The security scanning decision record: why the SAST/DAST integration you chose determines your vulnerability detection latency and your false-positive triage burden
The security scanner is added to CI the week after the SOC 2 auditor asks for evidence of automated vulnerability detection. Someone installs Semgrep or Snyk, runs it against the existing codebase, inherits 312 findings from the first scan, and merges a CI configuration that blocks builds on critical findings. The choice of scanner, the choice of rule sets, the choice of gate threshold, and the choice of who owns the remediation queue are all made in the same afternoon, under compliance pressure, without explicit reasoning about false-positive rate, triage overhead, or the relationship between the gate severity and delivery velocity. The decision is load-bearing and invisible — and it stays invisible until the queue has grown to 847 open findings, the security team and the engineering team are in a standoff about whose problem the queue is, and someone proposes disabling the scanner in CI because it blocks every merge.
The security scanning decision record makes the tool selection, the coverage scope, the false-positive policy, and the remediation ownership explicit before the queue becomes the organizational incident that supersedes the security program itself. This is not documentation for compliance purposes — a compliance auditor wants evidence of scanning, not an ADR. This is the record that allows a future security engineer or a future engineering lead to answer: what did we expect this scanner to find, how accurate did we expect it to be, who is responsible for each category of finding, and what constitutes an acceptable disposition of a finding that is not going to be fixed?
Two things that happen when the decision is not written down
The false-positive accumulation
A 40-person fintech company added Semgrep to its CI pipeline after a SOC 2 Type II audit identified "no automated static analysis tooling" as a gap. The security engineer added Semgrep with the default rule set — 847 rules covering Python, JavaScript, SQL injection, XSS, insecure randomness, hardcoded credentials, and a dozen other categories. The first scan returned 312 findings across the existing codebase. The security engineer opened 312 tickets in the issue tracker and sent an email to the engineering leads explaining that all findings needed to be reviewed.
No documentation recorded which rules had been enabled or why. No documentation recorded what the expected false-positive rate was for the default rule set against a Django and React codebase. No documentation recorded what the remediation SLA was for each finding severity. No documentation recorded who was responsible for triaging new findings generated on future pull requests. The CI configuration blocked merges on critical-severity findings; Semgrep's default rule set classified 23 of the 312 initial findings as critical.
Over the next six months, engineers opened pull requests, saw new Semgrep findings in the CI output, and responded in one of three ways: suppressed the finding with a # nosemgrep comment without documenting why the suppression was appropriate, left the finding unresolved and asked a senior engineer to review it (who also left it unresolved), or rewrote the code in a way that avoided the pattern Semgrep was matching — sometimes correctly because the code was genuinely vulnerable, and sometimes incorrectly because the code was not vulnerable and the engineer did not know it. Twelve months after the initial installation, the finding queue contained 847 open items. The engineering team estimated that 60-70% were false positives — phantom results from rules that matched Django ORM query patterns as potential SQL injection because the rule could not model the ORM's parameterization. The security team could not confirm the estimate because no baseline measurement of the false-positive rate had been taken when the scanner was configured. The queue was growing at approximately 30 new findings per week from pull request activity. Nobody knew which of the 847 findings represented genuine vulnerabilities that required remediation and which were phantom results that should be suppressed.
The scanner that was disabled
A 25-person developer tools company integrated Snyk into its Node.js monorepo CI pipeline to satisfy a requirement from an enterprise customer's vendor assessment. The integration ran snyk test on every pull request and blocked merge on any finding of medium severity or higher. Within the first week, the scanner was blocking 80% of pull requests — not because the codebase had 80% of PRs introducing vulnerabilities, but because Snyk's dependency vulnerability database flagged three transitive dependencies that were present in every package in the monorepo. The three dependencies had published CVEs that Snyk's severity model rated as medium or higher. Two of the three CVEs were in code paths that the application did not exercise (the vulnerable function was exported by the library but never imported by the application). One CVE was in a development dependency that was not included in the production build.
No documentation recorded the rationale for blocking on medium-severity findings rather than high-severity findings. No documentation recorded whether the CVE scope — all findings from all transitive dependencies — was intentional or a default configuration that had not been evaluated against the codebase's actual dependency usage. No documentation recorded who was authorized to make the decision that a given CVE was acceptable risk, or what evidence was required to support that determination. The options were: update the three transitive dependencies (which required first updating the direct dependencies that pinned them, which required testing that the updated versions did not break anything), mark the CVEs as acceptable risk in Snyk's suppression interface (which required a documented rationale that the security engineer was not present to write), or disable Snyk's CI gate.
The lead engineer disabled the Snyk CI gate to unblock delivery. The snyk test command was moved from a blocking CI step to a nightly scheduled scan with results posted to a Slack channel. Within a month, nobody was reading the Slack channel. The enterprise customer's vendor assessment had passed — the evidence submitted was the initial integration configuration, not the current state — and the operational pressure to maintain the scanning program had dissipated. The security scanning program that had been designed to detect supply chain vulnerabilities in a production Node.js application was producing nightly Slack messages that nobody read, and the three CVEs that had triggered the original crisis were still present in the dependency tree, unresolved, eighteen months later.
Both outcomes share the same root cause: the security scanning configuration was never documented as a decision with explicit reasoning about what the scanner was supposed to find, what the acceptable false-positive rate was, and who was empowered to make disposition decisions on findings. Without that documentation, the scanning program could not be calibrated — the gate threshold was either too strict (blocking delivery on phantom results) or, once disabled, nonexistent. The path between those extremes requires documented trade-off reasoning that cannot be recovered from the CI configuration file.
Three structural properties that are set at scanner selection time
1. The detection coverage surface and its inherent blind spots
The security scanner's detection coverage surface is determined by the combination of the scanning approach (SAST, DAST, SCA, or a combination) and the specific rule set or vulnerability database it uses. Each approach covers a different portion of the vulnerability space and leaves a different portion uncovered. SAST detects vulnerabilities in the application's source code by analyzing code patterns without executing the application; it is effective for injection vulnerabilities, hardcoded credentials, insecure algorithm selection, and path traversal where the vulnerable code pattern is present in the source. It is structurally unable to detect vulnerabilities that depend on runtime behavior — authentication bypass that depends on JWT validation timing behavior, session fixation that depends on cookie attribute handling in the HTTP framework, server-side request forgery that depends on the interaction between user input and a URL-fetching library at runtime. DAST detects vulnerabilities by probing a running application; it is effective for those runtime-dependent vulnerability categories but requires a deployed environment and produces findings that are harder to map back to specific code locations for remediation.
The inherent blind spots are not defects in the tools — they are structural properties of the approach. SAST will always have false positives because static analysis cannot model runtime behavior precisely: a SAST rule for SQL injection will match an ORM query that looks structurally similar to a raw string-concatenated query even if the ORM's parameterization makes the actual query safe. The false-positive rate is not a number that can be reduced to zero by choosing a better scanner; it is a property of the analysis approach applied to a specific codebase, and it varies based on the codebase's use of frameworks that the scanner models with varying accuracy. Document the expected false-positive rate for the chosen scanner and rule set at selection time, measured against the existing codebase rather than quoted from vendor marketing materials. The measured rate is the rate that determines the triage burden for the remediation queue — and the triage burden is what determines whether the security program is operationally sustainable.
The coverage surface also determines which attack categories the scanning program cannot detect and therefore what compensating controls are required. A SAST-only program with no DAST component does not detect runtime-dependent authentication vulnerabilities; the compensating control may be a manual penetration test on a defined cadence or a security-focused code review checklist for authentication-adjacent code changes. Document the compensating controls explicitly so that the security posture can be evaluated as a complete picture rather than as "the scanner handles security." The security ADR for threat model and compliance defines the threat model; the security scanning decision record documents which threats the automated scanning addresses and which require other controls.
2. The false-positive rate and its effect on the remediation queue's signal-to-noise ratio
The false-positive rate determines the ratio of genuine vulnerabilities to phantom results in the remediation queue — and this ratio determines whether engineers treat new findings as urgent security issues or as scanner noise. A queue where 70% of findings are phantom results does not have a 70% false-positive problem; it has a queue that trains engineers to treat all findings as phantom results with 70% probability, which means genuine critical vulnerabilities receive the same skeptical treatment as the phantom results and may sit in the queue at the same priority level as findings that an engineer has already decided are probably not real.
The false-positive rate is not fixed at selection time — it is a function of the rule set configuration relative to the codebase's patterns. A Semgrep rule for SQL injection against raw string concatenation in SQL queries will produce zero false positives in a codebase that uses only ORM queries, because the pattern the rule matches never appears. The same rule will produce a high false-positive rate in a codebase that constructs some queries via ORM and others via a query builder that uses parameterized templates — the rule will match the query builder's template syntax as potential injection because the rule cannot distinguish between a parameterized template and a string-concatenated template without full data-flow analysis. The false-positive rate is reduced by curating the rule set to exclude rules that produce high false-positive rates on the specific codebase, not by selecting a scanner with a lower stated false-positive rate in vendor documentation. Document which rules were disabled and why — both to justify the coverage gap they leave and to ensure that a future rule set update does not silently re-enable rules that were deliberately excluded.
The triage cost of a finding is the time required to determine whether a finding is a genuine vulnerability (requiring remediation) or a phantom result (requiring suppression with a documented rationale). For a finding that requires code reading and understanding of the data flow through the application, triage takes 30 to 90 minutes. For a finding in an unfamiliar part of the codebase, triage takes longer — the engineer must first understand what the code does before determining whether the pattern the scanner matched is actually exploitable. Document the expected triage cost per finding category in the decision record, so that the queue's operational overhead can be estimated and staffed. A scanning configuration that generates 50 new findings per month with a 60% false-positive rate requires approximately 50 × 0.6 × 45 minutes = 22.5 engineer-hours of triage per month to maintain a current queue, in addition to the remediation work for the 20 genuine findings. An organization that adds a scanning program without estimating this cost will discover the cost operationally, after the queue has grown to the point where the triage backlog itself is a risk because genuine findings are buried in untriaged findings.
3. The remediation queue ownership and the escalation model for contested findings
The remediation queue ownership question — who is responsible for triaging new findings, who is authorized to suppress findings as acceptable risk, and who is responsible for ensuring that critical findings are remediated within the SLA — is the question that determines whether the security scanning program produces security improvements or produces organizational conflict. In the absence of explicit ownership documentation, the default outcome is shared-ownership deadlock: the security team believes engineering is responsible for remediating all findings; the engineering team believes the security team is responsible for triaging findings and determining which are genuine; and findings accumulate in the queue while each team waits for the other to take the first action.
Effective remediation queue ownership requires three documented roles: the triage owner (who reviews new findings, makes the initial genuine-vs-phantom determination, and assigns genuine findings to the appropriate engineering team or accepts them as acceptable risk with documented rationale); the remediation assignee (the engineering team that owns the code where the finding was identified, responsible for fixing it within the severity's SLA or escalating a remediation timeline exception); and the exception approver (who is authorized to extend a remediation SLA or formally accept a finding as acceptable risk). Without all three roles documented, the triage and remediation processes cannot execute — there is no defined handoff between the scanner output and the remediation action, and the queue grows because the process for draining it was never specified.
The escalation model for contested findings — findings where the triage owner determines the finding is genuine but the remediation assignee disputes the determination — requires a documented resolution path. Security engineers and software engineers regularly disagree about whether a specific code pattern is exploitable in context. A SQL query constructed from user input that passes through the application's input validation layer may be flagged by a SAST rule as potential SQL injection; the engineer who wrote the input validation argues that the validation makes the query safe; the security engineer argues that the validation may have edge cases that the scanner cannot model. This dispute requires a documented resolution process: who makes the final determination (a senior security engineer with a defined scope, a third-party security review, or a documented risk acceptance by the engineering lead), and what evidence is required to support the engineer's claim that the validation is sufficient. Without a documented resolution process, contested findings persist in the queue indefinitely — technically open, practically ignored — until they are resolved by a security incident that confirms or refutes the engineer's claim about the validation's adequacy.
The five ADR sections for a security scanning decision
1. Scanner selection and scanning approach rationale
Document the chosen scanning approach — SAST only, DAST only, SCA only, or a combination — with explicit reasoning about what each approach covers and what it leaves uncovered. For each scanning category (SAST, DAST, SCA), document whether it was implemented, deferred, or deliberately excluded, and the reasoning for each disposition. A team that implements SAST and SCA but defers DAST should document why DAST is deferred (no deployed staging environment, no automated browser-level test harness to drive the DAST scanner, the threat model does not include the vulnerability categories DAST specializes in), and what compensating control covers the DAST-addressable vulnerability categories in the interim (manual penetration test on an annual cadence, security-focused code review checklist for authentication-adjacent changes).
For each scanner tool selected, document the alternatives that were evaluated with explicit rejection reasoning. Alternatives to Semgrep include CodeQL (deeper data-flow analysis, higher accuracy, significantly slower scan time — ruled out because 15-minute scan time on the monorepo would block PR feedback loops), Checkmarx (enterprise pricing at $X per developer per year — ruled out on cost until the team exceeds 50 engineers), and Bandit (Python-only, incompatible with the polyglot codebase). The rejection reasoning documents both the property that made each alternative unsuitable and the threshold that would make it worth reconsidering — so that a future security engineer who is evaluating whether the original scanner selection still fits the team's scale has the information to make a calibrated comparison rather than a fresh evaluation from scratch. The CI/CD pipeline decision record governs the overall pipeline architecture within which the scanner executes; document the integration point explicitly (pre-merge check, post-merge scan, scheduled nightly scan, or a combination) and the reasoning for the chosen integration point.
2. Rule set configuration and false-positive management policy
Document the rule set configuration: which rule categories are enabled, which are disabled, and the rationale for each exclusion. For SAST scanners with configurable rule sets (Semgrep, SonarQube, Checkmarx), the default rule set is designed to be broad — it is intended to catch all patterns that might be vulnerable across all codebases using the scanner's supported languages, not to be calibrated for accuracy on a specific codebase. The accuracy on a specific codebase is determined by the codebase's use of frameworks, patterns, and idioms that the rules model with varying accuracy. Document the baseline false-positive rate measured on the existing codebase at the time of the scanner configuration: run the scanner against the full codebase, triage a random sample of 50 findings, and record the fraction that are genuine vulnerabilities versus phantom results. This measurement is the baseline that allows future calibration — if the false-positive rate measured six months later is significantly higher than the baseline, it indicates either that the codebase has adopted new patterns that the rule set models poorly or that new rules were added to the rule set without calibration against the codebase's patterns.
Document the suppression policy: what justification is required to suppress a finding with a # nosemgrep comment, a Snyk .snyk ignore entry, or an equivalent scanner-specific mechanism. The minimum suppression justification should include: the finding ID and description, the reason the finding is a phantom result (the specific property of the code that makes the vulnerability pattern inapplicable), and the name of the engineer and the date of the suppression. Suppressions without justification are indistinguishable from engineers silencing scanner noise without evaluating whether the finding is genuine — and a codebase with hundreds of unjustified suppressions has, in effect, manually disabled the scanner's coverage of the suppressed code paths. The suppression policy should also document who reviews suppression justifications: a finding suppressed by the engineer who wrote the code under a time deadline receives less scrutiny than a finding suppressed after review by a second engineer or a security engineer. The review requirement is a process control on the accuracy of the suppression decision.
3. Severity gate threshold and delivery velocity trade-off
Document the gate threshold — the finding severity that blocks merge on a pull request — and the explicit reasoning about the trade-off between security coverage and delivery velocity. A gate that blocks on all severity levels (critical, high, medium, low, informational) maximizes the security signal that cannot be bypassed but creates a delivery velocity problem if the false-positive rate at medium and low severity is high: engineers will find paths around the gate rather than triaging every low-severity finding on every pull request. A gate that blocks only on critical severity minimizes the delivery velocity impact but allows high and medium severity genuine findings to merge without remediation, relying on the remediation queue process to address them asynchronously. Document the threshold chosen, the reasoning (including the expected false-positive rate at each severity level and its relationship to the gate decision), and the condition under which the threshold would be revisited — typically when the measured false-positive rate at the gate threshold exceeds a documented maximum, or when a genuine vulnerability at a severity level below the gate threshold is discovered in production.
Document the handling of legacy findings — findings that existed in the codebase before the scanner was introduced. A gate that blocks pull requests on any finding, including findings that predate the pull request, will block pull requests that do not introduce any new vulnerabilities, causing friction for engineers whose code is functionally unrelated to the legacy finding. The standard approach is a baseline suppression: all findings from the initial scan of the existing codebase are marked as accepted risk or suppressed with documented rationale, and the gate applies only to net-new findings introduced by the pull request's changes. Document whether baseline suppression was used, the date of the baseline, and the mechanism for distinguishing baseline-suppressed findings from findings introduced by new code changes. The feature flag decision record and other technology adoption decisions follow a similar pattern of managing legacy state separately from new requirements — the baseline suppression approach for security scanning is the same operational pattern applied to the specific case of existing codebase findings.
4. Dependency vulnerability scanning scope and SCA configuration
Document the SCA configuration: which dependency trees are scanned, what the severity threshold is for findings in transitive (indirect) dependencies versus direct dependencies, and what the remediation policy is for CVEs in code paths that the application does not exercise. The scope question is consequential: a monorepo with 40 packages has a dependency tree that includes every package's dependencies transitively; a CVE in a library that is a transitive dependency of a development-only build tool is not a production security risk, but an SCA scanner configured to scan all dependencies including development dependencies will report it at the same severity as a CVE in a runtime production dependency. Document the scope explicitly — production dependencies only, all dependencies, or a curated subset — and the reasoning for the scope choice.
The code-path question is the most contested area in SCA triage. A CVE for a remote code execution vulnerability in a library function that the application never calls is not an exploitable vulnerability in the application, even though the vulnerable library version is present in the dependency tree. Confirming that the application never calls the vulnerable function requires either static analysis (which the SCA scanner may not perform) or manual code review. The remediation policy for CVEs in unexploited code paths should document the standard of evidence required to accept a CVE as unexploited (a code search showing no imports of the vulnerable function plus a review confirmation, or a SAST scan confirming no data flow to the vulnerable function), who is authorized to make that determination, and the review cadence for accepted-unexploited CVEs if the application's usage of the library changes. Without a documented policy, each CVE in an unexploited code path becomes a contested finding that blocks delivery while engineers and the security team debate the exploitability of a code path that neither party has fully analyzed. The dependency injection decision record documents the application's dependency management architecture; the SCA configuration should be designed in the context of that architecture, particularly for codebases where the dependency injection container determines which library code paths are instantiated at runtime.
5. Scanning cadence, finding queue ownership, and remediation SLA
Document three interrelated operational parameters that determine whether the security scanning program is sustainable over time: the scanning cadence (when scans run and what triggers them), the finding queue ownership model (who triages findings, who remediates them, who resolves disputes), and the remediation SLA (maximum time between a finding's creation and its disposition — either remediated or formally accepted as risk with documented rationale).
The scanning cadence should cover pull request scanning, main branch scanning, and scheduled periodic scans independently. Pull request scanning catches vulnerabilities introduced by new code changes before they merge; main branch scanning catches vulnerabilities that were not introduced by a specific pull request (library version updates that introduce new CVEs, rule set updates that identify patterns in existing code); scheduled periodic scanning re-evaluates the full dependency tree against current CVE databases, catching newly disclosed CVEs in libraries whose version has not changed. Document the cadence for each scan type and the integration point where each type's findings enter the remediation queue.
Document the remediation SLA per severity tier — the maximum calendar time from a finding's creation to its disposition. A representative SLA structure: critical findings must be triaged within 24 hours and remediated within 7 days; high findings must be triaged within 7 days and remediated within 30 days; medium and low findings are triaged as queue capacity allows, with no hard SLA but a target of 90 days. Document whether the SLA applies to the triage decision (confirming the finding is genuine), the remediation completion (the fix is merged and deployed), or both. Document the exception process: when a critical finding cannot be remediated within 7 days because the remediation requires a dependency update that has not been validated in the test suite, who approves an extension and what compensating control (WAF rule, feature flag disabling the affected code path, infrastructure-level network restriction) is required during the extension window. The observability platform decision record governs production monitoring; if a security finding's remediation involves a production configuration change (a WAF rule, a network policy update, a secrets rotation), document the observable signals that confirm the compensating control is in effect and the procedure for verifying the remediation has reached production.
The security conversation buried in your AI chat history
Security scanning decisions belong to the class of decisions made in AI chat conversations rather than in synchronous meetings. "What's the best SAST scanner for a Python Django + React codebase?" is a natural Claude or ChatGPT question. The answer — including the Semgrep versus CodeQL trade-off, the specific rule categories to enable for an ORM-heavy Django codebase, the expected false-positive rate for the default rule set, and the recommendation to start with a high-severity gate and expand coverage as the false-positive rate is measured — lives in that conversation. The decision record that the next security engineer and the next engineering lead will need to understand why the scanner is configured the way it is does not exist. It requires the reasoning from the chat conversation, structured into the five ADR sections above, and committed to the repository alongside the scanner configuration file.
The WhyChose open-source extractor recovers exactly these security design conversations from your AI chat history — the discussion about whether to use Snyk or Dependabot for SCA, the reasoning about gate thresholds and delivery velocity, the question about false-positive rates in Django ORM codebases that you asked when you were evaluating Semgrep. That reasoning is the content of the security scanning decision record. The alternative is a team that inherits a scanner configuration with 847 open findings and no documented basis for understanding whether the findings represent real vulnerabilities or accumulated phantom results — and no ability to calibrate the configuration without running the original evaluation again from scratch.
Further reading
- The security ADR for threat model and compliance — the threat model defines which attack categories the security scanning program is expected to address; the scanning configuration should be designed to cover the threat model's highest-priority attack paths, with documented gaps where the chosen scanning approach cannot cover specific threat model entries
- The CI/CD pipeline decision record — the security scanner integrates into the CI pipeline at a specific stage; the pipeline decision record governs the overall CI architecture and the constraints (scan time budget, parallelization model, caching strategy) that determine which scanner configurations are feasible without degrading pull request feedback latency
- The authentication strategy decision record — authentication vulnerabilities (session fixation, token leakage, JWT algorithm confusion) are a major category of DAST findings; the authentication strategy decision record documents the authentication architecture that the security scanner should be configured to probe
- The secrets management decision record — hardcoded credentials and improperly stored secrets are a primary SAST finding category; the secrets management decision record documents where secrets are stored and how they flow through the application, which determines the scope of the hardcoded-credential rule set that should be enabled in the SAST configuration
- The API gateway decision record — API-level vulnerabilities (broken object-level authorization, mass assignment, rate limiting bypass) are best detected by DAST configured to probe the API's endpoints; the API gateway decision record documents the API's structure and authentication model, which determines the DAST scanner's attack surface
- The dependency injection decision record — the dependency injection architecture determines which library code paths are instantiated at runtime; for SCA findings in transitive dependencies, understanding the DI container's instantiation model is required to confirm whether a vulnerable library function is reachable in the application's execution graph
- The test strategy decision record — DAST requires a running application to probe; the test strategy decision record documents the test environment architecture and the automated test suite that can drive the DAST scanner; without an automated test harness that exercises the application's code paths, DAST coverage is limited to the paths the scanner's built-in probes can discover without application-specific test sequences
- The developer experience decision record — local development environment parity with the scanner's staging target determines whether security findings identified in CI are reproducible locally; a DAST finding in a staging environment that uses a different authentication configuration than the local development environment may not be reproducible locally, making triage harder
- The observability platform decision record — security findings that require production mitigations (WAF rules, network policies, feature flags disabling affected code paths) produce observable signals that confirm the mitigation is in effect; the observability platform decision record governs the monitoring infrastructure used to verify that security remediations have reached production and are behaving as expected
- The feature flag decision record — feature flags are a compensating control for critical security findings that cannot be remediated within the SLA: the affected functionality can be disabled via a feature flag while the remediation is in progress; the feature flag decision record documents the flag management infrastructure that enables this compensating control
- The decisions that never get written down — security scanning configuration decisions join the class of consequential undocumented technical choices: made under compliance pressure in a single afternoon, discovered as organizational incidents when the false-positive accumulation or the disabled scanner reveals that the original configuration assumptions were never recorded
- The WhyChose open-source extractor — recover the original security scanner evaluation from your AI chat history, including the Semgrep versus CodeQL trade-off analysis and the false-positive rate discussion that informed the original rule set configuration before anyone wrote it down