Blog · 2026-06-11 · ~13 min read
Decision records for infrastructure-as-code: ADRs alongside Terraform and Kubernetes config
A new engineer joins the platform team and starts reading the Terraform codebase. She finds a lifecycle { prevent_destroy = true } block on an RDS instance. She doesn't know what it's protecting. The original engineer who added it left fourteen months ago. She opens a PR to update the instance class — Terraform plan shows the attribute is preventing the change. She asks in Slack. Nobody knows why it's there. She removes the lifecycle block. The PR passes review. Four weeks later, the database is accidentally destroyed in a staging environment cleanup run. The prevent_destroy flag existed because that "staging" database had a compliance copy of production data that legal had requested, and destroying it would have broken a regulatory retention obligation. That context never appeared anywhere in the codebase except as a four-character flag in a resource block that looked like a cautious default.
TL;DR
Infrastructure-as-code gives teams the config-as-code property: every infrastructure change is traceable through git history, reviewable before apply, and reproducible from source. What it doesn't give you is the deliberation — why prevent_destroy is set, why workspaces were chosen over folder separation for environments, why Kyverno won the admission controller evaluation, why the module boundary between networking and compute is drawn where it is. The reasoning for those decisions lives in AI chat sessions that happened months before the first terraform apply. Five IaC decision categories consistently need documentation: state management decisions, module boundary decisions, security and lifecycle decisions, provider and service decisions, and naming and structure conventions. Two placement patterns work: root /decisions/ for cross-cutting decisions (state backend, workspace model, provider choice) and proximity placement alongside the module for module-specific decisions. Terraform and Kubernetes teams generate rich extraction targets because IaC evaluations almost always happen in AI chat — the deliberation-before-migration is where the reasoning lives, and the WhyChose extractor surfaces it by narrowing to the six-week window before the commit that made the change.
Why IaC decisions are different from application decisions
Infrastructure-as-code teams have a structural advantage that application teams often don't: the config IS the implementation. In an application codebase, the code that runs in production has a straightforward relationship to the source — it compiles, ships, executes. But the config that describes the infrastructure shapes every resource that runs on top of it. When someone changes a Terraform module's interface, every caller inherits the change. When someone updates a namespace's ResourceQuota, every pod in the namespace is affected. The config-as-code property means IaC decisions have visible, durable consequences encoded directly in source.
That property cuts both ways. It makes IaC decisions easy to see in retrospect — the state of the infrastructure at any point in git history is recoverable. It makes IaC decisions hard to understand — because the config shows you the outcome of the decision, not the deliberation that produced it. Reading a Terraform file tells you what was chosen. It doesn't tell you what alternatives were evaluated, what constraint drove the selection, or whether the choice was intentional or the path of least resistance.
Application decisions have the same problem, but with one mitigation: code comments and PR descriptions accumulate around significant changes. Engineers leave notes about non-obvious patterns. IaC code has less of this, partly because the declarative syntax makes commentary feel redundant (the config is already readable), and partly because IaC changes often involve many mechanical lines where the actual decision is a handful of attribute values. prevent_destroy = true is three words. The decision behind it might be ten sentences if someone wrote it down.
The other asymmetry: IaC decisions are often made by fewer people, over longer deliberations, with more tools evaluated. A Terraform workspace-vs-folder evaluation for environment isolation might involve a week of AI chat, a proof-of-concept, and a team meeting. The decision record is the week of chat, the spike results, and the meeting notes — none of which ends up in the .tf file, and often none of which ends up written down at all.
Five IaC decision categories that consistently need documentation
Not every Terraform attribute choice warrants an ADR. A tags = { team = "platform" } block doesn't need a decision record. The five categories below consistently produce decisions that a new engineer cannot reconstruct from the config, that carry real consequences if reversed without understanding the original context, and that are almost never documented in the repository that contains them.
1. State management decisions
Remote state backend choices — S3 vs. GCS vs. Terraform Cloud, locking via DynamoDB vs. native backend locking, workspace vs. folder separation for environment isolation — are the highest-consequence undocumented decisions in many IaC codebases. The reason: reversing a state management decision requires migrating state, which is expensive, error-prone, and requires infrastructure downtime in some configurations. Teams that accidentally discover they chose the wrong state isolation model two years in and need to migrate from workspaces to folders face a multi-sprint project. A decision record written when the original choice was made would tell the new team whether the original constraint still applies and whether the migration is worth the cost.
The workspace-vs-folder question is especially worth documenting because it's one of the genuine architectural forks in Terraform usage, with real trade-offs and a strong community preference that has shifted over time. A team that chose workspaces in 2022 because Terraform Cloud workspace isolation seemed simpler has a different constraint landscape in 2026 — the decision record that captured "we chose workspaces because our compliance team required environment isolation and workspaces provided the simplest per-environment state boundary given our small infrastructure footprint" gives the current team the information they need to decide whether to migrate.
2. Module boundary decisions
When a platform team creates a reusable Terraform module, the module's responsibility boundary is a decision: what goes inside the module and what is left to the caller. A networking module that includes security groups is a different decision from one that doesn't. A database module that creates the IAM role for application access is a different decision from one that treats IAM as the caller's responsibility. These boundary decisions cascade — every team that calls the module inherits the boundary, and changing it later requires updating every caller.
Module boundary decisions are almost never documented because they feel like implementation details at the time. The module was written by one engineer in a week, the boundary emerged organically from what seemed natural, and the decision wasn't framed as a decision. But three years later, when a new engineer is adding a third module that needs to decide whether to follow the same boundary convention or diverge, the original reasoning matters. Was the boundary chosen deliberately, or did it just happen? Would the same engineer make the same choice today?
These decisions connect directly to the monorepo ADR pattern described in the monorepo decision log post: shared Terraform modules are analogous to shared libraries, and the decisions about their interfaces are platform-level decisions that belong in a root decisions folder with appropriate governance.
3. Security and lifecycle decisions
Security decisions in IaC are the category where undocumented consequences are most expensive — for the same reasons covered in the security ADR post. The specific IaC flavors: minimum IAM scope decisions (why does this role have this specific set of permissions?), network topology decisions (why is this subnet private and not public?), encryption decisions (why is KMS encryption enabled on this resource?), and lifecycle decisions.
The prevent_destroy lifecycle attribute deserves a special mention. It is specifically designed to protect against accidental deletion of resources that are important for reasons not visible in the code. That's exactly the scenario where a decision record is most valuable: the attribute signals that a human made a deliberate choice, the config doesn't say why, and the reasoning is important for any future engineer who wants to modify or remove the attribute. A simple rule: any time prevent_destroy = true is added to a resource, a decision record should accompany the PR that adds it, explaining what the resource protects and what the consequence of destroying it would be. This is a two-sentence ADR, not a full Nygard write-up, but those two sentences are what prevent the scenario in this post's opening hook.
4. Provider and service decisions
Which cloud provider, which managed service, which service tier: these decisions are made once, at the beginning, and then inherited silently for years. The original AWS-vs-GCP evaluation that happened before the first infrastructure commit is almost certainly not documented anywhere in the repository. Neither is the decision to use RDS over self-hosted PostgreSQL, or to use Fargate over self-managed EC2 container instances, or to use CloudFront over a self-managed Nginx load balancer. Each of those decisions had a constraint that drove it — usually a mix of team familiarity, operational simplicity at the time, and cost at the team's then-current scale.
These decisions matter when the team is reconsidering them. The engineer who proposes migrating from AWS to GCP or from RDS to Aurora Serverless needs to know what the original constraint was — not to rule out the migration, but to evaluate whether the original constraint still applies. A decision record that says "we chose RDS over Aurora Serverless because Aurora Serverless v1 had cold start latency that exceeded our p99 SLA; v2 didn't exist yet; revisit if Aurora Serverless v2 eliminates cold starts at our current workload" is exactly the information needed to decide whether the migration is now cost-justified.
5. Naming and structure conventions
Naming schemas for resources, tagging strategies, folder hierarchies — these feel like style choices, not decisions. They're recorded in neither ADRs nor design docs. They're usually established by the first engineer who wrote the first Terraform, and every subsequent resource inherits the pattern through copy-paste. The decisions embedded in naming conventions are invisible until someone tries to change them and discovers the convention is load-bearing in some tool or process that depends on the pattern.
The most common example: a resource naming convention that was designed to fit a specific CI/CD tool's naming assumptions. The CI tool was replaced two years ago, but the naming convention persisted — and now it's embedded in every resource and in the IAM policies that reference resource names. A decision record that said "resource names follow [prefix]-[environment]-[service] because the CircleCI pipeline at the time used [prefix] as the project identifier and [environment] to select the deployment context; revisit if CircleCI is replaced" would have made the naming convention's dependency explicit.
Where ADRs live in an IaC repository
Three placement options work for IaC ADRs. The right choice depends on whether the decision is module-specific, cross-cutting, or part of a broader architecture log that covers both infrastructure and application.
Root /decisions/ folder for cross-cutting decisions
State management decisions, provider choices, naming conventions, and workspace models belong in a root /decisions/ or /doc/decisions/ folder. These are decisions that affect the entire IaC codebase, not a single module. A new engineer who needs to understand the overall infrastructure philosophy — why it's structured the way it is — should find these records in one place, accessible from the repository root without navigating the module tree.
If the IaC repository is separate from the application codebase, a root decisions/ folder there is straightforward. If IaC lives in a subdirectory of a monorepo (common in platform engineering setups), the root /doc/decisions/ folder at the monorepo level is the right home for platform-wide IaC decisions, with module-specific decisions living closer to the modules. This mirrors the three-category structure from the monorepo post: platform-wide in root, cross-service in shared libs, service-local in each service folder.
Proximity placement for module-specific decisions
Module boundary decisions, module security decisions, and significant module configuration choices belong alongside the module. A /modules/database/decisions/ folder that contains the ADR explaining the module's IAM boundary is navigable without knowing the broader repo structure — an engineer reading /modules/database/main.tf who wonders about the boundary can look in the same folder.
The proximity principle is the deciding factor for module-specific placement: put the decision record where an engineer reading the code it governs will naturally look. For Terraform modules, that's the module folder. For Helm charts, that's the chart directory. For Kubernetes manifests organized by component, that's the component directory. The goal is to reduce the search cost for engineers who encounter an unfamiliar configuration choice and want to understand it — they should find the decision record without knowing in advance that a decision record exists.
A /infrastructure/decisions/ folder in a shared application + infra repository
For teams that keep application code and IaC in the same repository and have an existing /decisions/ folder for application ADRs, adding /infrastructure/decisions/ as a parallel folder avoids mixing decision types while keeping everything in one place. The GitHub ADR workflow with CODEOWNERS can assign infrastructure decisions to the platform team and application decisions to engineering leads — the ownership is enforced at the folder level.
The main risk with parallel folders is cross-cutting decisions that are both infrastructure and application: an API gateway decision that affects both the Terraform that provisions it and the application code that calls it through it. These are the cross-team decisions covered in the cross-team ADR post — they belong in the root decisions/ folder with a downstream stakeholders field listing both the platform team and the application team.
Terraform-specific patterns
State locking decisions
State locking is one of the few IaC configurations where the consequence of the wrong choice becomes visible during a production incident rather than during a normal change. A team that chose DynamoDB-based locking for S3 remote state needs to know what the failure mode was they were protecting against. A team that later removes DynamoDB locking to simplify the setup should know whether the original concern was eliminated or deferred.
State locking decisions typically generate rich AI chat deliberation. The engineer responsible for setting up remote state will research options, compare the failure scenarios for different locking approaches, and evaluate the operational overhead. The chat session from that research phase is a high-value extraction target — it contains the comparison of options, the failure scenarios considered, and the rationale for the final choice. The WhyChose extractor surfaces these as high-confidence candidates because they contain explicit question shapes and trade-off markers that the locking evaluation generates.
Provider version pinning decisions
Provider version constraints — required_providers { aws = { version = "~> 5.0" } } — are decisions with a half-life. A pinning decision made when AWS provider 4.x had a breaking change in a resource the team was using is a different constraint from a blanket "pin everything" policy. A version constraint written as a comment — "pinned to 5.x because 5.31 introduced breaking changes to the RDS cluster resource that we haven't migrated yet; see open ticket #4821" — is a decision record in a comment. Promoting that to an ADR (even a short one) creates a searchable record with a review trigger and a link to the ticket. When the ticket is resolved, the ADR is superseded.
This is the same ADR lifecycle that applies to application decisions — the Superseded status mechanism works equally well for IaC decisions that are resolved by a migration or a provider update.
Module DRY vs. explicit trade-off
Terraform module design sits on a spectrum from "maximum DRY" (every resource pattern gets a reusable module) to "maximum explicit" (raw resources directly in the environment configuration, no modules). Neither extreme is universally correct. Teams that went maximum DRY often encounter deep module abstractions that hide configuration options and make debugging harder. Teams that stayed explicit often accumulate copy-pasted configuration blocks that diverge in ways that are hard to find and expensive to unify.
The decision about where a team sits on this spectrum — and the specific threshold for "this pattern justifies a module" — is worth documenting once rather than re-litigating with every new resource. A decision record that says "we create a module when the same resource pattern is used in four or more places and has at least two non-trivial configuration surfaces (RBAC, encryption, networking) — below this threshold, raw resources in the environment configuration are preferred" gives the team a consistent rubric and eliminates the recurring "should we module this?" discussion.
Kubernetes-specific patterns
CRD and operator adoption decisions
The decision to adopt a Kubernetes operator — cert-manager, the AWS Load Balancer Controller, the Prometheus Operator, a service mesh control plane — is a cross-cutting decision that affects every namespace and every workload that depends on the capability the operator provides. Operator adoption decisions are nearly impossible to reverse after adoption: the downstream workloads that depend on the CRDs the operator provides need to be migrated before the operator can be removed, and in practice operator dependencies accumulate faster than they're cleaned up.
Operator adoption decisions generate clear extraction candidates. The deliberation session where an engineer evaluates "cert-manager vs. AWS Certificate Manager for TLS certificate management" contains the question shape, explicit comparison language, and trade-off markers that the extractor is calibrated to surface. The constraint that drove the adoption — "cert-manager was chosen because we needed wildcard certificates that ACM doesn't support for customer-managed domains" — is the sentence that prevents a future engineer from removing cert-manager to "simplify" the cluster without realizing that customer-managed domains depend on it.
Namespace architecture decisions
Three common Kubernetes namespace models: flat namespaces (one per application or component), team-per-namespace (one per team, applications coexist), environment-per-namespace (one per environment, teams coexist). Each model encodes assumptions about RBAC isolation, network policy scope, resource quota allocation, and billing attribution. The model that's correct at a five-engineer company running three services is often wrong at a twenty-engineer company running fifteen services — but the namespace model is hard to change because it's embedded in RBAC policies, network policies, Helm release names, and the DNS names that inter-service calls use.
Namespace architecture decisions have a natural review trigger: team size and service count thresholds. A decision record that says "we use environment-per-namespace because RBAC isolation between production and staging is the primary concern; at the current scale of 3 engineers and 4 services, per-team namespaces would produce more namespaces than services; revisit when service count exceeds 10 or when teams have meaningfully different resource consumption profiles" sets a concrete revisit condition rather than leaving future engineers to discover the constraint by accident.
Admission policy framework decisions
OPA Gatekeeper vs. Kyverno vs. a custom webhook is one of the most consequential Kubernetes decisions a platform team makes, and it's almost never documented. Admission controllers enforce cluster-wide policy at the admission webhook layer — every workload submitted to the cluster is evaluated by them. Replacing the admission framework requires migrating every policy to the new format, which is a significant project that teams often defer indefinitely once the original framework is in place.
The admission controller evaluation typically happens during an initial security hardening sprint, in AI chat, over several sessions. The engineer responsible will compare Gatekeeper's OPA Rego syntax against Kyverno's Kubernetes-native policy format, evaluate the operational overhead of each, and factor in the team's existing OPA or Go expertise. That deliberation is the richest Kubernetes extraction target: it's a decision with multiple serious contenders, a named constraint (team expertise, policy language familiarity, operational complexity), and durable consequences that affect every workload the cluster will ever run.
These are security decisions with compliance scope implications — admission policies are often required by SOC 2 controls or internal security standards, which means the decision record may be reviewed by auditors who want to see the deliberation, not just the current configuration.
Helm vs. Kustomize decisions
The Helm vs. Kustomize choice is the IaC equivalent of the Terraform workspace-vs-folder decision: a genuine architectural fork where both approaches have significant adoption, both have real trade-offs, and the choice is difficult to reverse once a team has invested in one approach's mental model and toolchain.
What makes this decision hard to recover from AI chat: teams often migrate between tools mid-deployment lifecycle, and the migration deliberation is split across multiple sessions, multiple engineers, and sometimes multiple quarters. An engineer who asks "should we use Kustomize instead of Helm for our staging environment" in a single chat session is generating an extraction candidate. But the final decision may have been made in a team discussion three weeks later, after two proof-of-concept implementations. The multi-session, multi-engineer deliberation pattern is exactly the case where the distributed team quarterly extraction pass produces better results — pooling exports from multiple engineers across the evaluation period gives a more complete picture of the deliberation than any single person's export.
Using AI chat extraction for IaC decisions
IaC evaluations are among the highest-confidence extraction targets in any engineer's chat history. The reason is structural: when an engineer evaluates infrastructure options, they almost always frame the question explicitly. "Compare Terraform workspaces vs. folder-based environment separation for a team of five with three environments." "What are the trade-offs between using Fargate vs. EC2 instances for our container workloads?" "Should we use RDS Multi-AZ or Aurora with replicas for high availability?" These prompts generate AI responses with explicit comparative structure — pros, cons, considerations — that the engineer then reacts to with commit phrases ("we'll go with workspaces because...") or reversal markers ("actually, the folder approach makes more sense given that...").
The extraction workflow for IaC decisions:
- Run
git log --all --name-only --format= | grep -E '\.(tf|yaml|yml|json)$' | sort | uniq -c | sort -rn | head -20on the IaC repository to find the files with the most commits — these correspond to the highest-revision infrastructure decisions, which are often the ones most worth documenting. - For each high-revision file, run
git log --follow -- path/to/file.tfand look at the dates of significant changes. Major structure changes, not minor attribute updates, are the decision points. - Identify the engineer responsible (git blame gives the author, git log gives the commit message context). Export that engineer's AI chat history from the six-week window before the significant change date.
- Run the WhyChose extractor on the export. IaC evaluations produce high-confidence candidates because they contain the question shapes, trade-off markers, and commit phrases the extractor is calibrated to find.
- Use the extraction output to write the decision record. The IaC chat session often contains the full Considered Options content — the alternatives evaluated, the rejection reason for each, and the constraint that drove the final choice — in a form that maps directly onto the ADR template.
The six-week window is a calibration, not a rule. For decisions made during a major infrastructure migration (moving from EC2 to Fargate, migrating from self-managed Kubernetes to EKS, consolidating Terraform state into a shared backend), the deliberation window may be longer. Git history gives the actual date range — the commits that represent the migration give you the endpoints, and the chat history from that period is the extraction target.
The migration deliberation advantage
Teams undertaking a major IaC migration — Terraform 0.12 to 0.13, workspace to folder environment model, Helm v2 to v3, a cloud provider migration — generate the most thorough infrastructure reasoning they will ever produce. The engineer doing the migration evaluation spends real time in AI chat working through the implications, the migration steps, the risk scenarios. That deliberation is richer than the original decision-making that established the infrastructure, which often happened quickly during initial setup.
The paradox: the original decision matters more (it established the baseline everything inherits) but is harder to recover (it was made quickly, before the team's AI chat habit was established, possibly before ChatGPT and Claude existed). The migration deliberation is easier to recover and less foundational but still worth capturing — it produces a decision record that explains both why the team moved away from the original approach and what the new approach's trade-offs are.
For IaC codebases with no existing decision records, the migration deliberation approach is the highest-ROI starting point: find the two or three engineers who led the most significant infrastructure changes in the last 18 months, export their chat history, run the extractor on the migration-window periods, and use the output to write retrospective ADRs. This produces a baseline decision log that covers the highest-consequence decisions without requiring the team to audit all infrastructure decisions from scratch.
Review triggers for IaC decisions
IaC decisions expire at higher rates than application decisions because they depend on external conditions — cloud provider pricing, provider API deprecations, compliance requirement changes — that evolve independently of the codebase. The same ADR lifecycle review triggers that work for application decisions work for IaC decisions, with three additional categories specific to infrastructure.
Cloud provider pricing changes
A reserved instance vs. on-demand decision, a spot instance adoption decision, a service tier selection — these are explicitly economic decisions at the time they're made and are explicitly invalidated by pricing changes. A decision record for a service tier selection that says "revisit if this service's cost exceeds 15% of total infrastructure spend — at that threshold, a dedicated managed version becomes cost-justified" sets a concrete condition that can be evaluated quarterly against actual billing data. Without the review trigger, the decision is re-litigated by intuition rather than by the explicit condition the original engineer identified.
Provider feature changes and deprecations
AWS retires services. Kubernetes deprecates APIs. Terraform provider major versions introduce breaking changes. IaC decisions made in the context of a provider's features at one point in time may be invalidated when those features change. A decision record that says "we use Kubernetes CronJobs for background processing rather than a queue-based system because the CronJob implementation meets our current concurrency requirements; revisit if job count exceeds 50 active concurrent jobs — at that scale, a message queue provides better visibility and retry semantics" gives future engineers both the original constraint and the specific invalidation condition.
Kubernetes API deprecations are especially worth writing review triggers for. The removal of deprecated APIs in Kubernetes minor releases has broken more than one cluster upgrade. A decision record for a workload that uses a deprecated API (or a resource type that has been deprecated) should include a review trigger keyed to the Kubernetes version where the deprecation takes effect.
Team growth thresholds
Many IaC decisions are correct at one team size and wrong at another. The namespace architecture decision mentioned above is one example. Another: the choice to use a flat IAM model vs. cross-account IAM roles for environment isolation is often driven by operational simplicity at small team sizes, where managing multiple accounts adds overhead that isn't justified by the isolation benefit. At larger team sizes, the same isolation that was overhead at five engineers is a necessary control at twenty-five. A decision record with a review trigger — "revisit when engineering headcount exceeds 15" — makes the team-size dependency explicit rather than leaving future engineers to discover it when the policy stops fitting.
The prevent_destroy audit
A practical first step for any IaC codebase without existing decision records: run a search for every prevent_destroy = true attribute across all .tf files.
grep -r "prevent_destroy" --include="*.tf" -l
For every file that contains it: does the repository have a decision record explaining why that resource is protected? Is the explanation in a comment that will be readable in three years? In most codebases, the answer is no — the attribute exists, the resource it protects is important for reasons that live in AI chat or in the head of an engineer who may have left, and any future engineer who finds the attribute will either honor it as a mysterious constraint or remove it without understanding the consequence.
Writing a short ADR for each prevent_destroy instance — just context, decision, and consequence — is a two-hour project for most codebases and eliminates the failure mode from this post's opening hook entirely. The ADR doesn't need to be long. It needs to answer: what is this resource? Why must it not be destroyed? What would break if it were? Are there any conditions under which removing prevent_destroy would be correct?
The same audit pattern applies to other lifecycle attributes: ignore_changes blocks (why is this attribute not managed by Terraform?), create_before_destroy = false overrides (why was the default reversed?), and depends_on blocks that reference non-obvious dependencies. Each of these is a signal that an engineer made a deliberate choice, and each deserves a decision record explaining the context.
Where to start
If your IaC codebase has no existing decision records, the highest-ROI starting point is the prevent_destroy audit above — it produces decision records for the most explicitly protected decisions in your infrastructure in a bounded time investment.
The second step: find the five .tf files with the most commit history. These files have been changed more often than others, which usually means the underlying decisions have been revisited, reconsidered, and accumulated context. The decisions in those files are the ones new engineers are most likely to encounter and least likely to understand without a record.
For Kubernetes teams: start with admission controller decisions. If your cluster has OPA Gatekeeper or Kyverno, the decision to adopt it is almost certainly undocumented and is the highest-consequence undocumented decision in the cluster. The engineer who made that decision probably has the deliberation in their AI chat history. Running the WhyChose extractor on that engineer's export from the relevant time window recovers the deliberation. Writing a one-page ADR from the output takes less time than the first admission policy debugging session a new engineer will need to do without that context.
For teams starting from scratch: the three decisions that matter most before the first Terraform apply are the state backend choice, the workspace-vs-folder environment model, and the naming convention. Writing those three ADRs before any code is written — as planning documents that become decision records when the implementation confirms the choices — sets a documentation habit from day one and produces the exact records that will matter most when a new engineer joins the platform team in 18 months.
See also: the monorepo decision log (IaC in a monorepo — which decisions belong in root vs. alongside modules), the security ADR (the five fields a security ADR adds for IaC decisions that affect threat model or compliance scope), the ADR review checklist (applies to IaC ADRs as written — prevent_destroy decisions especially need an honest Consequences section), and how to document architecture decisions (the end-to-end process from decision to documented record).
Frequently asked questions
Where should ADRs live in a Terraform or Kubernetes repository?
Two patterns: root /decisions/ for cross-cutting decisions (state backend, workspace model, provider choice, naming conventions), and proximity placement alongside the module for module-specific decisions. A /modules/database/decisions/ folder puts the ADR where an engineer reading the module will look. In a monorepo, root /doc/decisions/ for platform-wide IaC decisions, with module decisions alongside the modules.
What Terraform decisions should have ADRs?
Five categories: state management (backend, locking, workspace model), module boundaries (what's inside vs. outside the module), security and lifecycle (prevent_destroy scope, IAM minimum permissions), provider and service choices, naming and structure conventions. Any time prevent_destroy = true is added to a resource, a short decision record should accompany the PR explaining what the resource protects and why.
How do you extract IaC decision reasoning from AI chat?
IaC evaluations generate explicit question shapes and trade-off language that makes them high-confidence extraction targets. Identify the significant change dates from git log, export chat history from the six-week window before each change, run the WhyChose extractor on the export. The deliberation-before-migration is the richest source — engineers spend real time in AI chat working through IaC migration options before committing to an approach.
What Kubernetes ADRs are most often missing?
Three consistently undocumented categories: CRD and operator adoption decisions (nearly impossible to reverse after adoption), namespace architecture decisions (correct at five engineers, often wrong at twenty), and admission policy framework decisions (OPA Gatekeeper vs. Kyverno vs. custom webhook — cross-cutting, long-lived, audit-relevant). The admission controller adoption session in an engineer's AI chat history is typically the highest-confidence Kubernetes extraction target in the codebase.