How post-mortems and ADRs work together: using incident history to fill the decision log

Most engineering teams run post-mortems and write architecture decision records as entirely separate processes. The post-mortem goes into the incident tracker. The ADR goes into the decisions directory. They almost never cross-reference each other. This separation is expensive: the post-mortem surfaces the exact constraint that broke in production, and that constraint was almost always implicit before the incident. The incident is the highest-confidence moment to write the ADR that should have existed before it — and most teams let that window close without writing anything.

The case for connecting post-mortems and ADRs is specific, not general. It's not "document everything" or "retrospectives produce good learning." It's that the post-mortem does a substantial fraction of the ADR's intellectual work — surfaces the constraint, names the decision that created the failure mode, identifies the alternatives that were implicitly not chosen — and then files that work in an incident ticket where it decays and can't be cited by future architectural decisions.

This post covers the specific pipeline from post-mortem to ADR: what types of ADRs incidents surface, how to map from post-mortem sections to ADR sections, how to recover the original decision context when the decision was made years ago, and why the 48-hour window after the post-mortem is the only window that makes retrospective ADRs accurate rather than reconstructed.

Why incidents surface the most valuable implicit constraints

Every production incident involves at least one constraint that was implicit before the failure. That's almost definitional: if the constraint had been explicit, it would have been documented, and either the decision creating the failure mode would have been made differently, or the known constraint would have been tested, or someone monitoring the system would have flagged the approaching boundary before it broke.

The implicit constraint takes one of a few forms. Sometimes it's a performance assumption that was true when the system was designed but stopped being true as usage grew — a database query that ran in 2ms at 10k rows and 800ms at 10M rows; the constraint was that the dataset would stay small, and that constraint was never written down. Sometimes it's a failure mode that was known but classified as "unlikely" without a documented threshold — the system can't handle a split-brain scenario because the original choice was to defer distributed coordination until the team grew, and that deferral was never recorded as a deferral decision with conditions. Sometimes it's a dependency assumption that was obvious to the team that built it and invisible to every subsequent engineer: the job scheduler assumes the database and the message broker are in the same availability zone; the constraint was obvious in 2021, invisible in 2024 when the infrastructure team moved the broker.

In all these cases, the post-mortem surfaces the constraint clearly, often for the first time. The incident timeline makes the failure mode concrete. The root cause analysis names the decision that created it. The team's understanding of the constraint is at its maximum depth — they just ran the failure mode in production. And within a week, that clarity will have started to abstract back into "we need to be more careful about scaling assumptions" — useful in aggregate, useless as a specific constraint in a future ADR.

The three ADR types a post-mortem typically produces

Not every incident produces the same kind of ADR. Most post-mortems surface one or more of three distinct types, and writing the right type requires recognizing which one applies.

The original decision ADR. The decision made months or years ago that left the system in a state where this incident could occur. A well-structured ADR for this decision would have documented the constraint explicitly: "we're accepting that this approach has no automatic failover because the team lacks operational experience with distributed consensus at this scale; revisit when we exceed X concurrent users or add a second on-call engineer." That ADR doesn't exist — which is why the constraint was implicit, and why the incident occurred. Writing it now, immediately after the post-mortem, captures the constraint with a precision that will be impossible to reconstruct in six months.

The deferral ADR. The implicit decision to not address a known risk. Engineering teams accumulate deferred risks as a normal part of operating — a known single point of failure that hasn't been a problem yet; a schema that should be normalized but the migration cost is too high right now; a third-party dependency whose API is fragile but whose replacement would take a sprint. These deferrals are decisions, but they almost never get written down. The deferral ADR makes them explicit: "We are deferring distributed session storage until Q3. Known risk: regional outage causes session loss for active users. Acceptable at current user count. Trigger condition: user count exceeds 50k or first customer SLA includes session persistence requirements."

The incident response ADR. The decision about how to mitigate. Rollback or hotfix? Accept and monitor or take the service down? Feature flag the broken path or push an emergency deploy? These decisions are made under time pressure, and the constraint that was actually decisive — time to restore service, blast radius of the fix, confidence in the rollback path — is almost never documented. The incident response ADR is the most underwritten of the three because the pressure is off once the incident resolves. But future incidents in the same system will benefit from knowing what the team decided, why, and what was given up by choosing that path over the alternatives.

Mapping post-mortem sections to ADR sections

The structural relationship between a post-mortem and an ADR is closer than it looks. Post-mortems and ADRs ask different questions in different orders, but a well-written post-mortem contains most of the raw material for the ADRs it should produce.

The post-mortem's "what failed?" section maps to the ADR's Consequences section — specifically the part of Consequences that names what the chosen design forecloses. The failure mode is the consequence that wasn't named when the original decision was made. "This design has no automatic failover" should have appeared in the original Consequences section; the post-mortem's "what failed?" section writes it in retrospect.

The post-mortem's "why did this design allow the failure?" section maps to the ADR's Constraint section. The root cause analysis is identifying the constraint that was decisive in the original design — the team lacked distributed systems operational experience, or the timeline didn't allow for a more complex solution, or the failure mode was known but classified below the risk threshold. The constraint analysis in the post-mortem is the material for the Constraint section of the original decision ADR.

The post-mortem's "what contributing factors made this worse?" section maps to the ADR's Alternatives Considered section. The factors that amplified the incident — no circuit breaker on the dependency, no health check on the scheduler, no rate limit on the retry loop — are usually alternatives that were implicitly not chosen. The ADR's Alternatives Considered section should have listed these with rejection reasons. The post-mortem writes the rejection reasons in retrospect: "no circuit breaker — not implemented because the team had no prior experience with circuit breaker patterns and the added complexity wasn't justified at 2021's traffic volume."

The post-mortem's "what will we do differently?" section is the input to the new ADR — the forward-looking decision that the incident is forcing. This is the most natural ADR candidate, and the one teams are most likely to convert into a formal record because it's part of the action items. But the original decision ADR is more valuable than the remediation ADR, because it's the record that prevents the same failure mode from recurring in a different part of the system where the post-mortem's action items don't reach.

The 48-hour window

The case for writing post-mortem ADRs quickly is about the rate at which clarity degrades.

Immediately after a post-mortem, the team's understanding of the original decision is at its sharpest. The root cause analysis has traced the failure mode back to the original constraint. The engineers who made the original decision — or who inherited it — have spent hours reconstructing the reasoning. The alternatives that were implicitly not chosen are visible from the failure mode itself. The constraint that was decisive is named in the post-mortem.

Forty-eight hours later, the urgency has dropped significantly. The system is back up. The action items are in the tracker. The team is back to sprint work. The specific constraint has started to abstract back into "we need better resilience patterns" — a lesson rather than a specific constraint in a specific decision context.

Two weeks later, the incident is a row in the post-mortem archive. The original decision ADR is still not written. A new engineer joins the team. They look at the system design and ask why there's no automatic failover. Nobody can answer precisely, because the precise constraint — the reason that was documented clearly in the 48-hour window — was never written down, and the abstracted lesson has replaced it in the team's memory.

The false constraint problem is especially likely in ADRs written from incident memory rather than from the 48-hour post-mortem clarity. Memory two months after an incident tends to produce the defensible-sounding constraint rather than the honest one. "The team lacked operational experience with distributed consensus" becomes "the operational overhead of Raft wasn't justified at 2021's scale" — both claims are true, but only one was the actual weight of the decision, and the defensible framing is the rationalized version.

Recovering original decision context from AI chat history

The original decision that created the failure mode was almost certainly deliberated in AI chat. An engineer made a stack choice or architecture decision in 2023 by working through it in ChatGPT. The alternatives were considered in the chat session. The constraint was identified there. The trade-offs were named. None of it made it into an ADR because the team wasn't writing ADRs yet, or the decision didn't seem significant enough at the time, or the engineer meant to write it up and didn't.

The post-mortem gives you a date range: the system was designed in Q2 2023, which means the deliberation sessions were in that window. Running the WhyChose extractor on exports from that period surfaces the decision sessions — the conversations where the engineer weighed alternatives, identified the constraint, and made the choice.

The retrospective ADR written against the deliberation session is significantly more accurate than the one written from memory. The session contains the alternatives that were actually on the table, not the ones the engineer remembers being on the table. It contains the constraint that was decisive in the conversation, not the one that sounds most defensible two years later. It often contains the specific acknowledgment of the trade-off that was accepted — "I know this doesn't handle split-brain well but we don't expect to be in that situation for at least a year" — which is exactly the material the Consequences section should contain.

The path: use the post-mortem timeline to narrow the date range, export the relevant ChatGPT or Claude history for that period, and run the extractor. The sessions that match the architectural decision will show up in the extracted decision log. The original deliberation is the authoritative source for the ADR's Alternatives Considered, Constraint, and Consequences sections — more authoritative than memory, and more accurate than the rationalized version the post-mortem narrative tends to produce.

The specific step of exporting the ChatGPT conversation history for the relevant time period takes about two minutes. The export includes all conversations, and the extractor identifies which ones contained architectural deliberation. For a decision made in Q2 2023, the relevant sessions typically cluster within a two-week window around when the implementation started — the deliberation happens just before the work begins.

The post-mortem that produces no ADR is incomplete

The standard post-mortem format — timeline, root cause analysis, contributing factors, action items — is well-designed for incident response. It captures what happened, why it happened, and what the team will do to prevent recurrence. But it does not capture what should be documented permanently about the decision that created the failure mode. Action items age out of relevance as they're completed. The post-mortem archive is an incident history, not a decision history. Future architectural decisions can't cite a post-mortem the way they can cite an ADR.

A post-mortem that produces no ADR has externalized the learning into a format that the architecture process can't consume. The team knows what happened. Future decisions won't benefit from it, because future decision-makers look at ADRs when making architectural choices — not at the incident archive.

The ADR lifecycle makes this connection explicit. When a post-mortem reveals that the original decision was made on a false or outdated constraint, the post-mortem-generated ADR supersedes the original: "ADR-0042 is superseded by ADR-0107, written following the 2025-11-14 cache availability incident, which revealed that the per-node session storage assumption documented in ADR-0042's Constraint section no longer holds given current traffic patterns." The post-mortem becomes part of the ADR's evidence chain, not just an incident archive entry.

Why the cultural separation persists

Post-mortems are owned by the on-call rotation and SRE function. ADRs are owned by the engineering or architecture function. In most engineering organizations these are different people with different processes and different review cadences. The post-mortem closes when the action items are assigned. The ADR starts when an architectural decision is being made. The two processes don't have a formal handoff, so the handoff doesn't happen.

The forcing function property of the blank ADR template helps here: if the post-mortem template includes a section — "ADRs produced by this incident: [ ]" — the section is a forcing function that makes the gap visible. A post-mortem filed with that section empty has explicitly not produced an ADR. The section being there makes the omission a choice rather than an oversight.

The ADR review process should also have a post-mortem check: for any ADR that supersedes an existing record or that addresses a known failure mode, does the ADR link to the relevant post-mortem? A decision record that documents a constraint that was revealed by a production failure should cite the incident as evidence — it makes the constraint more credible and gives future reviewers access to the full context.

The practical workflow

The specific process that closes the loop between post-mortems and ADRs:

During the post-mortem. Before the meeting closes, one team member is assigned the ADR task explicitly — not as part of the action items, but as a separate deliverable: "Write the three ADR drafts within 48 hours." The three drafts are the original decision ADR, the deferral ADR if one applies, and the incident response ADR. The assignment is explicit because ADR writing doesn't fall naturally into the on-call rotation's process without being named.

Within 48 hours. The assigned engineer writes the ADR drafts while the post-mortem analysis is still fresh. The drafts are informed by the post-mortem document (the root cause analysis, the contributing factors, the timeline), and by the AI chat extraction for the original decision period. The drafts are shared in the same channel as the post-mortem follow-up so the team can review them while the incident is still in working memory.

Cross-referencing. Each ADR links to the post-mortem as evidence for its Constraint section: "This constraint was identified following the 2025-11-14 incident — see [post-mortem link]." The post-mortem is updated to include links to the ADRs it produced: "Architectural follow-through: ADR-0107 (session storage constraint), ADR-0108 (deferral of distributed coordination)." The two documents reference each other as part of the same evidence chain.

For future architectural decisions. When a decision is being made that touches the same system or constraint domain, the ADR-from-post-mortem is cited in Alternatives Considered or Constraint: "ADR-0107 documents the session storage constraint that produced the 2025-11-14 incident; the alternative we're rejecting here would recreate that constraint in the message queue layer." The incident's learning reaches future decisions through the ADR, not through the post-mortem archive.

What the decision log looks like when this is working

The difference between a decision log built with post-mortem integration and one without it is visible in the Constraint sections. ADRs without post-mortem integration tend to have Constraint sections that state the rationale for the chosen option. ADRs with post-mortem integration have Constraint sections that name the specific failure modes the constraint was designed to prevent — and link to the incidents that revealed the constraints that earlier decisions had missed.

The decision log becomes a record of both what the team decided and what it learned — including the expensive lessons that only production incidents can teach. A new CTO inheriting a system can read the ADR archive and see not just why each decision was made, but which ones were revised after incidents, which constraints were implicit before they broke, and which deferral decisions accumulated into the current system state. That's a significantly more useful artifact than a separate incident archive and a separate ADR archive that don't cross-reference each other.

The post-mortem is expensive to produce. It requires engineering time under pressure, a structured retrospective meeting, and follow-through on action items. The ADR costs almost nothing by comparison — it's thirty minutes of writing that converts the post-mortem's analytical work into a format the architecture process can consume. Most teams pay the full post-mortem cost and then leave the ADR conversion undone. The window for accurate ADR writing closes in 48 hours, and so does most of the return on the post-mortem investment.

Further reading