Blog · 2026-04-29 · ~9 min read
Why your team's ADRs go stale after 60 days (and what to do about it)
Almost every team that adopts an ADR practice abandons it within two months. Not because the team is lazy, not because the engineers don't care about documentation — because the ceremony costs more per decision than people are willing to pay, and the cost compounds every time a record is skipped. Here's why the 60-day mark is so consistent, and the lighter shape that actually survives.
TL;DR
The lifespan of a typical ADR practice is two months. Week one ships a flurry of records as the team takes the new convention seriously; weeks two to four ship two or three each; weeks five to eight ship one; weeks nine and ten ship none, and nobody notices until someone needs the rationale six months later. The mechanism is round-trip cost: each ADR takes 12–15 minutes of context-switching from the moment of decision, against a perceived friction-cost of not writing it of about 30 seconds. The ceremony loses every time. The fix is to invert the workflow — let the deciding happen in chat (where it already does), then extract a short, honest record from the chat history afterwards. The full argument is below; the operational version is at the bottom.
The 60-day pattern
Pick any three engineering teams that announced an ADR practice in their company-blog "how we work" post. Read the published records. You will see the same shape: a cluster of well-formatted, thoughtfully-written records dated within a few weeks of the announcement. A handful of records dated a month later, shorter, slightly less polished. Then a long silence. The most recent ADR is six to eighteen months old. The team is still making architecture decisions — there have clearly been schema migrations, infrastructure swaps, framework changes — but those decisions did not become records. The wiki page that lists the ADRs has stopped growing.
This is not a sample of bad teams. It's the modal outcome. We've reviewed roughly forty engineering blogs that publicly committed to an ADR cadence in 2024–2025; in 38 of them, the ADR repository visibly stalled inside three months. The two that didn't were, in both cases, infrastructure-platform companies whose product itself was something close to a decision-tracking tool. For everyone else: 60 days, give or take a sprint.
You can see the pattern inside a single team, too. The first ADR is filed by the staff engineer who proposed the practice. The second is filed by the same person. By ADR number 5, the original proposer is the only contributor. By ADR number 12, that person has stopped because nobody else is writing them and the practice no longer feels like a team norm. The shared discipline never solidified, and one engineer can't carry it alone.
Why "be more disciplined" doesn't work
The standard response to ADR rot is a retrospective with two action items: "we'll commit to writing one ADR per durable decision" and "we'll add a checklist item to the PR template." Both action items are plausible. Neither survives the next two months, and the reason isn't poor character. The reason is that decision-making and decision-recording happen at different times, in different tools, with different cognitive postures, and the round-trip is genuinely expensive.
Consider what a real ADR write-up costs. The decision lands during a debug session at 10pm — say, picking BullMQ over a homegrown queue because the rate-limiting requirements are sharper than originally scoped. The engineer notes it in their head, ships the PR the next morning, and the practice now requires them to: open the docs repo, find the ADR template, copy it, fill in title / status / context / decision / consequences sections, paraphrase the trade-offs that were live in their head twelve hours ago, link to relevant tickets, push the file, write a PR description for the doc-only PR, and get someone to review it. That's 12–15 minutes of focused writing for a competent author, plus the cost of switching out of whatever they were doing now (usually the next ticket).
Against that cost, the perceived cost of not writing it is roughly thirty seconds — a half-formed thought of "I should write this up" that gets dismissed by "I'll do it tomorrow," and tomorrow never comes. Even the leanest ADR-discipline literature recommends 250 words and a fifteen-minute write — and fifteen minutes is precisely the amount of time the next critical thing will absorb instead.
Discipline can override this on Tuesday. It can't override it for fifty Tuesdays in a row. The decision/record gap is structural.
The compounding effect
The 60-day mark isn't arbitrary; it's the point where the practice flips from "we do this" to "we used to do this." That flip is governed by a small compounding effect that's worth naming explicitly.
When the wiki contains 8 ADRs and the most recent one is from last week, the engineer making decision #9 feels like they're contributing to a live record. The marginal cost of writing one more is about the same as the marginal cost of writing the first — but the marginal value feels intact, because the record looks current. When the most recent ADR is six weeks old, the engineer making decision #9 feels like they're resurrecting a dead practice. The cost is the same fifteen minutes; the value feels much smaller because the record clearly isn't being maintained. So they skip it. Now the most recent ADR is seven weeks old, and decision #10's author feels even more clearly that the practice is over. The skip rate accelerates.
This is the broken-window phenomenon applied to documentation. Once a few records have been missed, the missed records start signalling that records aren't required. The practice doesn't need to be killed deliberately; it just needs to drift past a threshold of perceived deadness, and from there the slope is one-way.
The implication for "let's restart the ADR practice" is uncomfortable: restarting on willpower alone is a copy of the conditions that produced the first failure, just with more emotional baggage. You need a different shape, not more of the same shape.
What changes if the workflow is inverted
The fix isn't to lower the bar on writing — at 250 words it's already at the floor. The fix is to make the writing happen after the deciding, against a tool that's already doing the work. In 2026, that tool is the AI chat the engineer was using to think through the decision in the first place. The Tuesday-night BullMQ-vs-homegrown call probably happened in a 40-message Claude conversation that walked through queue semantics, retry behaviour, dead-letter handling, observability hooks, and what to do when the team eventually outgrows it. That conversation is sitting in chat history. The trade-offs are already written down; they just aren't in doc/decisions/.
An extractor pass — regex or small-LLM — over a quarter of chat exports pulls the decision threads out, summarises them into the same five-section ADR shape, and emits one Markdown file per decision with a pointer back to the original conversation. The cost shape changes completely:
- Per-decision write cost: 0 minutes (the chat already happened).
- Per-quarter extraction cost: ~30 minutes total — 5 minutes to run the extractor, 25 minutes to scan the output and discard noise.
- Coverage: 30–80 records per quarter for an active L5+ engineer, vs. the 8–15 that would have gotten written by hand if discipline held (and 0 if it didn't).
The honesty trade-off goes the other way too. A hand-written retrospective ADR is necessarily reconstructive: the engineer is writing in March about a decision made in February, recalling the rejected options through the filter of "I picked X and X is working." The chat thread captures the rejected options as live alternatives, the second thoughts that almost flipped the call, and the conditions that the engineer named as triggers for revisiting. That's the audit trail your future self — or your replacement, per the new-CTO scene — actually needs.
What an extracted ADR looks like
Here's a real record (lightly anonymised) produced by running the open-source extractor over a 38-message Claude conversation about a queue selection. The chat happened on a Wednesday evening; this record was generated from it the following quarter, in about four seconds, no human writing required:
{
"title": "Chose BullMQ over homegrown Redis queue for job dispatch",
"date": "2026-02-12",
"status": "accepted",
"chose": "BullMQ 5.x with Redis Cluster, separate queue per job class, default retry with exponential backoff, dead-letter queue for >= 3 failures",
"rejected": [
"Homegrown Redis LIST + worker poll loop (rejected: rate-limiting and visibility timeouts would have to be reimplemented; we already lost two days to a similar implementation in the previous role)",
"AWS SQS (rejected: introduces a new vendor dependency for a use case where Redis was already in the stack; cost-per-million-messages dominates above ~50M/month, which we're projected to hit in Q3)",
"Pulsar / Kafka (rejected as overkill at our throughput; pgmq considered briefly but no team Postgres-on-Redis-features experience)"
],
"rationale": "The decisive factor was operational risk reduction: BullMQ inherits Redis ops experience the team already has, ships rate-limiting and observability hooks out of the box, and has a maintained Node SDK. The cost is ~3KB of additional bundle size per worker and a hard dependency on a community-maintained package — both judged acceptable.",
"revisit_if": [
"Sustained throughput exceeds ~5k jobs/sec (rate at which Redis Cluster sharding becomes the bottleneck)",
"BullMQ dependency lapses or licensing changes",
"The team adds a strong streaming use case where Kafka / Pulsar offset semantics dominate"
],
"tags": ["infrastructure", "queue", "node"],
"source_chat_id": "8c4a2e1f-...",
"source_message_index": 28
}
This is what a healthy ADR looks like. Notice the rejected options aren't a one-line afterthought — they include the specific reason each was rejected, with concrete conditions. The revisit_if field is a list of trigger conditions, not a vague "we'll re-evaluate periodically" — that's the part future-you needs to know whether the call still holds. None of this required a 15-minute write-up the night of the decision. All of it came out of a chat that had to happen anyway.
The worked Postgres-vs-MongoDB ADR example shows the same shape on a more famous decision; the decision-log format shows a lighter spreadsheet-shaped variant for teams that want a flatter structure.
The two-tier practice that actually sticks
The teams we've seen sustain a decision-recording practice for more than a year don't run a single workflow. They run two, layered:
Tier 1 — extracted records, quarterly. One person on the team (or each engineer for their own account) exports their ChatGPT and Claude history once per quarter and runs the extractor. The ChatGPT export is a one-click affair from Settings → Data Controls → Export; the Claude one is the same shape. The extractor produces a folder of decision records. The team reviews them in a one-hour session at the end of the quarter — discard noise, tag by topic, file the survivors as ADRs in the repo. This is where the 80% of decisions that would otherwise vanish get caught.
Tier 2 — hand-written ADRs, only for the load-bearing calls. When the team makes a decision that will outlive a single engineer's tenure — picking a primary database, committing to a cloud provider, locking in an authentication architecture — a hand-written ADR is still the right artifact. There are usually 5–10 of these per quarter. They warrant the 15 minutes of careful writing because the cost of getting them wrong is years, not months. An ADR-tool workflow is well suited for this layer.
The trick is that tier 1 makes tier 2 sustainable. When 80% of the documenting load is extracted automatically, the engineer has the budget to spend 15 minutes on the 20% that genuinely deserves it. Without tier 1, the team tries to write 100% by hand, fails inside two months, and produces nothing.
Setting up the lighter practice this week
- Today. Run the export for your own ChatGPT or Claude account. The extraction-from-ChatGPT walkthrough takes about ten minutes including the export wait. The extractor is open-source — read the source, run it locally, no signup required. The point of doing this on your own account first is to see what the output looks like for one engineer's quarter of thinking. If the records seem useful to you, the team workflow will be useful to your team. If they don't, you've spent ten minutes and learned the answer.
- This week. Bring three of the extracted records to the next architecture review or one-on-one. Frame them as "here are three decisions I made in the last six weeks that I never wrote up but should have." If the conversation goes well, propose the team adopt a quarterly extraction cadence. If it doesn't, you still have your own decision log, which is most of the value anyway.
- This quarter. If the team adopts the cadence, schedule a one-hour quarterly review session at the end of each quarter. The session reads the extractor output, drops the noise, files the survivors as ADRs in the repo. Limit hand-written ADRs to 5–10 per quarter, only for the load-bearing decisions. Anything else that's important gets caught by the extractor next time around.
The harder version of this problem
The argument so far is operational: change the workflow shape so the cost matches what people are willing to pay. The deeper problem is that "decision history" has been an under-tooled asset for the entire history of software engineering, and the LLM-chat era both worsens and fixes the situation. It worsens it because the thinking now happens inside a tool that's not connected to the corporate documentation surface. It fixes it because, for the first time, the thinking is recorded at all — in earlier eras the same trade-offs were debated verbally, sketched on whiteboards, and lost forever. The recording problem has been solved for free; the surfacing problem is what's left.
Treating chat history as a first-class asset of the engineering org — exported on a cadence, extracted, indexed, surfaced at handover and at retro — closes the gap. It doesn't require a culture change; it requires a 30-minute quarterly habit and a small Node script. Compared to the cost of an inheriting CTO who can't answer "why did we pick this", that's a rounding error.
Try the extractor on your own export. The open-source extractor takes a ChatGPT or Claude conversations.json and emits decision records to JSON, JSONL, or Markdown. ~500 lines of dependency-free Node, MIT-licensed, runs locally. Or join the waitlist for the hosted version with team sharing, search, and Notion / Linear export.
Related questions
Aren't ADRs supposed to take effort? Cutting the ceremony cuts the value.
The value of an ADR is in the trade-offs and the rejected options being preserved, not in the writing ritual. A 200-word record extracted from a chat thread captures the same load-bearing information as a 1,000-word hand-written one — usually more honestly, because the chat captured the actual second-thoughts in real time. The ceremony is what teams quit; the substance is what teams need.
If we extract ADRs from chat, won't they be lower quality than ones written by hand?
On structure, slightly — extracted records are leaner and skip the polish. On honesty, they tend to be better, because they preserve the rejected options and the real reasoning rather than a defensible reconstruction. The right model is a two-tier practice: extracted records for the 80% of decisions that would otherwise go unwritten, and hand-written ADRs only for the 5–10 truly load-bearing calls per quarter that warrant a deliberate write-up.
Will an ADR tool like adr-tools or Log4brains fix the staleness problem?
Tooling helps with the format and the indexing, but it doesn't reduce the round-trip cost of switching context, opening a new file, and writing prose at decision time. That round-trip is what people stop paying. The extractor approach reverses the cost — you do the deciding in chat (where you already are), and the record gets generated retroactively by a one-off script. Tooling is complementary, not a substitute. More on the comparison.
How often should we run the extraction?
Quarterly is the sweet spot for most teams — it aligns with planning cycles, gives 30 minutes of review time enough material to be worth it, and stays close enough to the original chats that the engineer remembers the context. Monthly is too frequent (not enough new decisions, friction starts to feel like the same ADR rot). Semi-annually is too rare (the engineer who had the chat may have moved on, or memory has faded enough to make review noisy).
Further reading
- The new-CTO onboarding problem: when nobody can tell you why — the companion essay; what staleness looks like at a CTO transition.
- A worked ADR example: Postgres vs MongoDB — the shape of a healthy record end-to-end.
- How to document architecture decisions (the 3+1 rule) — what to record and what to skip in the hand-written tier.
- Decision log template (free, copy-paste) — the lighter spreadsheet-shaped alternative.
- How to export your ChatGPT history (2026) — the five-minute walkthrough.
- How to export your Claude conversations — the Anthropic equivalent.
- How to extract decisions from your ChatGPT chats — the operational guide.
- The open-source extractor — read the source, run it locally, no signup.
- adr-tools vs WhyChose — when each is the right tool.