Blog · 2026-06-06 · ~10 min read

The quarterly decision review: a 30-minute ritual for engineering teams

You ran the extractor on your first ChatGPT or Claude export. You got 30-odd records. Some of them are genuinely surprising. Now you have a new problem: what do you do with them? How often do you run this again? Which records warrant a full ADR write-up? And how do you keep the log useful a year from now when you've accumulated 150 records across four quarters? Here's the ritual — the exact 30-minute workflow, the four triage categories, and what the second and third quarterly review look like once the practice is established.

TL;DR

Run extraction quarterly — not more, not less. The workflow is five steps: request your export (5 min, happens the day before), run the extractor (3 min), triage output into four buckets — Promote / Link / Park / Dismiss (12 min), write up 2–5 promoted records in your ADR format (8 min), archive and set the next quarter's reminder (2 min). Total: 30 minutes per quarter per person. The second quarterly review changes character — once you have an existing log, you spend less time on fresh records and more time checking new extractions against what already exists.

Why quarterly (not weekly, monthly, or ad hoc)

The cadence is set by two constraints that pull in opposite directions: export friction and memory accuracy.

Export friction sets the floor. Requesting your ChatGPT export takes 30 seconds, but the download link arrives 18–24 hours later. You can't extract on a whim the moment you decide to do a review. This means daily or even weekly reviews are impractical — they require maintaining a calendar habit around a 24-hour delay that will break the cadence within a month. Any cadence shorter than a week is also wrong on the content side: decisions made yesterday haven't matured. The immediate aftermath of a decision is when you're most convinced it was right. Three months later is when you have enough perspective to evaluate whether the decision created the constraints you thought it would.

Memory accuracy sets the ceiling. At 12 months, the triage pass shifts from recognition to reconstruction. With records made 4–5 months ago, you can typically read the extracted title and immediately remember the full context: who was in the conversation, what the trade-offs were, which alternative you almost chose. At 14 months, many records produce only a vague sense of familiarity without the specific reasoning. The triage pass at 12 months is still useful, but the annotation step — adding revisit conditions, tagging the domain, filling in context the extractor missed — becomes guesswork rather than recall. Quarterly is the boundary where the annotation pass is still accurate.

Planning cycle alignment is the practical bonus. Quarterly reviews coincide with the review rhythms most engineering teams already have — sprint retrospectives aggregate to quarterly check-ins, annual planning happens in four planning cycles, and many teams run a quarterly "tech debt triage" that the decision review can attach to. Running a new ad-hoc ritual always fails; attaching the 30-minute review to an existing calendar anchor survives. The decision review is short enough to append to the end of a quarterly retrospective without extending it meaningfully.

The 30-minute workflow

The session is split across two days: the export request happens the day before, and the review happens once the export arrives. This isn't a weakness — it's useful forcing function. The day you submit the export request is the day you block the review time on the calendar. Both happen together or neither does.

Day before: Request the export (5 minutes)

For ChatGPT: Settings → Data Controls → Export Data → Confirm. You'll get an email with a download link within 24 hours — the full walkthrough covers the ZIP contents and what to do if the request doesn't arrive. For Claude: Settings → Privacy → Export Data → same 24-hour wait. The Claude export guide covers the flatter format (no mapping DAG, linear conversation array). While you're waiting, block 30 minutes on tomorrow's calendar explicitly as "decision review" — not as a task in a notes app, but as a calendar block that will collide with other meetings and force a reschedule rather than a quiet drop.

Step 1: Run the extractor (3 minutes)

Once the export arrives, download and unzip it. The file you need is conversations.json. Run the extractor locally:

node bin/extractor.js --input conversations.json --output ./decisions --format jsonl

On a typical quarterly export (250–400 conversations since the last run), this takes 45–90 seconds. The output is a JSONL file — one JSON object per extracted decision, each with a title, the extracted choice, rejected alternatives with rejection reasons where the extractor found them, a confidence score, and a pointer back to the source conversation ID. For the first quarterly run on a full year of history, expect 25–50 records. For subsequent quarterly runs, expect 8–18. The count per quarter tends to stabilise once you've extracted the backlog — the ongoing quarterly extraction surfaces the decisions you made in the most recent 90 days, not the accumulated years of reasoning that the first run pulls from. A detailed walkthrough of what the numbers mean is in the 1,200-chats post.

Step 2: The triage pass (12 minutes)

Open the JSONL output in a text editor or pipe it through jq '.title' to get a title-only list first. Read each title, skim the extracted choice and the first rejected alternative. You're sorting into four buckets — the sorting itself should take 15–30 seconds per record:

The Dismiss pass goes fast — you'll recognise false positives immediately. The time goes to the boundary cases between Promote and Park. The rule of thumb: if you can state the constraint the decision creates for future work in one sentence from memory, it's ready to promote. If you need to re-read the source conversation to reconstruct the constraint, park it.

Step 3: Write up the promoted records (8 minutes)

For each Promote record, open your ADR template — Nygard format, MADR, or whatever format your team uses — and fill in four fields:

  1. Decision: One active-voice sentence. Not "We decided to evaluate Postgres" — "We use Postgres for the primary OLTP store."
  2. Context: What made this a decision worth making. The constraint or question that forced a choice. Usually one paragraph — the extracted rationale field is a good starting point.
  3. Rejected alternatives: At least one explicit entry from the extracted record, with the rejection reason. This is the load-bearing part of the ADR — without it, the record is a press release, not a decision document.
  4. Revisit condition: The specific trigger that would make this decision worth revisiting. "When we hire a DBA" or "if query latency exceeds 100ms p95 for 3 consecutive days" or "when the team grows past 15 engineers." A decision record without a revisit condition is one you'll either follow forever or re-litigate from scratch when someone on the team gets frustrated with it. Naming the trigger makes it a living rule rather than a frozen artifact.

The extracted record already has rough versions of all four fields — the ADR write-up is an editing pass, not a writing pass. Budget 90 seconds per promoted record for the first draft; a short review session later in the week is fine for polish. Don't let the polish step become a reason to delay writing the first draft — a rough ADR committed today is worth more than a polished one that hasn't been written yet, for exactly the reasons described in the ADR staleness post.

Step 4: Archive and set the reminder (2 minutes)

Move the triage output to a decisions/quarterly/YYYY-QN.jsonl file in your repo. Commit the promoted ADRs to doc/decisions/. Set a calendar reminder for 90 days from today labelled "Q[N+1] decision review — submit export request today." That's it — the cadence is self-reinforcing once the calendar anchor is in place.

Which records rise to ADR level

The Promote/Park/Dismiss split is where most engineers spend the most time on their first triage pass, and where the most mistakes happen. Over-promotion is the common failure mode: treating every extracted record as worthy of full ADR ceremony defeats the purpose of having two tiers. Under-promotion is also a failure mode, but it's less common on the first run — the first extraction tends to surface genuinely significant decisions that were never documented, and most engineers recognise them on sight.

Four properties reliably indicate a record is worth promoting to full ADR:

  1. It names a rejected alternative explicitly. A record that says "chose Postgres" is a statement. A record that says "chose Postgres over CockroachDB because operational complexity" is a decision. The extractor filters for this, but confidence is low on some records. If the extracted record doesn't name at least one rejected alternative, check the source conversation — if the conversation didn't consider alternatives, the decision may not warrant an ADR.
  2. It creates a constraint you can state. The decision tells future engineers not just what was chosen, but what it rules out. "We use Postgres for the primary store" constrains future service design: new services that need persistence can default to Postgres without re-evaluating alternatives from scratch. If you can't state the constraint the decision creates, it's not load-bearing enough for the ADR tier.
  3. A new team member would behave differently knowing it. This is the simplest test. Onboarding engineers ask "why do we do things this way?" The ADR tier exists to answer that question without requiring a conversation with the original decision-maker. If the answer to the record's question would change a new engineer's behavior, it belongs in the searchable formal record. If the new engineer would arrive at the same decision independently without knowing it, it probably belongs in the lighter decision log.
  4. It documents a deliberate "not building this" choice. These are the most valuable records and the most commonly undocumented. The extracted records that say "decided not to build real-time sync" or "decided not to add SSO until we have 10+ enterprise accounts" are the decisions that produce the most friction when re-litigated — because the original reasoning for not building something is rarely obvious to someone who wasn't in the conversation. Promote every "not building X" record that still applies.

The when-to-write-an-adr guide covers these thresholds in more detail, including the three conditions under which a decision that passed the "promote" threshold can be safely held at the lighter tier instead.

The second quarterly review: what changes

The first quarterly review is mostly discovery — you're finding out what's in your decision history. The second quarterly review is different in character because you now have a decision log to check against. This changes two parts of the workflow.

The Link category becomes load-bearing. On the first run, the Link bucket is small because there are few existing records to link to. By the second run, maybe 20% of the extracted records will be decisions that refine, extend, or depend on a first-quarter record. A pricing revision decision from Q2 should carry a reference to the original pricing ADR from Q1. A new infrastructure choice made in the context of the Postgres-over-CockroachDB decision from Q1 should cross-reference that record explicitly. These cross-references are what turn a list of records into a knowledge graph — and they only happen during the triage pass, not automatically.

The obsolescence check becomes necessary. Every quarterly review should include a pass over the existing promoted ADRs to check for records that are no longer operative. A decision that was made for a constraint that no longer exists — a library that's been replaced, a team structure that's changed, a product feature that's been retired — should be marked with a deprecation status, not left in the active log. A stale-but-active record is worse than no record: it misleads the exact engineer who most needs the decision log — the new team member who hasn't yet learned which records are outdated. The obsolescence check takes about 2 minutes on a log of 20 records, 5 minutes on 50.

The third quarterly review is when the practice tends to either become self-sustaining or die. By quarter three, you've had enough triage experience to get the triage pass down to under 10 minutes, the ADR write-ups are routine rather than effortful, and the cross-reference pass has produced enough links that the log starts answering questions you hadn't anticipated asking. If it hasn't produced that payoff by quarter three, the most common reasons are: the log is stored somewhere nobody looks (not in the repo), records are too abstract (no rejected alternatives), or there's no shared team habit of checking the log before making a decision. All three are fixable, and all three are problems that hand-written ADR practices have too — they're not specific to the extraction approach.

Three anti-patterns that kill decision logs

A few failure modes are specific to the extraction-based practice and worth naming explicitly.

Extracting without triaging. Running the extractor and committing the raw JSONL output to the repo without a triage pass is the most common mistake on the first run. The raw output has false positives, missing revisit conditions, and records without enough context for someone who wasn't in the conversation. Committing it unreviewed trains future engineers that the decision log is noisy — once that impression is set, people stop checking it. The 12-minute triage pass is not optional.

Promoting too many records. If the first quarterly run produces 40 extracted records and you promote 25 of them to full ADRs, you've turned the triage pass into the same high-ceremony documentation practice that teams abandon after 60 days. Promote 2–5 per quarter. The rest go in the lighter decision log tier — the decision log template covers the right format for these. The two-tier structure is the design constraint that makes the practice sustainable.

Records without rejected alternatives. An ADR that says "We chose Postgres" without a rejected alternative is a record of what happened, not a record of why. The only question that justifies the ceremony of an ADR is "why did we do this instead of the obvious alternative?" — and that question requires the rejected alternative to be named. The extractor tries to surface rejected alternatives from the source conversation; if it can't find one, check the source conversation manually before promoting. If the original conversation genuinely didn't consider alternatives, the record may not warrant the ADR tier.

Run your first extraction now. The open-source extractor takes a conversations.json from ChatGPT or Claude and emits a structured decision log. ~500 lines of dependency-free Node, MIT-licensed, runs locally — nothing sent to a server. Download your ChatGPT export or your Claude export, run the extractor, and spend 12 minutes on the triage pass. The quarterly cadence is easy to maintain once you've done it once. Or join the waitlist for the hosted version with team sharing, Notion / Linear export, and search across all quarters.

Related questions

How long does the triage pass actually take?

For a typical first extraction of 12 months of AI chat (30–50 extracted records), the triage pass runs 12–20 minutes. For ongoing quarterly extractions (8–15 new records per quarter), it's closer to 8 minutes. The time is dominated by the Promote decisions — records where you need to decide whether they warrant the fuller ADR treatment — not by the Dismiss pass, which goes fast. Budget 30 minutes total for the first run, 20 minutes for subsequent quarters.

What if I only get 5-10 records in my first quarterly run — is that enough to be worth it?

Five durable decisions with their rejected alternatives and revisit conditions is worth more than fifty undocumented ones. The value isn't in the quantity of records — it's in having the reasoning available when someone asks "why did we do this?" The question isn't whether 10 records is enough; it's whether those 10 records answer questions your team currently has to answer from memory. If two of them prevent a re-litigation or answer an onboarding question, the 30-minute investment paid for itself in the first use.

My team has been using ADRs for years. How does extraction fit with an existing ADR practice?

Extraction fills the gap that an existing ADR practice leaves open: the decisions that happened in AI chat sessions and never made it into a formal write-up. In a team with an active ADR practice, the typical split is 8–15 hand-written ADRs per quarter and 20–40 chat decisions that were never written up. The quarterly extraction surfaces the second category without disrupting the first. The triage pass tells you which of those chat decisions rise to ADR level (add them to your existing doc/decisions/ repo) and which belong in a lighter decision log.

Should I use the extracted records as-is, or rewrite them in my ADR format?

Use them as-is for the decision log tier — the extracted format (title, choice, rejected alternatives, rationale, revisit condition) is already sufficient for the 80% of records that don't warrant ceremony. Rewrite in your ADR format only for records you're promoting to the ADR tier, because those go into the team's formal doc/decisions/ repository where format consistency matters for search and tooling. Two tiers, two formats: the promoted records get your standard MADR or Nygard template; the archived records stay in the leaner extracted shape.

Further reading