Topic: extract decisions from chatgpt
How to Extract Decisions from Your ChatGPT Chats
You've had several hundred ChatGPT conversations. The architectural decisions you actually made are in there somewhere — behind the scratch thinking, the clarification questions, and the dead-end tangents. Here's how to pull them out.
TL;DR
Export your chats from chatgpt.com → Settings → Data Controls → Export data. Open the resulting conversations.json and run an extractor over it — either a regex pass looking for phrases like "we'll go with X over Y because…", or a small LLM pass that emits a JSON Schema. Typical result from a year of use: 20–80 durable decision records, each with the original chat snippet attached. The open-source CLI at whychose.com/extractor does this locally in ~500 lines of dependency-free Node. The hosted product wraps the same engine with team sharing and Notion/Linear export.
Why this matters
Every senior engineer uses ChatGPT 15–30 times a week to think through trade-offs: stack choices, pricing, hiring, architecture migrations. Six months later a new teammate asks "why did we pick Postgres over Mongo?" — and the reasoning is gone. It lives somewhere in ~800 chats, across ~30,000 messages. CMD+F doesn't work because you don't remember the exact phrasing. ChatGPT's native search retrieves conversations, not the structured reasoning inside them. The value of a decision record is not the thinking itself — you already did that — it's having it findable in September when someone on the interview panel asks.
How to approach it
- Export the archive. Settings → Data Controls → Export data. OpenAI emails a ZIP link within 30 minutes. The interesting file inside is
conversations.json— an array of objects with amappingfield holding the message tree. - Flatten conversations to linear text. Each conversation's
mappingis a DAG (because of edits/branches). Walk from the root down each branch, keep only user + assistant messages, join them as plain text. This step is the one most hand-rolled scripts get wrong — you end up with duplicate messages from edit forks. - Run the extractor. Two approaches work. (a) Regex-first: scan each flattened conversation for phrases like
"we'll go with","let's pick","the answer is","decided to","vs"within ~40 tokens of each other. Noisy but zero-cost. (b) LLM pass: feed each conversation into Claude or GPT-4o-mini with a JSON Schema asking for{title, chose, rejected, rationale}[]. More accurate, costs ~$0.001/conversation. - Dedupe and tag. Most decisions get revisited 2–4 times over weeks. Group by normalized-topic string and keep only the last-reached conclusion. Add tags for downstream filtering:
stack,pricing,hiring,migration. - Link back to source. For every record, keep the conversation ID + message index so reviewers can click through to the original chat. Without the backlink, the extract is a nice summary; with it, it's a real audit trail.
How WhyChose helps
WhyChose is the productized version of the above. You drop your conversations.json into the browser, extraction runs client-side (we never see your transcripts), and you get back a searchable decision log with the original chat snippets attached. Pro tier exports to Notion, Linear, and Obsidian; Team tier adds shared decision logs with per-teammate access. If you'd rather self-host, the extractor is MIT-licensed — download the tarball, run node bin/extractor.js conversations.json, keep your data off our servers entirely. The hosted and open-source paths run the same extraction engine; the difference is UI, storage, and team features.
Related questions
Does this work with Claude exports too?
Yes. Claude's export ships conversations.json with a flatter shape (no mapping DAG — messages live under chat_messages[] per conversation). Our extractor handles both formats and emits the same output schema. Claude export walkthrough →
What about Perplexity or Gemini?
Perplexity has no user-facing export as of April 2026. Gemini's Takeout export is partial and inconsistent. For now, WhyChose focuses on ChatGPT + Claude, where the exports are first-class. We track Perplexity's changelog and will add support when they ship a stable format.
Do you see my chat content?
In the hosted product, extraction runs in your browser — the raw transcript never crosses our network. We persist only the extracted records (short strings: title, chose, rejected, rationale, timestamp) and a hash of the source message for de-dup. In the open-source CLI, nothing leaves your machine.
Why not just let ChatGPT summarize it?
Asking ChatGPT "summarize my decisions" in a fresh chat doesn't work — it can only see the current context, not your history. The export-and-extract pattern works because you control the corpus and the prompt is the same across every conversation (deterministic output, reproducible results).
Further reading
- How to export your ChatGPT history (2026 guide) — the step-by-step on the OpenAI side.
- How to export your Claude conversations — the Anthropic equivalent, with the JSON shape difference explained.
- conversations.json field reference — the schema-level companion to this page; the leaf-walk JS lives here.
- How to search your ChatGPT history — what to do before jumping straight to extraction; covers the lighter sidebar / ripgrep / jq levels.
- ADR example: Postgres vs MongoDB — what a single extracted record looks like, fully written out.
- How to extract decisions from your Claude conversations — the symmetric Anthropic-side guide; flat
chat_messagesshape, Artifact-as-decision framing,project_uuidgrouping. - Gemini conversation export — the third-platform precondition; Gemini's HTML-only export normalized via the parsing script lets the same regex + LLM passes run on Google's chat history.
- ChatGPT Projects export — the Project-scoped variant; preserve
project_idas a first-class column in the decision log so per-Project decision queries are one filter, not a manual group-by; the custom instructions from each Project's README scope decision-relevance scoring to the Project's intent. - ChatGPT Team export — differences from Plus, workspace admin flow, and the Compliance API — workspace exports add
created_by_user_idandparticipants[]as filterable columns alongsideproject_id; per-Project decision audits become one filter, per-member decision-history reviews become two filters, and the audit log integrates as a second source for "who had access to the rationale when this decision was made." - The open-source extractor — ~500 lines of Node, zero deps, MIT licensed.
- ChatGPT Custom GPTs export — conversations vs configurations — Custom GPT conversations are filtered by
gizmo_idin the extractor output; if you built a Custom GPT specifically for architecture or decision-making work, this companion page explains how to use the GPT Builder export to back up the configuration and howgizmo_idbecomes a per-GPT decision journal filter. - Perplexity conversation export — how to save your AI research history — Perplexity research threads (often the research step before a ChatGPT decision conversation) have no batch export path; the extractor processes Perplexity content as plain-text paste input. If your decision workflow uses Perplexity for research and ChatGPT for synthesis, this page covers how the two layers fit together in a complete decision record.
- Gemini Workspace export — Google Vault, admin console, and enterprise data portability — for decision workflows that include Gemini for Workspace: the Vault MBOX export path, the Python MBOX parsing script that extracts conversation text, and how the plain-text output integrates with the extractor as a paste input. Teams that use Gemini for research and ChatGPT for synthesis have Gemini history available in MBOX form — parse it, paste the conversation text, and the extractor identifies the decision-shaped content.
- ChatGPT web search in conversations.json — tether content types, cited URLs, and extraction recipes — the extraction companion for conversations where ChatGPT cited web sources: the tether_browsing_display and tether_quote nodes contain the research evidence (query, cited URLs, quoted passages) behind the decisions the extractor surfaces. Understanding the tether node schema improves decision extraction quality — a record that captures "chose Valkey because of the Redis licensing change" is more defensible when the tether_quote node confirming the RSAL license is extractable alongside the decision text.
- ChatGPT voice mode in the data export — transcripts, what's missing, and how to process them — voice conversations export as plain text in conversations.json; audio is never stored; how to identify voice turns, understand Whisper transcription quality, and extract decisions from voice-heavy sessions.
- GitHub Copilot Chat export — why there isn't one, and where your history actually lives — for engineers who make architecture decisions in their IDE: Copilot Chat conversations are stored locally in VS Code's workspace storage, not exported by ChatGPT's data archive (different product). If the decision happened in Copilot Chat rather than a ChatGPT or Claude conversation, deliberate at-the-time capture is the only path — the extractor on this page only processes ChatGPT and Claude exports, not Copilot Chat's local storage.
- ChatGPT Canvas export — documents in conversations.json and decision-drafting workflow — Canvas (the collaborative document editing mode) is a high-value surface for the extractor because Canvas sessions often contain the structured decision language the extractor targets: trade-off comparisons, Alternatives Considered sections, explicit Consequences enumerations. Canvas documents appear in conversations.json as long assistant messages in the Canvas conversation thread — the extractor processes them alongside regular chat conversations. This companion page explains the Canvas message structure, the jq recipes for identifying Canvas-heavy conversations before running the extractor, and why Canvas-drafted ADRs typically pass decision extraction on the first pass without the noise-filtering that informal chat conversations require.