OPEN-SOURCE CORE · MIT

whychose-extractor

The ~500-line Node CLI that turns a ChatGPT or Claude export into a structured decision log. Same engine the hosted product uses. Read the source, run it locally, keep your data on your laptop.

v1.0.0 MIT licensed Zero dependencies Node ≥ 18 Runs 100% locally

Download tarball (v1.0.0) Read the source Full README

Install & run — 60 seconds

curl -sL https://whychose.com/extractor/whychose-extractor-v1.0.0.tar.gz | tar -xz
cd whychose-extractor
node bin/extractor.js sample-chatgpt.json

No npm install. No build step. No native bindings. Node 18+ is the only requirement — if you have a modern laptop you're set.

Run it against your own export:

# ChatGPT: Settings → Data Controls → Export → unzip → conversations.json
node bin/extractor.js ~/Downloads/conversations.json > decisions.json

# Claude: Settings → Account → Export data → unzip → conversations.json
node bin/extractor.js ~/Downloads/claude-export/conversations.json --format=md > decisions.md

Browse the source

Every file is plain text, served straight by the web server. Click to read — no login, no account.

README.mdWhat it is, install, quickstart bin/extractor.jsThe whole CLI — ~500 lines, zero deps schema.jsonDecisionRecord shape (JSON Schema draft-07) patterns.mdEvery regex + heuristic, with rationale sample-chatgpt.jsonTest input — realistic ChatGPT export shape sample-claude.jsonTest input — realistic Claude export shape sample-chatgpt-output.jsonGolden output — what the ChatGPT sample produces sample-claude-output.jsonGolden output — what the Claude sample produces sample-claude-output.mdSame Claude sample rendered as markdown package.jsonNo runtime deps; `npm test` runs the diff check LICENSEMIT

Flags

Flag	Values	Default	What it does
`--sensitivity`	`normal`, `high`	`normal`	`high` includes `confidence: low` records. More recall, more false positives.
`--format`	`json`, `jsonl`, `md`	`json`	Output shape. `jsonl` = one record per line. `md` = human-browsable markdown.

Output — the DecisionRecord shape

Every record follows schema.json. One real example, extracted from the bundled Claude sample:

{
  "id": "claude-20260203-d7cf9e98",
  "date": "2026-02-03",
  "source": "claude",
  "chat_title": "Kubernetes or Fly.io for the next deploy",
  "question": "kubernetes vs fly.io",
  "chosen": "kubernetes",
  "rejected": ["fly.io"],
  "trade_offs": [],
  "confidence": "medium",
  "snippet": "assistant: If the rest of infra is on k8s and CI is wired, ...\nuser: Agreed. Sticking with kubernetes. The scale-to-zero thing was cool but...",
  "tags": ["infra"]
}

Privacy — this CLI is local-only

The extractor makes zero network requests. You can verify this yourself:

grep -nE 'require\(|fetch|http\.|https\.' bin/extractor.js

Only fs, path, and crypto show up — all stdlib. No telemetry. No phone-home. The ChatGPT or Claude export you feed in never leaves your laptop.

If you then upload the extracted decision records (not the transcript) to the hosted product at whychose.com, that's 5–50 short strings per quarterly export — not the raw chat. See whychose.com/privacy for the full version.

Known misses (v1)

Documented openly so you know what to expect:

Long multi-turn decisions (20+ messages between the question and the commit). The commit-search window is 6 messages.
Implicit decisions — "ok let me scaffold this" with no commit phrase.
Non-English transcripts.
Decisions phrased as statements, not questions — "I think we should use Postgres" followed by "yeah ok".

If your export has a common case we're missing, send us a redacted snippet via @bitinvestigator on X. The pattern library gets tightened every time a real miss shows up.

License

MIT. See LICENSE.

What about the hosted product?

This CLI extracts decisions to stdout. The hosted product at whychose.com wraps the same engine with:

A browser-side upload UI — no CLI, no tarball
A searchable, filterable decision log
Shared decision logs for teammates via a private link
Export to Notion, Linear, or Obsidian

The CLI is strictly a subset of the hosted product. If you want a UI and multi-device sync, see pricing. If you want to keep everything on your laptop — you're already done.