# patterns.md — decision-detection heuristics (v1)

This is the list of regular-expression patterns and string heuristics the
extractor uses to turn a chat transcript into `DecisionRecord` objects.
It's deliberately conservative: v1 optimises for **precision over recall**
so users don't get drowned in false-positive records on their first export.
The tradeoff is that some genuine decisions are missed — that's what the
`sensitivity` flag is for, and that's what the open-source issue tracker
is for.

A decision thread is detected in two passes:

1. **Question pass** — find a user message that matches one of the
   *question shapes* below. That anchors the start of a decision thread.
2. **Commit pass** — within the ~6 following messages of the same
   conversation, look for a *commit phrase* from the user. If found,
   emit a `DecisionRecord` with `confidence: medium`. If the intervening
   messages also contain *trade-off markers*, bump to `confidence: high`.

If only the question pass matches with no commit, emit `confidence: low`
(only surfaced at `--sensitivity=high` — off by default).

---

## 1. Question shapes (user messages that start a decision thread)

Matches are case-insensitive. `\b` = word boundary. Group `(?:X|Y)` names the
two options when present.

```
\b(?:should I|should we|shall I|do I|do we)\s+(?:pick|use|choose|go with|adopt|move to|migrate to)\b
\b(?:X\s+(?:vs\.?|versus)\s+Y|\w+\s+vs\.?\s+\w+)\b
\b(?:torn between|deciding between|choosing between|weighing)\b
\b(?:which is better|what is better|what'?s better)\b
\b(?:recommend (?:a|an|the)?\s*\w+)\b
\b(?:what would you (?:pick|choose|recommend))\b
\b(?:pros and cons of|trade[- ]?offs? of|pros/cons of)\b
\b(?:is it worth|worth (?:it|using|adopting))\b
```

Rejection filters (pattern matched but NOT a decision): questions inside a
code block, questions where both options are the SAME word (likely typo),
questions shorter than 8 characters (too ambiguous).

---

## 2. Commit phrases (the user messages that "settle" the decision)

The user committing is what makes a decision a decision. An assistant saying
"I'd go with X" does NOT count — that's a recommendation, not a commit.

```
\b(?:I'?ll|I will|I am going to|I'?m going to|let me|let'?s)\s+(?:go with|pick|use|choose|adopt|stick with|roll with|use|try)\b
\b(?:going with|sticking with|rolling with|picking|choosing)\s+\w+\b
\b(?:decided on|decided to (?:use|go with|pick)|settled on)\b
\b(?:we'?ll (?:go with|use|pick|adopt)|we are going with)\b
\b(?:final answer|final choice|my pick)\b
```

The *captured option* is the first noun/code-span after the commit phrase.
Falls back to the first non-stopword token if no noun is detected.

Rejection filters: commit phrase inside a code block (someone's pasting in
code that literally says "let's use X"), commit phrase from the assistant
(role: assistant), or commit followed within 2 messages by reversal
language ("actually no", "wait, scratch that").

---

## 3. Trade-off markers (bump confidence to `high`)

```
\b(?:on (?:the )?one hand|on (?:the )?other hand)\b
\b(?:pros?:?\s*\n|cons?:?\s*\n)\b        # listed pros/cons
\b(?:trade[- ]?off|tradeoff)s?\b
\b(?:upside|downside|benefit|drawback|concern)s?\b
\b(?:advantage|disadvantage)s?\s+of\b
\b(?:tempting but|attractive but|love the .+ but)\b
```

Two or more distinct trade-off markers in the decision thread → `confidence: high`.

---

## 4. Reversal markers (disqualify the commit)

```
\b(?:actually no|scratch that|never mind|changed my mind|on (?:second|2nd) thought)\b
\b(?:wait,?\s+(?:no|let me reconsider|back up))\b
\b(?:strike that|ignore (?:the|my) last)\b
```

If a commit is followed within 2 messages by any of these, drop the record.

---

## 5. Tag derivation (for the `tags[]` field)

Simple keyword bucketing — not ML, just a fixed lexicon. Each decision gets
up to 3 tags based on the highest-score buckets:

| Tag | Keywords |
|---|---|
| `database` | postgres, mysql, mongo, sqlite, redis, dynamodb, cassandra, cockroach, supabase, db |
| `architecture` | microservice, monolith, api, rest, graphql, grpc, queue, kafka, rabbitmq |
| `language` | typescript, javascript, python, rust, go, java, ruby, kotlin, swift |
| `frontend` | react, vue, svelte, next, remix, astro, tailwind, vite, webpack |
| `infra` | aws, gcp, azure, kubernetes, k8s, docker, terraform, vercel, netlify, cloudflare |
| `pricing` | price, pricing, tier, plan, subscription, annual, monthly, dollar, $ |
| `hiring` | hire, hiring, candidate, interview, offer, salary, comp, compensation |
| `product` | feature, roadmap, launch, ship, release, mvp, beta |

Threshold: 2 matching keywords in the question + chosen + snippet bumps a
tag onto the record. Max 3 tags — highest-scoring win.

---

## 6. Known misses (v1 doesn't catch these)

Documented openly so users know what to add themselves or send a PR for:

- **Multi-turn decisions that span 20+ messages** — the commit-phrase search
  window is 6 messages. Long threads with a slow crescendo will miss.
- **Decisions made implicitly by just starting to do the thing** — "ok let me
  scaffold this" with no explicit commit phrase.
- **Decisions phrased as statements, not questions** — "I think we should
  probably use Postgres" followed by "yeah ok" isn't caught.
- **Non-English transcripts** — patterns are English-only in v1.
- **Decisions where both options are shell commands or code** — the option
  extractor is tuned for nouns.

If your export has these, please open an issue with a redacted snippet at
[whychose.com/extractor](https://whychose.com/extractor). The
`patterns.md` gets tightened every time a real missed case shows up.
