Topic: convert claude export to markdown

How to Convert Your Claude Export to Markdown

Q: Why is the Claude script shorter than the ChatGPT version?

Because there's no DAG to walk. Claude's conversations.json stores chat_messages as a flat array in chronological order, so you iterate i = 0..n-1 and don't need to reconstruct render order from a parent-pointer graph. The whole render-order section of the ChatGPT script — jq's recursive walk, the reverse, the parent-chain coalescing — collapses to a single .chat_messages[] iteration. Most of the 25 lines are slug-building, output naming, and Artifact-block extraction.

Q: Should I leave Artifacts inline in the assistant text or extract them?

Extract them. Artifacts are wrapped in pseudo-XML ( … ) which renders as raw markup in any Markdown reader and breaks code-block highlighters. The script below pulls every Artifact out of the assistant text and re-renders it as a fenced code block tagged with the Artifact's language attribute, with a bold heading naming the Artifact title and identifier. The surrounding prose stays as the assistant's commentary; the Artifact body becomes a first-class code block that GitHub, Obsidian, VS Code preview, and Marked all render correctly.

Q: How do I keep per-project context when the export flattens it?

Group output files by project_uuid. The script writes to out/ / - .md when project_uuid is non-null, and to out/_unscoped/ - .md otherwise. If you also exported the project metadata (system prompt + description), drop those into out/ / as system-prompt.md and description.md so the per-project directory is a self-contained archive. The /seo/claude-project-export page covers the full project-rebuild flow if you need conversations grouped under their original system prompts.

Q: What about huge exports — does the script handle 200 MB conversations.json?

Use --stream. Anthropic exports run bigger than ChatGPT's because Artifacts ride inline in the JSON rather than as separate files; payloads of 50–300 MB are normal for power users. The naive jq -c '.[]' loads the entire file into memory and OOMs at ~500 MB on a 16 GB machine. Replace with jq --stream -c 'fromstream(1|truncate_stream(inputs))' to walk one conversation at a time without holding the full array. Slower (~2× wall-clock) but bounded memory.

You exported the JSON from Anthropic. Now you want one readable .md file per conversation, with Artifacts as proper fenced code blocks instead of raw XML, and conversations grouped by Project. Here's the 25-line script — and the four edge cases the naive version misses.

TL;DR

Claude's conversations.json is a flat array — one object per conversation, with chat_messages already in render order. No DAG, no parent-pointer walk, no regenerated branches to filter out. The interesting work is in three places: (1) extracting <antartifact> blocks from the assistant text and re-rendering them as fenced code; (2) grouping output files under project_uuid so per-Project conversations stay together; (3) using jq --stream for the 50–300 MB exports that are normal for Artifact-heavy power users. The full script is below; one file per conversation, named YYYY-MM-DD-<slug>.md, written under out/<project_uuid>/ when the conversation has a Project, out/_unscoped/ otherwise.

Why this is mostly easy (with three sharp edges)

The good news first: Claude's export is dramatically simpler than ChatGPT's. There's no mapping tree, no current_node pointer, no edit-history branches preserved as siblings. Each conversation is a flat chat_messages array in chronological order — i = 0 is the first turn, i = n-1 is the last, every entry is on the visible thread. The format reference documents the shape; the practical implication is that the ChatGPT-conversion script's recursive parent-walk collapses to a single .chat_messages[] iteration. Most of the conceptual complexity disappears.

What replaces it: three things specific to Claude that the naive port-from-ChatGPT misses.

Artifacts are inline pseudo-XML. When Claude produced a code block, schema, ADR, design doc, or HTML page in the chat UI, it shipped as an <antartifact> wrapper inside the assistant's text field. Rendering that text raw to Markdown gives you <antartifact identifier="adr-postgres" type="text/markdown" title="..."> as literal characters — every Markdown reader shows it as plain text, and any GitHub-flavored renderer treats the angle brackets as broken HTML. Pull the Artifacts out and re-emit them as fenced code blocks tagged with the right language.
Project grouping is the export's missing organizing axis. If you used Claude Projects (system prompt + knowledge base + bundled conversations), the export flattens the boundary — every Project conversation is in the same top-level array as your unscoped chats. The conversation object carries a project_uuid, but if you don't route by it, you lose the only signal that says "all of these decisions were made inside the Mobile-rewrite-Q2 Project." Group output files by project_uuid.
Exports run big. Claude payloads of 50–300 MB are normal because Artifacts ride inline rather than as separate files. The straight jq -c '.[]' approach OOMs once you cross ~500 MB on a 16 GB machine. jq --stream walks one conversation at a time and keeps memory bounded.

The script handles all three. The four edge cases at the end cover what's left.

The 25-line script

Save as claude-to-md.sh. Requires jq 1.7+. Reads conversations.json from the current directory, writes one file per conversation under ./out/:

#!/usr/bin/env bash
set -euo pipefail
mkdir -p out

# Use --stream for files >500 MB; the form below is bounded-memory.
jq -c --stream 'fromstream(1|truncate_stream(inputs))' conversations.json | while read -r convo; do
  uuid=$(jq -r '.uuid' <<<"$convo")
  name=$(jq -r '.name // "untitled"' <<<"$convo")
  ctime=$(jq -r '.created_at[0:10]' <<<"$convo")
  proj=$(jq -r '.project_uuid // "_unscoped"' <<<"$convo")
  slug=$(echo "$name" | tr '[:upper:]' '[:lower:]' \
    | sed -e 's/[^a-z0-9]\+/-/g' -e 's/^-\+//' -e 's/-\+$//' | cut -c1-60)
  mkdir -p "out/$proj"
  out="out/$proj/${ctime}-${slug:-untitled}.md"

  # Header + per-message rendering. Pre-process Artifact wrappers in awk so
  # they re-emit as fenced code blocks tagged with the Artifact language.
  jq -r --arg NAME "$name" --arg UUID "$uuid" --arg PROJ "$proj" '
    "# " + $NAME + "\n\n> conversation_uuid: " + $UUID
    + "\n> project_uuid: " + $PROJ + "\n",
    ( .chat_messages[]
      | "\n## " + (.created_at | sub("\\..*Z$"; "Z"))
            + " — " + .sender + "\n\n" + (.text // "") )
  ' <<<"$convo" \
  | awk '
      /<antartifact / {
        match($0, /title="[^"]*"/);
        title = (RSTART ? substr($0, RSTART+7, RLENGTH-8) : "Artifact");
        match($0, /identifier="[^"]*"/);
        id = (RSTART ? substr($0, RSTART+12, RLENGTH-13) : "");
        match($0, /language="[^"]*"/);
        lang = (RSTART ? substr($0, RSTART+10, RLENGTH-11) : "");
        match($0, /type="[^"]*"/);
        type = (RSTART ? substr($0, RSTART+6, RLENGTH-7) : "");
        if (lang == "" && type ~ /markdown/) lang = "markdown";
        if (lang == "" && type ~ /react/)    lang = "jsx";
        if (lang == "" && type ~ /html/)     lang = "html";
        if (lang == "" && type ~ /mermaid/)  lang = "mermaid";
        print "\n**Artifact: " title (id ? " (" id ")" : "") "**\n\n```" lang;
        in_artifact = 1; next;
      }
      /<\/antartifact>/ { print "```"; in_artifact = 0; next; }
      { print }
    ' > "$out"
  echo "wrote $out"
done

Run with ./claude-to-md.sh from the directory holding conversations.json. Expect ~80 ms per conversation on a recent laptop; a 600-conversation export takes ~50 seconds. The streaming form is ~2× slower than naive but won't OOM on the multi-hundred-megabyte exports that Artifact-heavy users get.

The four edge cases the naive version hits

Inline Artifacts as raw XML. The biggest visible failure of a port-from-ChatGPT script. Without the awk Artifact pass, the output looks like <antartifact identifier="adr-postgres" type="text/markdown" title="..."> # ADR-0007 ... — angle-bracket noise that no Markdown renderer treats as code, and no syntax highlighter recognizes. The script's awk stage detects the wrapper, parses the four common attributes (identifier, title, type, language), maps the MIME type to a sensible fenced-code language tag, and re-emits the body inside a triple-backtick block. application/vnd.ant.code with a language attribute renders as that language; text/markdown renders as markdown (which highlighters mostly leave alone, which is what you want for nested headings); text/html as html; application/vnd.ant.react as jsx; application/vnd.ant.mermaid as mermaid (which GitHub renders as a diagram).
Attachments and files aren't in text. When a human turn included an uploaded file, the user's text field is the typed prompt only — the attachment is in attachments[] or files_v2[] as a separate array of objects with file_name, file_size, and (for some types) extracted_content. The naive script silently drops them. If you want to preserve evidence that a file was attached, add a section after the message text: jq -r '.attachments[]? | "📎 attached: " + .file_name' appended inside the per-message stanza. The actual binary contents aren't in the export — only the names and (for text-extractable formats) the parsed contents.
Project metadata is in a separate file. The export ZIP ships at least three files: conversations.json, users.json, and (when applicable) projects.json with the per-Project system prompts and descriptions. The script above references project_uuid for grouping but doesn't read projects.json; if you want each out/<project_uuid>/ directory to also contain a system-prompt.md and description.md, add a pre-pass: jq -r '.[] | "out/\(.uuid)/system-prompt.md\n\(.system_prompt // "")"' projects.json piped to a small writer loop. The full project-rebuild script lives at claude project export.
Versioned Artifacts (the command="update" shape). When Claude updated an Artifact in-conversation rather than rewriting the whole thing, the export contains a partial wrapper of the form <antartifact identifier="adr-postgres" command="update" old_str="..." new_str="..."> with no body to render — the actual artifact is patched, not replaced. The awk pass above will print an empty fenced code block for these, which is misleading. If you care about reconstruction, scan for command="update" and emit a small note (**Artifact patch on adr-postgres**) in place of the empty block. The complete patching reconstruction is documented at claude artifacts export — it's a 12-line jq recipe that walks updates in order and applies them to the most recent command="create" body.

Sanity-check the output before you trust it

Three cheap checks. (1) Compare conversation count: the input has jq 'length' conversations and find out -name '*.md' | wc -l should match (minus any whose name was empty and created multiple untitled.md collisions in the same Project — disambiguate by appending uuid[0:8] to the slug if you hit them). (2) Compare visible-message count: open one conversation in the Claude UI, count human-and-assistant turns, then grep -c '^## ' out/<project>/<file>.md. The numbers should match exactly because there's no system-message filtering needed. If the markdown is lower, the conversation has truncated (truncated: true) messages — a Claude message that hit the per-message length cap and was cut off; the export preserves that, your script doesn't lose them, but the displayed message count in the UI may show a "(continued)" label that confuses the comparison. (3) Spot-check Artifact rendering: pick a conversation you know contained Artifacts and confirm the .md file shows them as fenced code blocks with a bold "Artifact: ..." heading, not as raw <antartifact> tags.

Once you have markdown, what next?

Per-conversation Markdown is the right format for two follow-up workflows. (1) Archive. The out/<project_uuid>/ structure makes a clean archive: per-Project directories, dated filenames, system-prompt and description alongside the conversations they shaped. Drop into a private repo or a Notion Backups database and you have grep-able, ripgrep-able, dated history that survives Anthropic account changes. (2) Decision extraction. The conversion gives you readable text per chat, but it doesn't surface which chats contained durable decisions vs which were scratch thinking. Decision extraction applies regex patterns and an optional LLM pass to the same flat chat_messages array and emits structured records — title, decision, options considered, source conversation, source Artifact (if any) — that read like ADRs. WhyChose's open-source extractor ships exactly this loader and treats Artifacts as first-class decision sources via the heuristic that an Artifact with command="update" blocks past the first version almost always represents a decision that locked in.

How WhyChose helps

WhyChose treats your Claude export as a source of decisions, not as a content archive. The conversion script above gets you to readable Markdown; WhyChose's extractor goes one layer further and surfaces the architectural and product calls buried in those conversations. Same flat-array iteration under the hood — the open-source extractor implements the loader documented in the format reference, then layers a decision classifier on top with explicit Artifact handling per the Artifact-as-decision framing. The hosted product adds a teammate-shareable link, Notion / Linear export, and per-Project decision queries via project_uuid. If you're already comfortable with jq, the extractor is the more honest path: it's MIT-licensed, runs locally, and you can read every regex in patterns.md.

Get early access