Topic: convert chatgpt export to markdown
How to Convert Your ChatGPT Export to Markdown
You exported the JSON. Now you want one readable .md file per conversation, in render order, without the regenerated branches and the hidden system boilerplate. Here's the 30-line script — and the four edge cases that break the naive version.
TL;DR
ChatGPT's conversations.json stores each chat as a DAG (a mapping object plus a current_node pointer), not a flat array. Walking parent-pointers from current_node back to the root gives you the visible thread; everything else is regenerated branches you don't want in the output. Filter out author.role == "system", concatenate content.parts[] for assistant turns, render with one heading per message, and write one file per conversation named YYYY-MM-DD-<slug>.md. The full script is below.
Why a one-line dump won't work
The first instinct — jq '.[] | .mapping[] | .message' — fails for three reasons that show up immediately. (1) Messages aren't sorted. The mapping object is keyed by uuid and iterates in insertion order, which is roughly creation order, but creation includes every regenerated assistant turn you discarded. A 40-message chat in the UI becomes 120 messages in the dump. (2) The visible thread is selected by current_node, not by recency. ChatGPT records every "Regenerate response" you clicked as a sibling node; only the path from root to current_node is the chat you actually had. (3) System messages are noise. The first node after the conversation root is almost always a system-role message containing safety boilerplate and your Custom Instructions; rendering it pollutes the file with content the user never wrote.
The shape of the export is documented in the ChatGPT conversations.json field reference. The script below assumes that schema and walks it correctly.
The 30-line script
Save as chatgpt-to-md.sh. Requires jq 1.7+. Reads conversations.json from stdin, writes one file per conversation into ./out/:
#!/usr/bin/env bash
set -euo pipefail
mkdir -p out
jq -c '.[]' conversations.json | while read -r convo; do
id=$(jq -r '.id' <<<"$convo")
title=$(jq -r '.title // "untitled"' <<<"$convo")
ctime=$(jq -r '.create_time | strftime("%Y-%m-%d")' <<<"$convo")
slug=$(echo "$title" | tr '[:upper:]' '[:lower:]' \
| sed -e 's/[^a-z0-9]\+/-/g' -e 's/^-\+//' -e 's/-\+$//' | cut -c1-60)
out="out/${ctime}-${slug:-untitled}.md"
# Walk current_node back to root, then reverse — gives render order.
jq -r --arg ID "$id" --arg TITLE "$title" '
def walk(node):
if node == null then []
else [node] + walk(.mapping[node].parent) end;
walk(.current_node) | reverse as $path
| "# " + $TITLE + "\n\n> conversation_id: " + $ID + "\n"
, ( $path[]
| .mapping[.] as $n
| $n.message
| select(. != null)
| select(.author.role != "system")
| select((.content.parts // []) | length > 0)
| "\n## " + (.create_time | strftime("%Y-%m-%d %H:%M %Z"))
+ " — " + .author.role + "\n\n"
+ ( (.content.parts // [])
| map(if type == "string" then . else (.text // "") end)
| join("\n\n") )
)
' <<<"$convo" > "$out"
echo "wrote $out"
done
Run with ./chatgpt-to-md.sh from the directory holding conversations.json. Expect ~150 ms per conversation on a recent laptop; a 1,200-chat export takes ~3 minutes.
The four edge cases the naive version hits
-
Orphan nodes. Some
mappingentries have no parent and aren't reachable fromcurrent_node— typically tool-call placeholders or fragments from a chat that hit a backend error mid-turn. The walk above ignores them by construction (they're outside the parent-chain), which is the right behavior. If you want to audit them, run a separate query:jq '.[].mapping | to_entries[] | select(.value.parent == null and .value.message != null)'. -
Hidden system messages. The root node's
messageis usuallynull, but the first child is asystem-role boilerplate ("You are ChatGPT, a large language model trained by OpenAI…"). Theselect(.author.role != "system")filter drops it. If you have Custom Instructions enabled, those land as a second system message — same filter handles both. -
Multi-part assistant turns. When a turn involves DALL·E, the code interpreter, or any function call,
content.partsis an array of{content_type, text}objects, not a list of strings. Naiveparts[0]takes the first fragment (often a tool-call placeholder) and drops the actual reply. Themap(if type == "string" then . else (.text // "") end)coalesces both shapes;join("\n\n")preserves the order. For tool-call diagnostics, replace with.text // (.image_url // "[non-text part]"). -
Regenerated branches. "Regenerate response" creates a sibling node in the DAG, not a replacement. The
walk(current_node) | reversepath naturally selects only the chosen branch. If you want to recover earlier regens for diff-review, queryjq '.[].mapping | to_entries[] | select(.value.children | length > 1)'— every entry there is a fork point, and.value.childrenis the list of sibling nodes you can walk individually.
Sanity-check the output before you trust it
Two cheap checks. (1) Compare conversation count: the input has jq 'length' conversations and out/ should have the same number of .md files (minus any whose title was empty and create_time was missing — those produce untitled.md collisions; rerun with --arg on id instead of title if you hit them). (2) Compare visible-message count: open one chat in the ChatGPT UI, count user-and-assistant turns, then grep -c '^## ' out/<that-file>.md. The numbers should match. If the markdown is higher, you forgot to filter system messages; if lower, you're walking parent from the wrong node (use current_node, not the root).
Once you have markdown, what next?
Per-conversation Markdown is the right format for two follow-up workflows. (1) Archive. Drop the out/ directory into a private repo or a Notion Backups database; you now have grep-able, ripgrep-able, dated history that survives ChatGPT account changes. (2) Decision extraction. The conversion above gives you readable text per chat, but it doesn't surface which chats contained durable decisions vs which were scratch thinking. That's the layer above search — a different kind of question, with a different tool. Decision extraction walks the same DAG you just walked, applies heuristic patterns for "chose X over Y" / "decided to" / "going with" framings, and emits a structured record per match instead of raw markdown. WhyChose's open-source extractor ships the same DAG-walk this page describes — fork it, run it locally, audit every match.
How WhyChose helps
WhyChose treats your ChatGPT export as a source of decisions, not as a content archive. The conversion script above gets you to readable markdown; WhyChose's extractor goes one layer further and surfaces the architectural and product calls buried in those conversations. Same DAG-walk under the hood — the open-source extractor implements exactly the parent-pointer reversal shown above, then layers a decision classifier on top. The hosted product adds a teammate-shareable link, Notion / Linear export, and an audit trail. If you're already comfortable with jq, the extractor is the more honest path: it's MIT-licensed, runs locally, and you can read every regex in patterns.md.
Related questions
Why can't I just dump every message in conversations.json to Markdown?
Because conversations.json doesn't store messages in render order — each chat is a DAG (the mapping object) plus a current_node pointer, and only the path from root to current_node is the visible thread. A flat dump returns every regenerated branch, every hidden system prompt, and every orphan, in arbitrary order. A 40-message chat becomes 120 unordered messages. You have to walk the DAG.
Do I need to handle the parts array, or is the first element always the message?
You have to handle the array. content.parts is a list, not a singleton, and assistant turns frequently contain multiple parts (tool call + tool result + text reply). Taking parts[0] drops the actual reply for every chat that involved DALL·E, the code interpreter, or any function call. Concatenate parts[] with a separator, or filter on content_type before joining.
Should I include the system message in the output?
For a readable archive, no. The first node after the root is almost always a system-role message containing OpenAI's safety boilerplate and your Custom Instructions; rendering it pollutes the file with content the user never wrote. Filter author.role != "system" unless you're auditing prompt injection or studying how Custom Instructions shaped a chat.
What's the right way to name the output files?
Use a slug derived from the conversation title plus the create_time prefix in YYYY-MM-DD form, so files sort chronologically and remain unique even when two chats share a title. Example: 2026-03-12-postgres-vs-mongodb.md. Keep the conversation id as a comment in the file's frontmatter — it's canonical but unfriendly for filenames.
Further reading
- How to export your ChatGPT history — produce
conversations.jsonin the first place. - ChatGPT conversations.json format reference — the schema this script depends on.
- How to search your ChatGPT history — once you have markdown, ripgrep is the level-2 retrieval tool.
- ChatGPT export not working? — eight common failure modes if your export never landed.
- How to extract decisions from your ChatGPT chats — the level-4 step beyond search.
- The open-source extractor — MIT, runs locally, ships the same DAG-walk shown here.