Topic: chatgpt conversations.json format
ChatGPT conversations.json Format — Field Reference (2026)
Every field, what it actually contains, the DAG-flattening algorithm, and the four gotchas that trip up every first-time consumer of the export.
TL;DR
OpenAI's conversations.json is a top-level array of conversation objects. Each object has a mapping field — a directed acyclic graph of messages keyed by message UUID, not a list. The user-visible thread is one leaf path through the graph. To reconstruct it, follow current_node back to the synthetic root via each entry's parent. Edited prompts produce sibling branches under the same parent; ignore non-current branches unless you're auditing message edits.
Top-level shape
The export is a single JSON file containing an array. Each array element is one conversation:
[
{
"title": "Postgres vs MongoDB for metrics",
"create_time": 1736209400,
"update_time": 1736209800,
"mapping": { ... },
"moderation_results": [],
"current_node": "<leaf-message-uuid>",
"plugin_ids": null,
"conversation_id": "8f3a2b1c-...",
"conversation_template_id": null,
"gizmo_id": "g-abc123",
"is_archived": false,
"safe_urls": [],
"default_model_slug": "gpt-4o",
"voice": null,
"id": "8f3a2b1c-..."
},
...
]
The interesting fields are title, create_time, update_time, mapping, current_node, conversation_id, default_model_slug, and gizmo_id. Everything else is auxiliary or vestigial.
The mapping field
Each entry inside mapping is keyed by message UUID and looks like this:
"<message-uuid>": {
"id": "<message-uuid>",
"message": {
"id": "<message-uuid>",
"author": {
"role": "system|user|assistant|tool",
"name": null,
"metadata": {}
},
"create_time": 1736209412.345,
"update_time": null,
"content": {
"content_type": "text",
"parts": ["<the actual text>"]
},
"status": "finished_successfully",
"end_turn": true,
"weight": 1.0,
"metadata": { "model_slug": "gpt-4o", ... },
"recipient": "all",
"channel": null
},
"parent": "<parent-message-uuid>",
"children": ["<child-uuid>", ...]
}
Three structural rules: (1) the entry whose parent is null is the synthetic root — its message is usually a system-role placeholder; (2) parent/children form a tree, but a parent with multiple children means the user edited a prompt at that point; (3) current_node on the enclosing conversation is the UUID of the most recent leaf the user actually saw.
Reconstructing the user-visible thread
Most hand-rolled exporters print every message in the mapping and emit duplicates because they ignore the branch structure. The correct walk is leaf-to-root, then reverse:
function flatten(conv) {
const path = [];
let nodeId = conv.current_node;
while (nodeId) {
const node = conv.mapping[nodeId];
if (node.message && node.message.author.role !== 'system') {
path.push(node.message);
}
nodeId = node.parent;
}
return path.reverse();
}
This gives you the linear conversation as the user remembers it. Skip the root (system-role, empty parts) so the first user prompt is index 0. If you also want to surface edited branches for audit, walk every leaf (any node whose children is empty) and output each as a separate path; tag the one matching current_node as the canonical thread.
Four gotchas
- Tool messages have author.role = "tool". If your role-filter only allows
userandassistant, you silently drop code-interpreter outputs, browsing snippets, and DALL-E captions. Either includetoolas a third valid role or explicitly note its omission. - content.parts is an array, not a string. For multi-modal messages it can contain text plus image-asset references. Most consumers want
parts.join('\n'); for image-aware processing, walk parts and check each element's type. - create_time is float-seconds, not millis. Multiply by 1000 before passing to
new Date(). Conversation-levelcreate_timeis integer-seconds; message-level is float with sub-second precision. - gizmo_id is the custom-GPT identifier. If a conversation was with a custom GPT,
gizmo_idstarts withg-. The custom GPT's system prompt and instructions are not in the export — only your conversation with it. Group bygizmo_idif you want per-GPT analytics.
How WhyChose handles it
The open-source extractor implements the leaf-path walk above and emits a common internal shape that's symmetric with the Claude loader. It also surfaces edited branches as alternates[] on the canonical message, so you don't lose audit signal even though they're skipped from the linear thread. Pattern matching for decisions then runs against the flattened message stream — the regexes in patterns.md are tuned for the post-flatten shape, not the raw mapping. If you're writing your own consumer, the loader code at bin/extractor.js is ~80 lines and MIT-licensed; copy it.
Related questions
Why is mapping a tree instead of a list?
Because every time you edit a previous prompt, ChatGPT branches the conversation. The original message and the edited message both stay in the export — under the same parent — and each has its own subtree of subsequent replies. The flat conversation you remember reading is just one path through the DAG: the leaf path of whichever branch you ended up using.
Which message is the root?
The mapping object always contains a synthetic root message whose author role is system and whose content is empty. Look for the entry where parent is null. The current_node field on the conversation also points to the latest leaf — walk parent-pointers from there to reach the root.
What's the difference between create_time on the conversation and on each message?
Conversation create_time is when you first opened the chat. Message create_time is per-message, including edits — an edited message gets a new create_time even though it replaces an older sibling. Use message-level timestamps for time-of-decision analytics, conversation-level for session boundaries.
Are tool calls and code-interpreter outputs in the export?
Yes. Tool messages appear with author.role of tool and author.name set to the tool that ran (e.g., python, browser). Their content.parts contain the tool's output as plain text. Code-interpreter binary attachments aren't embedded — only the text-rendered output and a reference path that no longer resolves outside ChatGPT.
Further reading
- How to export your ChatGPT history — the prerequisite: get the ZIP first.
- Claude conversation export format — Anthropic's shape, for comparison.
- How to extract decisions from your ChatGPT chats — what to do with the flattened messages.
- Convert your ChatGPT export to Markdown — a 30-line script that walks the DAG and writes one
.mdper chat. - Gemini conversation export — the third-platform format reference; Google's HTML-only activity log normalized to a flat-array JSON shape, with the explicit divergences from this DAG-shaped format documented.
- ChatGPT Projects export — the Project-scoped extension of the schema;
project_idis the only top-level field added when a conversation belongs to a Project, plus a siblingprojects.jsonfile with the per-Project metadata (custom instructions, file manifest, bound Custom GPT id). - ChatGPT Team export — differences from Plus, workspace admin flow, and the Compliance API — the workspace-scoped extension of the schema; Team exports add two top-level fields per conversation (
created_by_user_id,participants[]) plus workspace-level metadata (workspace.json, members.json, audit-log.jsonl, shared-gpts/) absent from personal Plus exports. - ChatGPT Memory export — where your memories live in the data download — the companion file that rides alongside
conversations.jsonin the same ZIP;memory.jsonstores what ChatGPT remembered across sessions with content and timestamps, with the significant gap that no source conversation is linked. - The open-source extractor — reference implementation of the leaf-path walk.
- ChatGPT Custom GPTs export — conversations vs configurations — the
gizmo_idfield documented on this page is the starting point: this companion page explains whatgizmo_ididentifies, what is and isn't in the export for Custom GPT conversations, and a 22-line jq recipe that groups conversations bygizmo_idto produce per-GPT usage stats. - Perplexity conversation export — how to save your AI research history — the platform with no equivalent to the
conversations.jsonschema documented on this page: Perplexity stores threads in your Library but exports nothing as a downloadable file. Reading this page first highlights what a well-structured AI export looks like; the Perplexity page documents what the absence of that structure means in practice. - Uploaded files in ChatGPT exports — what's included, what's missing, and how to recover them — the companion reference for the
content_type: "multimodal_text"message parts andimage_asset_pointerobjects documented in this schema: what those attachment fields look like in practice, what the internalasset_pointerURI contains (file name and size — not a downloadable URL), and which content types preserve binary data versus text references only (DALL-E: expiring URL; uploaded PDF: name only; Code Interpreter code: full text). - ChatGPT shared links — what persists, what expires, and how to archive conversations — the complement to this schema reference: shared links are a common informal archival mechanism for important conversations (engineers paste them into PRs and Slack), but they're NOT represented anywhere in
conversations.json. The share UUID, the share creation date, and the list of which conversations were shared are all absent from the export this page documents. If you need to find conversation content that you previously shared, the approach is to searchconversations.jsonby title/date (thejqrecipes from this page) rather than by share URL. - ChatGPT web search in conversations.json — what's in the export, what's missing, and how to extract citations — the companion reference for the tether content types that this schema page lists but doesn't expand:
tether_browsing_display(search query + results list with URL, title, snippet per result) andtether_quote(specific quoted passage with URL) areauthor.role: "tool"nodes in the DAG, but the naive leaf-walk misses them because the current-node path typically ends at the final assistant text response, not at the tool nodes. This page covers jq recipes that walk all DAG nodes to extract cited URLs. - Can you export OpenAI Playground conversations? No — and here's why — the structural contrast to this page:
conversations.jsonexists because ChatGPT stores every turn server-side. The Playground (Chat mode) does not — it is a UI wrapper around a stateless API call. Playground sessions produce no analog to theconversations.jsonschema documented here, which is why they're absent from the data export entirely. - ChatGPT voice mode in the data export — transcripts, what's missing, and how to process them — voice conversations export as plain text in conversations.json; audio is never stored; how to identify voice turns, understand Whisper transcription quality, and extract decisions from voice-heavy sessions.
- ChatGPT Canvas export — document structure in conversations.json — Canvas (the collaborative document editing mode) stores document content as message parts in the standard conversations.json schema documented on this page. Canvas messages use
content_type: "text"with the full Markdown document body inparts[0]; some Canvas messages carry ametadata.canvassub-object with document title and format. The companion page explains how to identify Canvas conversations by length and title heuristics, jq recipes for extracting Canvas documents to separate Markdown files, and which Canvas features (UI annotations, interactive revision timeline) are absent from the export despite the document text being fully preserved. - ChatGPT image generation in the data export — DALL-E prompts, CDN URLs, and what expires — the detailed schema reference for DALL-E 3 tool invocations (author.name: "dalle.text2im") and GPT-4o native image generation (content_type: "image_asset_pointer") in the mapping DAG. Expands on the image_asset_pointer nodes this schema page lists, covering the CDN URL expiry timeline, revised_prompt field, and jq extraction recipes for bulk image history recovery.
- ChatGPT o1 and o3 export — what appears in conversations.json and what's hidden — the model-specific supplement to this schema reference: o1, o1-mini, o3, o3-mini, and o4-mini (reasoning model) conversations use the identical schema documented on this page, but with a notable absence — the internal reasoning chain (thinking tokens) is not present in any field. The author.metadata.model_slug field documented here (values: "o1", "o3", "o4-mini") is the primary identifier for reasoning model sessions in the export; this companion page explains what the final response does (and doesn't) contain for decision extraction purposes.