Topic: chatgpt conversations.json format

ChatGPT conversations.json Format — Field Reference (2026)

Every field, what it actually contains, the DAG-flattening algorithm, and the four gotchas that trip up every first-time consumer of the export.

TL;DR

OpenAI's conversations.json is a top-level array of conversation objects. Each object has a mapping field — a directed acyclic graph of messages keyed by message UUID, not a list. The user-visible thread is one leaf path through the graph. To reconstruct it, follow current_node back to the synthetic root via each entry's parent. Edited prompts produce sibling branches under the same parent; ignore non-current branches unless you're auditing message edits.

Top-level shape

The export is a single JSON file containing an array. Each array element is one conversation:

[
  {
    "title": "Postgres vs MongoDB for metrics",
    "create_time": 1736209400,
    "update_time": 1736209800,
    "mapping": { ... },
    "moderation_results": [],
    "current_node": "<leaf-message-uuid>",
    "plugin_ids": null,
    "conversation_id": "8f3a2b1c-...",
    "conversation_template_id": null,
    "gizmo_id": "g-abc123",
    "is_archived": false,
    "safe_urls": [],
    "default_model_slug": "gpt-4o",
    "voice": null,
    "id": "8f3a2b1c-..."
  },
  ...
]

The interesting fields are title, create_time, update_time, mapping, current_node, conversation_id, default_model_slug, and gizmo_id. Everything else is auxiliary or vestigial.

The mapping field

Each entry inside mapping is keyed by message UUID and looks like this:

"<message-uuid>": {
  "id": "<message-uuid>",
  "message": {
    "id": "<message-uuid>",
    "author": {
      "role": "system|user|assistant|tool",
      "name": null,
      "metadata": {}
    },
    "create_time": 1736209412.345,
    "update_time": null,
    "content": {
      "content_type": "text",
      "parts": ["<the actual text>"]
    },
    "status": "finished_successfully",
    "end_turn": true,
    "weight": 1.0,
    "metadata": { "model_slug": "gpt-4o", ... },
    "recipient": "all",
    "channel": null
  },
  "parent": "<parent-message-uuid>",
  "children": ["<child-uuid>", ...]
}

Three structural rules: (1) the entry whose parent is null is the synthetic root — its message is usually a system-role placeholder; (2) parent/children form a tree, but a parent with multiple children means the user edited a prompt at that point; (3) current_node on the enclosing conversation is the UUID of the most recent leaf the user actually saw.

Reconstructing the user-visible thread

Most hand-rolled exporters print every message in the mapping and emit duplicates because they ignore the branch structure. The correct walk is leaf-to-root, then reverse:

function flatten(conv) {
  const path = [];
  let nodeId = conv.current_node;
  while (nodeId) {
    const node = conv.mapping[nodeId];
    if (node.message && node.message.author.role !== 'system') {
      path.push(node.message);
    }
    nodeId = node.parent;
  }
  return path.reverse();
}

This gives you the linear conversation as the user remembers it. Skip the root (system-role, empty parts) so the first user prompt is index 0. If you also want to surface edited branches for audit, walk every leaf (any node whose children is empty) and output each as a separate path; tag the one matching current_node as the canonical thread.

Four gotchas

  1. Tool messages have author.role = "tool". If your role-filter only allows user and assistant, you silently drop code-interpreter outputs, browsing snippets, and DALL-E captions. Either include tool as a third valid role or explicitly note its omission.
  2. content.parts is an array, not a string. For multi-modal messages it can contain text plus image-asset references. Most consumers want parts.join('\n'); for image-aware processing, walk parts and check each element's type.
  3. create_time is float-seconds, not millis. Multiply by 1000 before passing to new Date(). Conversation-level create_time is integer-seconds; message-level is float with sub-second precision.
  4. gizmo_id is the custom-GPT identifier. If a conversation was with a custom GPT, gizmo_id starts with g-. The custom GPT's system prompt and instructions are not in the export — only your conversation with it. Group by gizmo_id if you want per-GPT analytics.

How WhyChose handles it

The open-source extractor implements the leaf-path walk above and emits a common internal shape that's symmetric with the Claude loader. It also surfaces edited branches as alternates[] on the canonical message, so you don't lose audit signal even though they're skipped from the linear thread. Pattern matching for decisions then runs against the flattened message stream — the regexes in patterns.md are tuned for the post-flatten shape, not the raw mapping. If you're writing your own consumer, the loader code at bin/extractor.js is ~80 lines and MIT-licensed; copy it.

Get early access

Related questions

Why is mapping a tree instead of a list?

Because every time you edit a previous prompt, ChatGPT branches the conversation. The original message and the edited message both stay in the export — under the same parent — and each has its own subtree of subsequent replies. The flat conversation you remember reading is just one path through the DAG: the leaf path of whichever branch you ended up using.

Which message is the root?

The mapping object always contains a synthetic root message whose author role is system and whose content is empty. Look for the entry where parent is null. The current_node field on the conversation also points to the latest leaf — walk parent-pointers from there to reach the root.

What's the difference between create_time on the conversation and on each message?

Conversation create_time is when you first opened the chat. Message create_time is per-message, including edits — an edited message gets a new create_time even though it replaces an older sibling. Use message-level timestamps for time-of-decision analytics, conversation-level for session boundaries.

Are tool calls and code-interpreter outputs in the export?

Yes. Tool messages appear with author.role of tool and author.name set to the tool that ran (e.g., python, browser). Their content.parts contain the tool's output as plain text. Code-interpreter binary attachments aren't embedded — only the text-rendered output and a reference path that no longer resolves outside ChatGPT.

Further reading