Topic: chatgpt conversations.json format

ChatGPT `conversations.json` Format — Field Reference (2026)

Q: What's the difference between create_time on the conversation and on each message?

Conversation create_time is when you first opened the chat. Message create_time is per-message, including edits — an edited message gets a new create_time even though its content replaces an older sibling. Use message-level timestamps for any time-of-decision analytics, conversation-level for session boundaries.

Q: Are tool calls and code-interpreter outputs in the export?

Yes. Tool messages appear with author.role of 'tool' and author.name set to the tool that ran (e.g., python, browser). Their content.parts contain the tool's output as plain text. Code-interpreter binary attachments are not embedded — only the text-rendered output and a reference path that no longer resolves outside ChatGPT.

Every field, what it actually contains, the DAG-flattening algorithm, and the four gotchas that trip up every first-time consumer of the export.

TL;DR

OpenAI's conversations.json is a top-level array of conversation objects. Each object has a mapping field — a directed acyclic graph of messages keyed by message UUID, not a list. The user-visible thread is one leaf path through the graph. To reconstruct it, follow current_node back to the synthetic root via each entry's parent. Edited prompts produce sibling branches under the same parent; ignore non-current branches unless you're auditing message edits.

Top-level shape

The export is a single JSON file containing an array. Each array element is one conversation:

[
  {
    "title": "Postgres vs MongoDB for metrics",
    "create_time": 1736209400,
    "update_time": 1736209800,
    "mapping": { ... },
    "moderation_results": [],
    "current_node": "<leaf-message-uuid>",
    "plugin_ids": null,
    "conversation_id": "8f3a2b1c-...",
    "conversation_template_id": null,
    "gizmo_id": "g-abc123",
    "is_archived": false,
    "safe_urls": [],
    "default_model_slug": "gpt-4o",
    "voice": null,
    "id": "8f3a2b1c-..."
  },
  ...
]

The interesting fields are title, create_time, update_time, mapping, current_node, conversation_id, default_model_slug, and gizmo_id. Everything else is auxiliary or vestigial.

The `mapping` field

Each entry inside mapping is keyed by message UUID and looks like this:

"<message-uuid>": {
  "id": "<message-uuid>",
  "message": {
    "id": "<message-uuid>",
    "author": {
      "role": "system|user|assistant|tool",
      "name": null,
      "metadata": {}
    },
    "create_time": 1736209412.345,
    "update_time": null,
    "content": {
      "content_type": "text",
      "parts": ["<the actual text>"]
    },
    "status": "finished_successfully",
    "end_turn": true,
    "weight": 1.0,
    "metadata": { "model_slug": "gpt-4o", ... },
    "recipient": "all",
    "channel": null
  },
  "parent": "<parent-message-uuid>",
  "children": ["<child-uuid>", ...]
}

Three structural rules: (1) the entry whose parent is null is the synthetic root — its message is usually a system-role placeholder; (2) parent/children form a tree, but a parent with multiple children means the user edited a prompt at that point; (3) current_node on the enclosing conversation is the UUID of the most recent leaf the user actually saw.

Reconstructing the user-visible thread

Most hand-rolled exporters print every message in the mapping and emit duplicates because they ignore the branch structure. The correct walk is leaf-to-root, then reverse:

function flatten(conv) {
  const path = [];
  let nodeId = conv.current_node;
  while (nodeId) {
    const node = conv.mapping[nodeId];
    if (node.message && node.message.author.role !== 'system') {
      path.push(node.message);
    }
    nodeId = node.parent;
  }
  return path.reverse();
}

This gives you the linear conversation as the user remembers it. Skip the root (system-role, empty parts) so the first user prompt is index 0. If you also want to surface edited branches for audit, walk every leaf (any node whose children is empty) and output each as a separate path; tag the one matching current_node as the canonical thread.

Four gotchas

Tool messages have author.role = "tool". If your role-filter only allows user and assistant, you silently drop code-interpreter outputs, browsing snippets, and DALL-E captions. Either include tool as a third valid role or explicitly note its omission.
content.parts is an array, not a string. For multi-modal messages it can contain text plus image-asset references. Most consumers want parts.join('\n'); for image-aware processing, walk parts and check each element's type.
create_time is float-seconds, not millis. Multiply by 1000 before passing to new Date(). Conversation-level create_time is integer-seconds; message-level is float with sub-second precision.
gizmo_id is the custom-GPT identifier. If a conversation was with a custom GPT, gizmo_id starts with g-. The custom GPT's system prompt and instructions are not in the export — only your conversation with it. Group by gizmo_id if you want per-GPT analytics.

How WhyChose handles it

The open-source extractor implements the leaf-path walk above and emits a common internal shape that's symmetric with the Claude loader. It also surfaces edited branches as alternates[] on the canonical message, so you don't lose audit signal even though they're skipped from the linear thread. Pattern matching for decisions then runs against the flattened message stream — the regexes in patterns.md are tuned for the post-flatten shape, not the raw mapping. If you're writing your own consumer, the loader code at bin/extractor.js is ~80 lines and MIT-licensed; copy it.

Get early access

ChatGPT `conversations.json` Format — Field Reference (2026)

TL;DR

Top-level shape

The `mapping` field

Reconstructing the user-visible thread

Four gotchas

How WhyChose handles it

Related questions

Further reading

TL;DR

Top-level shape

The mapping field

Reconstructing the user-visible thread

Four gotchas

How WhyChose handles it

Related questions

Further reading

The `mapping` field