Topic: claude conversation export format

Claude Conversation Export Format — Field Reference (2026)

Every field in Anthropic's export, what chat_messages[] actually contains, how Projects and Artifacts surface, and the differences from ChatGPT's shape that matter when you write a consumer for both.

TL;DR

Anthropic's Claude export is two files: conversations.json (a flat JSON array — one object per conversation) and users.json (account metadata). Each conversation has a chat_messages array of message objects in chronological order. Unlike OpenAI, there is no DAG and no branching — what you see is what was. Artifacts appear inline inside assistant text as <antartifact> blocks. Projects are surfaced via a project_uuid field on the conversation.

Top-level shape

The conversations file is an array of conversation objects:

[
  {
    "uuid": "8f3a2b1c-...",
    "name": "Postgres vs MongoDB for metrics",
    "summary": "",
    "model": "claude-sonnet-4-6",
    "created_at": "2026-02-14T10:04:22.000Z",
    "updated_at": "2026-02-14T10:22:41.000Z",
    "settings": { ... },
    "is_starred": false,
    "project_uuid": "abc-...",
    "current_leaf_message_uuid": "...",
    "chat_messages": [ ... ],
    "account": { "uuid": "..." }
  },
  ...
]

The interesting fields are uuid, name (the conversation title), model, created_at, updated_at, project_uuid, and chat_messages. Timestamps are ISO-8601 strings, not unix epochs — easier to work with than ChatGPT's float-seconds, but you'll need to convert if you're joining with ChatGPT data.

The chat_messages array

Each entry in chat_messages is a flat record with the message text already rendered:

{
  "uuid": "msg-uuid-...",
  "text": "<the actual message text>",
  "content": [
    { "type": "text", "text": "..." }
  ],
  "sender": "human|assistant",
  "index": 0,
  "created_at": "2026-02-14T10:04:22.000Z",
  "updated_at": "2026-02-14T10:04:22.000Z",
  "truncated": false,
  "stop_reason": "end_turn",
  "attachments": [ ... ],
  "files": [ ... ],
  "files_v2": [ ... ],
  "sync_sources": [],
  "parent_message_uuid": "..."
}

Three structural rules: (1) sender is either human or assistant — there's no separate tool role like ChatGPT (Claude.ai's tool-use ergonomics live inside the assistant's content blocks, not as standalone messages); (2) chat_messages is already in render order, so you can iterate i = 0..n-1 and don't need to walk parent-pointers; (3) parent_message_uuid exists for compatibility but in practice points to the previous message's uuid in linear order.

Artifacts inside text

Claude's Artifacts (the structured code/document panels you see in the UI) ship inline inside the assistant's text field, wrapped in pseudo-XML:

<antartifact identifier="adr-postgres-mongo"
            type="text/markdown"
            title="ADR-0007 Postgres vs MongoDB">
# ADR-0007: Postgres vs MongoDB for metrics

## Status
Accepted
...
</antartifact>

If you want to extract artifacts cleanly, scan the assistant text for the regex <antartifact[^>]*>([\s\S]*?)</antartifact> and parse the attribute string for identifier, type, language, and title. The artifact body is everything between the open/close tags. Treat the surrounding prose as the assistant's commentary on the artifact and the artifact body as the deliverable.

Differences from ChatGPT's format

ConcernChatGPTClaude
Message structureDAG (mapping with parent/children)Linear array
Edit historyBranches preservedOverwritten in place
Tool callsSeparate tool-role messagesEmbedded in assistant content
TimestampsFloat-seconds since epochISO-8601 strings
Per-conversation groupinggizmo_id for custom GPTsproject_uuid for Projects
Structured outputsInline text only<antartifact> blocks

The practical implication: if your consumer handles both formats, write two loaders that each emit the same internal shape (e.g., { source, conversation_id, title, model, created_at, messages: [{ role, text, created_at, attachments? }] }) and run downstream logic on the unified stream. This is what every multi-platform extractor does in practice.

How WhyChose handles it

The open-source extractor sniffs the file shape on load — if the top-level array contains objects with chat_messages, it routes to the Claude loader; if they have mapping, the ChatGPT one. Both emit the same internal record shape. Artifact blocks are pulled out of assistant text and surfaced as artifacts[] on the message; the surrounding prose stays as text. Decision-extraction patterns then run against the unified stream, so the same regexes in patterns.md work on both sources. If you're writing your own consumer, the Claude loader is ~50 lines of MIT-licensed Node — copy it from bin/extractor.js.

Get early access

Related questions

Why is Claude's export so much simpler than ChatGPT's?

Claude doesn't expose prompt-edit branching the way ChatGPT does — when you edit a message in Claude, the prior version is overwritten in your view rather than preserved as a branch. So the export only needs to ship the messages that survived, in order. Hence chat_messages is a flat array, not a graph.

Are Artifacts in chat_messages or separate?

Artifacts are inline. They appear as part of the assistant's text content, wrapped in <antartifact> tags with attributes like identifier, type, language, and title. Parse the assistant text for those blocks and treat each as a structured payload.

What's the project_uuid for?

When a conversation belongs to a Claude Project, the conversation object carries a project_uuid pointing to the project's identifier. Conversations not tied to a project either omit the field or set it to null. Group by project_uuid to reconstruct project-scoped activity.

Can I get the same data via the API?

No. The Anthropic Messages API is stateless — it doesn't store conversations server-side, so there's nothing to pull. The Claude.ai chat history is a separate system from the API, and the export ZIP is the only way to retrieve it.

Further reading