Topic: claude conversation export format

Claude Conversation Export Format — Field Reference (2026)

Every field in Anthropic's export, what chat_messages[] actually contains, how Projects and Artifacts surface, and the differences from ChatGPT's shape that matter when you write a consumer for both.

TL;DR

Anthropic's Claude export is two files: conversations.json (a flat JSON array — one object per conversation) and users.json (account metadata). Each conversation has a chat_messages array of message objects in chronological order. Unlike OpenAI, there is no DAG and no branching — what you see is what was. Artifacts appear inline inside assistant text as <antartifact> blocks. Projects are surfaced via a project_uuid field on the conversation.

Top-level shape

The conversations file is an array of conversation objects:

[
  {
    "uuid": "8f3a2b1c-...",
    "name": "Postgres vs MongoDB for metrics",
    "summary": "",
    "model": "claude-sonnet-4-6",
    "created_at": "2026-02-14T10:04:22.000Z",
    "updated_at": "2026-02-14T10:22:41.000Z",
    "settings": { ... },
    "is_starred": false,
    "project_uuid": "abc-...",
    "current_leaf_message_uuid": "...",
    "chat_messages": [ ... ],
    "account": { "uuid": "..." }
  },
  ...
]

The interesting fields are uuid, name (the conversation title), model, created_at, updated_at, project_uuid, and chat_messages. Timestamps are ISO-8601 strings, not unix epochs — easier to work with than ChatGPT's float-seconds, but you'll need to convert if you're joining with ChatGPT data.

The `chat_messages` array

Each entry in chat_messages is a flat record with the message text already rendered:

{
  "uuid": "msg-uuid-...",
  "text": "<the actual message text>",
  "content": [
    { "type": "text", "text": "..." }
  ],
  "sender": "human|assistant",
  "index": 0,
  "created_at": "2026-02-14T10:04:22.000Z",
  "updated_at": "2026-02-14T10:04:22.000Z",
  "truncated": false,
  "stop_reason": "end_turn",
  "attachments": [ ... ],
  "files": [ ... ],
  "files_v2": [ ... ],
  "sync_sources": [],
  "parent_message_uuid": "..."
}

Three structural rules: (1) sender is either human or assistant — there's no separate tool role like ChatGPT (Claude.ai's tool-use ergonomics live inside the assistant's content blocks, not as standalone messages); (2) chat_messages is already in render order, so you can iterate i = 0..n-1 and don't need to walk parent-pointers; (3) parent_message_uuid exists for compatibility but in practice points to the previous message's uuid in linear order.

Artifacts inside text

Claude's Artifacts (the structured code/document panels you see in the UI) ship inline inside the assistant's text field, wrapped in pseudo-XML:

<antartifact identifier="adr-postgres-mongo"
            type="text/markdown"
            title="ADR-0007 Postgres vs MongoDB">
# ADR-0007: Postgres vs MongoDB for metrics

## Status
Accepted
...
</antartifact>

If you want to extract artifacts cleanly, scan the assistant text for the regex <antartifact[^>]*>([\s\S]*?)</antartifact> and parse the attribute string for identifier, type, language, and title. The artifact body is everything between the open/close tags. Treat the surrounding prose as the assistant's commentary on the artifact and the artifact body as the deliverable.

Differences from ChatGPT's format

Concern	ChatGPT	Claude
Message structure	DAG (mapping with parent/children)	Linear array
Edit history	Branches preserved	Overwritten in place
Tool calls	Separate `tool`-role messages	Embedded in assistant content
Timestamps	Float-seconds since epoch	ISO-8601 strings
Per-conversation grouping	`gizmo_id` for custom GPTs	`project_uuid` for Projects
Structured outputs	Inline text only	`<antartifact>` blocks

The practical implication: if your consumer handles both formats, write two loaders that each emit the same internal shape (e.g., { source, conversation_id, title, model, created_at, messages: [{ role, text, created_at, attachments? }] }) and run downstream logic on the unified stream. This is what every multi-platform extractor does in practice.

How WhyChose handles it

The open-source extractor sniffs the file shape on load — if the top-level array contains objects with chat_messages, it routes to the Claude loader; if they have mapping, the ChatGPT one. Both emit the same internal record shape. Artifact blocks are pulled out of assistant text and surfaced as artifacts[] on the message; the surrounding prose stays as text. Decision-extraction patterns then run against the unified stream, so the same regexes in patterns.md work on both sources. If you're writing your own consumer, the Claude loader is ~50 lines of MIT-licensed Node — copy it from bin/extractor.js.

Get early access

Claude Conversation Export Format — Field Reference (2026)

TL;DR

Top-level shape

The `chat_messages` array

Artifacts inside text

Differences from ChatGPT's format

How WhyChose handles it

Related questions

Further reading

TL;DR

Top-level shape

The chat_messages array

Artifacts inside text

Differences from ChatGPT's format

How WhyChose handles it

Related questions

Further reading

The `chat_messages` array