Topic: chatgpt projects export

ChatGPT Projects Export — Custom Instructions, Files, and Conversation Routing

OpenAI's Projects feature scopes a workspace with custom instructions, uploaded files, and a bound Custom GPT. The standard ChatGPT export does include Project conversations — but it flattens the Project boundary in three specific ways that bite when you're trying to archive a multi-month initiative cleanly. This page documents what's exported, what's missing, and the conversion script that rebuilds the per-Project directory layout from a flat conversations.json.

TL;DR

Project conversations are in conversations.json, but interleaved with non-Project chats and identified only by a project_id field. Project metadata (name, description, custom instructions) is in a separate projects.json. Three things are NOT exported: uploaded Project files (only the manifest), the bound Custom GPT's configuration (only the gizmo_id reference), and the model-default override per Project (only inferable from per-conversation model fields). The conversion script at the bottom of this page rebuilds out/<project_id>/ with the per-Project README and conversation files; pre-export, manually download any Project files from the UI because there's no API path. The output archive layout matches the symmetric Claude Project export recipe so multi-platform decision audits can run the same downstream tooling.

Why Projects matter for the export

OpenAI's Projects feature went GA broadly in late 2024 and is now the canonical way to scope a long-running workstream inside ChatGPT. A Project bundles four things:

For the senior engineer or solo-founder CTO who structures multi-month initiatives as Projects (a Q2 architecture migration, a product redesign, a fundraise), the Project is the unit of decision archaeology. Six months later, the question isn't "what did I talk to ChatGPT about" — it's "what did the team conclude inside the Q2 Mobile Rewrite Project?" Preserving the Project boundary in the export is what makes that question answerable.

The catch: the export was designed before Projects were GA, and the structure shows it.

What's in the export — and where Projects live

The standard export ZIP (requested via Settings → Data Controls → Export Data; covered in detail in how to export your ChatGPT history) contains the Project data, but split across two top-level files rather than collected into a per-Project directory:

chatgpt-export-<date>.zip
├── conversations.json   # all conversations, flat array, Project-scoped + non-Project mixed
├── projects.json        # Project metadata: id, name, description, custom_instructions, file_manifest
├── user.json            # account metadata
├── message_feedback.json
├── shared_conversations.json
└── chat.html            # rendered HTML view (informational only; conversations.json is the source)

Inside conversations.json, every conversation has the same DAG-shaped mapping tree documented in the conversations.json format reference. What's added for Project conversations is a single top-level field:

{
  "id": "abc-123",
  "title": "Postgres connection pool sizing",
  "create_time": 1714489232.5,
  "project_id": "p_xyz789",     // <-- present only for Project conversations
  "current_node": "...",
  "mapping": { ... }
}

For non-Project conversations, project_id is absent (not null — absent). That's the only signal that ties a conversation to its Project. To rebuild the per-Project view, you filter conversations.json by project_id and look up the matching record in projects.json.

The projects.json file is a flat array of Project metadata records. A typical entry looks like:

{
  "id": "p_xyz789",
  "name": "Q2 Mobile Rewrite",
  "description": "Architecture decisions for the iOS/Android rewrite",
  "custom_instructions": "You are working on the mobile rewrite...",
  "file_manifest": [
    { "name": "current-arch-v1.pdf", "size": 482113, "uploaded_at": 1712880000.0, "mime": "application/pdf" },
    { "name": "performance-budget.csv", "size": 4218, "uploaded_at": 1712923500.0, "mime": "text/csv" }
  ],
  "bound_gpt_id": "g-abc123",       // optional; null if no Custom GPT bound
  "created_at": 1712793600.0,
  "archived_at": null               // populated if Project is archived
}

The file_manifest is a manifest, not the file contents — see what's missing below.

What's missing — three known gaps

  1. Uploaded Project files (binaries). The largest gap and the one most teams discover too late. The file_manifest ships filename, size, upload date, and MIME type — but not the bytes. PDFs, CSVs, code files, images uploaded as Project knowledge are not in the ZIP. Three recovery paths: (a) pre-export download — before requesting the export, navigate to each Project in the UI and download each file via the file's three-dot menu (this is the only path; there's no bulk-download API as of this writing); (b) document-the-gap — accept that conversations reference Project files by name only and the binaries live in OpenAI's storage until the Project is deleted; (c) re-upload-on-restore — if the archive is being restored to a new Project (e.g. for handoff to another team), restore conversations first and re-upload Project files manually using the manifest as a checklist. Path (a) is the only one that produces a self-contained archive.
  2. Custom GPT configuration. When a Project binds a Custom GPT (the bound_gpt_id field above), each conversation that used the GPT carries the gizmo_id in per-turn metadata. But the GPT's actual configuration — its system prompt, attached knowledge files, defined actions — is NOT in the Project owner's export. Custom GPTs are owned by their creator's account and exported separately when that creator runs their own export. For a Project that uses a shared Custom GPT created by a teammate, the export captures the transcripts but not the GPT's behavior definition. Mitigation: have the GPT creator run a separate export and store it alongside the Project archive; or document the GPT's prompt manually in the per-Project README so a future reader knows the inputs that produced the conversations.
  3. Per-Project model defaults. A Project can pin a default model (e.g. always GPT-4o, always o3-mini) that overrides the user's account default. The pinned model is not stored as a Project metadata field; it's only inferable from the per-turn model_slug field in each message. For most archives this doesn't matter — the conversation transcript is the same regardless of model — but for teams that pin specific models for cost or capability reasons, the override is part of the Project's intent and should be documented in the per-Project README.

Two non-gaps worth naming explicitly: (1) Project conversations DO carry full DAG history, including alternate branches via the mapping tree — the Project boundary doesn't change conversation shape; (2) archived Projects ARE in the export as long as the Project hasn't been hard-deleted (archived sets archived_at but keeps the data; deletion removes it from future exports).

The conversion script — rebuild per-Project directories

The script below routes each conversation from the flat export into out/<project_id>/, writes a per-Project README with custom instructions and file manifest, and produces a top-level INDEX.md listing every Project. The output layout matches the symmetric Claude Project archive shape so a multi-platform audit can run the same downstream tools on either output.

#!/usr/bin/env bash
# split-chatgpt-projects.sh — rebuild per-Project directories from a flat ChatGPT export
# Usage: ./split-chatgpt-projects.sh path/to/export-extracted/ ./out

set -euo pipefail
src=$1
out=$2

mkdir -p "$out"

# 1. Per-Project READMEs from projects.json
jq -c '.[]' "$src/projects.json" | while IFS= read -r proj; do
  pid=$(echo "$proj" | jq -r '.id')
  pname=$(echo "$proj" | jq -r '.name')
  pdir="$out/$pid"
  mkdir -p "$pdir/conversations" "$pdir/files"

  # README captures the Project's intent — what conversations don't tell you on their own
  {
    echo "# $pname"
    echo
    echo "$(echo "$proj" | jq -r '.description // "(no description)"')"
    echo
    echo "## Custom instructions"
    echo
    echo '```'
    echo "$proj" | jq -r '.custom_instructions // "(none set)"'
    echo '```'
    echo
    echo "## File manifest (binaries NOT included — see files/ for any pre-export downloads)"
    echo
    echo "$proj" | jq -r '.file_manifest // [] | map("- " + .name + " (" + (.size | tostring) + " bytes, " + .mime + ")") | .[]'
    echo
    echo "## Bound Custom GPT"
    echo
    bgpt=$(echo "$proj" | jq -r '.bound_gpt_id // empty')
    if [ -n "$bgpt" ]; then
      echo "Project bound to GPT \`$bgpt\`. The GPT's configuration is NOT in this export — request it separately from the GPT's owner."
    else
      echo "(no GPT bound; conversations used the account default model)"
    fi
  } > "$pdir/README.md"
done

# 2. Per-conversation files routed by project_id
jq -c '.[]' "$src/conversations.json" | while IFS= read -r conv; do
  pid=$(echo "$conv" | jq -r '.project_id // empty')
  cid=$(echo "$conv" | jq -r '.id')
  if [ -n "$pid" ]; then
    mkdir -p "$out/$pid/conversations"
    echo "$conv" > "$out/$pid/conversations/$cid.json"
  fi
done

# 3. Top-level INDEX listing every Project archived
{
  echo "# ChatGPT Projects archive"
  echo
  echo "Generated $(date -u +%Y-%m-%dT%H:%M:%SZ) from \`$src\`."
  echo
  for d in "$out"/*/; do
    pid=$(basename "$d")
    pname=$(jq -r --arg id "$pid" '.[] | select(.id == $id) | .name' "$src/projects.json")
    cc=$(ls "$d/conversations" 2>/dev/null | wc -l)
    fc=$(jq -r --arg id "$pid" '.[] | select(.id == $id) | (.file_manifest // []) | length' "$src/projects.json")
    echo "- [$pname]($pid/README.md) — $cc conversations, $fc files (manifest only)"
  done
} > "$out/INDEX.md"

echo "split $(jq -r 'length' "$src/projects.json") projects into $out/"

The script is idempotent — re-running with the same source overwrites cleanly. Conversations without a project_id (i.e. non-Project chats) are intentionally skipped; if you want them archived too, route them to out/_unscoped/ with a one-line addition. The output of running this against a real export is exactly the layout the WhyChose extractor expects when invoked with the --per-project flag.

Three sanity checks for the converted archive

  1. Project count matches UI. ls out/ | grep -v INDEX | wc -l should equal the Project count from ChatGPT → Projects sidebar (counting both active and archived). Off-by-one usually means a Project was hard-deleted between request and build; off by more usually means the export was generated before a recently-created Project propagated to the export builder (wait 24 hours and re-request).
  2. Per-Project conversation counts match. For each Project, ls out/<pid>/conversations/ | wc -l should equal the conversation count visible in that Project's sidebar tray. Mismatch usually means individual conversations were deleted from the Project before export; the export reflects the state at request time, not the state at usage time.
  3. File manifest covers every UI-visible file. Open each Project's README.md and cross-check the file manifest against the file list visible in Project → Files. Missing files in the manifest usually mean the file was uploaded after the export was queued; re-request to capture them. Files marked archived in the UI are still in the manifest — archival doesn't remove from manifest.

The Project-as-decision-boundary framing

For a senior engineer or CTO using ChatGPT Projects, the Project boundary IS context. A conversation in the "Q2 Mobile Rewrite" Project starts with a different mental model than a conversation in the "Hiring decisions" Project — the same prompt produces different reasoning because the custom instructions, file context, and conversation history all differ. A flat export without per-Project routing throws that context away.

The decision-archaeology question is rarely "what did ChatGPT say about X." It's almost always "what did the team conclude in the architecture initiative we were running last quarter." That question requires the Project boundary to be queryable. The recipe above puts it back.

For multi-platform users (ChatGPT + Claude + Gemini), the symmetric Claude Project export recipe produces an identical directory layout from a Claude export. The two scripts together let you build a unified per-Project archive across both platforms — useful when teams use ChatGPT for some Projects and Claude for others (or both for the same Project, comparing outputs). The Gemini conversation export recipe doesn't have a Project boundary to preserve (Gemini doesn't have an equivalent feature) but the converted output drops into the same archive shape with project_id: "_unscoped" for downstream tooling.

How WhyChose fits in

The Projects boundary is exactly the access boundary the WhyChose extractor preserves as a first-class column in the decision log. Drop the per-Project archive into the uploader (or pipe it to the CLI) and the extractor surfaces decision-shaped exchanges with project_id as a filterable field — so "every durable decision made inside Q2 Mobile Rewrite" is a one-filter query, not a manual group-by. The custom_instructions from each Project's README.md becomes the system-prompt context the extractor uses to score decision relevance (a conversation about latency budgets in a Project whose custom instructions say "target 200ms" produces higher-scored decision records than the same conversation in an unscoped chat). The Pro tier exports decision logs to Notion / Linear / Obsidian with the Project as a Notion database property or Linear label; the Team tier ships shared decision logs scoped by Project membership — the same access boundary the export carries forward.

Get early access

Related questions

Does the standard ChatGPT export include Project conversations?

Yes — every conversation from every Project is in conversations.json, identified by a project_id field. Project metadata is in a separate projects.json. The structure is flat; the script in this page rebuilds the per-Project directory tree.

Are Project files (PDFs, code, data) in the export?

No — only the file manifest (filename, size, upload date, MIME type). Three recovery paths: pre-export download via the Projects UI (the only path to recover binaries), document-the-gap, or re-upload-on-restore. There is no bulk-download API as of this writing.

What about Custom GPTs bound to a Project?

The conversation references the GPT by gizmo_id in per-turn metadata, but the GPT's configuration (system prompt, knowledge files, actions) is NOT in the export. Custom GPTs are owned by their creator and exported separately. Have the GPT creator run their own export, or document the GPT's prompt in the per-Project README.

How do I archive an old Project so I can answer questions about it later?

Run the conversion script in this page to rebuild out/<project_id>/ with conversations + README. Pre-export, manually download Project files via the UI and place them in out/<project_id>/files/. The archive layout matches the Claude Project archive recipe so multi-platform tooling works on either.

Why does my export have project_id on some conversations but projects.json is empty?

This means the Project was deleted between conversation creation and export request. The conversation still carries the project_id reference (immutable in the conversation record), but the Project metadata is gone. Treat these as orphan Project conversations — the per-conversation reasoning is intact, but the Project-level custom instructions and file manifest are unrecoverable.

Further reading