Topic: chatgpt projects export
ChatGPT Projects Export — Custom Instructions, Files, and Conversation Routing
OpenAI's Projects feature scopes a workspace with custom instructions, uploaded files, and a bound Custom GPT. The standard ChatGPT export does include Project conversations — but it flattens the Project boundary in three specific ways that bite when you're trying to archive a multi-month initiative cleanly. This page documents what's exported, what's missing, and the conversion script that rebuilds the per-Project directory layout from a flat conversations.json.
TL;DR
Project conversations are in conversations.json, but interleaved with non-Project chats and identified only by a project_id field. Project metadata (name, description, custom instructions) is in a separate projects.json. Three things are NOT exported: uploaded Project files (only the manifest), the bound Custom GPT's configuration (only the gizmo_id reference), and the model-default override per Project (only inferable from per-conversation model fields). The conversion script at the bottom of this page rebuilds out/<project_id>/ with the per-Project README and conversation files; pre-export, manually download any Project files from the UI because there's no API path. The output archive layout matches the symmetric Claude Project export recipe so multi-platform decision audits can run the same downstream tooling.
Why Projects matter for the export
OpenAI's Projects feature went GA broadly in late 2024 and is now the canonical way to scope a long-running workstream inside ChatGPT. A Project bundles four things:
- Custom instructions — system-prompt-like text every conversation in the Project inherits. For an architecture initiative, this is where the team encodes "we run on Postgres, we ship monoliths first, target latency budget 200ms" so every conversation starts with that context.
- Uploaded files — PDFs, code, data attached to the Project. Files are accessible to every conversation in the Project as if they were uploaded inline, but they belong to the Project rather than to a single chat.
- A bound Custom GPT (optional) — when set, every new conversation in the Project starts with that GPT instead of the default model. This is how teams pin a specific persona to a Project (e.g. "the architecture-review GPT" with sharper review criteria).
- Conversations — the chats themselves, scoped to the Project. Distinct from the user's personal chats; appearing in a separate sidebar tray.
For the senior engineer or solo-founder CTO who structures multi-month initiatives as Projects (a Q2 architecture migration, a product redesign, a fundraise), the Project is the unit of decision archaeology. Six months later, the question isn't "what did I talk to ChatGPT about" — it's "what did the team conclude inside the Q2 Mobile Rewrite Project?" Preserving the Project boundary in the export is what makes that question answerable.
The catch: the export was designed before Projects were GA, and the structure shows it.
What's in the export — and where Projects live
The standard export ZIP (requested via Settings → Data Controls → Export Data; covered in detail in how to export your ChatGPT history) contains the Project data, but split across two top-level files rather than collected into a per-Project directory:
chatgpt-export-<date>.zip
├── conversations.json # all conversations, flat array, Project-scoped + non-Project mixed
├── projects.json # Project metadata: id, name, description, custom_instructions, file_manifest
├── user.json # account metadata
├── message_feedback.json
├── shared_conversations.json
└── chat.html # rendered HTML view (informational only; conversations.json is the source)
Inside conversations.json, every conversation has the same DAG-shaped mapping tree documented in the conversations.json format reference. What's added for Project conversations is a single top-level field:
{
"id": "abc-123",
"title": "Postgres connection pool sizing",
"create_time": 1714489232.5,
"project_id": "p_xyz789", // <-- present only for Project conversations
"current_node": "...",
"mapping": { ... }
}
For non-Project conversations, project_id is absent (not null — absent). That's the only signal that ties a conversation to its Project. To rebuild the per-Project view, you filter conversations.json by project_id and look up the matching record in projects.json.
The projects.json file is a flat array of Project metadata records. A typical entry looks like:
{
"id": "p_xyz789",
"name": "Q2 Mobile Rewrite",
"description": "Architecture decisions for the iOS/Android rewrite",
"custom_instructions": "You are working on the mobile rewrite...",
"file_manifest": [
{ "name": "current-arch-v1.pdf", "size": 482113, "uploaded_at": 1712880000.0, "mime": "application/pdf" },
{ "name": "performance-budget.csv", "size": 4218, "uploaded_at": 1712923500.0, "mime": "text/csv" }
],
"bound_gpt_id": "g-abc123", // optional; null if no Custom GPT bound
"created_at": 1712793600.0,
"archived_at": null // populated if Project is archived
}
The file_manifest is a manifest, not the file contents — see what's missing below.
What's missing — three known gaps
- Uploaded Project files (binaries). The largest gap and the one most teams discover too late. The
file_manifestships filename, size, upload date, and MIME type — but not the bytes. PDFs, CSVs, code files, images uploaded as Project knowledge are not in the ZIP. Three recovery paths: (a) pre-export download — before requesting the export, navigate to each Project in the UI and download each file via the file's three-dot menu (this is the only path; there's no bulk-download API as of this writing); (b) document-the-gap — accept that conversations reference Project files by name only and the binaries live in OpenAI's storage until the Project is deleted; (c) re-upload-on-restore — if the archive is being restored to a new Project (e.g. for handoff to another team), restore conversations first and re-upload Project files manually using the manifest as a checklist. Path (a) is the only one that produces a self-contained archive. - Custom GPT configuration. When a Project binds a Custom GPT (the
bound_gpt_idfield above), each conversation that used the GPT carries thegizmo_idin per-turn metadata. But the GPT's actual configuration — its system prompt, attached knowledge files, defined actions — is NOT in the Project owner's export. Custom GPTs are owned by their creator's account and exported separately when that creator runs their own export. For a Project that uses a shared Custom GPT created by a teammate, the export captures the transcripts but not the GPT's behavior definition. Mitigation: have the GPT creator run a separate export and store it alongside the Project archive; or document the GPT's prompt manually in the per-Project README so a future reader knows the inputs that produced the conversations. - Per-Project model defaults. A Project can pin a default model (e.g. always GPT-4o, always o3-mini) that overrides the user's account default. The pinned model is not stored as a Project metadata field; it's only inferable from the per-turn
model_slugfield in each message. For most archives this doesn't matter — the conversation transcript is the same regardless of model — but for teams that pin specific models for cost or capability reasons, the override is part of the Project's intent and should be documented in the per-Project README.
Two non-gaps worth naming explicitly: (1) Project conversations DO carry full DAG history, including alternate branches via the mapping tree — the Project boundary doesn't change conversation shape; (2) archived Projects ARE in the export as long as the Project hasn't been hard-deleted (archived sets archived_at but keeps the data; deletion removes it from future exports).
The conversion script — rebuild per-Project directories
The script below routes each conversation from the flat export into out/<project_id>/, writes a per-Project README with custom instructions and file manifest, and produces a top-level INDEX.md listing every Project. The output layout matches the symmetric Claude Project archive shape so a multi-platform audit can run the same downstream tools on either output.
#!/usr/bin/env bash
# split-chatgpt-projects.sh — rebuild per-Project directories from a flat ChatGPT export
# Usage: ./split-chatgpt-projects.sh path/to/export-extracted/ ./out
set -euo pipefail
src=$1
out=$2
mkdir -p "$out"
# 1. Per-Project READMEs from projects.json
jq -c '.[]' "$src/projects.json" | while IFS= read -r proj; do
pid=$(echo "$proj" | jq -r '.id')
pname=$(echo "$proj" | jq -r '.name')
pdir="$out/$pid"
mkdir -p "$pdir/conversations" "$pdir/files"
# README captures the Project's intent — what conversations don't tell you on their own
{
echo "# $pname"
echo
echo "$(echo "$proj" | jq -r '.description // "(no description)"')"
echo
echo "## Custom instructions"
echo
echo '```'
echo "$proj" | jq -r '.custom_instructions // "(none set)"'
echo '```'
echo
echo "## File manifest (binaries NOT included — see files/ for any pre-export downloads)"
echo
echo "$proj" | jq -r '.file_manifest // [] | map("- " + .name + " (" + (.size | tostring) + " bytes, " + .mime + ")") | .[]'
echo
echo "## Bound Custom GPT"
echo
bgpt=$(echo "$proj" | jq -r '.bound_gpt_id // empty')
if [ -n "$bgpt" ]; then
echo "Project bound to GPT \`$bgpt\`. The GPT's configuration is NOT in this export — request it separately from the GPT's owner."
else
echo "(no GPT bound; conversations used the account default model)"
fi
} > "$pdir/README.md"
done
# 2. Per-conversation files routed by project_id
jq -c '.[]' "$src/conversations.json" | while IFS= read -r conv; do
pid=$(echo "$conv" | jq -r '.project_id // empty')
cid=$(echo "$conv" | jq -r '.id')
if [ -n "$pid" ]; then
mkdir -p "$out/$pid/conversations"
echo "$conv" > "$out/$pid/conversations/$cid.json"
fi
done
# 3. Top-level INDEX listing every Project archived
{
echo "# ChatGPT Projects archive"
echo
echo "Generated $(date -u +%Y-%m-%dT%H:%M:%SZ) from \`$src\`."
echo
for d in "$out"/*/; do
pid=$(basename "$d")
pname=$(jq -r --arg id "$pid" '.[] | select(.id == $id) | .name' "$src/projects.json")
cc=$(ls "$d/conversations" 2>/dev/null | wc -l)
fc=$(jq -r --arg id "$pid" '.[] | select(.id == $id) | (.file_manifest // []) | length' "$src/projects.json")
echo "- [$pname]($pid/README.md) — $cc conversations, $fc files (manifest only)"
done
} > "$out/INDEX.md"
echo "split $(jq -r 'length' "$src/projects.json") projects into $out/"
The script is idempotent — re-running with the same source overwrites cleanly. Conversations without a project_id (i.e. non-Project chats) are intentionally skipped; if you want them archived too, route them to out/_unscoped/ with a one-line addition. The output of running this against a real export is exactly the layout the WhyChose extractor expects when invoked with the --per-project flag.
Three sanity checks for the converted archive
- Project count matches UI.
ls out/ | grep -v INDEX | wc -lshould equal the Project count from ChatGPT → Projects sidebar (counting both active and archived). Off-by-one usually means a Project was hard-deleted between request and build; off by more usually means the export was generated before a recently-created Project propagated to the export builder (wait 24 hours and re-request). - Per-Project conversation counts match. For each Project,
ls out/<pid>/conversations/ | wc -lshould equal the conversation count visible in that Project's sidebar tray. Mismatch usually means individual conversations were deleted from the Project before export; the export reflects the state at request time, not the state at usage time. - File manifest covers every UI-visible file. Open each Project's README.md and cross-check the file manifest against the file list visible in Project → Files. Missing files in the manifest usually mean the file was uploaded after the export was queued; re-request to capture them. Files marked archived in the UI are still in the manifest — archival doesn't remove from manifest.
The Project-as-decision-boundary framing
For a senior engineer or CTO using ChatGPT Projects, the Project boundary IS context. A conversation in the "Q2 Mobile Rewrite" Project starts with a different mental model than a conversation in the "Hiring decisions" Project — the same prompt produces different reasoning because the custom instructions, file context, and conversation history all differ. A flat export without per-Project routing throws that context away.
The decision-archaeology question is rarely "what did ChatGPT say about X." It's almost always "what did the team conclude in the architecture initiative we were running last quarter." That question requires the Project boundary to be queryable. The recipe above puts it back.
For multi-platform users (ChatGPT + Claude + Gemini), the symmetric Claude Project export recipe produces an identical directory layout from a Claude export. The two scripts together let you build a unified per-Project archive across both platforms — useful when teams use ChatGPT for some Projects and Claude for others (or both for the same Project, comparing outputs). The Gemini conversation export recipe doesn't have a Project boundary to preserve (Gemini doesn't have an equivalent feature) but the converted output drops into the same archive shape with project_id: "_unscoped" for downstream tooling.
How WhyChose fits in
The Projects boundary is exactly the access boundary the WhyChose extractor preserves as a first-class column in the decision log. Drop the per-Project archive into the uploader (or pipe it to the CLI) and the extractor surfaces decision-shaped exchanges with project_id as a filterable field — so "every durable decision made inside Q2 Mobile Rewrite" is a one-filter query, not a manual group-by. The custom_instructions from each Project's README.md becomes the system-prompt context the extractor uses to score decision relevance (a conversation about latency budgets in a Project whose custom instructions say "target 200ms" produces higher-scored decision records than the same conversation in an unscoped chat). The Pro tier exports decision logs to Notion / Linear / Obsidian with the Project as a Notion database property or Linear label; the Team tier ships shared decision logs scoped by Project membership — the same access boundary the export carries forward.
Related questions
Does the standard ChatGPT export include Project conversations?
Yes — every conversation from every Project is in conversations.json, identified by a project_id field. Project metadata is in a separate projects.json. The structure is flat; the script in this page rebuilds the per-Project directory tree.
Are Project files (PDFs, code, data) in the export?
No — only the file manifest (filename, size, upload date, MIME type). Three recovery paths: pre-export download via the Projects UI (the only path to recover binaries), document-the-gap, or re-upload-on-restore. There is no bulk-download API as of this writing.
What about Custom GPTs bound to a Project?
The conversation references the GPT by gizmo_id in per-turn metadata, but the GPT's configuration (system prompt, knowledge files, actions) is NOT in the export. Custom GPTs are owned by their creator and exported separately. Have the GPT creator run their own export, or document the GPT's prompt in the per-Project README.
How do I archive an old Project so I can answer questions about it later?
Run the conversion script in this page to rebuild out/<project_id>/ with conversations + README. Pre-export, manually download Project files via the UI and place them in out/<project_id>/files/. The archive layout matches the Claude Project archive recipe so multi-platform tooling works on either.
Why does my export have project_id on some conversations but projects.json is empty?
This means the Project was deleted between conversation creation and export request. The conversation still carries the project_id reference (immutable in the conversation record), but the Project metadata is gone. Treat these as orphan Project conversations — the per-conversation reasoning is intact, but the Project-level custom instructions and file manifest are unrecoverable.
Further reading
- How to export your ChatGPT history — the prerequisite; covers the export flow, what's in the ZIP top-level, and the 72-hour download window.
- ChatGPT conversations.json format reference — the per-conversation schema; same DAG shape applies to Project conversations.
- How to export a Claude Project — the symmetric Anthropic-side recipe; output layout matches so multi-platform archives use the same tooling.
- ChatGPT export not working — eight failure modes — Mode 5 (missing custom GPTs + Memory + images) covers the binary-file gap shared by Project files.
- Convert your ChatGPT export to Markdown — the conversion script works on Project conversations too; route by project_id and write per-Project Markdown into
out/<project_id>/. - How to search your ChatGPT history (the four levels) — the search levels apply identically to Project archives; Level 3 jq recipes work better on the per-Project directory tree.
- Extract decisions from your ChatGPT chats — the level-4 step beyond search; the extractor reads per-Project archives natively.
- Claude Team workspace export — the cross-platform parallel for shared workspaces; the workspace export shape mirrors the multi-Project archive this script produces.
- Gemini conversation export — Gemini doesn't have Projects, but the converted output drops into the same archive shape under
_unscoped. - The open-source extractor — reads per-Project archives natively;
project_idis a first-class column in the decision log output.