Topic: chatgpt file uploads export
Uploaded Files in ChatGPT Exports — What's Included, What's Missing, and How to Recover Them
You downloaded your ChatGPT data export, unzipped it, opened conversations.json, and your PDF analysis isn't there. Your Code Interpreter charts are missing. Your DALL-E images are 404s. None of this is a bug — it's how ChatGPT's data model works. Files you upload to ChatGPT are treated as temporary session context, not permanent user data. The conversation text that references and discusses those files IS in the export; the files themselves are not. The same applies to artifacts that ChatGPT generated for you: Code Interpreter outputs, processed datasets, and DALL-E images are not in conversations.json as binary content. This page documents exactly what appears in conversations.json for each file type, why OpenAI made this design choice, what expires and when, and how to recover each category of missing content.
TL;DR
The ChatGPT data export contains the conversation text that references files, not the files themselves. Uploaded PDFs: file name only. Code Interpreter outputs: text description and Python code, no binary. DALL-E images: expiring CDN URLs that 404 after ~30 days. Recovery: uploaded files come from your local original; Code Interpreter outputs must be downloaded during the session; DALL-E images must be saved immediately. The decision reasoning you worked through in the conversation IS preserved in conversations.json — that's the part WhyChose extracts.
The file inclusion table: what's in the export and what isn't
| Content type | In conversations.json? | What's actually there | How to recover the binary |
|---|---|---|---|
| PDF you uploaded | Partial — name only | File name in message content; conversation text discussing the PDF's contents | Use your local original — it's still on your machine or in the cloud storage you uploaded from |
| Image you uploaded (JPG, PNG, etc.) | Partial — name only | File name reference; ChatGPT's vision analysis in the assistant message | Your original local file |
| Spreadsheet / CSV you uploaded | Partial — name and sometimes preview text | File name; some row/column previews ChatGPT included in its response | Your original local file |
| Code files you uploaded (.py, .js, etc.) | Yes — text content often included | Code file contents frequently appear in the conversation as quoted text blocks | Your original; or extract from conversations.json if ChatGPT quoted it |
| Code Interpreter outputs (.csv, .xlsx, .png charts) | No | Python code that generated the output (yes); text description of the output (yes); the binary file (no) | Download during the session via the Download button; re-run the code locally from conversations.json |
| DALL-E generated images | Expiring URL | CDN URL (valid ~30 days after generation); no binary content | Save immediately in the session; no recovery path after URL expiry |
| Voice mode transcripts | Yes — text transcription | The transcribed text appears in the conversation as a user or assistant message | N/A — text transcript is preserved; original audio is not stored by OpenAI |
| Web search results (SearchGPT / Browse) | Partial | The answer text citing web sources; source URLs cited in the assistant message; not the full fetched page content | Re-visit source URLs from the conversation text |
Why your uploaded files aren't in the export
ChatGPT's data model distinguishes between data about you and data you contributed temporarily. Your conversations are data about you — they record your questions, your reasoning, your decisions. Under GDPR Article 20 (data portability), OpenAI is obligated to return them. Files you uploaded are different: they're your own existing documents, which you already have locally. OpenAI's position is that returning your own PDF to you via a data export is not data portability — you never lost access to the file; you uploaded a copy temporarily for ChatGPT to read.
There is also a practical storage reason. ChatGPT handles millions of file uploads per day. Storing every uploaded PDF permanently — and including it in every user's data export — would be prohibitively expensive and is not required by any regulatory framework. OpenAI stores uploaded files in temporary session storage with a retention window (the exact duration is not published in OpenAI's documentation, but anecdotal reports suggest 30–60 days after the last conversation where the file was referenced). After that window, the file is deleted from OpenAI's servers. Your conversations.json retains the file name and the conversation that discussed the file, but the file itself is gone from OpenAI's side.
The practical implication: ChatGPT is not a file storage service. Files you upload are session context for the duration of your analysis; they are not backed up by OpenAI on your behalf. If you need to preserve the output of a ChatGPT analysis session, download Code Interpreter outputs immediately, save DALL-E images before the session ends, and keep your original uploaded files in your own storage.
What conversations.json actually contains for file uploads
When you upload a file and send a message, the resulting JSON in conversations.json looks something like this:
{
"id": "msg_abc123",
"author": { "role": "user" },
"content": {
"content_type": "multimodal_text",
"parts": [
{
"content_type": "text",
"text": "Summarize the key findings in this report."
},
{
"content_type": "image_asset_pointer",
"asset_pointer": "file-service://file-XXXXXXXXXXXXXXXX",
"size_bytes": 1234567,
"width": null,
"height": null,
"fovea": null,
"metadata": {
"dalle": null,
"gizmo": null,
"sanitized": false,
"is_file_upload": true,
"file_name": "q3-strategy-report.pdf",
"file_type": "pdf"
}
}
]
}
}
Key fields to know:
asset_pointer: The internal file-service URI used to reference the file within OpenAI's systems. This is not a publicly accessible URL — you cannot curl it, download from it, or access the file via this pointer. It's an internal identifier, not a download link.file_name: The original filename you provided when uploading. This is preserved in the export.file_type: The MIME category (pdf, image, etc.).size_bytes: The file size at upload time — useful for verifying if you locate the original file.
The binary content of the file is nowhere in conversations.json. If you grep for the file content (say, a specific string from the PDF), you may find ChatGPT's summary or quoted text from the document in the assistant's response messages — but only what ChatGPT chose to quote, not the full file.
The DALL-E case: expiring URLs you may not have saved
DALL-E generated images are the most commonly lost content category because users often don't realize the URLs expire. When ChatGPT generates an image with DALL-E, conversations.json contains something like:
{
"content_type": "image_asset_pointer",
"asset_pointer": "file-service://file-YYYYYYYYYYYYYYYY",
"metadata": {
"dalle": {
"gen_id": "gen-ZZZZZZZZZZ",
"prompt": "a minimalist flat-design illustration of...",
"seed": 1234567890,
"parent_gen_id": null,
"edit_op": null,
"serialization_title": "DALL-E generation"
}
}
}
And in some versions of the export, you'll also find a parts array where the image is represented as a URL:
{
"content_type": "tether_browsing_display",
"result": "https://oaidalleapiprodscus.blob.core.windows.net/private/org-XXX/user-YYY/img-ZZZ.png?..."
}
That URL has a signed expiry embedded in the query string. After the expiry (typically 30 days from generation), the URL returns a 403 Forbidden or 404 Not Found. OpenAI does not offer a way to regenerate expired DALL-E image URLs through the user interface or the data export.
What you do have after expiry: The DALL-E prompt (the exact text description you or ChatGPT used to generate the image). If you need to recreate the image, you can use the same prompt — the output won't be pixel-identical (DALL-E generation includes stochastic seed values) but a similar prompt will produce a similar result. The gen_id and seed values in conversations.json are not accepted as inputs to regenerate the exact image via the API.
Prevention: Right-click, Save Image As on every DALL-E image you intend to keep. Do this during the session or within a few days of generation — don't rely on the conversations.json URL as a permanent reference.
Code Interpreter: what survives and what doesn't
Code Interpreter (now called Advanced Data Analysis in some ChatGPT interfaces) sessions produce several types of content with different export behavior:
What IS in conversations.json for Code Interpreter sessions
- The Python code that was run. Every code cell executed in a Code Interpreter session appears in the assistant's message content as a fenced code block. If you need to reproduce the analysis, the code is fully recoverable from conversations.json.
- ChatGPT's text description of the output. The assistant's response after running code typically describes what the output showed: "The chart shows a steady increase from Q1 to Q3, with a 23% peak in August." This description is in conversations.json.
- Error messages and stack traces. If the code failed, the error message is in the assistant's response. Useful for debugging.
- Your follow-up questions and ChatGPT's interpretations. The full dialogue is preserved.
What is NOT in conversations.json for Code Interpreter sessions
- The generated files. The .png chart, the .csv output, the processed .xlsx file — none of these are in conversations.json as binary content. They appear as download links during the session (temporary CDN URLs similar to DALL-E images) and expire on the same ~30-day schedule.
- The uploaded input data. If you uploaded a spreadsheet for Code Interpreter to analyze, the spreadsheet binary is not in conversations.json for the same reasons as any other uploaded file (name only, no content).
Reproducing Code Interpreter outputs from conversations.json
Because the Python code is preserved in conversations.json, many Code Interpreter analyses are reproducible locally:
# Extract all Python code blocks from a conversations.json export
import json, re
with open('conversations.json') as f:
data = json.load(f)
for conv in data:
for node in conv.get('mapping', {}).values():
msg = node.get('message', {})
if msg.get('author', {}).get('role') == 'assistant':
parts = msg.get('content', {}).get('parts', [])
for part in parts:
if isinstance(part, str):
# Extract Python fenced code blocks
blocks = re.findall(r'```python\n(.*?)```', part, re.DOTALL)
for block in blocks:
print(block)
print('---')
Run this against your conversations.json to extract all Python code from Code Interpreter sessions. You can then run the code locally (pip install the required libraries if needed) using your original input data files to regenerate the outputs.
The contrarian insight: conversation text is the most portable asset
Most users assume that the files they brought to ChatGPT are the fragile part — they uploaded them, so ChatGPT should return them. In practice, the opposite is true. Files you uploaded still exist on your local machine. The fragile content is what ChatGPT generated: the charts, the DALL-E images, the processed datasets — these are the artifacts at risk of permanent loss. Your conversation text (the reasoning, the analysis, the decisions you worked through) is the most portable part of the export. It's fully in conversations.json, fully readable, and not subject to expiry.
This has a practical implication for how to treat ChatGPT sessions. The conversation thread where you worked through a decision — weighing trade-offs, considering alternatives, reaching a conclusion — is permanently preserved in your data export. The DALL-E visualization you generated to illustrate the decision is not. Treat text as permanent; treat generated binaries as ephemeral.
Project Knowledge files: same pattern, additional gap
If you use ChatGPT Projects, the Project Knowledge files — PDFs, documents, and reference materials you uploaded to a Project's knowledge base — follow the same inclusion rules as session uploads. The Project's metadata (custom instructions, bound Custom GPT ID, file manifest) is in a sibling projects.json file in the export ZIP. But the actual content of the knowledge files is not included in the export.
The projects.json entry for a Project includes a files array with file names and sizes — confirming which files were in the knowledge base — but the binary content is absent. This is the same gap as regular session uploads, now extended to Project-persistent knowledge. Teams that use ChatGPT Projects as a team knowledge base and want to export or migrate that knowledge need to maintain their own copies of the source files.
How WhyChose fits in
The conversation content that IS in your conversations.json export — the reasoning you worked through, the trade-offs you named, the decisions you reached — is exactly what the WhyChose extractor targets. Even when the supporting files (the PDF you analyzed, the chart that illustrated the decision) are gone, the decision thread is preserved. The extractor identifies decision-shaped conversations and structures them as a searchable log: "Chose Postgres over MongoDB for the user service," "Priced Pro at $9 after analyzing the $7 vs $12 trade-off." The conversation text where that reasoning happened is fully recoverable from conversations.json, regardless of whether the files attached to that conversation are still available.
Related questions
Does the ChatGPT data export include files I uploaded?
No. The ChatGPT data export (conversations.json) includes the file name and the conversation text that references the file, but not the binary content of the file itself. This applies to PDFs, images, spreadsheets, and any other files you uploaded. The file is treated as temporary session context, not permanent user data. To recover uploaded files, use your original local copy — you have it; ChatGPT doesn't keep it permanently.
Where are my Code Interpreter outputs in the ChatGPT data export?
Code Interpreter outputs (charts, processed CSVs, generated files) are not included as binaries in conversations.json. The Python code that generated the outputs IS in the export (in the assistant's code cell messages), as are text descriptions of the outputs. The actual .png, .csv, or .xlsx files must be downloaded during the session — after the session, they expire on roughly the same ~30-day schedule as other ChatGPT-generated content. To regenerate them, extract the Python code from conversations.json (using the script on this page) and re-run it locally with your original input data.
Why are my DALL-E image URLs broken in the ChatGPT export?
DALL-E images appear in conversations.json as signed CDN URLs that expire approximately 30 days after generation. After expiry, the URL returns 403 or 404. There's no recovery path once the URL has expired. To prevent this: download DALL-E images immediately after generation (right-click, Save Image As) during the session. The DALL-E prompt that generated the image is preserved in conversations.json, so you can regenerate a similar image — but not the exact same pixel output.
How do I recover files I uploaded to ChatGPT?
Files you uploaded to ChatGPT (PDFs, images, spreadsheets) are your own documents. They still exist wherever you originally had them — your local machine, Google Drive, Dropbox, or wherever you stored them. ChatGPT does not keep permanent copies of uploaded files and does not include their binary content in data exports. The conversation text that discussed the file content is in conversations.json. If the file came from a one-time download and you deleted it locally, check your browser's Downloads folder, your Trash/Recycle Bin, and any backups you maintain.
Further reading
- How to export your ChatGPT history (2026 guide) — the prerequisite: how to request and download the data export ZIP that contains conversations.json and the file references documented on this page. The step-by-step walkthrough covers the Settings → Data Controls → Export data flow and what to expect in the ZIP contents.
- ChatGPT conversations.json format — field reference — the full schema reference for conversations.json, including the
content_type: "multimodal_text"message parts where file attachment references appear. Reading this alongside the current page clarifies exactly which JSON fields contain file names, asset pointers, and DALL-E metadata versus the binary content that's absent. - ChatGPT export not working? Eight failure modes and how to recover — the troubleshooting companion for when the export process itself has problems. Missing files in a successful export (this page's topic) is distinct from the export failing to complete — if you're not sure which problem you have, check this page's eight failure modes first to rule out an export process issue before concluding that the content is simply absent by design.
- ChatGPT Projects export — custom instructions, files, and conversation routing — the Project-specific export reference, including the
projects.jsonfile manifest that lists Project Knowledge files by name and size. The same binary-exclusion rule applies to Project Knowledge files as to session uploads: names are in the export, binary content is not. - ChatGPT Memory export — where your memories live in the data download — the companion "what's included" reference for the memory.json file that rides alongside conversations.json in the same export ZIP. Memory entries are fully included (they're text, not binaries); this contrast with file uploads illustrates the design principle: text content is portable data, binary files are not.
- ChatGPT Custom GPTs export — conversations vs configurations, what's included and what's not — a parallel "what's included" reference for Custom GPT content: GPT conversations are in conversations.json (with gizmo_id), but the knowledge files uploaded to the GPT Builder are not accessible via the standard data export — the same binary exclusion applies.
- How to extract decisions from your ChatGPT chats — what to do with the conversation content that IS preserved in conversations.json: the decision reasoning, trade-off analysis, and conclusions you worked through are fully portable even when the files that supported that analysis are gone.
- ChatGPT Team export — differences from Plus, workspace admin flow, and the Compliance API — the workspace-scoped extension of the export format; Team exports add audit-log.jsonl and workspace metadata, but the same file exclusion rules apply: conversation text is included, uploaded file binaries are not, regardless of whether the export is personal or workspace-admin initiated.
- ChatGPT web search in conversations.json — tether content types, what's stored, and how to extract cited URLs — the companion reference for another category of content that requires non-obvious extraction: when ChatGPT browses the web, the search queries, cited URLs, and page snippets appear in
tether_browsing_displayandtether_quotenodes in the mapping DAG. Like uploaded file binaries, full page HTML is not stored — only the extracted snippets. Unlike uploaded files, the text content (query, title, URL, snippet) IS fully preserved. - ChatGPT image generation in the data export — DALL-E prompts, CDN URLs, and what expires — the dedicated reference for DALL-E 3 and GPT-4o native image generation in conversations.json. Goes deeper on the image-specific export schema: the dalle.text2im tool invocation structure, the revised_prompt that DALL-E actually used (often significantly different from your input), the CDN URL expiry timeline, bulk-check scripts, and Python extraction of all image prompts for regeneration. The CDN URL expiry is a specific failure mode that this page documents but doesn't detail.