Topic: chatgpt file uploads export

Uploaded Files in ChatGPT Exports — What's Included, What's Missing, and How to Recover Them

Q: Does the ChatGPT data export include files I uploaded?

No, the ChatGPT data export (conversations.json) does not include the binary content of files you uploaded. What is included is the file name as it appears in the conversation, and the conversation text that references the file. The actual PDF, image, spreadsheet, or other binary is not in the export ZIP. This is by design: ChatGPT stores uploaded files as temporary session context with an expiry period (typically 30–60 days after the last conversation where the file was used), not as permanent user data. OpenAI's rationale is that you already have the file locally — it's your own document that you provided. The export omits it to avoid redundancy. To recover the file, use your original local copy. If the file came from a third-party source and you no longer have it, the conversation text that references and discusses the file is in the export, but the file itself is not.

Q: Where are my Code Interpreter outputs in the ChatGPT data export?

Code Interpreter session outputs — charts, CSVs, processed files, and other artifacts generated by ChatGPT during a Code Interpreter session — are not included as binary files in the data export. What is in the export: the Python code that was run (as text in the assistant's message content), ChatGPT's text description of what the code produced, and any follow-up conversation about the output. The actual generated files (the .png chart, the .csv spreadsheet, the cleaned dataset) are not in the export. The only reliable way to preserve Code Interpreter outputs is to download them during the session using the Download button that appears below each generated file in the ChatGPT interface. Once the session is closed and the temporary file storage expires (usually within 30 days), the files are no longer accessible. The code that produced them is still in conversations.json, so a reproducible Code Interpreter session can regenerate the outputs — but only if the input data was not itself an uploaded file that has since expired.

Q: Why are my DALL-E image URLs broken in the ChatGPT export?

DALL-E generated images appear in conversations.json as URLs pointing to OpenAI's CDN (typically oaidalleapiprodscus.blob.core.windows.net or files.oaiusercontent.com). These URLs expire after approximately 30 days. After expiry, the URL in conversations.json returns a 404 or an access-denied error. This is the most common source of the 'broken images in my export' complaint. The only way to preserve DALL-E outputs is to download the image during or shortly after the session — right-click, Save Image As. Once the URL has expired, the image is not recoverable from OpenAI's servers via the data export or any other user-accessible path. If you have a ChatGPT Plus or Pro subscription and used the My GPTs DALL-E capability, some images may persist longer in a separate Gallery view accessible from your account settings, but this is not guaranteed and the Gallery is also subject to storage limits.

Q: How do I recover files I uploaded to ChatGPT?

Files you uploaded to ChatGPT (PDFs, images, spreadsheets, code files) are not stored permanently by OpenAI and are not in the data export. To recover them: first, check your original source — you uploaded the file from somewhere, and that source still has the original. If the file came from your local machine, it's still there. If it came from a cloud storage service (Google Drive, Dropbox, SharePoint), it's there. If it was a downloaded file from the web, check your browser's Downloads folder or your local file system. The conversation text that references and discusses the file content is in conversations.json, so you have ChatGPT's analysis and your follow-up questions — just not the file itself. For Code Interpreter outputs specifically: if the Python code that produced the output is in conversations.json, you can re-run it (in Code Interpreter or locally) to regenerate the output, provided the input data is still available.

You downloaded your ChatGPT data export, unzipped it, opened conversations.json, and your PDF analysis isn't there. Your Code Interpreter charts are missing. Your DALL-E images are 404s. None of this is a bug — it's how ChatGPT's data model works. Files you upload to ChatGPT are treated as temporary session context, not permanent user data. The conversation text that references and discusses those files IS in the export; the files themselves are not. The same applies to artifacts that ChatGPT generated for you: Code Interpreter outputs, processed datasets, and DALL-E images are not in conversations.json as binary content. This page documents exactly what appears in conversations.json for each file type, why OpenAI made this design choice, what expires and when, and how to recover each category of missing content.

TL;DR

The ChatGPT data export contains the conversation text that references files, not the files themselves. Uploaded PDFs: file name only. Code Interpreter outputs: text description and Python code, no binary. DALL-E images: expiring CDN URLs that 404 after ~30 days. Recovery: uploaded files come from your local original; Code Interpreter outputs must be downloaded during the session; DALL-E images must be saved immediately. The decision reasoning you worked through in the conversation IS preserved in conversations.json — that's the part WhyChose extracts.

The file inclusion table: what's in the export and what isn't

Content type	In conversations.json?	What's actually there	How to recover the binary
PDF you uploaded	Partial — name only	File name in message content; conversation text discussing the PDF's contents	Use your local original — it's still on your machine or in the cloud storage you uploaded from
Image you uploaded (JPG, PNG, etc.)	Partial — name only	File name reference; ChatGPT's vision analysis in the assistant message	Your original local file
Spreadsheet / CSV you uploaded	Partial — name and sometimes preview text	File name; some row/column previews ChatGPT included in its response	Your original local file
Code files you uploaded (.py, .js, etc.)	Yes — text content often included	Code file contents frequently appear in the conversation as quoted text blocks	Your original; or extract from conversations.json if ChatGPT quoted it
Code Interpreter outputs (.csv, .xlsx, .png charts)	No	Python code that generated the output (yes); text description of the output (yes); the binary file (no)	Download during the session via the Download button; re-run the code locally from conversations.json
DALL-E generated images	Expiring URL	CDN URL (valid ~30 days after generation); no binary content	Save immediately in the session; no recovery path after URL expiry
Voice mode transcripts	Yes — text transcription	The transcribed text appears in the conversation as a user or assistant message	N/A — text transcript is preserved; original audio is not stored by OpenAI
Web search results (SearchGPT / Browse)	Partial	The answer text citing web sources; source URLs cited in the assistant message; not the full fetched page content	Re-visit source URLs from the conversation text

Why your uploaded files aren't in the export

ChatGPT's data model distinguishes between data about you and data you contributed temporarily. Your conversations are data about you — they record your questions, your reasoning, your decisions. Under GDPR Article 20 (data portability), OpenAI is obligated to return them. Files you uploaded are different: they're your own existing documents, which you already have locally. OpenAI's position is that returning your own PDF to you via a data export is not data portability — you never lost access to the file; you uploaded a copy temporarily for ChatGPT to read.

There is also a practical storage reason. ChatGPT handles millions of file uploads per day. Storing every uploaded PDF permanently — and including it in every user's data export — would be prohibitively expensive and is not required by any regulatory framework. OpenAI stores uploaded files in temporary session storage with a retention window (the exact duration is not published in OpenAI's documentation, but anecdotal reports suggest 30–60 days after the last conversation where the file was referenced). After that window, the file is deleted from OpenAI's servers. Your conversations.json retains the file name and the conversation that discussed the file, but the file itself is gone from OpenAI's side.

The practical implication: ChatGPT is not a file storage service. Files you upload are session context for the duration of your analysis; they are not backed up by OpenAI on your behalf. If you need to preserve the output of a ChatGPT analysis session, download Code Interpreter outputs immediately, save DALL-E images before the session ends, and keep your original uploaded files in your own storage.

What conversations.json actually contains for file uploads

When you upload a file and send a message, the resulting JSON in conversations.json looks something like this:

{
  "id": "msg_abc123",
  "author": { "role": "user" },
  "content": {
    "content_type": "multimodal_text",
    "parts": [
      {
        "content_type": "text",
        "text": "Summarize the key findings in this report."
      },
      {
        "content_type": "image_asset_pointer",
        "asset_pointer": "file-service://file-XXXXXXXXXXXXXXXX",
        "size_bytes": 1234567,
        "width": null,
        "height": null,
        "fovea": null,
        "metadata": {
          "dalle": null,
          "gizmo": null,
          "sanitized": false,
          "is_file_upload": true,
          "file_name": "q3-strategy-report.pdf",
          "file_type": "pdf"
        }
      }
    ]
  }
}

Key fields to know:

asset_pointer: The internal file-service URI used to reference the file within OpenAI's systems. This is not a publicly accessible URL — you cannot curl it, download from it, or access the file via this pointer. It's an internal identifier, not a download link.
file_name: The original filename you provided when uploading. This is preserved in the export.
file_type: The MIME category (pdf, image, etc.).
size_bytes: The file size at upload time — useful for verifying if you locate the original file.

The binary content of the file is nowhere in conversations.json. If you grep for the file content (say, a specific string from the PDF), you may find ChatGPT's summary or quoted text from the document in the assistant's response messages — but only what ChatGPT chose to quote, not the full file.

The DALL-E case: expiring URLs you may not have saved

DALL-E generated images are the most commonly lost content category because users often don't realize the URLs expire. When ChatGPT generates an image with DALL-E, conversations.json contains something like:

{
  "content_type": "image_asset_pointer",
  "asset_pointer": "file-service://file-YYYYYYYYYYYYYYYY",
  "metadata": {
    "dalle": {
      "gen_id": "gen-ZZZZZZZZZZ",
      "prompt": "a minimalist flat-design illustration of...",
      "seed": 1234567890,
      "parent_gen_id": null,
      "edit_op": null,
      "serialization_title": "DALL-E generation"
    }
  }
}

And in some versions of the export, you'll also find a parts array where the image is represented as a URL:

{
  "content_type": "tether_browsing_display",
  "result": "https://oaidalleapiprodscus.blob.core.windows.net/private/org-XXX/user-YYY/img-ZZZ.png?..."
}

That URL has a signed expiry embedded in the query string. After the expiry (typically 30 days from generation), the URL returns a 403 Forbidden or 404 Not Found. OpenAI does not offer a way to regenerate expired DALL-E image URLs through the user interface or the data export.

What you do have after expiry: The DALL-E prompt (the exact text description you or ChatGPT used to generate the image). If you need to recreate the image, you can use the same prompt — the output won't be pixel-identical (DALL-E generation includes stochastic seed values) but a similar prompt will produce a similar result. The gen_id and seed values in conversations.json are not accepted as inputs to regenerate the exact image via the API.

Prevention: Right-click, Save Image As on every DALL-E image you intend to keep. Do this during the session or within a few days of generation — don't rely on the conversations.json URL as a permanent reference.

Code Interpreter: what survives and what doesn't

Code Interpreter (now called Advanced Data Analysis in some ChatGPT interfaces) sessions produce several types of content with different export behavior:

What IS in conversations.json for Code Interpreter sessions

The Python code that was run. Every code cell executed in a Code Interpreter session appears in the assistant's message content as a fenced code block. If you need to reproduce the analysis, the code is fully recoverable from conversations.json.
ChatGPT's text description of the output. The assistant's response after running code typically describes what the output showed: "The chart shows a steady increase from Q1 to Q3, with a 23% peak in August." This description is in conversations.json.
Error messages and stack traces. If the code failed, the error message is in the assistant's response. Useful for debugging.
Your follow-up questions and ChatGPT's interpretations. The full dialogue is preserved.

What is NOT in conversations.json for Code Interpreter sessions

The generated files. The .png chart, the .csv output, the processed .xlsx file — none of these are in conversations.json as binary content. They appear as download links during the session (temporary CDN URLs similar to DALL-E images) and expire on the same ~30-day schedule.
The uploaded input data. If you uploaded a spreadsheet for Code Interpreter to analyze, the spreadsheet binary is not in conversations.json for the same reasons as any other uploaded file (name only, no content).

Reproducing Code Interpreter outputs from conversations.json

Because the Python code is preserved in conversations.json, many Code Interpreter analyses are reproducible locally:

# Extract all Python code blocks from a conversations.json export
import json, re

with open('conversations.json') as f:
    data = json.load(f)

for conv in data:
    for node in conv.get('mapping', {}).values():
        msg = node.get('message', {})
        if msg.get('author', {}).get('role') == 'assistant':
            parts = msg.get('content', {}).get('parts', [])
            for part in parts:
                if isinstance(part, str):
                    # Extract Python fenced code blocks
                    blocks = re.findall(r'```python\n(.*?)```', part, re.DOTALL)
                    for block in blocks:
                        print(block)
                        print('---')

Run this against your conversations.json to extract all Python code from Code Interpreter sessions. You can then run the code locally (pip install the required libraries if needed) using your original input data files to regenerate the outputs.

The contrarian insight: conversation text is the most portable asset

Most users assume that the files they brought to ChatGPT are the fragile part — they uploaded them, so ChatGPT should return them. In practice, the opposite is true. Files you uploaded still exist on your local machine. The fragile content is what ChatGPT generated: the charts, the DALL-E images, the processed datasets — these are the artifacts at risk of permanent loss. Your conversation text (the reasoning, the analysis, the decisions you worked through) is the most portable part of the export. It's fully in conversations.json, fully readable, and not subject to expiry.

This has a practical implication for how to treat ChatGPT sessions. The conversation thread where you worked through a decision — weighing trade-offs, considering alternatives, reaching a conclusion — is permanently preserved in your data export. The DALL-E visualization you generated to illustrate the decision is not. Treat text as permanent; treat generated binaries as ephemeral.

Project Knowledge files: same pattern, additional gap

If you use ChatGPT Projects, the Project Knowledge files — PDFs, documents, and reference materials you uploaded to a Project's knowledge base — follow the same inclusion rules as session uploads. The Project's metadata (custom instructions, bound Custom GPT ID, file manifest) is in a sibling projects.json file in the export ZIP. But the actual content of the knowledge files is not included in the export.

The projects.json entry for a Project includes a files array with file names and sizes — confirming which files were in the knowledge base — but the binary content is absent. This is the same gap as regular session uploads, now extended to Project-persistent knowledge. Teams that use ChatGPT Projects as a team knowledge base and want to export or migrate that knowledge need to maintain their own copies of the source files.

How WhyChose fits in

The conversation content that IS in your conversations.json export — the reasoning you worked through, the trade-offs you named, the decisions you reached — is exactly what the WhyChose extractor targets. Even when the supporting files (the PDF you analyzed, the chart that illustrated the decision) are gone, the decision thread is preserved. The extractor identifies decision-shaped conversations and structures them as a searchable log: "Chose Postgres over MongoDB for the user service," "Priced Pro at $9 after analyzing the $7 vs $12 trade-off." The conversation text where that reasoning happened is fully recoverable from conversations.json, regardless of whether the files attached to that conversation are still available.

Get early access