Topic: chatgpt file uploads export

Uploaded Files in ChatGPT Exports — What's Included, What's Missing, and How to Recover Them

You downloaded your ChatGPT data export, unzipped it, opened conversations.json, and your PDF analysis isn't there. Your Code Interpreter charts are missing. Your DALL-E images are 404s. None of this is a bug — it's how ChatGPT's data model works. Files you upload to ChatGPT are treated as temporary session context, not permanent user data. The conversation text that references and discusses those files IS in the export; the files themselves are not. The same applies to artifacts that ChatGPT generated for you: Code Interpreter outputs, processed datasets, and DALL-E images are not in conversations.json as binary content. This page documents exactly what appears in conversations.json for each file type, why OpenAI made this design choice, what expires and when, and how to recover each category of missing content.

TL;DR

The ChatGPT data export contains the conversation text that references files, not the files themselves. Uploaded PDFs: file name only. Code Interpreter outputs: text description and Python code, no binary. DALL-E images: expiring CDN URLs that 404 after ~30 days. Recovery: uploaded files come from your local original; Code Interpreter outputs must be downloaded during the session; DALL-E images must be saved immediately. The decision reasoning you worked through in the conversation IS preserved in conversations.json — that's the part WhyChose extracts.

The file inclusion table: what's in the export and what isn't

Content type In conversations.json? What's actually there How to recover the binary
PDF you uploaded Partial — name only File name in message content; conversation text discussing the PDF's contents Use your local original — it's still on your machine or in the cloud storage you uploaded from
Image you uploaded (JPG, PNG, etc.) Partial — name only File name reference; ChatGPT's vision analysis in the assistant message Your original local file
Spreadsheet / CSV you uploaded Partial — name and sometimes preview text File name; some row/column previews ChatGPT included in its response Your original local file
Code files you uploaded (.py, .js, etc.) Yes — text content often included Code file contents frequently appear in the conversation as quoted text blocks Your original; or extract from conversations.json if ChatGPT quoted it
Code Interpreter outputs (.csv, .xlsx, .png charts) No Python code that generated the output (yes); text description of the output (yes); the binary file (no) Download during the session via the Download button; re-run the code locally from conversations.json
DALL-E generated images Expiring URL CDN URL (valid ~30 days after generation); no binary content Save immediately in the session; no recovery path after URL expiry
Voice mode transcripts Yes — text transcription The transcribed text appears in the conversation as a user or assistant message N/A — text transcript is preserved; original audio is not stored by OpenAI
Web search results (SearchGPT / Browse) Partial The answer text citing web sources; source URLs cited in the assistant message; not the full fetched page content Re-visit source URLs from the conversation text

Why your uploaded files aren't in the export

ChatGPT's data model distinguishes between data about you and data you contributed temporarily. Your conversations are data about you — they record your questions, your reasoning, your decisions. Under GDPR Article 20 (data portability), OpenAI is obligated to return them. Files you uploaded are different: they're your own existing documents, which you already have locally. OpenAI's position is that returning your own PDF to you via a data export is not data portability — you never lost access to the file; you uploaded a copy temporarily for ChatGPT to read.

There is also a practical storage reason. ChatGPT handles millions of file uploads per day. Storing every uploaded PDF permanently — and including it in every user's data export — would be prohibitively expensive and is not required by any regulatory framework. OpenAI stores uploaded files in temporary session storage with a retention window (the exact duration is not published in OpenAI's documentation, but anecdotal reports suggest 30–60 days after the last conversation where the file was referenced). After that window, the file is deleted from OpenAI's servers. Your conversations.json retains the file name and the conversation that discussed the file, but the file itself is gone from OpenAI's side.

The practical implication: ChatGPT is not a file storage service. Files you upload are session context for the duration of your analysis; they are not backed up by OpenAI on your behalf. If you need to preserve the output of a ChatGPT analysis session, download Code Interpreter outputs immediately, save DALL-E images before the session ends, and keep your original uploaded files in your own storage.

What conversations.json actually contains for file uploads

When you upload a file and send a message, the resulting JSON in conversations.json looks something like this:

{
  "id": "msg_abc123",
  "author": { "role": "user" },
  "content": {
    "content_type": "multimodal_text",
    "parts": [
      {
        "content_type": "text",
        "text": "Summarize the key findings in this report."
      },
      {
        "content_type": "image_asset_pointer",
        "asset_pointer": "file-service://file-XXXXXXXXXXXXXXXX",
        "size_bytes": 1234567,
        "width": null,
        "height": null,
        "fovea": null,
        "metadata": {
          "dalle": null,
          "gizmo": null,
          "sanitized": false,
          "is_file_upload": true,
          "file_name": "q3-strategy-report.pdf",
          "file_type": "pdf"
        }
      }
    ]
  }
}

Key fields to know:

The binary content of the file is nowhere in conversations.json. If you grep for the file content (say, a specific string from the PDF), you may find ChatGPT's summary or quoted text from the document in the assistant's response messages — but only what ChatGPT chose to quote, not the full file.

The DALL-E case: expiring URLs you may not have saved

DALL-E generated images are the most commonly lost content category because users often don't realize the URLs expire. When ChatGPT generates an image with DALL-E, conversations.json contains something like:

{
  "content_type": "image_asset_pointer",
  "asset_pointer": "file-service://file-YYYYYYYYYYYYYYYY",
  "metadata": {
    "dalle": {
      "gen_id": "gen-ZZZZZZZZZZ",
      "prompt": "a minimalist flat-design illustration of...",
      "seed": 1234567890,
      "parent_gen_id": null,
      "edit_op": null,
      "serialization_title": "DALL-E generation"
    }
  }
}

And in some versions of the export, you'll also find a parts array where the image is represented as a URL:

{
  "content_type": "tether_browsing_display",
  "result": "https://oaidalleapiprodscus.blob.core.windows.net/private/org-XXX/user-YYY/img-ZZZ.png?..."
}

That URL has a signed expiry embedded in the query string. After the expiry (typically 30 days from generation), the URL returns a 403 Forbidden or 404 Not Found. OpenAI does not offer a way to regenerate expired DALL-E image URLs through the user interface or the data export.

What you do have after expiry: The DALL-E prompt (the exact text description you or ChatGPT used to generate the image). If you need to recreate the image, you can use the same prompt — the output won't be pixel-identical (DALL-E generation includes stochastic seed values) but a similar prompt will produce a similar result. The gen_id and seed values in conversations.json are not accepted as inputs to regenerate the exact image via the API.

Prevention: Right-click, Save Image As on every DALL-E image you intend to keep. Do this during the session or within a few days of generation — don't rely on the conversations.json URL as a permanent reference.

Code Interpreter: what survives and what doesn't

Code Interpreter (now called Advanced Data Analysis in some ChatGPT interfaces) sessions produce several types of content with different export behavior:

What IS in conversations.json for Code Interpreter sessions

What is NOT in conversations.json for Code Interpreter sessions

Reproducing Code Interpreter outputs from conversations.json

Because the Python code is preserved in conversations.json, many Code Interpreter analyses are reproducible locally:

# Extract all Python code blocks from a conversations.json export
import json, re

with open('conversations.json') as f:
    data = json.load(f)

for conv in data:
    for node in conv.get('mapping', {}).values():
        msg = node.get('message', {})
        if msg.get('author', {}).get('role') == 'assistant':
            parts = msg.get('content', {}).get('parts', [])
            for part in parts:
                if isinstance(part, str):
                    # Extract Python fenced code blocks
                    blocks = re.findall(r'```python\n(.*?)```', part, re.DOTALL)
                    for block in blocks:
                        print(block)
                        print('---')

Run this against your conversations.json to extract all Python code from Code Interpreter sessions. You can then run the code locally (pip install the required libraries if needed) using your original input data files to regenerate the outputs.

The contrarian insight: conversation text is the most portable asset

Most users assume that the files they brought to ChatGPT are the fragile part — they uploaded them, so ChatGPT should return them. In practice, the opposite is true. Files you uploaded still exist on your local machine. The fragile content is what ChatGPT generated: the charts, the DALL-E images, the processed datasets — these are the artifacts at risk of permanent loss. Your conversation text (the reasoning, the analysis, the decisions you worked through) is the most portable part of the export. It's fully in conversations.json, fully readable, and not subject to expiry.

This has a practical implication for how to treat ChatGPT sessions. The conversation thread where you worked through a decision — weighing trade-offs, considering alternatives, reaching a conclusion — is permanently preserved in your data export. The DALL-E visualization you generated to illustrate the decision is not. Treat text as permanent; treat generated binaries as ephemeral.

Project Knowledge files: same pattern, additional gap

If you use ChatGPT Projects, the Project Knowledge files — PDFs, documents, and reference materials you uploaded to a Project's knowledge base — follow the same inclusion rules as session uploads. The Project's metadata (custom instructions, bound Custom GPT ID, file manifest) is in a sibling projects.json file in the export ZIP. But the actual content of the knowledge files is not included in the export.

The projects.json entry for a Project includes a files array with file names and sizes — confirming which files were in the knowledge base — but the binary content is absent. This is the same gap as regular session uploads, now extended to Project-persistent knowledge. Teams that use ChatGPT Projects as a team knowledge base and want to export or migrate that knowledge need to maintain their own copies of the source files.

How WhyChose fits in

The conversation content that IS in your conversations.json export — the reasoning you worked through, the trade-offs you named, the decisions you reached — is exactly what the WhyChose extractor targets. Even when the supporting files (the PDF you analyzed, the chart that illustrated the decision) are gone, the decision thread is preserved. The extractor identifies decision-shaped conversations and structures them as a searchable log: "Chose Postgres over MongoDB for the user service," "Priced Pro at $9 after analyzing the $7 vs $12 trade-off." The conversation text where that reasoning happened is fully recoverable from conversations.json, regardless of whether the files attached to that conversation are still available.

Get early access

Related questions

Does the ChatGPT data export include files I uploaded?

No. The ChatGPT data export (conversations.json) includes the file name and the conversation text that references the file, but not the binary content of the file itself. This applies to PDFs, images, spreadsheets, and any other files you uploaded. The file is treated as temporary session context, not permanent user data. To recover uploaded files, use your original local copy — you have it; ChatGPT doesn't keep it permanently.

Where are my Code Interpreter outputs in the ChatGPT data export?

Code Interpreter outputs (charts, processed CSVs, generated files) are not included as binaries in conversations.json. The Python code that generated the outputs IS in the export (in the assistant's code cell messages), as are text descriptions of the outputs. The actual .png, .csv, or .xlsx files must be downloaded during the session — after the session, they expire on roughly the same ~30-day schedule as other ChatGPT-generated content. To regenerate them, extract the Python code from conversations.json (using the script on this page) and re-run it locally with your original input data.

Why are my DALL-E image URLs broken in the ChatGPT export?

DALL-E images appear in conversations.json as signed CDN URLs that expire approximately 30 days after generation. After expiry, the URL returns 403 or 404. There's no recovery path once the URL has expired. To prevent this: download DALL-E images immediately after generation (right-click, Save Image As) during the session. The DALL-E prompt that generated the image is preserved in conversations.json, so you can regenerate a similar image — but not the exact same pixel output.

How do I recover files I uploaded to ChatGPT?

Files you uploaded to ChatGPT (PDFs, images, spreadsheets) are your own documents. They still exist wherever you originally had them — your local machine, Google Drive, Dropbox, or wherever you stored them. ChatGPT does not keep permanent copies of uploaded files and does not include their binary content in data exports. The conversation text that discussed the file content is in conversations.json. If the file came from a one-time download and you deleted it locally, check your browser's Downloads folder, your Trash/Recycle Bin, and any backups you maintain.

Further reading