Topic: ChatGPT image generation export

ChatGPT Image Generation in the Data Export — DALL-E Prompts, CDN URLs, and What Expires

Engineers and designers who use ChatGPT to generate UI mockups, architecture diagrams, brand assets, and design explorations often discover a painful fact when they export their conversation history: the image files are not in the ZIP. The conversations.json export preserves the prompt text and a CDN URL for each generated image — but the CDN URL expires roughly 30 days after the image was created, after which the image is gone from OpenAI's servers permanently. This page explains exactly how DALL-E 3 and GPT-4o native image generation appear in conversations.json, how to extract your image generation history, and what recovery options exist when the CDN URLs have already expired.

TL;DR

DALL-E 3 images: the export preserves your original prompt, the DALL-E revised prompt (which may differ substantially), and a CDN URL that expires approximately 30 days post-generation. The image binary is not in the ZIP. DALL-E invocations appear as tool messages with author.name: "dalle.text2im" in the mapping DAG. GPT-4o native image generation: appears as content_type: "image_asset_pointer" with an internal file reference URI — not a public URL, not downloadable from the export. Recovery after expiry: regenerate from the preserved prompt text (revised_prompt for DALL-E 3). The prompt is always preserved regardless of image URL status. Capture strategy: download images at generation time; don't rely on CDN URLs as an archive.

What the ChatGPT data export does and doesn't include for images

Item In conversations.json? As a downloadable file? Expiry? Notes
Your original image prompt (what you typed) Yes — in the user message content n/a (text) Never — permanent Preserved in the parts array of the user message turn
DALL-E revised prompt (what DALL-E actually used) Yes — in the dalle.text2im tool invocation n/a (text) Never — permanent Often significantly different from your original prompt; contains the full revised instruction
DALL-E image CDN URL Yes — in the dalle.text2im tool result Not in the ZIP; must fetch the URL while it's live ~30 days from generation URL format: https://files.oaiusercontent.com/file-XXXXX?.... Returns HTTP 404 after expiry.
DALL-E image binary No No ~30 days OpenAI does not store image binaries indefinitely; only the CDN URL is preserved in the export
GPT-4o native image (content_type: image_asset_pointer) Yes — as an internal asset_pointer URI No — the URI is an internal reference, not a public URL Unknown — may be shorter than DALL-E CDN Worse export coverage than DALL-E 3; no public URL path exists
Image generation model version Yes — as model_slug (e.g., gpt-4-gizmo-dalle) n/a Never Identifies whether DALL-E 3, DALL-E 2, or GPT-4o native generation was used
Number of images generated per prompt Partially — each image is a separate URL in the tool result No Per URL When you ask for multiple images, each gets its own CDN URL in the tool result content

The critical asymmetry: the text-based components of image generation (your prompt, the revised prompt, the conversation around the image) are preserved permanently in conversations.json. The image itself is ephemeral. If you need the image file, download it at generation time — do not assume the CDN URL will remain accessible.

How DALL-E 3 image generation appears in conversations.json

Understanding the schema is necessary before you can write the jq queries to extract image generation history. DALL-E 3 image generation produces three types of messages in the mapping DAG:

1. The user prompt message

A standard user message with your image description in the parts array:

{
  "id": "user-turn-uuid",
  "message": {
    "id": "msg-uuid",
    "author": { "role": "user" },
    "content": {
      "content_type": "text",
      "parts": ["Generate a diagram showing a three-tier web architecture with load balancer, application servers, and database cluster"]
    },
    "create_time": 1738234567.123
  },
  "parent": "parent-uuid",
  "children": ["tool-invocation-uuid"]
}

2. The DALL-E tool invocation message

This is the message that identifies image generation sessions. The key identifier is author.name: "dalle.text2im":

{
  "id": "tool-invocation-uuid",
  "message": {
    "id": "msg-uuid",
    "author": {
      "role": "assistant",
      "name": "dalle.text2im"
    },
    "content": {
      "content_type": "tether_browsing_display",
      "result": "{\"size\":\"1792x1024\",\"prompt\":\"A professional technical diagram showing a three-tier web architecture. At the top, a load balancer distributes traffic to three application server nodes in the middle tier, shown as blue rectangles with server icons. At the bottom, a primary PostgreSQL database cluster flanked by two read replicas, shown as dark cylinders. Clean white background, minimal style, architecture diagram aesthetic.\",\"dalle_prompt\":\"...\"}",
      "parts": [
        {
          "content_type": "image_asset_pointer",
          "asset_pointer": "file-service://file-XXXXXXXXXXXXX",
          "size_bytes": 1234567,
          "width": 1792,
          "height": 1024
        }
      ]
    },
    "create_time": 1738234570.456
  },
  "parent": "user-turn-uuid",
  "children": ["assistant-followup-uuid"]
}

Note: the exact schema varies by ChatGPT version. Older DALL-E 3 sessions (pre-2025) may have the revised prompt in a text field inside a JSON string in the result field. Newer sessions use the parts array structure shown above. Both include the prompt; the field names differ.

3. The assistant follow-up message

The assistant's response after image generation — typically includes a Markdown image embed with the CDN URL and a text description of what was generated:

{
  "id": "assistant-followup-uuid",
  "message": {
    "id": "msg-uuid",
    "author": { "role": "assistant" },
    "content": {
      "content_type": "text",
      "parts": ["Here's your three-tier architecture diagram:\n\n![Architecture diagram](https://files.oaiusercontent.com/file-XXXXX?se=2025-03-15T12%3A30%3A00Z&sp=r&...)\n\nThe diagram shows the load balancer at the top routing traffic to three application servers..."]
    }
  }
}

The CDN URL appears in the Markdown image syntax ![alt](url) embedded in the assistant's text. This is a second location where you can find the URL — in addition to the tool invocation message.

jq recipes for image generation extraction

Count image generation sessions

jq '[
  .[] | select(
    .mapping | to_entries[].value.message.author.name? == "dalle.text2im"
  )
] | length' conversations.json

Returns the number of conversations that contain at least one DALL-E image generation.

Extract all image prompts (what you asked for)

jq -r '
  [.[] |
    .mapping | to_entries[].value |
    select(.message.author.role? == "user") |
    .message.content.parts[]? |
    select(type == "string")
  ] | .[]
' conversations.json

This extracts the user-turn text from all conversations. To filter for conversations that contain image generation, chain with the DALL-E session filter.

Extract all CDN URLs from the assistant follow-up messages

jq -r '
  [.[] |
    .mapping | to_entries[].value |
    select(.message.author.role? == "assistant" and
           (.message.author.name? != "dalle.text2im")) |
    .message.content.parts[]? |
    select(type == "string") |
    scan("https://files\\.oaiusercontent\\.com/[^)\"\\s]+")
  ] | .[]
' conversations.json

This scans assistant message text for CDN URLs matching the files.oaiusercontent.com pattern. Output is one URL per line, suitable for piping to a bulk-download script.

Full image generation inventory: prompt + URL + date

#!/usr/bin/env bash
# Produces a TSV: conversation_title | create_date | user_prompt | cdn_url

jq -r '
  .[] | . as $conv |
  .mapping | to_entries[] | . as $entry |
  select($entry.value.message.author.name? == "dalle.text2im") |
  {
    title: $conv.title,
    date: ($entry.value.message.create_time | todate),
    parent_id: $entry.value.parent
  } as $meta |
  # Get the user prompt from the parent node
  ($conv.mapping | to_entries[] |
    select(.key == $meta.parent_id) |
    .value.message.content.parts[]? |
    select(type == "string")) as $prompt |
  # Get the CDN URL from the children (assistant follow-up)
  ($conv.mapping | to_entries[] |
    select(.value.parent == $entry.key) |
    .value.message.content.parts[]? |
    select(type == "string") |
    scan("https://files\\.oaiusercontent\\.com/[^)\"\\s]+")
  ) as $url |
  [$meta.title, $meta.date, $prompt, $url] | @tsv
' conversations.json

Save this as extract-images.sh and run bash extract-images.sh > image-inventory.tsv. The result is a tab-separated file you can open in a spreadsheet application for review.

Bulk-check which CDN URLs are still live

#!/usr/bin/env bash
# Read CDN URLs from a file (one per line) and report live vs expired

while IFS= read -r url; do
  status=$(curl -s -o /dev/null -w "%{http_code}" --head --max-time 5 "$url")
  if [ "$status" = "200" ]; then
    echo "LIVE $url"
  else
    echo "EXPIRED($status) $url"
  fi
done < cdn-urls.txt

Pipe the CDN URL extract above into cdn-urls.txt, then run this script to identify which images are still downloadable and which have expired. For images still returning HTTP 200, download with curl -o output-filename.webp "URL" before they expire.

GPT-4o native image generation

GPT-4o's native image generation capability (launched in 2025, distinct from the DALL-E 3 tool) produces images that appear differently in conversations.json than DALL-E 3 invocations. The key difference: GPT-4o native images are referenced via internal asset_pointer URIs rather than public CDN URLs.

How to identify GPT-4o native image generation in your export:

jq '[
  .[] | .mapping | to_entries[].value |
  select(
    .message.content.parts? |
    arrays |
    .[] |
    objects |
    .content_type? == "image_asset_pointer"
  )
] | length' conversations.json

The asset_pointer field contains an internal URI like file-service://file-XXXXXXXXXXXXX. This is not a URL you can fetch with curl. There is no documented public endpoint that resolves these internal file references from outside ChatGPT's authenticated session. The image is referenced in your export data, but it is not accessible from the export alone.

The practical difference from DALL-E 3:

Attribute DALL-E 3 (tool invocation) GPT-4o native image generation
Identifier in export author.name: "dalle.text2im" content_type: "image_asset_pointer" in parts array
Image URL type Public CDN URL (files.oaiusercontent.com) Internal file reference URI (file-service://file-XXX)
Downloadable from export? Yes — while CDN URL is live (~30 days) No — internal URI, no public endpoint
Prompt preserved? Yes — both original and revised prompt Yes — in the conversation turn where image was requested
Recovery path when image lost Regenerate from preserved revised_prompt Regenerate from original user prompt in conversation

The CDN URL expiry timeline

Based on observed behaviour in conversations.json exports, DALL-E 3 CDN URLs follow this pattern:

The expiry timestamp is encoded in the CDN URL itself. Extract it with:

python3 -c "
from urllib.parse import urlparse, parse_qs
import sys
url = sys.stdin.read().strip()
params = parse_qs(urlparse(url).query)
print('Expires:', params.get('se', ['not found'])[0])
" <<< "https://files.oaiusercontent.com/file-XXXXX?se=2025-03-15T12%3A00%3A00Z&sp=r&..."

This prints the expiry date for any CDN URL you extract from conversations.json. Images generated more than 30 days before your export date will have already expired; images generated within 30 days of the export may still be downloadable if you act immediately after downloading the ZIP.

Recovery strategy when images have already expired

The preserved prompt text is your primary recovery asset. DALL-E 3 preserves the revised_prompt — the full, detailed prompt that DALL-E actually used to generate the image — which is typically more detailed and useful for regeneration than the original short prompt you typed.

Regeneration workflow:

  1. Extract the revised_prompt from the DALL-E tool invocation using the jq recipes above.
  2. Submit the revised_prompt verbatim to DALL-E 3 in a new ChatGPT conversation. The result will not be pixel-identical (generative models produce different outputs on each run) but will be stylistically consistent with the original.
  3. If exact consistency is required (e.g., you used a specific random seed), note that DALL-E 3 via ChatGPT does not expose a seed parameter in the public UI. Use the OpenAI Images API directly (POST /v1/images/generations with model: dall-e-3) where seeds may be set in some configurations.

For design work where the image represents a specific decision or specification, the revised_prompt itself may be more valuable than the image — it is an unambiguous, machine-readable specification of what was designed.

Extracting revised prompts for regeneration

#!/usr/bin/env bash
# Extract all DALL-E revised prompts from conversations.json
# Output: one prompt per line, suitable for batch regeneration

python3 <<'EOF'
import json, re, sys

with open("conversations.json") as f:
    data = json.load(f)

prompts = []
for conv in data:
    for node in conv.get("mapping", {}).values():
        msg = node.get("message", {})
        if msg.get("author", {}).get("name") == "dalle.text2im":
            content = msg.get("content", {})
            # Try parts array (newer format)
            for part in content.get("parts", []):
                if isinstance(part, dict) and "text" in part:
                    prompts.append({
                        "conversation": conv.get("title", ""),
                        "date": msg.get("create_time", 0),
                        "revised_prompt": part["text"]
                    })
            # Try result field (older format)
            result = content.get("result", "")
            if result:
                try:
                    result_json = json.loads(result)
                    if "prompt" in result_json:
                        prompts.append({
                            "conversation": conv.get("title", ""),
                            "date": msg.get("create_time", 0),
                            "revised_prompt": result_json["prompt"]
                        })
                except (json.JSONDecodeError, TypeError):
                    pass

for p in sorted(prompts, key=lambda x: x["date"]):
    from datetime import datetime
    date_str = datetime.fromtimestamp(p["date"]).strftime("%Y-%m-%d")
    print(f"[{date_str}] {p['conversation']}")
    print(p["revised_prompt"])
    print("---")
EOF

This script handles both the pre-2025 and post-2025 DALL-E 3 schema variants and outputs a human-readable list of every revised prompt in your export, sorted by date.

Capture strategy: don't rely on CDN URLs

The architectural lesson from the CDN URL expiry is the same as the lesson from ChatGPT shared links: content that lives only on OpenAI's servers — rather than being embedded in the export — expires. The export is a snapshot of text and references; it is not a complete archive of all generated artifacts.

Practical capture strategy for engineers and designers who use ChatGPT image generation for work artifacts:

How image generation sessions compare to other ChatGPT export gaps

Content type In conversations.json? As downloadable binary? Permanently preserved?
Conversation text (all turns) Yes — full text n/a Yes
DALL-E image prompt + revised prompt Yes n/a (text) Yes
DALL-E image binary No — CDN URL only No (must fetch live URL) No (~30 days)
GPT-4o native image Internal URI only No No (no public download path)
Uploaded image files Filename only (asset_pointer) No No (binary excluded from export)
Code Interpreter output files Code text yes; generated file binaries no No No (only code text)
ChatGPT Memory entries In memory.json (separate file in ZIP) n/a (text) Yes (until deleted)

Image generation as a decision-capture surface

For engineers and product designers, ChatGPT image generation sessions often contain specification-quality content: the prompt you wrote to generate an architecture diagram, a UI mockup, or a brand asset is effectively a specification of what you were designing. "Generate a diagram showing event-driven data flow between three microservices using an event bus, with the order-service as producer and inventory-service plus notification-service as consumers" is a more precise description of the intended architecture than most informal ADRs.

The revised_prompt that DALL-E 3 produces from your original prompt is even more detailed — it expands the specification with rendering decisions (style, colour palette, composition) that reflect how the image was actually produced. For design documentation, preserving the revised_prompt alongside the image is equivalent to preserving the compiled output alongside the source code.

When you use the WhyChose extractor on your conversations.json, image generation turns are processed differently from text turns — the extractor identifies tool-invocation messages and flags them as potential specification artifacts rather than decision rationale turns. For sessions where image generation was the output of a design decision (e.g., you discussed options for an architecture diagram and then generated the winning design), the extractor surfaces the decision context from the text turns, and the image prompt documents the specification that resulted from that decision. Together they form a complete decision record: the reasoning that led to the design and the specification that captured the design itself.

Get early access

Related questions

Are DALL-E images included in the ChatGPT data export?

Not as downloadable image files. The ChatGPT export preserves the original prompt, the DALL-E revised prompt, and a CDN URL for each generated image. The CDN URL expires approximately 30 days after generation and returns HTTP 404 after expiry. The image binary itself is not in the ZIP — you must download images during the live CDN window or regenerate them from the preserved prompt text.

How do I find image generation sessions in my conversations.json export?

DALL-E 3 invocations appear as tool messages with author.name: "dalle.text2im" in the mapping DAG. Run: jq '[.[] | select(.mapping | to_entries[].value.message.author.name? == "dalle.text2im")] | length' conversations.json to count conversations with image generation. For GPT-4o native images, look for content_type: "image_asset_pointer" in message parts arrays.

What happens to GPT-4o native image generation in the export?

GPT-4o native image generation (from 2025 onwards) appears as content_type: "image_asset_pointer" with an internal file-service://file-XXX URI. This is not a public URL — it cannot be fetched with curl or a browser outside an authenticated ChatGPT session. There is no supported download path for GPT-4o native images from the data export. Only the prompt text that produced the image is recoverable.

What is the revised_prompt field and why is it different from my original prompt?

DALL-E 3 automatically rewrites your prompt before generating images to improve quality and safety compliance. The revised_prompt in the export is what DALL-E actually used — often significantly more detailed than what you typed. For a 5-word original prompt, the revised prompt may be 100+ words. Both are valuable: the original reflects your intent (the requirement), the revised prompt reflects the specification that was built. The revised prompt is often more useful for regeneration because it contains the rendering instructions that produced the original output style.

Further reading