Topic: ChatGPT image generation export

ChatGPT Image Generation in the Data Export — DALL-E Prompts, CDN URLs, and What Expires

Q: Are DALL-E images included in the ChatGPT data export?

Not as downloadable image files. The ChatGPT data export (Settings → Data Controls → Export Data) includes conversations.json, which contains the text of every conversation including image generation sessions. For each image you generated, the export preserves: the original prompt text you wrote, the revised prompt that DALL-E actually used (often different — DALL-E 3 rewrites prompts for quality), and a CDN URL pointing to where the image was hosted. The image binary itself is not included in the ZIP. The CDN URLs expire approximately 30 days after the image was generated and return HTTP 404 after expiry. After the URL expires, the image is gone from OpenAI's servers — but the prompt text is permanently preserved in conversations.json.

Q: How do I find image generation sessions in my conversations.json export?

Image generation sessions contain tool messages with author.name set to 'dalle.text2im' (DALL-E 3) or appear as multimodal content with content_type 'image_asset_pointer' (GPT-4o native image generation). The quickest jq filter to identify all image-generation conversations: jq '[.[] | select(.mapping | to_entries[].value.message.author.name? == "dalle.text2im")] | length' conversations.json — this counts conversations containing at least one DALL-E invocation. For a full list of image prompts: jq -r '[.[] | .mapping | to_entries[].value | select(.message.author.name? == "dalle.text2im") | .message.content.parts[] | select(type == "object") | .text] | .[]' conversations.json — this extracts the revised_prompt from each DALL-E invocation.

Q: What happens to GPT-4o native image generation in the export?

GPT-4o native image generation (available from early 2025, distinct from DALL-E tool calls) appears differently in conversations.json. These images are stored as content_type 'image_asset_pointer' in the message content array, with an asset_pointer URI like file-service://file-XXXXX rather than a public CDN URL. The asset_pointer URI is an internal OpenAI file reference — it does not resolve to a publicly accessible URL. There is no supported path to download GPT-4o natively generated images from the data export; the image is referenced but not downloadable. The prompt text that generated the image is preserved in the conversation turn where you asked for the image. This is a worse export story than DALL-E 3, where at least the CDN URL is accessible for ~30 days post-generation.

Q: What is the revised_prompt field and why is it different from my original prompt?

DALL-E 3 automatically rewrites user prompts before generating images — a feature OpenAI added to improve image quality and safety. The original prompt is what you typed; the revised_prompt (stored in the DALL-E tool invocation in conversations.json) is what DALL-E actually used. The rewrite can be significant: a prompt like 'a dark forest at night' might become 'A dense, ancient forest shrouded in darkness, the gnarled branches of towering oak trees forming a canopy that blocks out the stars. Bioluminescent mushrooms dot the forest floor, providing the only light. Rendered in a painterly style with deep shadows and rich dark greens.' For decision-capture purposes, both prompts matter: the original prompt reflects what you intended (the requirement), and the revised prompt reflects what was actually built (the specification). Keeping both in your archive is analogous to keeping both the RFC and the ADR.

Engineers and designers who use ChatGPT to generate UI mockups, architecture diagrams, brand assets, and design explorations often discover a painful fact when they export their conversation history: the image files are not in the ZIP. The conversations.json export preserves the prompt text and a CDN URL for each generated image — but the CDN URL expires roughly 30 days after the image was created, after which the image is gone from OpenAI's servers permanently. This page explains exactly how DALL-E 3 and GPT-4o native image generation appear in conversations.json, how to extract your image generation history, and what recovery options exist when the CDN URLs have already expired.

TL;DR

DALL-E 3 images: the export preserves your original prompt, the DALL-E revised prompt (which may differ substantially), and a CDN URL that expires approximately 30 days post-generation. The image binary is not in the ZIP. DALL-E invocations appear as tool messages with author.name: "dalle.text2im" in the mapping DAG. GPT-4o native image generation: appears as content_type: "image_asset_pointer" with an internal file reference URI — not a public URL, not downloadable from the export. Recovery after expiry: regenerate from the preserved prompt text (revised_prompt for DALL-E 3). The prompt is always preserved regardless of image URL status. Capture strategy: download images at generation time; don't rely on CDN URLs as an archive.

What the ChatGPT data export does and doesn't include for images

Item	In conversations.json?	As a downloadable file?	Expiry?	Notes
Your original image prompt (what you typed)	Yes — in the user message content	n/a (text)	Never — permanent	Preserved in the `parts` array of the user message turn
DALL-E revised prompt (what DALL-E actually used)	Yes — in the dalle.text2im tool invocation	n/a (text)	Never — permanent	Often significantly different from your original prompt; contains the full revised instruction
DALL-E image CDN URL	Yes — in the dalle.text2im tool result	Not in the ZIP; must fetch the URL while it's live	~30 days from generation	URL format: `https://files.oaiusercontent.com/file-XXXXX?...`. Returns HTTP 404 after expiry.
DALL-E image binary	No	No	~30 days	OpenAI does not store image binaries indefinitely; only the CDN URL is preserved in the export
GPT-4o native image (content_type: image_asset_pointer)	Yes — as an internal asset_pointer URI	No — the URI is an internal reference, not a public URL	Unknown — may be shorter than DALL-E CDN	Worse export coverage than DALL-E 3; no public URL path exists
Image generation model version	Yes — as model_slug (e.g., gpt-4-gizmo-dalle)	n/a	Never	Identifies whether DALL-E 3, DALL-E 2, or GPT-4o native generation was used
Number of images generated per prompt	Partially — each image is a separate URL in the tool result	No	Per URL	When you ask for multiple images, each gets its own CDN URL in the tool result content

The critical asymmetry: the text-based components of image generation (your prompt, the revised prompt, the conversation around the image) are preserved permanently in conversations.json. The image itself is ephemeral. If you need the image file, download it at generation time — do not assume the CDN URL will remain accessible.

How DALL-E 3 image generation appears in conversations.json

Understanding the schema is necessary before you can write the jq queries to extract image generation history. DALL-E 3 image generation produces three types of messages in the mapping DAG:

1. The user prompt message

A standard user message with your image description in the parts array:

{
  "id": "user-turn-uuid",
  "message": {
    "id": "msg-uuid",
    "author": { "role": "user" },
    "content": {
      "content_type": "text",
      "parts": ["Generate a diagram showing a three-tier web architecture with load balancer, application servers, and database cluster"]
    },
    "create_time": 1738234567.123
  },
  "parent": "parent-uuid",
  "children": ["tool-invocation-uuid"]
}

2. The DALL-E tool invocation message

This is the message that identifies image generation sessions. The key identifier is author.name: "dalle.text2im":

{
  "id": "tool-invocation-uuid",
  "message": {
    "id": "msg-uuid",
    "author": {
      "role": "assistant",
      "name": "dalle.text2im"
    },
    "content": {
      "content_type": "tether_browsing_display",
      "result": "{\"size\":\"1792x1024\",\"prompt\":\"A professional technical diagram showing a three-tier web architecture. At the top, a load balancer distributes traffic to three application server nodes in the middle tier, shown as blue rectangles with server icons. At the bottom, a primary PostgreSQL database cluster flanked by two read replicas, shown as dark cylinders. Clean white background, minimal style, architecture diagram aesthetic.\",\"dalle_prompt\":\"...\"}",
      "parts": [
        {
          "content_type": "image_asset_pointer",
          "asset_pointer": "file-service://file-XXXXXXXXXXXXX",
          "size_bytes": 1234567,
          "width": 1792,
          "height": 1024
        }
      ]
    },
    "create_time": 1738234570.456
  },
  "parent": "user-turn-uuid",
  "children": ["assistant-followup-uuid"]
}

Note: the exact schema varies by ChatGPT version. Older DALL-E 3 sessions (pre-2025) may have the revised prompt in a text field inside a JSON string in the result field. Newer sessions use the parts array structure shown above. Both include the prompt; the field names differ.

3. The assistant follow-up message

The assistant's response after image generation — typically includes a Markdown image embed with the CDN URL and a text description of what was generated:

{
  "id": "assistant-followup-uuid",
  "message": {
    "id": "msg-uuid",
    "author": { "role": "assistant" },
    "content": {
      "content_type": "text",
      "parts": ["Here's your three-tier architecture diagram:\n\n![Architecture diagram](https://files.oaiusercontent.com/file-XXXXX?se=2025-03-15T12%3A30%3A00Z&sp=r&...)\n\nThe diagram shows the load balancer at the top routing traffic to three application servers..."]
    }
  }
}

The CDN URL appears in the Markdown image syntax ![alt](url) embedded in the assistant's text. This is a second location where you can find the URL — in addition to the tool invocation message.

jq recipes for image generation extraction

Count image generation sessions

jq '[
  .[] | select(
    .mapping | to_entries[].value.message.author.name? == "dalle.text2im"
  )
] | length' conversations.json

Returns the number of conversations that contain at least one DALL-E image generation.

Extract all image prompts (what you asked for)

jq -r '
  [.[] |
    .mapping | to_entries[].value |
    select(.message.author.role? == "user") |
    .message.content.parts[]? |
    select(type == "string")
  ] | .[]
' conversations.json

This extracts the user-turn text from all conversations. To filter for conversations that contain image generation, chain with the DALL-E session filter.

Extract all CDN URLs from the assistant follow-up messages

jq -r '
  [.[] |
    .mapping | to_entries[].value |
    select(.message.author.role? == "assistant" and
           (.message.author.name? != "dalle.text2im")) |
    .message.content.parts[]? |
    select(type == "string") |
    scan("https://files\\.oaiusercontent\\.com/[^)\"\\s]+")
  ] | .[]
' conversations.json

This scans assistant message text for CDN URLs matching the files.oaiusercontent.com pattern. Output is one URL per line, suitable for piping to a bulk-download script.

Full image generation inventory: prompt + URL + date

#!/usr/bin/env bash
# Produces a TSV: conversation_title | create_date | user_prompt | cdn_url

jq -r '
  .[] | . as $conv |
  .mapping | to_entries[] | . as $entry |
  select($entry.value.message.author.name? == "dalle.text2im") |
  {
    title: $conv.title,
    date: ($entry.value.message.create_time | todate),
    parent_id: $entry.value.parent
  } as $meta |
  # Get the user prompt from the parent node
  ($conv.mapping | to_entries[] |
    select(.key == $meta.parent_id) |
    .value.message.content.parts[]? |
    select(type == "string")) as $prompt |
  # Get the CDN URL from the children (assistant follow-up)
  ($conv.mapping | to_entries[] |
    select(.value.parent == $entry.key) |
    .value.message.content.parts[]? |
    select(type == "string") |
    scan("https://files\\.oaiusercontent\\.com/[^)\"\\s]+")
  ) as $url |
  [$meta.title, $meta.date, $prompt, $url] | @tsv
' conversations.json

Save this as extract-images.sh and run bash extract-images.sh > image-inventory.tsv. The result is a tab-separated file you can open in a spreadsheet application for review.

Bulk-check which CDN URLs are still live

#!/usr/bin/env bash
# Read CDN URLs from a file (one per line) and report live vs expired

while IFS= read -r url; do
  status=$(curl -s -o /dev/null -w "%{http_code}" --head --max-time 5 "$url")
  if [ "$status" = "200" ]; then
    echo "LIVE $url"
  else
    echo "EXPIRED($status) $url"
  fi
done < cdn-urls.txt

Pipe the CDN URL extract above into cdn-urls.txt, then run this script to identify which images are still downloadable and which have expired. For images still returning HTTP 200, download with curl -o output-filename.webp "URL" before they expire.

GPT-4o native image generation

GPT-4o's native image generation capability (launched in 2025, distinct from the DALL-E 3 tool) produces images that appear differently in conversations.json than DALL-E 3 invocations. The key difference: GPT-4o native images are referenced via internal asset_pointer URIs rather than public CDN URLs.

How to identify GPT-4o native image generation in your export:

jq '[
  .[] | .mapping | to_entries[].value |
  select(
    .message.content.parts? |
    arrays |
    .[] |
    objects |
    .content_type? == "image_asset_pointer"
  )
] | length' conversations.json

The asset_pointer field contains an internal URI like file-service://file-XXXXXXXXXXXXX. This is not a URL you can fetch with curl. There is no documented public endpoint that resolves these internal file references from outside ChatGPT's authenticated session. The image is referenced in your export data, but it is not accessible from the export alone.

The practical difference from DALL-E 3:

Attribute	DALL-E 3 (tool invocation)	GPT-4o native image generation
Identifier in export	`author.name: "dalle.text2im"`	`content_type: "image_asset_pointer"` in parts array
Image URL type	Public CDN URL (`files.oaiusercontent.com`)	Internal file reference URI (`file-service://file-XXX`)
Downloadable from export?	Yes — while CDN URL is live (~30 days)	No — internal URI, no public endpoint
Prompt preserved?	Yes — both original and revised prompt	Yes — in the conversation turn where image was requested
Recovery path when image lost	Regenerate from preserved revised_prompt	Regenerate from original user prompt in conversation

The CDN URL expiry timeline

Based on observed behaviour in conversations.json exports, DALL-E 3 CDN URLs follow this pattern:

Day 0–7 (fresh): URL returns HTTP 200. Image is downloadable. The URL contains a signed expiry parameter (se=YYYY-MM-DDTHH%3A00%3A00Z in the query string) that shows the exact expiry timestamp.
Day 8–30 (aging): URL may still return HTTP 200 but expiry is approaching. Check the se= parameter to confirm remaining lifetime.
Day 31+ (expired): URL returns HTTP 404. OpenAI has deleted the stored image. The URL in your conversations.json is now a dead reference. The prompt text remains accessible.

The expiry timestamp is encoded in the CDN URL itself. Extract it with:

python3 -c "
from urllib.parse import urlparse, parse_qs
import sys
url = sys.stdin.read().strip()
params = parse_qs(urlparse(url).query)
print('Expires:', params.get('se', ['not found'])[0])
" <<< "https://files.oaiusercontent.com/file-XXXXX?se=2025-03-15T12%3A00%3A00Z&sp=r&..."

This prints the expiry date for any CDN URL you extract from conversations.json. Images generated more than 30 days before your export date will have already expired; images generated within 30 days of the export may still be downloadable if you act immediately after downloading the ZIP.

Recovery strategy when images have already expired

The preserved prompt text is your primary recovery asset. DALL-E 3 preserves the revised_prompt — the full, detailed prompt that DALL-E actually used to generate the image — which is typically more detailed and useful for regeneration than the original short prompt you typed.

Regeneration workflow:

Extract the revised_prompt from the DALL-E tool invocation using the jq recipes above.
Submit the revised_prompt verbatim to DALL-E 3 in a new ChatGPT conversation. The result will not be pixel-identical (generative models produce different outputs on each run) but will be stylistically consistent with the original.
If exact consistency is required (e.g., you used a specific random seed), note that DALL-E 3 via ChatGPT does not expose a seed parameter in the public UI. Use the OpenAI Images API directly (POST /v1/images/generations with model: dall-e-3) where seeds may be set in some configurations.

For design work where the image represents a specific decision or specification, the revised_prompt itself may be more valuable than the image — it is an unambiguous, machine-readable specification of what was designed.

Extracting revised prompts for regeneration

#!/usr/bin/env bash
# Extract all DALL-E revised prompts from conversations.json
# Output: one prompt per line, suitable for batch regeneration

python3 <<'EOF'
import json, re, sys

with open("conversations.json") as f:
    data = json.load(f)

prompts = []
for conv in data:
    for node in conv.get("mapping", {}).values():
        msg = node.get("message", {})
        if msg.get("author", {}).get("name") == "dalle.text2im":
            content = msg.get("content", {})
            # Try parts array (newer format)
            for part in content.get("parts", []):
                if isinstance(part, dict) and "text" in part:
                    prompts.append({
                        "conversation": conv.get("title", ""),
                        "date": msg.get("create_time", 0),
                        "revised_prompt": part["text"]
                    })
            # Try result field (older format)
            result = content.get("result", "")
            if result:
                try:
                    result_json = json.loads(result)
                    if "prompt" in result_json:
                        prompts.append({
                            "conversation": conv.get("title", ""),
                            "date": msg.get("create_time", 0),
                            "revised_prompt": result_json["prompt"]
                        })
                except (json.JSONDecodeError, TypeError):
                    pass

for p in sorted(prompts, key=lambda x: x["date"]):
    from datetime import datetime
    date_str = datetime.fromtimestamp(p["date"]).strftime("%Y-%m-%d")
    print(f"[{date_str}] {p['conversation']}")
    print(p["revised_prompt"])
    print("---")
EOF

This script handles both the pre-2025 and post-2025 DALL-E 3 schema variants and outputs a human-readable list of every revised prompt in your export, sorted by date.

Capture strategy: don't rely on CDN URLs

The architectural lesson from the CDN URL expiry is the same as the lesson from ChatGPT shared links: content that lives only on OpenAI's servers — rather than being embedded in the export — expires. The export is a snapshot of text and references; it is not a complete archive of all generated artifacts.

Practical capture strategy for engineers and designers who use ChatGPT image generation for work artifacts:

At generation time: right-click each image in the ChatGPT UI and save it locally. The UI-displayed image is at full resolution and does not expire during the browser session. This 10-second action at generation time prevents hours of regeneration work later.
For batch recovery (recent export): run the CDN URL extractor immediately after downloading your conversations.json ZIP, check which URLs are still live using the bulk-check script, and download live images before they expire. Images from the last 30 days are recoverable; older images are not.
For prompt archival: run the revised_prompt extractor and save the output to a plaintext file (image-prompts.md). The prompts are permanently preserved and give you regeneration capability on demand, even after the CDN URLs expire. For design-decision documentation, the revised prompt is the specification — paste it into the decision record alongside the ADR or design doc it informs.

How image generation sessions compare to other ChatGPT export gaps

Content type	In conversations.json?	As downloadable binary?	Permanently preserved?
Conversation text (all turns)	Yes — full text	n/a	Yes
DALL-E image prompt + revised prompt	Yes	n/a (text)	Yes
DALL-E image binary	No — CDN URL only	No (must fetch live URL)	No (~30 days)
GPT-4o native image	Internal URI only	No	No (no public download path)
Uploaded image files	Filename only (asset_pointer)	No	No (binary excluded from export)
Code Interpreter output files	Code text yes; generated file binaries no	No	No (only code text)
ChatGPT Memory entries	In memory.json (separate file in ZIP)	n/a (text)	Yes (until deleted)

Image generation as a decision-capture surface

For engineers and product designers, ChatGPT image generation sessions often contain specification-quality content: the prompt you wrote to generate an architecture diagram, a UI mockup, or a brand asset is effectively a specification of what you were designing. "Generate a diagram showing event-driven data flow between three microservices using an event bus, with the order-service as producer and inventory-service plus notification-service as consumers" is a more precise description of the intended architecture than most informal ADRs.

The revised_prompt that DALL-E 3 produces from your original prompt is even more detailed — it expands the specification with rendering decisions (style, colour palette, composition) that reflect how the image was actually produced. For design documentation, preserving the revised_prompt alongside the image is equivalent to preserving the compiled output alongside the source code.

When you use the WhyChose extractor on your conversations.json, image generation turns are processed differently from text turns — the extractor identifies tool-invocation messages and flags them as potential specification artifacts rather than decision rationale turns. For sessions where image generation was the output of a design decision (e.g., you discussed options for an architecture diagram and then generated the winning design), the extractor surfaces the decision context from the text turns, and the image prompt documents the specification that resulted from that decision. Together they form a complete decision record: the reasoning that led to the design and the specification that captured the design itself.

Get early access