Topic: chatgpt search export

ChatGPT Web Search in conversations.json — What's in the Export, What's Missing, and How to Extract Citations (2026)

Q: What does tether_browsing_display contain?

A tether_browsing_display node contains: the search query string that ChatGPT submitted to the search engine, an array of result objects each with title (page title), url (full URL), and a short snippet (the search-result preview text). The number of results varies — typically 3–10 per search query. Each result object corresponds to one web page ChatGPT retrieved. Note that the snippet is the search-engine preview text, not the full page content. If ChatGPT visited a page to extract a longer passage, that longer passage appears in a separate tether_quote node pointing to the same URL.

Q: How do I extract all URLs ChatGPT cited in a conversation?

Use jq to target the tether_quote nodes — these are the most reliable source of cited URLs because each node has a single url field. Run: jq '[.[] | .mapping | to_entries[].value | select(.message != null) | select(.message.content.content_type == "tether_quote") | .message.content.url] | unique[]' conversations.json. For the search query strings themselves, target tether_browsing_display nodes: jq '[.[] | .mapping | to_entries[].value | select(.message != null) | select(.message.content.content_type == "tether_browsing_display") | .message.content.query] | unique[]' conversations.json. Replace conversations.json with the path to your export file.

Q: What's the difference between Browse with Bing and ChatGPT Search in the export?

Browse with Bing (the older feature, available 2023-2024) and ChatGPT Search (the integrated search product launched late 2024) both produce tether content types in conversations.json, but with minor schema differences. Browse with Bing exports use content_type: 'tether_browsing_display' with a results array. ChatGPT Search exports may use 'search_result_group' as an outer wrapper or slightly different field names (search_query vs query) depending on when the export was taken. The tether_quote structure is consistent across both. If your jq recipe targeting tether_browsing_display returns no results for a clearly search-heavy conversation, try content_type == 'search_result_group' as the fallback selector.

When ChatGPT searches the web to back up a recommendation — choosing a library, checking a license, comparing two API approaches — that research session leaves a structured trace in your conversations.json export. But it's invisible to the extract-the-assistant-messages approach that most tutorials describe. The content types are tether_browsing_display and tether_quote, and they live in the mapping DAG alongside regular messages. This page is the reference for what ChatGPT's web search stores in the export, the schema of each content type, what's deliberately excluded (full page HTML, ranking signals, browsing session cookies), and the jq recipes to pull every URL ChatGPT cited for a decision.

TL;DR

ChatGPT web search IS in conversations.json, in author.role: "tool" nodes with content_type: "tether_browsing_display" (query + results list with URLs and titles) and tether_quote (specific quoted passage with URL). The assistant's final answer uses content_type: "text" with inline citation markers — the URL-to-sentence mapping is not in the export. Full HTML of browsed pages is not stored. Use jq targeting tether_quote nodes to get a deduplicated list of every URL ChatGPT cited in a conversation.

How ChatGPT web search appears in conversations.json

The conversations.json mapping DAG (covered in detail at the full mapping DAG schema reference) represents every message as a node with an id, a parent pointer, and a message object. When ChatGPT performs a web search, the DAG gains tool-role messages interleaved between the user query and the assistant response. These tool messages are not filtered out of the export — they are full nodes in the mapping DAG — but they use content_type values that the common "walk from current_node to root and collect assistant turns" script ignores entirely.

A tool-role node in the mapping DAG looks like this:

{
  "id": "msg-abc123",
  "parent": "msg-user-query-id",
  "children": ["msg-def456"],
  "message": {
    "id": "msg-abc123",
    "author": {
      "role": "tool",
      "name": "browser",
      "metadata": {}
    },
    "create_time": 1717190000.0,
    "content": {
      "content_type": "tether_browsing_display",
      "result": "Search results for: valkey vs redis license",
      "query": "valkey vs redis license",
      "results": [
        {
          "title": "Valkey 1.0 release — BSD license confirmed",
          "url": "https://valkey.io/blog/valkey-1-0/",
          "snippet": "Valkey is released under the BSD 3-Clause license..."
        }
      ]
    },
    "status": "finished_successfully"
  }
}

The current_node field at the top level of each conversation object points to the final assistant text response — the last thing the user sees. The typical DAG traversal script walks backward from current_node through parent pointers and collects nodes where author.role == "assistant". The tether nodes sit on the same path (they appear as parents of the assistant response and children of the user query), but because their author.role is "tool" rather than "assistant", they are skipped. To collect tether nodes, you must walk ALL nodes in the mapping using to_entries[].value, not the current-node lineage walk.

The tether content types — what each one means

When ChatGPT issues a web search, a single user turn can produce multiple tool-role nodes in the DAG — one per search query, one per quoted passage. Understanding which content type carries which data is essential before writing any jq recipe.

content_type	role	What it contains	Notes
`tether_browsing_display`	tool	Search query string + results array (title, url, snippet per result)	Appears once per search query issued; multiple queries in one turn produce multiple nodes
`tether_quote`	tool	url + quoted text passage extracted from a specific page	Appears for each page ChatGPT extracted a passage from; one node per quoted passage; multiple passages from the same URL produce multiple nodes
`tether_browsing_code`	tool	Internal browser-control code	Implementation detail; not useful for citation extraction; not consistently present across all export versions
`text` (with citations)	assistant	The assistant's final response with inline `【N†source】` citation markers	The N indices reference positions in the tether_browsing_display results array for that turn; the mapping from marker to URL is reconstructable by index but not stored as an explicit field

A tether_browsing_display node with the full results array looks like this:

{
  "content_type": "tether_browsing_display",
  "result": "Search results for: best in-memory cache library Go 2024",
  "query": "best in-memory cache library Go 2024",
  "results": [
    {
      "title": "ristretto vs groupcache — benchmark comparison",
      "url": "https://example-bench.dev/go-cache-2024",
      "snippet": "Ristretto outperforms groupcache by 3x on high-contention reads..."
    },
    {
      "title": "dgraph-io/ristretto — GitHub",
      "url": "https://github.com/dgraph-io/ristretto",
      "snippet": "A high performance memory-bound Go cache. Ristretto is a fast..."
    }
  ]
}

A tether_quote node — the most useful for citation extraction — looks like this:

{
  "content_type": "tether_quote",
  "url": "https://github.com/dgraph-io/ristretto",
  "domain": "github.com",
  "title": "dgraph-io/ristretto — GitHub",
  "text": "Ristretto is a fast, fixed size, in-memory cache with a focus on performance and correctness. The motivation to build Ristretto comes from the need for a contention-free cache in Dgraph."
}

Note the text field in tether_quote: it is meaningfully longer than the snippet in a tether_browsing_display result. The snippet is what the search engine returned as a preview. The tether_quote text is what ChatGPT extracted after visiting the page — typically a full paragraph and the passage that directly supported the assistant's claim. For decision-extraction purposes, the tether_quote nodes are the higher-signal artifact.

What ChatGPT stores vs what it doesn't

Element	Stored in export?	Notes
Search query string	Yes	In `tether_browsing_display.content.query` (or `search_query` in newer ChatGPT Search exports)
Cited URLs with titles	Yes	In `tether_browsing_display.content.results[].url` and `.title` for each result in the list
Text snippet per result	Yes	In `tether_browsing_display.content.results[].snippet` — this is search-engine preview length, roughly 1–3 sentences; not the full page content
Full HTML of browsed pages	No	ChatGPT processes pages server-side and stores only the extracted content; raw HTML, DOM structure, and inline JavaScript are never written to the export
Quoted passage from a specific page	Yes	In `tether_quote.content.text` — typically a paragraph; longer and more specific than the results snippet
Which assistant sentence corresponds to which citation	No	The inline citation markers `【N†source】` reference results by index, but the index-to-sentence mapping is not stored as an explicit field; reconstruction requires correlating the N index against the results array
Number of pages ChatGPT visited	Inferrable	Count the distinct URLs across all `tether_browsing_display` result entries and `tether_quote` nodes for the conversation
Order of pages visited	Yes	DAG node order and `create_time` timestamps preserve the search session sequence
Search engine used	Not stored explicitly	Always the search engine OpenAI contracts with at the time the conversation was created; the engine name does not appear in the export
Ranking signals or click-through data	No	Only the results that appeared in the results list are stored; position-beyond-list and search ranking internals are not present

jq recipes to extract cited URLs

All three recipes below operate on the full conversations.json file. The key principle: use .mapping | to_entries[].value to walk ALL nodes in the DAG, not the current-node lineage walk. The current-node lineage walk skips tool-role nodes — exactly the nodes that contain the web search data.

Recipe 1: All URLs cited in a single conversation (deduplicated)

This targets tether_quote nodes, which each have a single url field. Replace "Your Conversation Title Here" with the exact conversation title from your export:

jq -r '
  .[]
  | select(.title == "Your Conversation Title Here")
  | .mapping
  | to_entries[].value
  | select(.message != null)
  | select(.message.content.content_type == "tether_quote")
  | .message.content.url
' conversations.json | sort -u

The -r flag outputs raw strings (no JSON quotes). The sort -u deduplicates — the same URL may appear in multiple tether_quote nodes if ChatGPT quoted it multiple times across the conversation.

Recipe 2: All search queries across every conversation in the full export

This extracts every search query string ChatGPT ever submitted across your entire export history. Useful for auditing which research sessions are captured:

jq -r '
  [
    .[]
    | .mapping
    | to_entries[].value
    | select(.message != null)
    | select(.message.content.content_type == "tether_browsing_display")
    | .message.content.query
  ]
  | unique[]
' conversations.json

If tether_browsing_display returns no results for conversations where the assistant clearly cited web sources, also try content_type == "search_result_group" as the selector — see the Browse with Bing vs ChatGPT Search section below.

Recipe 3: Query + cited URLs grouped by search session

This produces combined output pairing each search query with the URLs that appeared in that query's results list — useful for reconstructing the research context behind a decision:

jq -r '
  .[]
  | .title as $title
  | .mapping
  | to_entries[].value
  | select(.message != null)
  | select(.message.content.content_type == "tether_browsing_display")
  | {
      conversation: $title,
      query: .message.content.query,
      urls: [.message.content.results[].url]
    }
' conversations.json

This outputs one JSON object per search-query node, with the parent conversation title, the query string, and the array of result URLs. Pipe through jq -s '.' to collect all objects into a single array. For large exports, add a select(.title | test("keyword"; "i")) filter after the first .[] to limit to relevant conversations.

Why to_entries[].value and not the current-node walk: The standard DAG traversal walks from current_node backward through parent pointers, stopping at root. Tool-role nodes are on this path — they appear as intermediate nodes between the user query and the assistant response — but the traversal filters them out by role. Using to_entries[].value iterates over every key-value pair in the mapping object, visiting all nodes regardless of role or position in the lineage. This is the only reliable way to collect tether nodes without also rewriting the DAG traversal to preserve tool turns.

Browse with Bing vs ChatGPT Search — schema differences

The web search feature has gone through two distinct product phases, and the export schema differs slightly between them.

Browse with Bing (2023–2024)

The original Browse with Bing feature, available to ChatGPT Plus users from mid-2023 onward, used content_type: "tether_browsing_display" as the primary search result node, with a results array containing per-result objects with title, url, and snippet. The query field is named query. This schema is the most widely documented in ChatGPT export guides and is what the jq recipes in this page target by default.

ChatGPT Search (late 2024 onward)

The ChatGPT Search product (integrated directly into the main ChatGPT UI, not a plugin-style toggle) may use content_type: "search_result_group" as an outer wrapper node that contains multiple tether_browsing_display children, or may use slightly different field names — specifically search_query instead of query for the query string field, depending on the exact export version. The tether_quote structure is consistent across both product versions and is therefore the more stable target for citation URL extraction.

Diagnosing which schema your export uses

If your jq recipe targeting tether_browsing_display returns no results for a conversation where the assistant clearly cited web sources — you can see inline citation markers in the response text — run this enumeration query to see all content_type values present in your export:

jq -r '
  .[]
  | .mapping
  | to_entries[].value
  | select(.message != null)
  | .message.content.content_type
' conversations.json | sort -u

This lists every distinct content_type value in your entire export, deduplicated. The output tells you the exact type names for the search tool calls in your specific export file. If you see search_result_group instead of tether_browsing_display, substitute accordingly in your recipes. If you see both, your export spans conversations from both product eras.

Team workspace exports

For ChatGPT Team and Enterprise workspace exports (covered at the ChatGPT Team export guide), web search tool calls are included in the workspace-scoped conversations.json in the same tether node format. The workspace export does not add or remove tether nodes relative to the Plus export — the schema is consistent. If workspace members use ChatGPT Search in shared project conversations, those tether nodes appear in the per-project conversations/ directory in the same format as individual account exports.

One workspace-specific consideration: in Team accounts, multiple members may have held related conversations where ChatGPT searched for overlapping sources. The per-member export structure means these overlapping research sessions are in separate files; if you want a consolidated view of all URLs ChatGPT retrieved across the team for a given architectural decision, you need to run the jq recipes against each member's conversations.json and merge the URL sets. There is no single federated export that combines member conversations for programmatic analysis.

How WhyChose fits in

Web search tool calls are often the most decision-relevant turns in a ChatGPT conversation. When an engineer asks ChatGPT to help evaluate two competing libraries, the tether_quote nodes contain the specific passages from the documentation, licensing pages, or performance benchmarks that informed the choice. The assistant's final recommendation is downstream of this research — and the research is fully captured in the export.

The WhyChose extractor targets not just the assistant response but also the supporting tool calls. So the extracted decision record captures both the recommendation and the research that backed it. This matters for ADR-quality output: a decision record that says "chose Valkey over Redis" is useful; one that also captures "based on the Valkey 1.0 release notes confirming the BSD license" — sourced from a tether_quote node pointing at the Valkey release blog — is what makes the ADR defensible when someone asks eighteen months later whether the license was actually verified or just assumed.

The naive approach — copying the assistant's final response text — captures the conclusion but loses the evidence chain. The tether nodes are the evidence chain. An ADR written from just the assistant response says "we chose this because it has a permissive license." An ADR written from the assistant response plus the tether_quote nodes says "we chose this because it has a permissive license, verified against the 1.0 release announcement at this URL on this date." The second ADR survives scrutiny; the first doesn't.

Get early access