Topic: chatgpt search export
ChatGPT Web Search in conversations.json — What's in the Export, What's Missing, and How to Extract Citations (2026)
When ChatGPT searches the web to back up a recommendation — choosing a library, checking a license, comparing two API approaches — that research session leaves a structured trace in your conversations.json export. But it's invisible to the extract-the-assistant-messages approach that most tutorials describe. The content types are tether_browsing_display and tether_quote, and they live in the mapping DAG alongside regular messages. This page is the reference for what ChatGPT's web search stores in the export, the schema of each content type, what's deliberately excluded (full page HTML, ranking signals, browsing session cookies), and the jq recipes to pull every URL ChatGPT cited for a decision.
TL;DR
ChatGPT web search IS in conversations.json, in author.role: "tool" nodes with content_type: "tether_browsing_display" (query + results list with URLs and titles) and tether_quote (specific quoted passage with URL). The assistant's final answer uses content_type: "text" with inline citation markers — the URL-to-sentence mapping is not in the export. Full HTML of browsed pages is not stored. Use jq targeting tether_quote nodes to get a deduplicated list of every URL ChatGPT cited in a conversation.
How ChatGPT web search appears in conversations.json
The conversations.json mapping DAG (covered in detail at the full mapping DAG schema reference) represents every message as a node with an id, a parent pointer, and a message object. When ChatGPT performs a web search, the DAG gains tool-role messages interleaved between the user query and the assistant response. These tool messages are not filtered out of the export — they are full nodes in the mapping DAG — but they use content_type values that the common "walk from current_node to root and collect assistant turns" script ignores entirely.
A tool-role node in the mapping DAG looks like this:
{
"id": "msg-abc123",
"parent": "msg-user-query-id",
"children": ["msg-def456"],
"message": {
"id": "msg-abc123",
"author": {
"role": "tool",
"name": "browser",
"metadata": {}
},
"create_time": 1717190000.0,
"content": {
"content_type": "tether_browsing_display",
"result": "Search results for: valkey vs redis license",
"query": "valkey vs redis license",
"results": [
{
"title": "Valkey 1.0 release — BSD license confirmed",
"url": "https://valkey.io/blog/valkey-1-0/",
"snippet": "Valkey is released under the BSD 3-Clause license..."
}
]
},
"status": "finished_successfully"
}
}
The current_node field at the top level of each conversation object points to the final assistant text response — the last thing the user sees. The typical DAG traversal script walks backward from current_node through parent pointers and collects nodes where author.role == "assistant". The tether nodes sit on the same path (they appear as parents of the assistant response and children of the user query), but because their author.role is "tool" rather than "assistant", they are skipped. To collect tether nodes, you must walk ALL nodes in the mapping using to_entries[].value, not the current-node lineage walk.
The tether content types — what each one means
When ChatGPT issues a web search, a single user turn can produce multiple tool-role nodes in the DAG — one per search query, one per quoted passage. Understanding which content type carries which data is essential before writing any jq recipe.
| content_type | role | What it contains | Notes |
|---|---|---|---|
tether_browsing_display |
tool | Search query string + results array (title, url, snippet per result) | Appears once per search query issued; multiple queries in one turn produce multiple nodes |
tether_quote |
tool | url + quoted text passage extracted from a specific page | Appears for each page ChatGPT extracted a passage from; one node per quoted passage; multiple passages from the same URL produce multiple nodes |
tether_browsing_code |
tool | Internal browser-control code | Implementation detail; not useful for citation extraction; not consistently present across all export versions |
text (with citations) |
assistant | The assistant's final response with inline 【N†source】 citation markers |
The N indices reference positions in the tether_browsing_display results array for that turn; the mapping from marker to URL is reconstructable by index but not stored as an explicit field |
A tether_browsing_display node with the full results array looks like this:
{
"content_type": "tether_browsing_display",
"result": "Search results for: best in-memory cache library Go 2024",
"query": "best in-memory cache library Go 2024",
"results": [
{
"title": "ristretto vs groupcache — benchmark comparison",
"url": "https://example-bench.dev/go-cache-2024",
"snippet": "Ristretto outperforms groupcache by 3x on high-contention reads..."
},
{
"title": "dgraph-io/ristretto — GitHub",
"url": "https://github.com/dgraph-io/ristretto",
"snippet": "A high performance memory-bound Go cache. Ristretto is a fast..."
}
]
}
A tether_quote node — the most useful for citation extraction — looks like this:
{
"content_type": "tether_quote",
"url": "https://github.com/dgraph-io/ristretto",
"domain": "github.com",
"title": "dgraph-io/ristretto — GitHub",
"text": "Ristretto is a fast, fixed size, in-memory cache with a focus on performance and correctness. The motivation to build Ristretto comes from the need for a contention-free cache in Dgraph."
}
Note the text field in tether_quote: it is meaningfully longer than the snippet in a tether_browsing_display result. The snippet is what the search engine returned as a preview. The tether_quote text is what ChatGPT extracted after visiting the page — typically a full paragraph and the passage that directly supported the assistant's claim. For decision-extraction purposes, the tether_quote nodes are the higher-signal artifact.
What ChatGPT stores vs what it doesn't
| Element | Stored in export? | Notes |
|---|---|---|
| Search query string | Yes | In tether_browsing_display.content.query (or search_query in newer ChatGPT Search exports) |
| Cited URLs with titles | Yes | In tether_browsing_display.content.results[].url and .title for each result in the list |
| Text snippet per result | Yes | In tether_browsing_display.content.results[].snippet — this is search-engine preview length, roughly 1–3 sentences; not the full page content |
| Full HTML of browsed pages | No | ChatGPT processes pages server-side and stores only the extracted content; raw HTML, DOM structure, and inline JavaScript are never written to the export |
| Quoted passage from a specific page | Yes | In tether_quote.content.text — typically a paragraph; longer and more specific than the results snippet |
| Which assistant sentence corresponds to which citation | No | The inline citation markers 【N†source】 reference results by index, but the index-to-sentence mapping is not stored as an explicit field; reconstruction requires correlating the N index against the results array |
| Number of pages ChatGPT visited | Inferrable | Count the distinct URLs across all tether_browsing_display result entries and tether_quote nodes for the conversation |
| Order of pages visited | Yes | DAG node order and create_time timestamps preserve the search session sequence |
| Search engine used | Not stored explicitly | Always the search engine OpenAI contracts with at the time the conversation was created; the engine name does not appear in the export |
| Ranking signals or click-through data | No | Only the results that appeared in the results list are stored; position-beyond-list and search ranking internals are not present |
jq recipes to extract cited URLs
All three recipes below operate on the full conversations.json file. The key principle: use .mapping | to_entries[].value to walk ALL nodes in the DAG, not the current-node lineage walk. The current-node lineage walk skips tool-role nodes — exactly the nodes that contain the web search data.
Recipe 1: All URLs cited in a single conversation (deduplicated)
This targets tether_quote nodes, which each have a single url field. Replace "Your Conversation Title Here" with the exact conversation title from your export:
jq -r '
.[]
| select(.title == "Your Conversation Title Here")
| .mapping
| to_entries[].value
| select(.message != null)
| select(.message.content.content_type == "tether_quote")
| .message.content.url
' conversations.json | sort -u
The -r flag outputs raw strings (no JSON quotes). The sort -u deduplicates — the same URL may appear in multiple tether_quote nodes if ChatGPT quoted it multiple times across the conversation.
Recipe 2: All search queries across every conversation in the full export
This extracts every search query string ChatGPT ever submitted across your entire export history. Useful for auditing which research sessions are captured:
jq -r '
[
.[]
| .mapping
| to_entries[].value
| select(.message != null)
| select(.message.content.content_type == "tether_browsing_display")
| .message.content.query
]
| unique[]
' conversations.json
If tether_browsing_display returns no results for conversations where the assistant clearly cited web sources, also try content_type == "search_result_group" as the selector — see the Browse with Bing vs ChatGPT Search section below.
Recipe 3: Query + cited URLs grouped by search session
This produces combined output pairing each search query with the URLs that appeared in that query's results list — useful for reconstructing the research context behind a decision:
jq -r '
.[]
| .title as $title
| .mapping
| to_entries[].value
| select(.message != null)
| select(.message.content.content_type == "tether_browsing_display")
| {
conversation: $title,
query: .message.content.query,
urls: [.message.content.results[].url]
}
' conversations.json
This outputs one JSON object per search-query node, with the parent conversation title, the query string, and the array of result URLs. Pipe through jq -s '.' to collect all objects into a single array. For large exports, add a select(.title | test("keyword"; "i")) filter after the first .[] to limit to relevant conversations.
Why to_entries[].value and not the current-node walk: The standard DAG traversal walks from current_node backward through parent pointers, stopping at root. Tool-role nodes are on this path — they appear as intermediate nodes between the user query and the assistant response — but the traversal filters them out by role. Using to_entries[].value iterates over every key-value pair in the mapping object, visiting all nodes regardless of role or position in the lineage. This is the only reliable way to collect tether nodes without also rewriting the DAG traversal to preserve tool turns.
Browse with Bing vs ChatGPT Search — schema differences
The web search feature has gone through two distinct product phases, and the export schema differs slightly between them.
Browse with Bing (2023–2024)
The original Browse with Bing feature, available to ChatGPT Plus users from mid-2023 onward, used content_type: "tether_browsing_display" as the primary search result node, with a results array containing per-result objects with title, url, and snippet. The query field is named query. This schema is the most widely documented in ChatGPT export guides and is what the jq recipes in this page target by default.
ChatGPT Search (late 2024 onward)
The ChatGPT Search product (integrated directly into the main ChatGPT UI, not a plugin-style toggle) may use content_type: "search_result_group" as an outer wrapper node that contains multiple tether_browsing_display children, or may use slightly different field names — specifically search_query instead of query for the query string field, depending on the exact export version. The tether_quote structure is consistent across both product versions and is therefore the more stable target for citation URL extraction.
Diagnosing which schema your export uses
If your jq recipe targeting tether_browsing_display returns no results for a conversation where the assistant clearly cited web sources — you can see inline citation markers in the response text — run this enumeration query to see all content_type values present in your export:
jq -r '
.[]
| .mapping
| to_entries[].value
| select(.message != null)
| .message.content.content_type
' conversations.json | sort -u
This lists every distinct content_type value in your entire export, deduplicated. The output tells you the exact type names for the search tool calls in your specific export file. If you see search_result_group instead of tether_browsing_display, substitute accordingly in your recipes. If you see both, your export spans conversations from both product eras.
Team workspace exports
For ChatGPT Team and Enterprise workspace exports (covered at the ChatGPT Team export guide), web search tool calls are included in the workspace-scoped conversations.json in the same tether node format. The workspace export does not add or remove tether nodes relative to the Plus export — the schema is consistent. If workspace members use ChatGPT Search in shared project conversations, those tether nodes appear in the per-project conversations/ directory in the same format as individual account exports.
One workspace-specific consideration: in Team accounts, multiple members may have held related conversations where ChatGPT searched for overlapping sources. The per-member export structure means these overlapping research sessions are in separate files; if you want a consolidated view of all URLs ChatGPT retrieved across the team for a given architectural decision, you need to run the jq recipes against each member's conversations.json and merge the URL sets. There is no single federated export that combines member conversations for programmatic analysis.
How WhyChose fits in
Web search tool calls are often the most decision-relevant turns in a ChatGPT conversation. When an engineer asks ChatGPT to help evaluate two competing libraries, the tether_quote nodes contain the specific passages from the documentation, licensing pages, or performance benchmarks that informed the choice. The assistant's final recommendation is downstream of this research — and the research is fully captured in the export.
The WhyChose extractor targets not just the assistant response but also the supporting tool calls. So the extracted decision record captures both the recommendation and the research that backed it. This matters for ADR-quality output: a decision record that says "chose Valkey over Redis" is useful; one that also captures "based on the Valkey 1.0 release notes confirming the BSD license" — sourced from a tether_quote node pointing at the Valkey release blog — is what makes the ADR defensible when someone asks eighteen months later whether the license was actually verified or just assumed.
The naive approach — copying the assistant's final response text — captures the conclusion but loses the evidence chain. The tether nodes are the evidence chain. An ADR written from just the assistant response says "we chose this because it has a permissive license." An ADR written from the assistant response plus the tether_quote nodes says "we chose this because it has a permissive license, verified against the 1.0 release announcement at this URL on this date." The second ADR survives scrutiny; the first doesn't.
Related questions
Are ChatGPT web search results in conversations.json?
Yes, but not in the places most jq scripts look. When ChatGPT uses web search (Browse with Bing or the newer ChatGPT Search feature), the search session produces tool-role messages with content_type values of tether_browsing_display (the search query plus a results list with title, URL, and snippet for each result) and tether_quote (a specific text snippet extracted from one of the cited pages). Most ChatGPT export tutorials only extract author.role: "assistant" messages, making these tool-role nodes invisible. The full HTML of browsed pages is not stored — only the query, the result list metadata, and quoted snippets.
What does tether_browsing_display contain?
A tether_browsing_display node contains the search query string that ChatGPT submitted to the search engine, plus an array of result objects each with title (page title), url (full URL), and a short snippet (the search-result preview text, roughly 1–3 sentences). The number of results varies — typically 3–10 per search query. The snippet is the search-engine preview text, not the full page content. If ChatGPT visited a page to extract a longer passage, that longer passage appears in a separate tether_quote node pointing to the same URL.
How do I extract all URLs ChatGPT cited in a conversation?
Use jq to target the tether_quote nodes — these are the most reliable source of cited URLs because each node has a single url field. Run: jq '[.[] | .mapping | to_entries[].value | select(.message != null) | select(.message.content.content_type == "tether_quote") | .message.content.url] | unique[]' conversations.json. Use to_entries[].value to walk all DAG nodes, not just the current-node lineage — the tool-role nodes are skipped by the standard DAG walk. For the search query strings themselves, target tether_browsing_display nodes and filter by .message.content.query.
What's the difference between Browse with Bing and ChatGPT Search in the export?
Browse with Bing (2023–2024) uses content_type: "tether_browsing_display" with a results array and a query field. ChatGPT Search (late 2024 onward) may use content_type: "search_result_group" as an outer wrapper, or may use search_query instead of query as the field name. The tether_quote structure is consistent across both. If targeting tether_browsing_display returns nothing for a clearly search-heavy conversation, run the content-type enumeration query (.message.content.content_type | sort -u) to see the actual type names in your export and adjust the selector accordingly.
Further reading
- ChatGPT conversations.json format — field reference — the full mapping DAG schema and how to walk it; the tether nodes are standard DAG nodes, not special-cased, but the current-node traversal pattern used in most guides silently skips them. This reference covers the complete node structure including tool-role messages.
- How to export your ChatGPT history (2026 guide) — the export flow that produces the conversations.json file containing the tether nodes. Covers the Settings → Data controls → Export data path, the typical turnaround time, and what to do when the export email doesn't arrive.
- Uploaded files in ChatGPT exports — what's included, what's missing — companion reference for the other content types that exporters commonly miss: uploaded file binaries (not stored) and DALL-E image CDN URLs (expiring ~30 days). The web search nodes are one category of invisible-to-naive-extractor content; uploaded file binary absence is another.
- ChatGPT shared links — what persists, what expires, and how to archive — shared links render the assistant's text with inline
【N†source】citation markers but not the tether source nodes themselves, so the URLs behind those markers in a shared link are not independently verifiable without the full conversations.json export. This guide covers the shared-link-as-decision-record anti-pattern and archival strategies. - How to extract decisions from your ChatGPT chats — the extraction guide;
tether_quotepassages are high-signal input for the decision extraction step because they contain the research evidence behind choices, not just the conclusions. This page covers the full extraction pipeline from conversations.json to structured ADR-equivalent output. - The open-source extractor — handles tether node extraction as part of the full conversations.json pass, surfacing both the decision text and the supporting citations in the structured output record.