The GraphQL vs REST decision record: why the API paradigm you chose determines your client coupling surface and your N+1 query exposure
GraphQL versus REST is decided in the founding sprint when the first API endpoint is needed — and never documented as a deliberate architecture choice with client coupling strategy, DataLoader policy, or schema governance evaluated. The query interface determines whether adding a new mobile view requires a backend engineer or not, whether N+1 database queries are invisible until production load exposes them, whether removing a field breaks an external consumer six months later without warning, and whether HTTP caching at the CDN layer is structurally available or architecturally inaccessible. Each of these properties is established when the founding engineer answers "REST or GraphQL?" in the first sprint, invisible in the API reference documentation, and impossible to reconstruct from the schema file once the reasoning has been discarded.
A 15-person fintech startup chose GraphQL for their investment portfolio dashboard. The founding engineer had built with GraphQL at a previous company and recommended it: product managers would be able to request new dashboard components without involving a backend engineer for each new data combination, and the schema would serve as a contract that made the frontend and backend teams independent. The first year validated the choice. The frontend team added six new dashboard panels across two quarters without a single backend API ticket. The schema grew from 12 types to 38 types across that period, with each new type added by the engineer responsible for the corresponding feature.
The incident was triggered by a new admin view. The operations team requested a "portfolio overview" page showing all customer portfolios in a single table — customer name, portfolio value, asset count, and most recent transaction date. The frontend engineer implementing the view wrote a GraphQL query that requested the portfolios list, and for each portfolio, the customer object (for name) and the recentTransaction object (for date). The query was straightforward — four lines requesting four fields across three types. The engineer tested it locally against a development database with 12 portfolios and saw a 340-millisecond response time, acceptable for an internal admin tool.
The query went live against the production database, which contained 340 customer portfolios. The response time was 47 seconds. The admin view timed out on every load attempt. The on-call engineer investigated and found 681 database queries in the query log for a single GraphQL request: one query for the list of portfolios, 340 queries for the corresponding customer record (one per portfolio), and 340 queries for the most recent transaction per portfolio. The customer resolver and the recentTransaction resolver each fetched their target by a foreign key from the portfolio row, independently, once per portfolio. No DataLoader was in use on either resolver.
The team had implemented DataLoader on the resolvers built in the first quarter — the founding engineer who set up the schema had used DataLoader for the user resolver and the account resolver and had included it in the README as "use this pattern for resolvers that look up by ID." The three engineers who added new types in the following months had read the schema and the resolver implementations but had not read the README section on DataLoader. The customer and recentTransaction resolvers were added by two engineers who knew GraphQL well enough to write correct resolvers but did not know that "correct" in this codebase required DataLoader on any resolver that fetched a database row by a foreign key from a parent object. The README instruction was not enforced by any code review checklist, linter rule, or CI test.
Adding DataLoader to the two resolvers took two hours and reduced the query count from 681 to 3 (portfolios list, customers batch, recent transactions batch). The response time dropped to 280 milliseconds. The fix was not the problem. The problem was that the DataLoader requirement — the constraint that prevents GraphQL's resolver model from producing unbounded database query counts on list views — was a convention known to the founding engineer, partially documented in a README section, and unknown to the three engineers who added new types across the following year. The schema had no mechanism to express that a field required DataLoader. The code review process had no checklist item asking whether the new resolver accessed a database by a foreign key from a parent object. The CI test suite had no query count assertion for list queries. The N+1 problem accumulated silently across twelve months of schema additions until a production list query exposed it.
A 22-person B2B SaaS company built their mobile application against a REST API. The API design followed standard resource conventions: GET /users/:id, GET /projects/:id, GET /tasks, GET /teams/:id/members. The founding backend engineer had experience with REST and chose it because the team understood HTTP semantics, the API would be consumed by both the mobile app and future third-party integrations, and REST's stateless request model aligned with their CDN caching strategy for read-heavy endpoints.
The mobile application launched with a dashboard view that displayed user profile information, the user's active projects, and the team member count for each project. The mobile engineer fetched this with three sequential calls: GET /users/:id for the profile, GET /projects?user_id=:id for the projects list, and then one GET /teams/:id/members call per project to get the member count. On a fast connection the sequential calls took 1.2 seconds. The team accepted this at launch.
Six months later, the mobile team added a "home feed" view that displayed recent activity across all projects a user was involved in. Each activity item showed the project name, the activity type, the actor's name, and the actor's avatar URL. Fetching this required the activity list, then a project lookup per activity (for the project name), and then a user lookup per activity (for the actor's name and avatar). The sequential call chain was too slow for a feed view. The mobile engineer filed a ticket requesting a dedicated endpoint. The backend engineer created GET /feed?user_id=:id, which returned activity items with embedded project and actor data in a single response. The endpoint was named for its consumer — the mobile feed view.
The pattern repeated. The mobile team requested a "project detail" view with tasks, member list, recent files, and open comments aggregated into one response. The backend engineer created GET /projects/:id/detail. The team requested an "onboarding" view that assembled user progress data alongside team metadata. The backend engineer created GET /onboarding/summary?user_id=:id. Each new view had a corresponding backend ticket, a new endpoint, and a new aggregation function that pulled data from multiple database tables into a combined response shape shaped around the mobile view's current requirements.
After eighteen months, the API had 54 endpoints. Fourteen of those were view-specific aggregation endpoints serving the mobile app: seven named after the view they served (/feed, /detail, /summary, /overview, /profile-card, /dashboard-stats, /search-preview), and seven that combined resource-path and aggregation (/projects/:id/detail, /users/:id/activity-summary, /teams/:id/health-snapshot). When the mobile team added a tablet layout with different data density requirements, six of the fourteen view-specific endpoints needed to be cloned or modified because the tablet view required additional fields or different nesting. The backend engineer who owned the API section wrote in the PR description: "This is now the third variant of the dashboard endpoint. We need a better fetching strategy."
The root cause was not that REST was the wrong choice. The root cause was that REST's fixed-response-shape model was never paired with a fetching strategy for multi-resource client views. The founding decision to use REST implicitly required a decision about how clients with complex data requirements would fetch multiple resource types efficiently — and that secondary decision was never made. Instead, the team improvised incrementally: each mobile ticket produced a new endpoint, and the endpoint proliferation accumulated as a structural liability that required cloning endpoints for each new client variant.
The four structural properties that are decided in the founding sprint
Both incidents — the N+1 query production and the view-specific endpoint proliferation — were caused by structural properties established when the API paradigm was selected. These properties are not visible in the API reference documentation, the schema file, or the endpoint list. They are visible only in the fetching strategy: whether list queries batch their related-entity lookups, whether client-view data requirements are served by a fixed endpoint or a flexible query, and whether schema changes that remove fields produce client errors immediately or silently. The founding session that answers "REST or GraphQL?" establishes all four properties — and closes before any of them are documented.
1. Over-fetching, under-fetching, and the client coupling tradeoff
REST endpoints return a fixed response shape defined by the server. A GET /users/:id endpoint returns the full user object regardless of which fields the client needs. A mobile view that displays only the user's name and avatar receives the full object including address, preferences, subscription tier, and any other fields the server includes. This is over-fetching: the server sends more data than the client needs. Over-fetching wastes bandwidth, increases serialization and deserialization time, and scales poorly on slow mobile connections where payload size directly affects render latency.
Under-fetching is the inverse: a single REST endpoint does not return all the data the client needs for the current view. A dashboard that displays user profile data alongside recent orders and team member count must make multiple requests — one per resource type — adding a round-trip for each additional call. Teams that encounter under-fetching solve it by creating combo endpoints that aggregate data from multiple resources for specific client views. Combo endpoints eliminate round-trips but couple the server API surface to the client's current view requirements: when the view changes, a backend engineer must update or create an endpoint. The combo endpoint proliferation documented in the second incident is the natural evolution of under-fetching without a fetching strategy.
GraphQL addresses both problems by letting the client declare exactly the fields it needs in each query. The server returns only the declared fields; the client makes one request per logical view regardless of how many resource types are involved. A mobile dashboard that needs user name, three recent orders with their totals, and team member count sends a single GraphQL query with those exact fields and receives a response with exactly that data. No over-fetching, no sequential round-trips, no backend ticket required for the field selection. The tradeoff is that the client now authors queries — a responsibility that requires the frontend team to understand the schema structure and maintain queries as schema evolves. The benefit — client teams evolving independently of backend teams — is the stated rationale behind most GraphQL adoptions.
The coupling surface shifts, not disappears. In REST, the server controls the response shape and the client is coupled to the server's aggregation decisions (the combo endpoint must match the view). In GraphQL, the client controls the query shape and the server is coupled to the schema's type system (the schema must evolve in a backward-compatible way because clients depend on specific field paths). The API schema design decision record documents the response shape and the evolution policy; the GraphQL vs REST ADR documents which paradigm determines where that coupling surface lives — at the server's endpoint aggregation or at the schema's type and field contract.
2. N+1 query exposure and the DataLoader requirement
GraphQL resolvers are functions that return a field's value for a single parent object. The runtime calls each resolver independently for each parent object in a list. When a query requests a list of orders and the customer name for each order, the GraphQL runtime calls the order resolver once (which queries the database for the list), and then calls the customer resolver once per order to fetch that order's customer. Each customer resolver invocation issues an independent database query for one customer by ID. A list of 340 orders produces 340 independent database queries for customers, in addition to the initial orders query.
The N+1 problem is invisible in a development environment where the list contains 5–20 items. It surfaces in production when the list reaches meaningful sizes and the per-item resolver latency — each database query adding 2–15 milliseconds — multiplies across the list. A query that returns 340 items with two related fields that each trigger an independent database lookup produces 681 database queries, as in the first incident. The query duration scales linearly with list size; there is no natural backpressure that makes the slowness visible in development.
DataLoader solves N+1 by batching: it collects all resolver invocations within a single GraphQL execution tick and issues a single database query for all collected IDs. A DataLoader for the customer field collects all customer IDs requested across the order resolvers in the current query, issues SELECT * FROM customers WHERE id IN (...), and distributes the results back to each waiting resolver. The result is one database query for customers regardless of how many orders are in the list. DataLoader requires a per-request instance — a DataLoader shared across requests will return stale cached results and must not be used as a singleton. The per-request lifecycle is a convention that must be documented and enforced because it is not visible in the resolver's type signature.
REST does not have the N+1 problem at the API boundary because the endpoint implementation aggregates data on the server. A GET /orders?user_id=:id endpoint implementation uses a SQL JOIN or a single batched query to fetch orders with their customer data in one database round-trip. The aggregation strategy is invisible to the client. The cost is that changing the aggregation requires a backend code change; the benefit is that the database query strategy is controlled by the engineer who writes the endpoint, not by the query the client happens to send. An endpoint implementation that produces an N+1 is a bug visible in code review; a GraphQL resolver that produces N+1 on list queries is a pattern that requires a convention (DataLoader) invisible to code review unless specifically checked.
The DataLoader convention — which resolvers require it, the per-request instance requirement, and the enforcement mechanism — must be documented in the GraphQL vs REST ADR, not in the DataLoader library's README. A code review checklist item ("does this resolver access a database using a field from the parent object?") is the minimum enforcement. A CI test that asserts query count on a list query of known size is the stronger enforcement. Neither is automatic from the GraphQL runtime. The performance optimization decision record covers query analysis and profiling methodology; the GraphQL vs REST ADR documents the DataLoader requirement as a convention applied at every new resolver that fetches a related entity.
3. HTTP caching model and client cache strategy
REST endpoints are addressable by URL. A GET /users/42 response can carry Cache-Control: max-age=300 and be cached by the browser, a CDN, or a reverse proxy for five minutes. A second request to the same URL within the cache window returns the cached response without hitting the application server. REST's HTTP caching is structural — it requires no additional infrastructure, no client-side library, and no application-level configuration beyond the response headers. The caching strategy decision record documents the cache duration by resource type, the invalidation trigger, and the CDN configuration; these decisions are expressible directly in HTTP semantics.
GraphQL uses a single endpoint — typically POST /graphql — for all queries. POST requests are not cacheable by default in HTTP. A CDN or reverse proxy sees every GraphQL request as a POST to the same URL and cannot differentiate between a query for user profile data and a query for order history. HTTP caching at the CDN layer is architecturally inaccessible for standard GraphQL over POST. Persisted queries — where the client sends a query hash and the server looks up the full query string — allow GET requests with query parameters that are cacheable by URL, but require both client and server infrastructure to register and look up persisted queries, adding operational complexity that standard REST caching does not require.
GraphQL clients solve the caching problem client-side. Apollo Client maintains a normalized cache that stores fetched data keyed by type and ID (User:42) and serves subsequent queries that overlap with cached data without making a network request. The normalized cache means that a query fetching order list data and a separate query fetching a single order's detail can share cached data if the single order's fields overlap with what was already fetched in the list query. This is more sophisticated than HTTP caching — it operates at the field level, not the URL level — but it requires the client to use Apollo Client or an equivalent library, requires the cache to be configured with the correct type policies for each type that has an ID, and requires that the server's schema reliably returns ID fields so the cache can key its entries correctly. A GraphQL implementation that uses Apollo Client for caching without documenting the cache configuration, the type policies, and the cache invalidation triggers has chosen a caching strategy that is invisible to a reader of the API documentation.
The caching architecture decision — CDN-cacheable REST endpoints versus client-normalized GraphQL cache — determines the performance characteristics of read-heavy queries at scale and must be documented explicitly. A team that selects GraphQL for its fetching flexibility without planning its caching strategy will discover at first load spike that POST /graphql is a cache miss for every request at the CDN layer, while a comparable REST endpoint serving the same data would have a cache hit rate above 80% for common queries.
4. Schema governance, deprecation, and breaking change surface
A GraphQL schema is a typed contract. Removing a field from the schema breaks every client that includes that field in its queries — immediately, at the next request, without a version boundary. A REST endpoint that removes a JSON field from its response breaks clients that depend on that field, but REST provides no mechanism to declare the field's existence as part of a contract, so the breaking change may not be detected until the consumer reports an error. GraphQL's explicit schema makes breaking changes more precisely definable and more precisely dangerous: a field that appears in the schema is a commitment to every client currently querying that field.
Schema governance — the process by which fields are added, deprecated, and removed — must be documented before the schema is used by any consumer outside the founding team. The deprecation workflow has five steps: (1) add the @deprecated(reason: "use newFieldName instead") directive to the field, (2) ensure the deprecation reason includes a migration path, (3) identify all current consumers of the field using schema usage analysis or Apollo Studio's field usage metrics, (4) notify consumers of the deprecation timeline, and (5) remove the field after all consumers have migrated. A field that is deprecated without step 3 and step 4 will be removed while an unconsumed external client still uses it. A field that is deprecated without a documented removal timeline will remain in the schema indefinitely because no engineer owns the removal.
The API versioning strategy decision record documents whether the API uses URL versioning, header versioning, or schema versioning; for GraphQL, schema versioning typically means additive-only evolution (never remove or rename fields, only add new fields alongside deprecated old fields) rather than explicit version namespaces. The ADR must document which evolution strategy the schema uses, what "additive-only" means in practice (can you change a field's type? can you add a required argument?), and what triggers a breaking change review before a schema change is merged.
REST's schema governance problem is less formally defined but equally real. A REST API that changes the shape of GET /users/:id — renaming a field, changing a nested object to a flat field, or removing a null-returned field that clients check — breaks consumers with no deprecation mechanism available at the protocol level. The HTTP Deprecation header (RFC 8594) and the Link: <docs>; rel="deprecation" header exist but are rarely implemented. REST API governance relies on versioning (adding a /v2/users/:id endpoint alongside the existing endpoint and maintaining both until consumers migrate) or on communication (announcing changes in API changelogs). The versioning strategy for REST must be documented in the same ADR — not separately — because the decision to version via URL, via header, or via schema-evolution determines the consumer migration burden when a response shape must change. The API versioning decision record covers the versioning mechanism; the GraphQL vs REST ADR documents how the selected paradigm's contract model interacts with versioning.
API paradigm options and their structural properties
REST with resource-based endpoints: One URL per resource type, HTTP methods express CRUD intent, response shapes are fixed by the server. Supports HTTP caching natively at the CDN and browser level. Produces over-fetching (fixed shapes return more than clients need) and under-fetching (single resource endpoints require multiple round-trips for multi-resource views). Suitable when the API is consumed by third parties who need stable, predictable endpoints and when CDN caching of read-heavy endpoints is a performance requirement. Requires a versioning strategy documented alongside the endpoint design. The API gateway decision record covers how REST's multiple-endpoint surface is exposed through a gateway with rate limiting and authentication; the gateway's route configuration is the operational representation of the REST API contract.
REST with BFF (Backend For Frontend) pattern: A dedicated backend service per client type (mobile BFF, web BFF, third-party BFF) aggregates data from underlying REST or internal services and exposes view-optimized endpoints tailored to each client's data requirements. Eliminates under-fetching for the target client type without coupling the underlying API surface to client view requirements. The BFF is owned by the frontend team or a team aligned with the client; backend teams evolve the underlying services without directly negotiating endpoint shapes with each client. The operational cost is a separate deployable service per client type; the build vs buy decision record covers whether to build the BFF as a custom Node.js/Go aggregation service or use an API composition tool. BFF is a structured answer to the combo endpoint proliferation problem — it acknowledges that client-optimized aggregation is a real requirement and assigns it a designated architectural boundary rather than accumulating it as ad-hoc endpoints on the core API.
GraphQL with a single schema (Apollo Server, GraphQL Yoga, Strawberry for Python, graphql-ruby): A single typed schema serves all client types; clients declare their exact data requirements in queries; the runtime calls resolvers per field. Eliminates under-fetching across all client types without view-specific endpoint proliferation. Produces N+1 query risk on every resolver that fetches a related entity — DataLoader is required by convention, not by the runtime. HTTP caching is structurally inaccessible without persisted queries; client-side normalized caching requires Apollo Client or Relay. Schema governance (deprecation, breaking change policy, usage analysis) must be implemented as process because the runtime does not enforce it. Suitable when multiple client types with divergent data requirements must be served from a single API surface and when the team is willing to implement and enforce the DataLoader convention on every new resolver.
GraphQL with federation (Apollo Federation, GraphQL Mesh, Cosmo): Multiple backend services each own a subgraph of the full schema; a gateway stitches the subgraphs into a unified schema that clients query as if it were a single server. Each subgraph team evolves their portion of the schema independently, subject to federation compatibility rules. Federation is the answer to "we chose GraphQL but our services are split across five teams" — it maintains GraphQL's client-facing benefits while distributing schema ownership across service teams. The operational cost is a gateway process, a schema registry (Apollo Studio or Cosmo Router), and schema composition validation in CI. Federation introduces a new class of N+1 problem: the gateway may batch entity resolvers across subgraphs, but the batching efficiency depends on the subgraph's entity resolver implementation. Federation ADRs belong to the same decision as the initial GraphQL vs REST choice — adopting federation is not a separate decision made after GraphQL is established, because the subgraph design constraints are most economically addressed at the time the API paradigm is selected. The container orchestration decision record covers the infrastructure for running the gateway and subgraph services alongside the application fleet.
tRPC (TypeScript-only): Type-safe RPC protocol that generates client types directly from server procedure definitions, eliminating the need for schema files, code generation, or API documentation for TypeScript full-stack applications. A tRPC procedure is a TypeScript function; the client calls it as if it were a local function, with full type inference. End-to-end type safety catches breaking API changes at compile time rather than at runtime — a changed procedure output type fails the client's TypeScript compilation before the change reaches production. tRPC is only viable when both the client and the server are TypeScript; it cannot be consumed by non-TypeScript clients. Suitable for internal TypeScript full-stack applications (Next.js frontend + Node.js backend) where the client and server are in the same repository and the consumer is not a third party. Not suitable for public APIs, mobile applications using native code, or teams where the client and server are in separate technology stacks. The test strategy decision record covers how end-to-end type safety changes the testing pyramid — fewer integration tests for type correctness are needed when the type system enforces the contract at compile time.
gRPC (Protocol Buffers): Binary RPC protocol for internal service-to-service communication. Defined by .proto files that generate typed client and server stubs in any supported language. Provides streaming, bidirectional communication, and efficient binary serialization. Not designed for browser consumption in most configurations (requires grpc-web or a proxy). Suitable for internal service-to-service calls where bandwidth efficiency, streaming, and cross-language typed contracts matter. Not a competing choice with REST or GraphQL for browser-facing APIs — it occupies a different use case. A team that selects gRPC for internal services and REST or GraphQL for their external API has made two correct non-competing decisions; a team that evaluates gRPC for their browser-facing API is comparing incompatible use cases. The distinction — internal service protocol versus client-facing API paradigm — belongs in the ADR as the scope limitation of the decision.
AI chat sessions where API paradigm decisions are made
API paradigm decisions are made in four types of AI chat sessions, each of which establishes structural properties that are not visible in the schema or the endpoint list:
The founding "REST or GraphQL?" session. "I'm building a SaaS with a React frontend and a mobile app. Should I use REST or GraphQL for the API?" This session covers the conceptual tradeoffs — flexible queries vs stable endpoints, schema contract vs HTTP caching — and closes with a recommendation that the founding engineer implements. It does not cover the DataLoader convention, the cache strategy for the chosen paradigm, the schema governance process, or the authorization model. All four structural properties are established by the paradigm selection without being documented. This is the highest-value session to recover from AI chat history: it contains the alternatives considered, the constraints that drove the recommendation, and the client types (mobile, web, third-party) that shaped the fetching strategy rationale. The WhyChose extractor recovers this session from the AI chat export; the founding sprint session that chose GraphQL over REST is the first entry in the API paradigm ADR's context section.
The "how do we fix the slow query?" session. "Our GraphQL query is timing out. The profiler shows 341 database queries for one request." This session covers DataLoader, explains the N+1 problem, and produces a DataLoader implementation for the specific resolver that caused the incident. It does not produce a documented convention requiring DataLoader on all future resolvers that fetch related entities, a CI test asserting query count on list queries, or a code review checklist item. The fix is in the code; the convention that prevents recurrence is in the session history. A team that recovers this session from their AI chat history finds the original reasoning for the DataLoader pattern, the scope of the fix (one resolver), and the gap between the fix and the prevention policy. The observability strategy decision record covers the database query monitoring that would have made the N+1 production in the first incident visible before the admin view went live; its absence from the original setup is the second finding from this session type.
The "we need a new mobile endpoint" session. "The mobile team needs a single endpoint that returns user profile, their active projects, and each project's member count. How should I structure this?" This session produces a new combo endpoint, a new route, and a new aggregation function. It does not produce a fetching strategy decision that would prevent the next mobile ticket from producing another combo endpoint. The pattern repeats across twelve such sessions, each independently correct, collectively forming the endpoint proliferation documented in the second incident. Recovering these sessions from AI chat history shows the accumulation: the first session was a reasonable tactical decision; the fifth session in the same pattern was the signal that a fetching strategy decision was overdue. The WhyChose extractor surfaces all twelve sessions grouped by the same question pattern, making the proliferation visible as a pattern rather than as twelve independent acceptable decisions.
The "how do we deprecate this GraphQL field?" session. "We renamed a field in our GraphQL schema and now some clients are getting null. How do we handle deprecation?" This session covers the @deprecated directive, explains the additive-only evolution model, and may produce a deprecation notice in the schema. It does not produce a schema governance document, a field usage analysis process, or a consumer notification workflow. The incident that triggers the session — a client breaking silently because a field was removed — is the first time the team discovers that GraphQL schema evolution requires a documented governance process, not just correct syntax. Recovering this session from AI chat history surfaces the original schema that caused the break, the consumers that were affected, and the deprecation approach that was chosen under time pressure rather than as a deliberate policy.
Writing the GraphQL vs REST ADR
The GraphQL vs REST ADR has five sections. Each section addresses one structural property that is established when the API paradigm is selected and difficult to change retroactively — because the entire client codebase is coupled to the fetching model, the caching model, and the schema evolution model that the founding session chose.
Section 1: API paradigm selection and client fetching strategy. Document the paradigm selected (REST resource endpoints, GraphQL single schema, REST with BFF, tRPC, GraphQL federation) with the alternatives considered and the reason each alternative was rejected. Include the client types the API serves (browser, mobile native, third-party consumers, internal services), because the client type population is the primary driver of the fetching strategy: a GraphQL schema with a single mobile client has a different coupling surface than a schema shared by a mobile app, a web dashboard, and a public third-party API. Include the fetching strategy for multi-resource views: if REST, how are multi-resource views served (sequential client-side calls, combo endpoints, BFF)? If GraphQL, how are client queries bounded (query depth limits, query complexity limits, query allowlist)? The rationale for the paradigm selection must be specific enough that a new engineer can evaluate whether the rationale still holds two years later — not "GraphQL is more flexible" but "GraphQL was selected because the mobile team was adding two new dashboard panels per sprint and each panel required a backend ticket under the REST combo-endpoint model; the flexibility benefit required implementing DataLoader on all resolvers accessing related entities and an Apollo Client normalized cache."
Section 2: N+1 prevention policy (GraphQL) or aggregation boundary (REST). For GraphQL: document the DataLoader requirement — which resolver types require DataLoader (any resolver that accesses a database field from a parent object), the per-request instance lifecycle (DataLoader must be instantiated per GraphQL execution context, not as a module-level singleton), and the enforcement mechanism (code review checklist item, linter rule, or CI assertion). Include the query depth limit (the maximum allowed nesting level in a GraphQL query) and the query complexity limit (the maximum allowed complexity score, where each field contributes a weight and nested list fields contribute a multiplying weight). Both limits prevent a single query from triggering unbounded database operations. For REST: document the aggregation boundary — which aggregation level is permitted in the core API (resource-level endpoints only, no combo endpoints), and which aggregation is the responsibility of the BFF or the client. Include the convention for when a new combo endpoint is acceptable versus when a BFF is the correct answer. A decision that says "combo endpoints are acceptable for authenticated internal views but not for the public API" is a documented aggregation policy; "we add combo endpoints when the mobile team requests them" is not a policy — it is the absence of one that produces endpoint proliferation. The database vendor decision record and the connection pool configuration interact with both DataLoader batching efficiency (batch size versus connection pool depth) and REST endpoint aggregation performance (JOIN complexity on the chosen database).
Section 3: Caching strategy for the selected paradigm. For REST: document the cache policy per resource type — which endpoints carry Cache-Control: max-age=N headers, the cache duration for user-specific versus shared resources, the CDN configuration for cacheable endpoints, and the cache invalidation trigger per resource type. The caching strategy decision record covers the full cache topology; the GraphQL vs REST ADR references it for the cache policy and documents which endpoints participate in CDN caching. For GraphQL: document the client-side cache strategy — Apollo Client normalized cache, Relay store, or no cache — the type policy configuration that maps schema types to cache keys, the cache invalidation approach for mutations (refetchQueries versus cache updates), and the persisted query strategy if CDN caching of common queries is required. Include the cache miss path: a GraphQL request that bypasses the normalized client cache because the query shape does not overlap with previously cached data will hit the application server; the cache hit rate for common query patterns must be measurable via Apollo Client's cache metrics. A GraphQL implementation without a documented cache strategy and without cache hit rate monitoring has a performance dependency that is invisible until traffic volume exposes it.
Section 4: Schema governance and evolution policy. Document the breaking change definition: what constitutes a breaking change in the selected paradigm. For GraphQL: removing a field, renaming a field, changing a field's type from nullable to non-nullable, adding a required argument. For REST: removing a response field, changing a field's type, removing an endpoint, changing authentication requirements. Document the deprecation workflow: how a field or endpoint is marked deprecated, how consumers are notified, the minimum deprecation period before removal, and the owner responsible for verifying that all consumers have migrated. Document the schema changelog format — where schema changes are recorded, how they are versioned, and how consumers discover breaking changes before they are deployed. The API versioning strategy ADR documents the versioning mechanism (URL versioning, header versioning, schema evolution); the GraphQL vs REST ADR documents the governance process that enforces the versioning policy at schema review time. For GraphQL, include the schema linting configuration (GraphQL Inspector, ESLint graphql-plugin) that catches breaking changes in CI before they reach a deployed environment. A schema change that removes a field must fail CI if any registered client query includes that field.
Section 5: Authorization model at the API boundary. For REST: authorization is typically enforced at the middleware layer — every request to a protected endpoint passes through an authentication check and a permission check before the handler runs. The authorization policy is expressed in route-level middleware and is uniform across the endpoint: a user who can access GET /projects/:id receives all fields the endpoint returns, or receives a 403. For GraphQL: the authorization surface is more complex because a single query can request fields across multiple types, and different fields may have different authorization requirements. Field-level authorization — where the resolver checks permissions for each field before returning its value — prevents unauthorized fields from being returned even when the query is syntactically valid. Schema-level authorization — enforced at the gateway before resolvers run — rejects entire queries that request unauthorized field paths. The authorization model must be documented with: which fields require per-resolver authorization checks, which fields are public (no check), how authentication context is passed to resolvers (usually via the GraphQL context object), and whether field-level authorization is enforced in addition to or instead of schema-level gateway authorization. The authentication strategy decision record covers the token format and validation mechanism; the GraphQL vs REST ADR documents where in the GraphQL execution pipeline the token is validated and how the resulting permission set reaches each resolver. A GraphQL implementation that relies entirely on route-level authentication without field-level authorization exposes all fields in the schema to any authenticated user — including administrative fields, internal metadata, and data belonging to other users that the authenticated user's query happens to reach through a related-entity traversal.
None of these five sections appear in the founding "REST or GraphQL?" session that selects the paradigm. That session answers the immediate question — which query interface to use — and closes when the engineer starts implementing the first endpoint or resolver. The DataLoader convention, the caching model, the schema governance process, the aggregation boundary, and the authorization surface are the operational requirements of an API that handles production load, evolves across multiple engineers and client versions, and enforces consistent data access permissions across all query paths. They are not advanced optimization concerns. They are the properties that determine whether adding a second engineer to the GraphQL schema accumulates N+1 queries silently, whether a mobile view change requires a backend ticket or a frontend query update, whether a deprecated field reaches removal or persists forever, and whether a sophisticated client query can traverse the type graph to reach data belonging to another account. The WhyChose extractor surfaces the founding session, the N+1 incident session, the mobile endpoint proliferation sessions, and the first deprecation crisis session from AI chat history; the GraphQL vs REST ADR takes the paradigm choice buried in those sessions and converts it into a documented fetching strategy, a DataLoader convention with enforcement, a caching model with measurable cache hit rates, a governance process that catches breaking changes before deployment, and an authorization model that is explicit about which fields require permission checks at the resolver level.
FAQs
Why does GraphQL's resolver model produce N+1 database queries, and what is the correct mechanism to prevent it?
GraphQL resolvers are functions that return a field's value for a single parent object. When a query requests a list of objects and a related field for each — 100 orders and each order's customer name — GraphQL calls the customer resolver once per order. Each invocation independently queries the database for one customer by ID. The result is N+1 database queries: one for the parent list plus one per item for each related field. The N+1 is invisible in development against small datasets and surfaces in production when list sizes reach meaningful counts and per-item query latency multiplies across the list.
DataLoader solves N+1 by batching: it collects all resolver invocations within a single GraphQL execution tick and issues one batched database query for all collected IDs. A DataLoader for customers collects all customer IDs requested across the order resolvers in the current query execution, issues SELECT * FROM customers WHERE id IN (...), and distributes results back to each waiting resolver. The result is one database query for customers regardless of list size. DataLoader must be instantiated per request — a module-level singleton will return stale cached results across requests.
DataLoader is a convention, not a runtime enforcement. A GraphQL server without a documented convention requiring DataLoader on any resolver that fetches a database row by a foreign key from a parent object will accumulate N+1 resolver implementations as engineers add new types and fields without knowing the requirement applies to their new code. The enforcement mechanism — a code review checklist item, a linter rule, or a CI test asserting query count on a list query of known size — must be documented in the ADR, not assumed from the DataLoader library's documentation.
What is the difference between over-fetching and under-fetching in REST APIs, and how does GraphQL change this tradeoff?
Over-fetching is when a REST endpoint returns more data than the client needs. A mobile view that displays only a user's name and avatar receives the full user object — preferences, subscription tier, address, all fields the server includes — regardless of what the client needs. Under-fetching is when a single REST endpoint does not return all the data the client needs, requiring multiple sequential requests to multiple endpoints and adding a round-trip per additional resource type.
GraphQL addresses both problems: the client declares exactly the fields it needs in each query and makes one request per logical view regardless of how many resource types are involved. The over-fetching problem is eliminated because the server returns only declared fields. The under-fetching problem is eliminated because one query can span multiple types. The tradeoff shifts: REST's coupling surface is the server's endpoint aggregation decisions (combo endpoints must match view requirements); GraphQL's coupling surface is the schema's type and field contract (clients depend on specific field paths, making schema changes that remove fields breaking by default).
The secondary tradeoff is caching: REST endpoints are cacheable by URL at the CDN and browser level because each resource has a stable address; GraphQL uses POST to a single endpoint, making HTTP caching structurally inaccessible without persisted queries. A GraphQL implementation without a documented client-side cache strategy and without persisted queries for common read-heavy operations does not benefit from CDN caching — every request hits the application server regardless of how recently the same data was fetched by any other client.
What should a GraphQL vs REST ADR document that a general API design document does not?
A general API design document specifies resource naming, response envelopes, error codes, and authentication. A GraphQL vs REST ADR must document the paradigm-specific structural properties that differ between the two models and are invisible in the API reference: (1) The DataLoader convention — which resolvers require DataLoader, the per-request instance requirement, and the enforcement mechanism. (2) The caching strategy — CDN caching policy for REST endpoints or client-side normalized cache configuration for GraphQL, including the cache hit rate measurement approach. (3) The schema governance process — deprecation workflow, breaking change definition, consumer notification procedure, and the minimum deprecation period before field removal. (4) The authorization model — whether authorization is enforced at route middleware (REST) or per-resolver and at schema gateway (GraphQL), and which fields require field-level permission checks beyond route-level authentication. (5) The aggregation boundary — which multi-resource aggregation is permitted in the core API and which belongs in a BFF layer or client-side query composition, documented as a policy rather than an implicit accumulation of ad-hoc decisions.