The performance optimization decision record: why the "we added a cache" decision is not self-documenting

The Redis cluster in your production environment has been there for three years. The commit message that introduced it says: "add caching for user profile queries." There is no ADR. There is no ticket with discussion. The engineer who added it is at a different company. A new engineer joining the team has a legitimate set of questions that the codebase cannot answer: Which queries are cached? What was the latency before the cache? What load condition made it necessary? What is the TTL and why? Is the cache still needed, or has the query been rewritten since? What happens if the cache goes down — does the application degrade gracefully or hard-fail? The cache is visible in the stack. The decision that put it there is not.

Performance optimization decisions are the category most likely to be treated as self-evident in a commit message. The code change is the explanation: you added Redis, so you were caching. You added an index, so the query was slow. You switched from lazy loading to eager loading, so you had an N+1 problem. The what is visible. The why — which queries, at what baseline, under which load conditions, with what alternatives considered, and with what ongoing maintenance obligations accepted — is invisible from the implementation alone.

This matters because performance optimizations have a maintenance lifecycle that other architectural decisions do not. A decision to use PostgreSQL over MySQL is largely permanent — the switching cost is extremely high, and the decision is unlikely to be revisited unless the product's data model changes fundamentally. A decision to add a Redis cache for a specific query pattern is impermanent by design. The cache was added because the query was slow under some load condition. If that query is later rewritten, the cache may be unnecessary — but it will remain in the stack until someone investigates whether the original problem still exists, which requires knowing what the original problem was.

The performance optimization ADR is the record that makes this lifecycle manageable. It tells the next engineer — or the same engineer two years later — what problem the optimization was solving, what it cost to add, what it continues to cost to maintain, and when it should be reconsidered. Without it, every optimization accumulates as permanent infrastructure that no one has the information to remove.

Why performance decisions look self-documenting but aren't

The illusion of self-documentation comes from the specificity of the implementation. A commit that adds an index on (user_id, created_at) to a posts table seems to explain itself — clearly some query was selecting posts by user sorted by date, and it was slow. But this reading leaves out: which query specifically, what the query plan showed before the index, what the latency improvement was, whether the index was the only option considered or whether a covering index was evaluated and rejected, what the write overhead of the index is at the table's current insert rate, and whether a subsequent ORM query change has made the index irrelevant. The commit message "add index on posts (user_id, created_at) for dashboard query performance" is accurate at the word level and incomplete at the decision level.

Performance decisions are also unique in that they document what they added but not what they chose not to add. The engineering team that evaluated three approaches to the slow dashboard query — adding a read replica, adding a Redis cache, and adding a covering index — and chose the index has a record of the index in the code and a record of nothing else. The two alternatives that were rejected are completely invisible. A new engineer who sees the index and wonders whether a cache would be better doesn't know that a cache was already evaluated and found to produce insufficient latency reduction for this query's access pattern. The rejected option is a decision too, and its absence from the record forces the next evaluation to start from scratch.

There is a third category of invisible performance decision: the optimization that was the right choice at the time and has since become load. The cache with a 300-second TTL that was appropriate when the user profile data changed infrequently may have become stale-data risk after the product added real-time presence features. The denormalized comment_count column that was appropriate when comments were appended occasionally has become a write bottleneck after the product added a high-volume notification system. The optimization that was correctly made under one set of product constraints continues to exist under different product constraints with no one having the information to reconsider it — because the original product constraints were never recorded as the context in which the optimization was made.

Three categories of performance decisions worth documenting

Not every performance optimization produces a record-worthy architectural decision. Adding a missing index on a foreign key column that Postgres's query planner was already screaming about is close to operational maintenance. What produces a record-worthy performance decision is a choice between multiple valid approaches, an acceptance of ongoing maintenance obligations, or an optimization that is designed to be temporary and might outlive the condition that made it necessary.

Caching decisions. A decision to add Redis, Memcached, or an in-process cache for a specific query pattern is among the most reliably underdocumented performance decisions. The cache is added when a query becomes slow enough that the product's performance requirements are not being met. The commit adds Redis, the application code adds cache reads and writes and invalidation, and the deployment pipeline adds a Redis cluster to the infrastructure. Three years later, the Redis cluster is a permanent and unquestioned piece of the infrastructure, and no one knows what query originally required it, whether that query still exists, whether the baseline condition that made caching necessary still applies, or whether the cache's staleness guarantees are still appropriate for the data it is serving. The caching ADR is the document that answers all of these questions — not by preventing the cache from being added, but by recording why it was added in a form that allows it to be reconsidered when the conditions change.

Index decisions. Database index decisions are a common source of invisible performance debt. Teams add indexes reactively, when a query is identified as slow in production monitoring, and the index is added with a commit that says "add index for performance" without recording which queries motivated it. Over time, the database accumulates indexes that were added for queries that were later changed or removed, covering columns that are no longer selected, or optimizing join patterns that have been replaced by denormalization. Each unused index carries write overhead — every insert, update, and delete on the indexed table must update the index. A table with fifteen indexes accumulated across three years of reactive optimization may have five to eight indexes that are no longer serving their original queries, contributing write overhead on every insert without providing read benefit. Finding and removing these indexes requires knowing what each index was for — information that is in the commit message only if the commit message named the specific query, which it rarely does. The new technical leader who inherits a database with fifteen indexes on the posts table has no way to evaluate which are load-bearing without reconstructing the query history from scratch.

Query optimization decisions. This category includes N+1 fixes, query restructuring, denormalization decisions, and the introduction of materialized views or summary tables. These decisions are particularly invisible because the optimization changes the code in ways that look like ordinary implementation: adding an includes() call to an ORM query, changing a SELECT * to a specific column list, adding a summary table that is updated asynchronously. The performance motivation for these changes is rarely visible from the code change alone. An N+1 fix that adds eager loading for an association looks like an ORM configuration change; the fact that the original lazy loading was producing 200 database queries per page load at 500 concurrent users, that this was causing 2-second page load times, and that eager loading reduced this to 3 queries per page load, is entirely invisible. A year later, when the product changes the page to not display the eagerly loaded association, no one knows whether the eager loading can be removed — because the performance motivation for adding it is not in the code.

The baseline problem

Performance optimization decisions are made in response to a specific observed condition: a query that takes 800ms when the application's SLA requires 200ms, a page that loads in 3 seconds at 500 concurrent users when marketing has promised sub-1-second performance, a database that shows 95% CPU utilization at peak load when the team wants to stay under 70%. The optimization is evaluated against this baseline: does it bring the metric within the acceptable range?

The baseline is the most important context for every performance decision, and it is the context most likely to be absent from the commit message. "Add Redis cache for user profile queries" says nothing about what the pre-cache latency was. "Add composite index on orders (customer_id, status, created_at)" says nothing about which query was slow or by how much. The baseline is the context that makes the optimization decision legible to someone reading it a year later — without it, the optimization is an assertion that "this was slow" with no way to verify the claim or evaluate whether the problem still exists.

Baselines also change in ways that invalidate optimizations. A cache that was necessary when the user profile query took 400ms at 1,000 concurrent users may not be necessary if the underlying database has been upgraded, if the query has been rewritten to use a more efficient join, or if the application's peak concurrent user count has actually declined from the projected number that motivated the optimization. Conversely, an optimization that was adequate when traffic was 1,000 concurrent users may be insufficient when traffic has grown to 10,000. A new engineer evaluating the performance characteristics of the application needs to know: what was the baseline when this optimization was made, and has the baseline changed enough that the optimization should be reconsidered? That question cannot be answered from the code.

The performance optimization ADR records the baseline not to be pedantic about metrics but because the baseline is what makes the revisitation condition meaningful. "Reconsider this cache if the underlying query latency drops below 50ms" is only meaningful if the record also states that the pre-cache latency was 400ms and the optimization reduced it to approximately 15ms (the cache hit path). Without the baseline, the revisitation condition is unanchored — there is no way to evaluate whether "the underlying query latency has dropped below 50ms" because the comparison is to an unrecorded starting point.

The "is this still needed?" question

The lifecycle of a performance optimization follows a pattern that other architectural decisions do not share. A database choice is made once and rarely revisited. A caching layer is added when a query becomes a bottleneck, accumulates cache invalidation logic as the data model evolves, and eventually becomes a source of complexity that exceeds the performance benefit it was added to provide — at which point it should be removed. But removing it requires answering the question: is the underlying problem still present without the cache?

Answering that question requires knowing what the underlying problem was. If the cache ADR records that the user profile query took 400ms without caching, a team evaluating cache removal can take the following steps: disable the cache in a staging environment, measure the current query latency, and compare the result to the 400ms baseline. If the query now takes 35ms without caching — because the query was rewritten in a subsequent optimization, or because the database has been upgraded, or because the access pattern has changed — the cache can be removed. If the query still takes 380ms without caching, the cache is still needed. The decision to remove the cache is as much a performance decision as the decision to add it, and it is only possible to make it correctly if the original optimization is documented with enough specificity to establish what "the problem is solved" would mean.

Without the record, the team is left with the risk-averse choice: keep the cache because removing it might cause a production performance regression, even if the underlying problem has long since been resolved. This is how performance infrastructure accumulates. The cache that solved a real problem in 2022 stays in the stack through 2026 because no one can verify whether the problem exists in 2026 without the record of what the problem was in 2022. An optimization ADR that is superseded by a removal record is as complete a decision chain as a dependency ADR superseded by a migration — the removal record explains what changed to make the optimization unnecessary, which closes the loop on why the infrastructure was added in the first place.

The "is this still needed?" question also arises at a different level: across the set of optimizations applied to a system. A team that has added six separate optimizations to a slow checkout flow over three years — an index, a cache, a denormalized subtotal column, eager loading for cart items, a read replica for inventory queries, and a background job for invoice generation — has a system where the cumulative effect of these optimizations is difficult to understand without knowing what each individual optimization was solving. The quarterly decision review applied to performance optimizations is a systematic check on this question: which optimizations exist, what each was addressing, whether the condition still holds, and whether any have become load without providing benefit.

Writing the performance optimization ADR

The Nygard ADR format applies to performance decisions with one important adjustment: the Context section must include the performance baseline, not just a description of the problem. "The user profile endpoint was slow" is not useful context. "The user profile endpoint was taking 380–450ms p99 at 800 concurrent users under the peak load observed during the Q3 marketing campaign; the application's SLA for profile reads is 150ms p99; the load test at 1,200 concurrent users — projected peak for the Q4 campaign — showed 620ms p99" is the context that allows a future reader to understand what condition motivated the decision and whether that condition still exists.

The decision-statement title convention for performance optimization ADRs should name the optimization approach and the target query or system component — not just the direction of the optimization:

Context. The performance symptom, the baseline measurement, the load condition under which the measurement was taken, and the SLA or product requirement that was not being met. This section should also name how the root cause was identified: "EXPLAIN ANALYZE showed a sequential scan on the posts table for the user dashboard query despite the existing index on user_id, because the query's ORDER BY created_at clause required a sort that the index could not satisfy" gives a future reader the ability to verify whether the root cause has been addressed by subsequent changes.

Alternatives Considered. Each optimization approach that was evaluated, with the specific reason it was not chosen. For a caching decision, the alternatives might include a covering index (evaluated and found to require index scan on large date ranges, insufficient improvement for the aggregation step), a database read replica (evaluated but deferred because the replication lag would require accepting 30s stale data on the profile page — unacceptable for the notification count field), and query restructuring (evaluated and found to require a schema change that is a larger project than the immediate performance need). For an index decision, the alternatives might include a partial index (covering only recent records — rejected because the dashboard query spans the full history for some users), a materialized view (evaluated but found to require a refresh strategy that adds operational complexity beyond the current team's operational capacity), or query refactoring to avoid the sort (rejected because the sort order is a product requirement, not a query artifact).

Decision. What was implemented, the specific performance improvement achieved, and any approximations or limitations in the solution. "Added Redis cache with 300-second TTL for user profile queries. In staging load tests at 800 concurrent users, p99 latency dropped from 420ms to 18ms (cache hit) and 380ms (cache miss). Cache hit rate at 800 concurrent users is approximately 94% after two minutes of warm-up. Cache is invalidated on user profile update (synchronous) and on role change (synchronous). Notification count is included in the cached object and will be stale by up to 300 seconds — this is accepted as the trade-off for the performance improvement."

Consequences. The ongoing obligations the optimization creates. Caching decisions create the most significant ongoing consequences and these are the ones most likely to be omitted from commit messages: "This decision adds Redis to the required infrastructure. The cache invalidation strategy requires that all code paths that update user profile data also call the cache invalidation function — any new code path that modifies profile data without calling invalidation will produce stale cache entries. The cache serialization format must remain stable across deployments; schema changes to the user profile model require updating the cache serialization before deploying. Cache misses for uncached users add one Redis round-trip to the profile query latency; at very low traffic (below approximately 50 concurrent users), the cache provides negligible benefit and adds the round-trip overhead."

Revisitation condition. Named, checkable triggers for reconsidering the optimization — this is the section that most directly prevents optimization debt accumulation: "Re-evaluate this cache if: (1) the user profile query is restructured to eliminate the aggregation join that was the root cause of the latency — in that case, measure the query latency without the cache before deciding whether caching is still necessary; (2) the notification count is moved to a separate, real-time endpoint — in that case, the 300-second staleness for notification count is no longer a known trade-off and the cached object's freshness guarantees should be re-evaluated; (3) a Redis operational incident reveals that the cache is causing more operational complexity than the performance benefit justifies — in that case, evaluate whether query optimization has advanced enough to eliminate the cache."

The index accumulation problem

Database indexes deserve special attention in the context of performance decision records because they accumulate silently. Unlike a Redis cluster, which is a visible piece of infrastructure that teams notice and question, indexes are invisible at the application layer. They exist in the database schema, added by migration scripts, often with comments that are lost as the database is provisioned and re-provisioned. A database table that was created in 2020 and has been actively developed for six years may have fifteen to twenty indexes, of which the development team can explain the purpose of perhaps eight or ten.

The indexes that cannot be explained fall into predictable categories. Some were added for queries that were later rewritten in a way that no longer uses the index. Some were added for reports that were deprecated when the business model changed. Some were added for JOIN patterns that were replaced by denormalization. Some were added proactively for queries that were anticipated but never implemented. Each of these indexes continues to add write overhead on every insert, update, and delete — a table with twenty indexes takes approximately twice as long to write to as the same table with ten indexes, because each write must maintain each index. On a table with high write volume, accumulated index debt is a material performance cost.

The index decision record solves this problem by naming the specific query the index was added for. An index ADR that records "Added composite index on orders (customer_id, status, created_at) for the open orders dashboard query in the customer service panel — the query at /api/orders?status=open&sort=created_at was taking 1.4s at tables above 200k rows" can be evaluated when the open orders dashboard query changes. If the customer service panel is redesigned and the query is changed to a different endpoint with a different filter pattern, the old index can be evaluated for removal: does the new query use this index? If not, and if no other query uses it, it is a candidate for removal. Without the record, the index cannot be attributed to any specific query, and the safe choice is to leave it in place.

Index decision records also improve the quality of index creation decisions at the time they are made. An engineer who must write a formal record of why they are adding an index — including what query it is for, what the query plan shows, what the write overhead is at the table's current insert rate, and what the specific performance improvement is — is less likely to add speculative indexes for queries that haven't been written yet, and is more likely to verify that the index actually improves the query plan before committing. Writing the ADR before finalizing the optimization is the discipline that prevents the accumulation of indexes that seemed like good ideas at the time they were added.

Finding performance deliberations in AI chat

Performance debugging sessions are among the most structured and most complete patterns in AI chat history. They follow a tightly constrained workflow: a developer presents a performance symptom (slow endpoint, high database CPU, long page load), provides the relevant code or query, and works through root cause diagnosis (query plan analysis, N+1 detection, lock contention, memory pressure) before arriving at an optimization recommendation. This structure makes performance debugging sessions highly identifiable in AI chat exports, and it makes them among the highest-quality sources of decision content — the session contains the baseline, the root cause, the alternatives, and the recommended approach in a single connected thread.

The baseline measurement is almost always in the session. A developer who is asking for help with a slow endpoint will provide the current performance numbers: "this is taking 1.2 seconds," "the Datadog graph shows p99 at 800ms," "the EXPLAIN ANALYZE output shows the sequential scan taking 600ms." This is exactly the information that the performance optimization ADR needs in its Context section, and it exists only in the AI chat session — it is not in the commit message ("add Redis cache"), not in the code change itself, and not in the ticket description (usually "performance issue on user dashboard — investigate").

The alternatives evaluation is also usually present, implicitly if not explicitly. An AI performance debugging session will typically go through multiple candidate fixes before converging on a recommendation: "an index on that column won't help because the sort is on a different column," "a cache would work but the staleness tolerance depends on how frequently the data changes," "you could denormalize the count but that adds a trigger or application-level maintenance." These elimination steps are exactly the Alternatives Considered section of the ADR, and they represent real engineering evaluation that happens nowhere else in the documentation chain.

The WhyChose extractor identifies performance debugging sessions through the performance symptom pattern — latency measurements, query performance metrics, or timing data combined with a query or endpoint context. Identification phrases include "is taking X seconds," "p99 is X ms," "EXPLAIN shows," "query plan shows," "why is this slow," and "N+1 queries." The extractor treats the first session in a performance debugging thread as the baseline-definition session and subsequent sessions as alternative-evaluation sessions, reconstructing the complete decision thread across what may have been a multi-day debugging investigation.

The sessions most valuable to find are the ones immediately after the optimization was applied — the verification session, where the developer confirms that the optimization worked: "the p99 has dropped from 800ms to 45ms," "the EXPLAIN now shows an index scan instead of a sequential scan," "the N+1 count dropped from 150 to 3." This verification session contains the post-optimization measurement, which is the data that completes the ADR: the optimization is documented not just as an intent but as a measured improvement. The complete record is: baseline (pre-optimization), reasoning (why this approach), implementation, and verification (post-optimization measurement). Most performance ADRs written without the AI chat session record the first two and omit the fourth — the verification measurement that would let a future engineer evaluate whether the optimization is still necessary by comparing current performance to both the pre-optimization baseline and the post-optimization measurement.

The N+1 fix as a decision record

N+1 query fixes deserve specific attention because they represent a category of performance optimization that is almost never documented as a decision, even though the trade-offs are significant and the reversal conditions are real.

An N+1 fix converts a lazy loading access pattern to an eager loading pattern. The original code loads a collection (orders) and then accesses an association for each item (customer name) — producing N+1 queries where N is the collection size. The fix adds eager loading (an ORM includes or join) that loads the associations in a single query. The commit says "fix N+1 in orders controller" and the optimization looks like a routine cleanup.

But the eager loading fix has trade-offs that the commit message does not capture. Eager loading fetches the associated data for every item in the collection, even if only some items' associations are accessed in a given request context. If the orders collection is filtered before display and only 10% of orders are shown to the user in a given request, the eager loading is fetching associations for 100% of the orders to display 10% of them. This is a trade-off the original lazy loading did not have — the N+1 fix has solved the "200 queries for 200 orders" problem and introduced a "fetch all customer data even for filtered orders" problem that may become significant if the orders collection grows large or if the filtering becomes more aggressive.

The N+1 fix ADR records this trade-off explicitly, which allows the decision to be revisited when the context changes. If the orders page is later redesigned to paginate at 20 orders per page, the eager loading fetches associations for 20 orders per page load — a trade-off that looks very different from the original "200 queries for 200 orders" problem. If the page is redesigned to show only a summary view where customer names are not displayed, the eager loading is fetching associations that are never used. Without the decision record, the eager loading persists as apparently correct code — the N+1 is fixed, the code looks clean — even after the context that made eager loading the right choice has changed.

What the honest performance record enables

The performance optimization decision record — the one that names the baseline, the root cause, the alternatives considered, the trade-offs accepted, and the revisitation condition — enables three things that the commit message alone does not.

It enables informed removal decisions. The team that wants to remove the Redis cache, the unused index, or the over-eager loading knows what each optimization was solving and can evaluate whether the solution is still necessary. This is the difference between optimization debt management and optimization debt accumulation. Systems that accumulate performance optimizations without recording them accumulate infrastructure complexity that cannot be safely reduced — the cache cannot be removed because no one can verify the underlying query is fast enough without it. Systems that document their optimizations can systematically evaluate whether each one is still providing value.

It enables performance regression attribution. When a system's performance regresses, the team investigating the regression needs to understand which optimizations are in place and what each is doing. A cache that has become stale-heavy due to a TTL that is too short for a new write pattern is a regression that looks like a query problem until the cache behavior is understood. An index that was added for a specific query pattern and is now being used by a different, unintended query is a query planning change that looks correct until the original intent of the index is known. The decision record tells the investigating engineer what each optimization was designed to do, which is the starting point for understanding why it may no longer be doing it correctly.

And it enables the honest conversation about performance trade-offs that teams often avoid. A cache adds infrastructure complexity, staleness risk, and invalidation surface. An index adds write overhead. Denormalization adds synchronization cost and consistency surface. Documenting these trade-offs honestly — naming the consequences that were accepted, not just the improvements that were achieved — is the same discipline as naming the real constraints in a build vs. buy decision rather than the principle-sounding reasons. The team that documents its performance decisions honestly knows what it is accepting when it adds each optimization. The team that commits "add cache" without a record knows that it is adding Redis — but not what it is accepting in exchange for the performance improvement Redis provides. The difference between these two teams becomes visible not at the moment of optimization, but in the three-year accumulation of performance infrastructure that one team can reason about and the other cannot.

Further reading