Why does caching adoption need an architecture decision record?

Caching appears to be a performance optimization — a way to make slow endpoints faster — and the adoption decision is typically made quickly when the first database performance problem appears. The mechanism choice (cache-aside, write-through, write-behind, read-through) and the invalidation strategy (TTL-based, event-driven invalidation, tag-based purge) carry consistency commitments that are invisible at the time of adoption but become the source of bugs when cached data diverges from authoritative data. The TTL policy chosen to improve read latency is simultaneously the staleness window that determines how long a user's stale billing address, expired permission, or outdated price can remain visible in production. The failure behavior chosen for 'what happens when the cache is unavailable' determines whether the application degrades gracefully or breaks entirely when the cache node restarts. None of these consequences are visible when the first endpoint is accelerated. The decision record captures the caching mechanism, the invalidation strategy, the consistency guarantee, the cache key design policy, and the failure behavior — the five decisions inside 'we use Redis' that most teams have made informally and cannot reconstruct when a new engineer asks why users sometimes see stale data after an update.

What is a cache stampede, and why does it matter for caching strategy decisions?

A cache stampede (also called thundering herd) occurs when a cached value expires and multiple concurrent requests arrive simultaneously, each finding a cache miss and each independently executing the expensive computation or database query to regenerate the cached value. For a value that is expensive to compute — a complex aggregation query, an external API call, a machine learning inference — simultaneous regeneration by dozens or hundreds of concurrent requests can saturate the database or downstream service, producing a cascading failure at the moment the cache should be protecting the system. The severity depends on traffic volume, cache TTL, and the cost of cache miss computation. Teams that never documented their cache stampede behavior discover it after a cache flush (when all cached values expire simultaneously), a cache node restart (same effect), or a coordinated TTL expiry for a set of hot keys set to the same TTL value. The two common mitigations are mutex/single-flight patterns (only one request regenerates the cached value; others wait for the result) and probabilistic early expiration (cached values are regenerated before they expire, using a randomized early-expiry check that reduces the probability of simultaneous expiry for concurrent requests). Which mitigation is in place, whether it is applied uniformly or only to specific cache keys, and what the application does while the cache is being regenerated are decisions that belong in the caching decision record — because they determine what users experience when a cache flush happens in production.

What should a caching strategy architecture decision record include?

A caching ADR needs five sections that most cache adoptions skip entirely. First, the caching mechanism and cache provider decision: the caching pattern chosen (cache-aside, write-through, write-behind, read-through, CDN edge caching), the cache backend (Redis, Memcached, Varnish, CDN), alternatives evaluated with rejection reasons, and the deployment topology (standalone, replica set, cluster — which determines failure behavior and consistency model under node failure). Second, the TTL policy and invalidation mechanism: TTL values for each cached data class with the rationale for each value, whether invalidation is TTL-only or also event-driven (explicit delete on write), the invalidation trigger for each data class, and the behavior for data classes where TTL invalidation is insufficient (profile data that must be current immediately after user update). Third, the consistency guarantee: for each cached data class, the documented consistency level (strong consistency, bounded staleness with TTL, eventual consistency), the acceptable staleness window, and the data classes that are explicitly excluded from caching because their consistency requirements cannot be met by the chosen mechanism. Fourth, the cache key design and namespace policy: key naming conventions, namespace isolation between environments, key encoding for composite keys (user_id + resource_type + resource_id), TTL-per-key-type policy, and the maximum key length constraint. Fifth, the failure behavior and fallback policy: what the application does when the cache is unavailable (fail open to the database, fail closed with a 503, serve stale data from a local in-process cache), the circuit breaker configuration if one exists, the cache stampede mitigation strategy, and the cache warm-up procedure after a node restart or flush.

2026-06-18 · ~18 min read

The caching strategy decision record: why the cache invalidation approach you chose shapes your consistency guarantees and the classes of bugs your users experience in production

Q: How do caching decisions appear in AI chat history?

Caching decisions surface in four session types. Initial adoption sessions contain the mechanism selection: 'Redis vs. Memcached for Django caching', 'how to add Redis caching to a Node.js API', 'cache-aside vs. read-through caching — which is better?', 'should we use database query result caching or object caching?', 'how to set Redis TTL for user session data.' These sessions hold the cache provider choice, the mechanism, and the initial TTL reasoning. Staleness incident sessions contain the invalidation mechanism decision: 'user updated their profile but still sees old data', 'how to invalidate Redis cache after a database write', 'why does my cache not reflect the latest database update?', 'Redis cache invalidation pattern for cache-aside', 'how to delete a Redis key when the underlying data changes.' These sessions reveal when the team first encountered the staleness problem and what invalidation strategy they applied. Cache stampede sessions contain the thundering herd discovery: 'Redis cache stampede — how to prevent thundering herd', 'how to use mutex for Redis cache regeneration', 'many requests hitting the database when cache expires at the same time', 'probabilistic cache expiration to prevent thundering herd', 'cache warming after Redis restart.' These sessions emerge after a cache flush or coordinated TTL expiry produces a database spike. Cache failure sessions contain the fallback behavior decision: 'what happens when Redis is down', 'application performance without Redis cache', 'how to implement cache fallback to database', 'Redis connection pool exhausted — application failing', 'how to handle cache unavailability without breaking the application.'

Caching adoption is treated as a performance optimization, not an architecture decision. The mechanism is chosen quickly when the first slow endpoint appears and rarely documented. Two years later, the TTL policy determines the staleness window for every cached resource, the invalidation strategy determines which writes propagate immediately versus after a delay, the cache stampede behavior determines what users experience when the cache is flushed, and the failure mode determines whether the application degrades gracefully or breaks entirely when the cache node restarts. None of this was visible when the first endpoint was accelerated. None of it is written down.

A user updates their billing address. The update succeeds — the database write completes, the response returns HTTP 200, the UI shows a success message. Four hours later, the user's invoice arrives with the old address. The support ticket arrives the same day. The on-call engineer traces the billing invoice generation to a user profile lookup, which hits the cache, which returns the profile as it existed at the time of the last cache population. The profile was cached with a 24-hour TTL. The cache was not invalidated on the address write. The billing service read a six-hour-old profile and generated the invoice with the previous address.

The fix is straightforward: add a cache invalidation call to the address update endpoint. The issue is that "add a cache invalidation call" requires knowing which cache keys encode the user's profile — and the profile is cached under three different key patterns depending on which service populated the cache. The billing service caches user:{id}:profile. The API gateway caches the full user object under session:{token}:user. The frontend GraphQL layer caches the profile query result under a key derived from the query hash. Invalidating "the user's profile" requires invalidating three different cache keys across two different cache stores, and the mapping between "user changed their address" and "these keys must be deleted" was never documented.

Like most foundational infrastructure decisions, the caching mechanism is visible as a fact — the application uses Redis, the TTL is 24 hours, the keys follow a naming pattern — but invisible as a decision. The fact answers "what is true now?" The decision record answers "what consistency commitment was made when the TTL was set to 24 hours, what data classes are excluded from caching because their consistency requirements cannot be met, and what the invalidation policy is for data that changes outside the primary write path." Without the record, the stale address bug is a surprise rather than a documented consequence of a known policy gap.

What "we use caching" means across five patterns

The first decision inside "we use Redis" is the caching mechanism — the architectural pattern by which cached data is populated, validated against the authoritative source, and invalidated when the authoritative data changes. The mechanism choice is often made by whoever is blocked on a slow endpoint, driven by the first tutorial that appears in a search, and carries specific consistency, latency, and invalidation model commitments that determine the behavior of every cached data class that follows.

Cache-aside (lazy loading) is the most common pattern and the one most often adopted by default. The application checks the cache before reading from the authoritative source (database, external API). On a cache hit, the cached value is returned without touching the authoritative source. On a cache miss, the application reads from the authoritative source, writes the result to the cache with a TTL, and returns the value. The application owns both the cache read and the cache population logic. Cache and authoritative source are not automatically synchronized — divergence accumulates until the cached value expires. The write path is decoupled from the cache: an update to the database does not automatically invalidate the corresponding cache key unless the application explicitly adds an invalidation call to the write path. Teams that add cache-aside without adding invalidation-on-write accept eventual consistency at the TTL boundary as the default behavior for all cached data, often without naming this as the consistency model they have chosen.

The write path decoupling is where staleness bugs originate. A user's account tier is cached under cache-aside with a 6-hour TTL. The user's subscription expires at 11am. The subscription expiry is processed by a background job that writes to the database at 11am and does not touch the cache. The user attempts to use a Pro-tier feature at 1pm — the cache still shows Pro tier, the feature gate reads from the cache, the user accesses a feature they should not have. At 5pm, when the 6-hour TTL expires, the cache is re-populated from the database, and the user loses access. The 6-hour window of incorrect access is a direct consequence of the TTL policy and the absence of invalidation on write — both decisions that were made (implicitly) at cache adoption and that determine the consistency guarantee for account tier data.

Write-through caching updates the cache synchronously on every write to the authoritative source. When the application writes to the database, it also writes the new value to the cache before returning the response to the caller. The cache is always at most one write behind the database — in practice, it is up to date with every committed write, because the cache is updated within the same write transaction or immediately after. The consistency guarantee is strong: a read that follows a write always sees the most recent data, as long as the read goes through the cache. The write latency cost is the cache write added to the write path — typically 1–5ms for a local Redis instance, which is negligible for most write patterns. The cold start problem is more significant: a freshly provisioned cache contains no data, so every read is a cache miss until the cache warms up through writes. Write-through caching without a separate cache-warming mechanism produces a degraded read latency period after a cache flush or node restart, because the cache only populates through writes, not reads.

Write-behind (write-back) caching acknowledges writes to the cache immediately and persists to the authoritative source asynchronously. The application writes to the cache, the cache acknowledges the write, and the response is returned to the caller — the database write happens later, in a background process. Write latency is minimized because the slow operation (the database write) is moved off the critical path. The risk is durability: if the cache node fails or is flushed between the cache write acknowledgment and the database write completion, the write is lost. The application told the user "your update succeeded" — and it is silently gone. Write-behind is correct for write-heavy workloads where write latency is the primary constraint and some data loss under cache failure is acceptable (analytics counters, view counts, non-critical preference updates). It is incorrect for any data where the user expectation is that "your update succeeded" means "your update is durable." Most teams that adopted write-behind for a specific high-write use case discover that the pattern has been applied to other data classes that require durability — when the cache node fails, they lose data that users assumed was persisted.

Read-through caching moves the cache miss handling from the application into the caching layer itself. When the application reads through a cache miss, the cache layer automatically calls a configured data loader to fetch from the authoritative source, stores the result, and returns it — the application receives the result without needing separate cache population logic. The application always reads from the cache; the cache is responsible for cache miss resolution. This pattern requires a cache provider that supports read-through configuration (some Redis client libraries, Ehcache, Hazelcast) and a data loader implementation that the cache layer can call. The benefit is consistent cache population logic — the application does not contain multiple code paths that each populate the cache differently, which is the source of the multi-key-pattern problem in the billing address scenario above. The constraint is that the cache layer must be able to call the data loader with the same dependencies and context as the application — authorization context, tenant identifier, database connection — which is straightforward for simple data loaders and complex for data that requires application-layer business logic to assemble.

CDN and edge caching caches HTTP responses at network edges — CDN nodes geographically distributed close to users — without requiring application code changes, because the caching behavior is controlled by Cache-Control response headers. An origin server that responds with Cache-Control: public, max-age=3600 instructs CDN nodes to cache the response for one hour and serve subsequent requests from the edge without hitting the origin. The latency reduction for static and semi-static content (product catalog pages, pricing pages, blog posts, API responses for data that changes infrequently) is significant: CDN edge latency is typically 5–20ms vs. 100–300ms for an origin request from a distant user. The invalidation model is where CDN caching decisions have the greatest architectural consequence. TTL expiry requires waiting for the cache to expire before updated content is served — a one-hour TTL means users may see stale content for up to one hour after an update. Cache purge APIs (Cloudflare, Fastly, AWS CloudFront) allow the application to explicitly invalidate specific URLs or patterns — but the purge call must be wired into the write path, and the mapping from "this record changed" to "these URLs must be purged" must be maintained. Surrogate key invalidation (Fastly Surrogate-Key, Cloudflare Cache-Tag) allows tagging cached responses with entity identifiers and purging by tag — when a blog post is updated, the application emits a purge for the post's cache tag, which invalidates all edge-cached responses that include that post's content, including listing pages and related-content blocks. The surrogate key model requires that the application track which cache tags are emitted for each response, and the tag design is a cache architecture decision that determines the granularity at which content can be invalidated.

The invalidation mechanism decision

The invalidation mechanism is the second decision inside caching strategy, and it is the one most frequently left undocumented because it feels like an implementation detail rather than an architecture decision. The mechanism chosen at adoption determines the consistency boundary for every cached data class — how long a write to the authoritative source takes to propagate to cache readers, and which events trigger cache invalidation versus leaving the cache to expire naturally.

TTL-based invalidation is the default for most cache-aside implementations. The cached value is stored with a fixed expiration time; at expiry, the next read produces a cache miss and the cache is re-populated from the authoritative source. The consistency guarantee is bounded staleness: a cached value is at most TTL seconds out of date. The TTL is typically chosen to balance cache hit rate (longer TTL = higher hit rate = less database load) against staleness tolerance (shorter TTL = more current data = more database load). The choice is made at cache adoption under the conditions that exist at that time: a team with a small database and a medium-latency read path might choose a 60-second TTL; the same team three years later with a high-traffic API and a slower database might never revisit the TTL because it was never documented as a policy decision. Like retention policies that determine how long historical data is queryable, TTL values set once at adoption become the operative policy until a staleness bug forces a revisit — and without documentation, each revisit rediscovers the TTL from first principles rather than building on prior reasoning.

Event-driven invalidation supplements or replaces TTL expiry by explicitly deleting or updating cached keys when the authoritative data changes. When a user updates their profile, the write path calls cache.del("user:{id}:profile") before or after the database write. The consistency guarantee improves to near-real-time: a read that follows a write sees updated data as soon as the invalidation completes. The complexity cost is the requirement to maintain the mapping from "write event" to "cache keys to invalidate" — a mapping that must be updated every time a new cache key pattern is introduced and every time a new write path is added. The mapping omission is the root cause of most cache staleness bugs: a new write path is added (background job, API endpoint, admin interface, batch import) that modifies data without calling the corresponding cache invalidation, and the cache serves stale data until the TTL expires. Like error handling decisions that determine which failure modes the application surfaces versus silently absorbs, the invalidation mapping decision determines which writes propagate immediately versus which silently diverge from the cache until TTL expiry.

Tag-based invalidation (available in HTTP caching via Fastly Surrogate-Key or Cloudflare Cache-Tag, and in application-layer caching via libraries like cache-tags for Redis) allows grouping cached values under named tags and invalidating all values associated with a tag in a single operation. A product listing page might be tagged with product-catalog and category:{id}. When any product in the catalog is updated, the application purges the product-catalog tag, invalidating all listing pages simultaneously. When a specific category is updated, the application purges category:{id}, invalidating only the pages for that category. The tag design is the critical decision: coarse tags (purging product-catalog on every product update) produce large invalidation scope and reduce cache efficiency; fine-grained tags (purging product:{id} on update) require tracking which cached responses include which products, which is complex to maintain for responses that include multiple products. The tag design decision is rarely documented because it is made incrementally — each new cache tag is added when needed — and the aggregate policy (which data class maps to which tags, what the invalidation granularity is for each class) is never stated as a coherent decision.

The cache stampede and cold start problem

The cache stampede — also called thundering herd — occurs when a high-traffic cached value expires and concurrent requests simultaneously find a cache miss. Each request independently executes the cache miss handler: the database query, the external API call, the complex aggregation. For a value that is cheap to compute (a simple database row lookup), the stampede produces duplicate reads with negligible impact. For a value that is expensive to compute (a JOIN across three tables aggregating 10,000 rows, an external API call with a 200ms latency, a machine learning model inference), the stampede saturates the database or downstream service with duplicate in-flight computations, each of which will produce the same result.

The severity multiplier is traffic volume at the moment of expiry. A cached value with a 60-second TTL that receives 1,000 requests per minute produces a stampede of roughly 17 simultaneous cache miss handlers at expiry — the number of requests that arrive in the window between the expiry and the first re-population. A value receiving 10,000 requests per minute produces a stampede of roughly 170 simultaneous handlers. Teams that cache high-traffic values with fixed TTLs and never documented the stampede behavior encounter it after a cache flush (which synchronously expires all keys, producing simultaneous stampedes for all hot values), a cache node restart (same effect), or a coordinated TTL expiry for a set of keys set to the same absolute TTL value at the same time (a common consequence of cache warming scripts that populate a batch of keys simultaneously).

The two standard mitigations have different operational tradeoffs. The mutex or single-flight pattern allows only one request to execute the cache miss handler per key; other concurrent requests wait for the first to complete and then read the newly populated cache value. The consistency guarantee is strong — only one computation runs — but the waiting requests experience latency equal to the computation time of the miss handler, which is typically the slow operation the cache was added to avoid. For a 500ms database query, all requests that arrive during the stampede window wait up to 500ms for the single-flight computation to complete. The probabilistic early expiration pattern (also called "jitter TTL" or "random early expiration") extends the effective TTL by a random amount proportional to the remaining TTL, causing individual cache clients to probabilistically regenerate the cached value before it expires rather than all at once. A value with a 60-second TTL might be regenerated by one request at 58 seconds, another at 59 seconds, and the third at 60 seconds — spreading the regeneration load rather than concentrating it at expiry. Like performance optimization decisions that determine latency behavior under load, the stampede mitigation decision determines user-visible latency at the moment the system is under its highest pressure — after a cache flush, which often happens during or shortly after a deployment.

The cold start problem is the related constraint that applies after a cache node restart or a full cache flush. A write-through cache is populated through writes — after a cold start, the cache is empty and every read is a cache miss until the cache accumulates data through production writes. For a cache-aside implementation, every read is a cache miss until the hot data set has been read at least once. For high-traffic applications, the transition from a cold cache to a warm cache can last minutes during which the database handles a traffic load it was not provisioned to handle independently. Cache warming procedures — pre-populating the cache with hot data after a restart before routing traffic to the application — are the standard mitigation, but the warming procedure is rarely documented at cache adoption time. It is typically developed reactively after the first cold start incident reveals that the application cannot handle its production traffic load without the cache. Like the feature flag bootstrap behavior that determines what the application returns before the SDK has synced its configuration, the cache cold start behavior determines what the application does in the window between startup and full cache warmth — a constraint that is only visible when the cache is not warm.

The consistency guarantee and the bugs it produces

The most consequential undocumented aspect of caching strategy is the consistency guarantee — the formal statement of what level of data currency the application commits to for each cached data class. Without a documented consistency guarantee, every cached data class implicitly commits to whatever the TTL policy produces: bounded staleness at the TTL boundary. Most teams never named this commitment, never evaluated which data classes can tolerate it, and never excluded data classes whose consistency requirements exceed it.

The stale permission bug is the most common high-severity consequence. Account permissions, subscription tier, and feature entitlements are frequently cached for read performance — permission checks happen on every authenticated request, and reading from the database on every check adds a database query per request at the volume of authenticated traffic. A cache with a 30-minute TTL for permission data means that a permission change — subscription upgrade, subscription expiry, role assignment, role revocation — takes up to 30 minutes to propagate to the application layer. The specific failure modes: a user whose subscription is cancelled should lose access to Pro features immediately; with a 30-minute TTL they retain access for up to 30 minutes, during which they can export data, add team members, or take actions that the subscription cancellation was intended to prevent. A user who is granted admin role by an administrator should have admin access immediately; with a 30-minute TTL they use the application as a regular user for up to 30 minutes while their admin session is pending. Neither failure mode was visible when the 30-minute TTL was chosen to reduce database load — the TTL was evaluated for its cache hit rate impact, not for its consistency implications for permission data.

The stale price display bug produces financial and trust consequences. A product catalog's pricing is cached for read performance — pricing pages receive high traffic and pricing data changes infrequently. A TTL of 24 hours is chosen. A pricing change is made: the Pro plan price changes from $9/month to $12/month at the start of a new pricing tier rollout. The database is updated. For the next 24 hours (on average 12 hours for users who hit a cache whose remaining TTL is uniformly distributed), users see the $9/month price on the pricing page. Users who sign up during this window pay $9/month and expect to continue paying $9/month — the price they saw when they signed up. Like API versioning decisions that determine which clients are affected by breaking changes, the TTL policy determines which users see updated pricing and which see stale pricing — and the answer is determined by the clock at which their cache entry was populated, not by any intentional policy.

The checkout race condition is a specific version of stale price data where the consistency window creates a financial liability. A user adds items to their cart, sees prices from the cached catalog, and proceeds to checkout. The cart total is computed from cached prices. The actual prices at checkout time are read from the database. If a price increased between the time the cart was populated from the cache and the time the checkout total was computed from the database, the user sees a different total at checkout than in the cart. If a price decreased, the user overpays and expects a refund. The checkout race condition is a direct consequence of using cached prices for display and authoritative prices for payment computation — an inconsistency that is built into the architecture by the combination of cache-aside on the display path and direct database reads on the payment path. Whether this inconsistency is acceptable (show stale prices in cart, always use authoritative prices at payment) or not (invalidate product caches on any price change) is an architecture decision that should be documented, not discovered through customer support tickets.

The cache coherence problem across multiple cache stores is the hardest version of the consistency challenge. When the same logical data is cached in multiple stores — the CDN edge cache, the application-layer Redis cache, and the in-process memory cache of a long-running application server — a single write to the authoritative source must invalidate all three caches to achieve consistency. Invalidating only the Redis cache leaves the CDN edge cache and the in-process memory cache serving stale data. The propagation order matters: invalidating the CDN before the Redis cache produces a window where CDN-served content is fresh but Redis-served content is stale; invalidating Redis before the CDN produces the opposite. Multi-level cache invalidation is an architecture problem that is invisible when each cache layer is adopted independently and first becomes visible when a write fails to propagate to one of the layers and a user reports seeing different data depending on whether their request hits an edge node or the origin. Like the service mesh observability constraint where trace context propagation requires explicit application-layer participation at every service boundary, multi-level cache invalidation requires explicit application-layer coordination at every write path — a coordination requirement that must be documented as a policy, not discovered through inconsistency incidents.

Writing the caching strategy decision record

The Nygard ADR format adapts for caching decisions with five sections that most cache adoptions leave entirely undocumented.

The caching mechanism and cache provider decision. Name the caching pattern, the cache backend, and the alternatives evaluated with rejection reasons. "We evaluated three approaches in January 2025: Memcached (simpler data model — key-value only, no data structures; horizontal scaling via consistent hashing; no persistence; evaluated for session caching use case only and rejected because we also needed sorted sets for the activity feed ranking), Redis Cluster (distributed Redis with automatic sharding across 6 nodes; supports all Redis data structures; higher operational complexity; evaluated and rejected for the initial deployment — we have one primary application without a write volume that justifies cluster overhead), and Redis Standalone with read replicas (single write primary with 2 read replicas via Redis Sentinel; supports all data structures; automatic failover via Sentinel; our current database cluster already uses a primary-replica pattern we operate confidently). Redis Standalone with Sentinel was selected. Caching pattern: cache-aside for application-layer caching (application checks Redis before reading from Postgres, populates Redis on miss). Write-through is not used — write path is responsible for explicit cache invalidation (see invalidation section). CDN caching: Cloudflare, controlled via Cache-Control response headers, with Cache-Tag headers for surrogate key invalidation on content that requires purge-on-update."

The TTL policy and invalidation mechanism. Name TTL values by data class and the invalidation strategy for each. "Data classes and TTL policies: (1) User profile (name, email, preferences — not account tier or permissions): TTL 300 seconds (5 minutes); invalidation: explicit cache delete on any profile write, including writes by background jobs and admin operations. The 5-minute TTL is the fallback for missed invalidations; the explicit delete is the primary invalidation path. (2) Account tier and permissions: NOT CACHED. Permission data (account tier, feature entitlements, team membership, role assignments) is always read from Postgres. The read cost (one database query per authenticated request on the permission-checked path) is acceptable at current traffic volume. The consistency requirement for permission data (immediate propagation of subscription changes, immediate reflection of role assignments) cannot be met by any TTL policy shorter than the acceptable staleness window for permissions, which is zero seconds. If permission data is added to the cache in the future, it requires event-driven invalidation wired to every write path that modifies permissions, and this decision record must be updated before the cache key is added. (3) Product catalog (names, descriptions, feature lists — not prices): TTL 3600 seconds (1 hour); invalidation: cache tag purge via Cloudflare Cache-Tag API on any catalog update. The one-hour TTL is the CDN edge TTL; the application layer reads product catalog from the database. (4) Product prices: NOT CACHED at application layer. Prices are always read from Postgres. Prices shown on the pricing page are subject to CDN TTL (see CDN policy). If the pricing page is updated, the Cloudflare cache tag for pricing content must be explicitly purged. (5) User-generated content (comments, posts, decision records): TTL 60 seconds; invalidation: explicit delete on content write. The 60-second TTL addresses the case where invalidation is missed and ensures content is not stale for more than one minute. Invalidation must be called from the write path before returning the success response to the user. (6) Aggregated counts (total decisions, team member count): TTL 30 seconds; no explicit invalidation (the count changes frequently and the 30-second bounded staleness is acceptable — displaying a count that is 25 seconds out of date does not affect correctness)."

The consistency guarantee. State explicitly what level of consistency each cached data class provides. "Consistency levels by data class: User profile — near-real-time via explicit invalidation, with 5-minute bounded staleness fallback. Account tier and permissions — strong consistency (always read from database, no caching). Product catalog — eventually consistent with 1-hour TTL, near-real-time for explicit Cloudflare purges. Product prices — strong consistency at application layer (always database); eventually consistent at CDN edge for pricing page HTML (Cache-Control max-age=3600, Cloudflare Cache-Tag purge on price change). User-generated content — near-real-time via explicit invalidation, with 60-second bounded staleness fallback. Data classes not listed: if a data class is not listed, it must not be added to the cache without a documented TTL and invalidation policy and a review of the consistency requirement by the data owner. The consistency guarantee for any cached data class is determined by the TTL and invalidation policy in this section, not by the correctness assumptions in the consuming code. If a service assumes strong consistency for data it reads from the cache, and the cached data class has a non-zero TTL with TTL-only invalidation, the service's assumption is incorrect and will produce incorrect behavior during the staleness window."

The cache key design and namespace policy. Name key conventions, namespace isolation, and composite key encoding. "Key naming convention: {namespace}:{entity_type}:{entity_id}[:{sub_resource}]. Examples: wc:user:8f2a9b3c:profile (user profile for user ID 8f2a9b3c in the 'wc' namespace), wc:decision:{id}:full (full decision record), wc:team:{id}:members (team member list). Namespace: all application keys use the 'wc' prefix. Test and staging environments use 'wc-test' and 'wc-stg'. Do not use numeric IDs as raw key components — encode with entity type prefix to prevent key collisions between different entity types that happen to share ID values. Maximum key length: 512 bytes (Redis limit). Composite keys for queries that return collections (e.g., 'all decisions for user X in team Y') must include all query parameters that affect the result, including sort order and pagination parameters if the result set is paginated. Collections must be invalidated when any member entity changes — the collection cache key must be deleted when a member is added, removed, or modified. This is the primary source of staleness bugs for collection caches: a new member is added, the individual member's cache key is invalidated, but the collection key that includes that member is not invalidated and serves a stale list."

The failure behavior and cache stampede policy. Name what happens when the cache is unavailable and how stampedes are mitigated. "Cache unavailability: the application treats Redis as a performance optimization, not a required dependency. If Redis is unavailable (connection timeout, connection refused, command timeout), the application falls through to the database for all reads. Cache writes are best-effort: if the Redis write fails, the application logs the failure at WARN level and returns the database result without caching it. The application must not return an error to the user because Redis is unavailable. Circuit breaker: after 10 consecutive Redis failures in a 10-second window, the Redis client opens a circuit breaker and all cache reads return 'cache unavailable' immediately without attempting a Redis connection. The circuit resets after 30 seconds. This prevents Redis connection pool exhaustion from cascading into application thread exhaustion during a Redis outage. Cache stampede: the application uses the single-flight pattern (singleflight library / equivalent) for cache miss handlers that execute database queries costing more than 50ms at p99. Single-flight deduplicates concurrent cache miss executions for the same key — only one database query runs, other concurrent requests wait and receive the shared result. Single-flight is not applied to cache misses for cheap database queries (p99 under 50ms) — the overhead of the deduplication mechanism exceeds the benefit for fast queries. Cache warm-up: after a Redis node restart or a cache flush, the application does not pre-warm the cache. The application relies on the database to handle the increased read load during the warm-up period (typically 5–15 minutes for most traffic patterns). If the database cannot sustain production read load without the cache warm — confirmed through load testing — a warm-up script that pre-populates hot user profile keys and hot product catalog keys must be added to the deployment runbook and this section updated with the warm-up procedure."

Finding caching decisions in AI chat

The WhyChose extractor surfaces caching decisions from four session types that contain the reasoning most teams cannot reconstruct when a new engineer asks why users sometimes see stale data after an update, or when a cache stampede incident prompts someone to ask whether the cache miss handler is protected against thundering herd.

The initial adoption session. "Redis vs. Memcached for a Node.js API — which should we use?", "how to add caching to a Django REST framework API", "best practices for caching database query results in Python", "cache-aside pattern vs. read-through — when to use which?", "how to set Redis TTL for user session data vs. product data", "should we cache database query results or full API responses?" These sessions contain the cache provider choice, the caching pattern, and the initial TTL reasoning. The adoption session is the most important to recover because the mechanism chosen at adoption carries all the downstream consistency commitments — and the rejection reasons for Memcached, for write-through, for database-level query caching are why the chosen mechanism cannot simply be replaced when a consistency bug reveals its limitations. A team that rejected write-through because of write latency concerns has a documented reason to revisit that tradeoff if write latency tolerance changes; without the record, the decision to use cache-aside is just an observed fact, not a reasoned position that can be updated.

The staleness incident session. "User updated their profile but still sees old data after page refresh", "how to invalidate Redis cache after a database write in Python", "Redis cache not reflecting updated database values", "cache invalidation pattern for cache-aside — when to delete vs. update the cached value", "user sees expired subscription features — permissions are cached with wrong data", "how to make cache invalidation happen immediately when a record changes." These sessions reveal when the team first encountered the staleness problem and what invalidation strategy they applied. The session that asks "user sees expired subscription features" is the permission caching bug being diagnosed in real time — it contains the diagnosis (permissions cached without invalidation on subscription change), the fix applied (add explicit cache delete to subscription expiry handler), and the scope of the problem (all other permission-modifying write paths that also lack invalidation). For platform teams, recovering staleness incident sessions from individual service teams identifies which write paths lack cache invalidation — the map of consistency gaps that a platform-level invalidation standard would address.

The stampede session. "Redis cache stampede — how to prevent thundering herd in production", "multiple requests hitting the database simultaneously when cache key expires", "how to use mutex for Redis cache regeneration in Node.js", "probabilistic early cache expiration to prevent thundering herd", "singleflight pattern for cache miss deduplication in Go", "cache warming after Redis restart — application slow after Redis node failure", "how to prevent all cache keys from expiring at the same time." These sessions emerge after a cache flush or coordinated TTL expiry incident. Like performance debugging sessions that reveal system behavior under production load, the stampede incident session contains the actual database saturation metrics, the specific hot keys that produced the stampede, and the fix applied. Recovering this session produces the stampede mitigation section of the decision record without requiring the team to reconstruct the mitigation from first principles after the second stampede incident.

The cache failure session. "What happens to the application when Redis goes down", "how to make my application continue working without the cache", "Redis connection pool exhausted — application throwing errors on cache reads", "how to implement graceful degradation when cache is unavailable", "circuit breaker for Redis in a Python application", "application performance without Redis — do we need to scale the database?", "how to handle Redis timeout without failing the user request." These sessions contain the failure behavior decision — whether the application fails open to the database, implements a circuit breaker, or returns an error when the cache is unavailable. A technical leader who inherits a system where Redis is an undocumented hard dependency — where Redis unavailability produces application failures because the code treats Redis as required infrastructure rather than optional acceleration — cannot assess the failure blast radius without reading the cache client code in every service. The failure session from the first Redis outage contains the decision about whether Redis should be a hard or soft dependency, and the fix applied (fail open vs. circuit breaker vs. error return) is the failure behavior policy that new services should follow.

What the decision record prevents

A documented caching strategy prevents three recurring problems that teams encounter as their cache usage grows and their engineering team turns over.

It prevents the undocumented consistency commitment. A team that caches account permissions with a 30-minute TTL has implicitly committed to 30-minute eventual consistency for permission changes — but without documenting this commitment, no engineer can confirm whether 30-minute staleness is acceptable for permission data, and no product decision that modifies how permissions are enforced can account for the staleness window. The decision record that names "account tier and permissions: NOT CACHED — strong consistency required" converts the implicit policy into an explicit constraint that new services can rely on. A new service that reads permissions from the cache, believing them to be current, builds on a false premise; a new service that reads permissions from the database, knowing that permission data is explicitly excluded from caching, builds on a documented guarantee. Like security decisions that determine which threat models are in scope, the consistency guarantee determines which data integrity assumptions are valid for downstream services — and it must be documented to be reliable.

It prevents the write-path invalidation gap. The most common source of staleness bugs is a new write path added after the cache was deployed that modifies cached data without adding the corresponding cache invalidation. An admin interface added six months after launch that can modify user profiles without calling the profile cache invalidation. A batch import job that updates product descriptions without purging the catalog cache. A background job that processes subscription cancellations without invalidating the permission cache. Each new write path that modifies cached data must be paired with the cache invalidation for the modified data class — and without a documented policy that makes this pairing explicit, each new write path represents a potential staleness bug. The decision record that lists the invalidation requirement for each cached data class converts the pairing requirement from an implicit engineering convention into an explicit policy that code reviewers can verify. Like ADR lifecycle policies that define when a decision requires revisitation, the invalidation policy is only reliable if it is documented as a requirement that applies to all future write paths, not just the ones that exist at the time of adoption.

It prevents the cascade failure after a cache flush. A team that deploys without knowing their application cannot sustain production database load without the cache discovers this during the post-deploy cache warm-up period, when the database experiences a sudden increase in read load that saturates its connection pool and produces cascading timeouts. The decision record that documents the cache cold start behavior — whether the application can sustain production load from a cold cache, and if not, what the warm-up procedure is — converts the cache flush from an unpredictable production event into a documented operational procedure with a known recovery time. Like the logging infrastructure decisions that determine whether the on-call engineer can answer incident questions at 3am, the cache failure behavior documented in the decision record determines whether the on-call engineer has a procedure to follow when Redis restarts or whether they are diagnosing the application's cache dependency from first principles during a production incident.