The event sourcing decision record: why the append-only event log you chose determines your temporal query capability and your event schema migration cost
Event sourcing is decided in the founding sprint when an audit trail requirement appears — and never documented as a deliberate architecture choice with temporal query requirements evaluated, projection rebuild strategy planned, or event schema migration policy defined. The append-only log determines whether "what was the account balance at noon on March 15?" is a native query or a reconstruction from backup; it also determines whether migrating an event type requires two hours of careful upcasting or a production incident caused by handler mismatch on events written eighteen months ago. Both properties are set when the founding engineer writes the first INSERT INTO events row — and neither appears in the audit trail session that triggered the pattern.
A 14-person compliance SaaS team chose event sourcing for their financial workflow product. The founding engineer had read Martin Fowler's event sourcing articles and Greg Young's CQRS documentation before starting. The immutable append-only event log was the correct model for a product whose primary value proposition was a complete, tamper-evident audit trail for regulatory review. Auditors needed to reconstruct the exact sequence of decisions that led to any workflow state. CRUD mutable tables could not answer that question natively without a separate audit table, and a separate audit table was a second system of record that could diverge from the primary. Event sourcing made the event log the single system of record; the mutable state was derived data, always reconstructible from the events.
The first two years validated the choice. The system handled financial workflows for mid-market companies, each with distinct approval chains, escalation rules, and status transitions. The event model was expressive: WorkflowSubmitted, ReviewerAssigned, ReviewerApproved, EscalationTriggered, WorkflowRejected, WorkflowCompleted. Regulatory auditors could request the full event history for any workflow and receive a timestamped narrative of every state transition, every reviewer action, and every system-triggered escalation. One customer's legal team explicitly praised the audit export format in a renewal call. The founding engineer's instinct had been correct for the core use case.
The problem emerged in the third year, as the product expanded and the event types accumulated. The team had now defined 47 distinct event types. Twelve of those event types had been through at least one schema revision — some through three. The ReviewerAssigned event had been revised twice: first to add a reviewerRole field (added in month 8), then to replace the reviewerId string field with a structured reviewer object containing id, name, and department (added in month 20, when the system integrated with an HR directory). Events written before month 8 had no reviewerRole field. Events written between month 8 and month 20 had a reviewerId string. Events written after month 20 had a reviewer object.
The upcaster — the code that converted old events to the current schema before the handler processed them — was a 740-line file called event-upcasters.ts. Each event type with version history had a chain of upcasting functions: upcastReviewerAssignedV1toV2 and upcastReviewerAssignedV2toV3. Events loaded from the store passed through the applicable chain before reaching the handler. The upcasters were correct. The system worked. But every engineer who touched the event handler for ReviewerAssigned had to understand three event shapes simultaneously — the shape the handler expected, the shape stored in the event log for events written before month 8, and the shape stored for events written between month 8 and month 20. The upcasters were the translation layer. When a new engineer joined and asked why upcastReviewerAssignedV1toV2 existed, the answer was in a Slack thread from month 8, partially in the PR that introduced the v2 event, and partially in the institutional memory of the engineer who had written both the original event type and the upcaster. The upcaster file was the highest-friction file in the codebase — touched most frequently by bugs, avoided most frequently during refactors.
A new engineer joined in month 32. Their first task was adding a new WorkflowDelegated event type. Straightforward — they defined the event, registered the handler, wrote the upcasting stub (no upcasting needed for a new event type), and shipped. Their second task was adding a delegationChain field to an existing projection so the UI could show the delegation tree. This required the projection to read from ReviewerAssigned events and WorkflowDelegated events together. They modified the projection handler. Then they rebuilt the projection — necessary to backfill the delegationChain for historical workflows. The event log contained 4.1 million events at this point. The projection rebuild took eleven hours. During those eleven hours the UI delegation view showed a loading spinner. The team's customer success manager received four support tickets about the delegation view being broken. The eleven-hour rebuild had not been estimated, planned for, or communicated to customers ahead of time. It was the first time the team had rebuilt a projection on a log of this size, and the rebuild time was not a number anyone had thought to calculate before initiating it.
A 9-person marketplace startup adopted event sourcing fourteen months in, after a blog post convinced the team that "append-only events are more correct than mutable state." Their product was a peer-to-peer rental marketplace: users listed items, other users made bookings, bookings went through a request/confirm/complete lifecycle. The founding engineer had built the system on PostgreSQL with a standard CRUD model. The data model was clean: a listings table, a bookings table with a status column, a reviews table. The status column tracked the booking lifecycle with an enum.
The trigger for adopting event sourcing was a product manager request: "can we replay the exact state of any booking at any point in time for dispute resolution?" The CRUD model could not answer this without a complete change history. The team adopted event sourcing for the bookings aggregate: BookingRequested, BookingConfirmed, BookingCancelled, BookingCompleted, DisputeOpened, DisputeResolved. The bookings table became a projection populated by the event stream. The engineer who implemented the migration was pleased with how clean it felt: the event log was the source of truth, the projection was derived data, and the dispute resolution team could now replay any booking's full history.
Six months later the data team requested a new analytics view: "show me all bookings where the renter churned within 30 days of completing their first booking." This required correlating BookingCompleted events across the same renter's entire booking history — the first completion event per renter, then checking whether any further activity occurred within 30 days. The engineer started building the projection. The projection handler needed to process BookingCompleted events in chronological order, maintaining a running record of each renter's first completion date and subsequent activity. It could not be populated incrementally from the current time because the first completion might have occurred years earlier; the projection needed to read from event 0 to correctly identify first-time completers. The event log contained 2.3 million booking events accumulated over fourteen months. The projection rebuild took 14 hours. The analytics view was unavailable for 14 hours after the rebuild was initiated. The team did not have a snapshot strategy — there were no checkpoints in the booking event stream, so every rebuild started from event 0. The rebuild SLA had never been estimated because this was only the second time a new projection had been added, and the event log had been much smaller when the first projection was built.
The data team's follow-up request arrived two weeks later: "we also need churn segmented by listing category." A third projection rebuild, reading the same 2.3 million events, correlating bookings with listing categories via a join to the CRUD listings table. The engineer added the listing category join to the event handler, noted that the rebuild would take another 14 hours, scheduled it for a Friday night, and mentioned in the Slack channel that analytics would be unavailable from midnight to mid-morning Saturday. The pattern of 14-hour weekend rebuilds for new analytics projections became a known operational rhythm — not because it was planned, but because the projection rebuild SLA had never been established as a design constraint.
Structural properties set by the event sourcing decision
Four structural properties are determined when an event-sourced write model is chosen. None of them appear explicitly in the founding session that adopted the pattern — they are the operational consequences of a design choice made in response to an audit trail requirement, a temporal query request, or a blog post about data immutability.
Property 1: Temporal query capability and its cost model. Event sourcing makes "what was the state at time T?" a native query: replay all events with a timestamp at or before T, and the resulting aggregate state is the answer. For any event-sourced aggregate, this query is always available, always correct (events are immutable), and always derivable from first principles. No secondary audit table is needed. No bitemporal column tracking is required. The event log is the temporal record. This is the genuine, non-negotiable strength of event sourcing, and it is the property that justifies the pattern for regulatory audit trails, financial ledgers, and dispute resolution systems.
The cost model of the temporal query is proportional to event count and replay speed. Replaying 10,000 events for a single aggregate takes milliseconds. Replaying 2.3 million events for a new cross-entity projection takes 14 hours. The temporal query capability is not free — it is bounded by the event volume and the projection rebuild infrastructure. An aggregate with frequent events (a marketplace booking that generates 6 events per lifecycle) accumulates more events than an aggregate with rare events (a workflow that generates 12 events per lifecycle but has fewer instances). The replay speed for any given projection depends on the event handler's complexity, the event store's read throughput, and whether snapshot checkpoints exist that allow replays to begin partway through the stream rather than at event 0.
The temporal query cost model must be estimated at design time and re-evaluated as event volume grows. A projection that rebuilds in 90 minutes at 500,000 events will rebuild in 7 hours at 4 million events if the handler throughput is constant. The rebuild SLA — the acceptable duration of read model inconsistency during a rebuild — must be documented per projection and must account for the event volume growth rate. The observability strategy decision record documents monitoring for projection lag and event consumer health; an event-sourced system should track projection rebuild progress as a first-class operational metric, not a one-time manual process.
Property 2: CQRS projection model and consistency lag. Event sourcing does not require CQRS, but the two patterns appear together in practice because the append-only event log is an awkward query surface for most read patterns. You do not query an event log to find "all active listings in San Francisco sorted by price" — you project the event stream into a denormalized read model optimized for that query. The read model is a projection: a derived database view (or table, or document) maintained by an event handler that processes each event and updates the read model accordingly.
Projections can be updated synchronously or asynchronously. Synchronous projection update means the projection handler runs within the same transaction as the event write: the event is stored and the read model is updated atomically. Zero consistency lag, but write operations carry the overhead of updating every projection that cares about the written event type. Asynchronous projection update means the event is stored first, then a separate subscriber processes the event and updates the read model. The write operation is fast; the read model trails the write model by the subscriber's processing lag — typically milliseconds in a healthy system, potentially minutes or hours if the subscriber falls behind or the projection rebuild is running. The consistency lag during normal operation and the consistency lag during rebuild must be documented separately. A consumer of the projection must know whether the projection guarantees read-after-write consistency or whether it is eventually consistent with a documented lag bound. The real-time architecture decision record documents the event subscription model for real-time views; the event sourcing ADR must document the consistency model for each projection — synchronous or asynchronous, and the lag SLA for asynchronous projections under both normal operation and rebuild.
Property 3: Event schema migration and upcaster accumulation. Events are immutable — once written, they are never modified. Event handlers evolve. This tension is the highest-friction maintenance cost of a long-lived event-sourced system. Three patterns exist for managing event schema evolution:
Weak schema: event payloads are JSON, handlers ignore unknown fields and supply defaults for missing ones. Adding a new optional field to an event is non-breaking: old events are missing the field, the handler supplies a default, the system continues. Removing a field is a coordination problem: handlers that read the field will encounter absent values for all historical events. Renaming a field is effectively a remove-and-add. Weak schema is the simplest migration path and works well when the business domain is stable and events do not carry required fields whose presence can be guaranteed only for events written after a specific date.
Strong versioning: event types carry an explicit version number (ReviewerAssignedV1, ReviewerAssignedV2). When the schema changes, a new version is registered. An upcaster function converts the old version to the new version before the handler receives it. The handler always processes the current version; old events pass through the upcaster chain on load. Strong versioning is explicit and auditable — the upcaster documents exactly what transformation is applied to each old event shape. The cost is the accumulation of upcasters. An event type with three schema revisions has a two-function chain (V1→V2, V2→V3). After five revisions it has four functions. After ten revisions it has nine functions. Engineers joining the codebase must understand the upcaster chain to modify the handler. The upcaster file becomes the most complex file in the codebase and the first place a bug in historical event processing is diagnosed.
Migration events: rather than upcasting old events, write a new migration event that explicitly brings old aggregates up to the current schema. A ReviewerAssignedMigrated event is appended to each existing stream, carrying the fields that the V2 handler needs. Future replays process the migration event rather than upcasting. Migration events add noise to the event log but eliminate the upcaster chain for historical events. They are appropriate when the upcaster chain has accumulated past a threshold complexity or when the migration logic is too complex to express reliably as a stateless transformation of old event fields.
The API schema design decision record documents breaking vs non-breaking change definitions for external API surfaces; an event sourcing ADR must document the equivalent for internal event schemas — what constitutes a breaking change to an event type (removing or renaming a required field, changing a field's type), which migration pattern is used (weak schema, strong versioning, or migration events), and the budget for upcaster chain depth before a migration event is preferred. Without this policy, upcasters accumulate session by session as individual engineering decisions and the 740-line event-upcasters.ts is the output.
Property 4: Event store selection and operational constraints. The event store is the append-only log. Three broad categories exist, each with distinct operational constraints that compound over the system's lifetime:
PostgreSQL as event store: familiar technology, ACID transactions that can write the event and update the outbox table atomically, operational tooling the team already uses. The events table is a standard relational table with stream_id, event_type, sequence_number, payload (JSONB), and occurred_at. Optimistic concurrency on writes is enforced by the unique constraint on (stream_id, sequence_number). The events table grows without bound — event sourcing systems do not delete events. At 10 million rows with JSONB payloads averaging 2 KB, the table is 20 GB. At 100 million rows it is 200 GB. Partition by occurred_at to keep active partitions manageable and move old partitions to cheaper storage. Read throughput for projection rebuilds is bounded by PostgreSQL's sequential scan speed, which is predictable but not optimized for high-fan-out event streaming. The database vendor decision record documents the storage technology choice; the event sourcing ADR must document the archiving policy as the event table grows (partition rotation, partition archive to cold storage, retention policy for compliance-driven minimum retention) and the read throughput estimate for projection rebuilds at current and projected event volume.
Dedicated event store — EventStoreDB: purpose-built for event sourcing, stream-per-aggregate model, built-in catch-up subscriptions with consumer groups, optimistic concurrency on stream version, projections as first-class server-side feature. Higher operational complexity than adding a table to an existing PostgreSQL database: separate deployment, separate backup and recovery procedure, separate operational expertise. Catch-up subscriptions allow async projections to resume from a stored checkpoint position rather than replaying from event 0 after restart — a significant operational advantage over manual position tracking in PostgreSQL. The projection rebuild problem is reduced (not eliminated) by EventStoreDB's native stream subscription model and server-side projections. Justified when the team has engineering capacity to operate an additional stateful service and the event throughput or subscription fan-out exceeds what PostgreSQL can serve efficiently.
Outbox pattern without full event sourcing: the primary CRUD database is the system of record; each write transaction also inserts a row into an outbox table in the same database. A background worker reads unprocessed outbox rows and publishes them as events to a message broker (Kafka, RabbitMQ, Redis Streams). The broker delivers events to consumers who update read projections or trigger downstream processes. This is not event sourcing — the write model is still CRUD and cannot answer arbitrary temporal queries — but it provides reliable event publishing from a CRUD system and decouples read model updates from write operations. The outbox pattern is appropriate when the goal is reliable asynchronous event publishing and downstream read model separation, but temporal query capability and aggregate replay are not required. The background job infrastructure decision record documents the outbox worker's execution model, failure handling, and idempotency contract.
What the founding session records and what it omits
Event sourcing is almost always adopted in response to a specific trigger: a regulatory audit trail requirement, a temporal query request from a product manager, a read scaling problem where the primary database cannot support the query volume, or a blog post or conference talk that made the pattern seem like the correct general-purpose data model for the domain. The founding session that responds to this trigger records the event type vocabulary, the aggregate boundary, the chosen event store technology, and the rationale for the audit trail or temporal query capability. It does not record the properties that determine whether the system is maintainable and operationally predictable two years later.
Four types of AI chat sessions generate these gaps:
The "how do we build an audit trail?" session. The engineer asks how to implement a complete, tamper-evident audit trail. The session discusses event sourcing, explains the append-only log, and produces the initial event type vocabulary and the events table schema. The session does not ask: what is the expected event volume per aggregate per month? How frequently will existing event types need to evolve? Which queries need to be answered by temporal event replay versus materialized projections? What is the acceptable rebuild time if a projection must be corrected? These questions are not asked because the problem being solved is the audit trail, not the long-term operational model. The resulting ADR — if one is written at all — records the pattern choice without the projection rebuild strategy, the snapshot policy, or the event schema migration plan.
The "how do we migrate this event type?" session. Eight months after launch, a field needs to be added to a core event type. The engineer discovers that historical events do not have the field and the handler will break on load. The session explains upcasters and produces the first upcastEventTypeV1toV2 function. The session does not establish a versioning policy, an upcaster accumulation budget, or a criterion for when a migration event should be written instead of another upcaster. The upcaster is correct for the current migration. It does not account for the fact that this is the beginning of an accumulation that will reach twelve upcaster functions and 740 lines before anyone pauses to document a migration policy.
The "why is the projection rebuild taking so long?" session. The engineer initiates a projection rebuild after a bug fix and discovers it takes eleven hours. The session explains snapshot checkpoints — write a materialized snapshot of the aggregate state every N events, so future replays start from the most recent snapshot rather than event 0. The session produces the snapshot implementation. It does not retroactively establish a rebuild SLA, does not explain how to estimate rebuild time as a function of event volume, and does not recommend communicating the rebuild duration and consistency impact to customers before initiating it. The snapshot is correct. The process discipline around rebuild planning is not established.
The "how do we add a new analytics projection?" session. The product manager requests a new report that requires correlating events across entity lifetimes. The engineer asks how to build the projection. The session explains CQRS projection design and produces the event handler. It does not surface the constraint that this projection requires reading from event 0, cannot be incrementally populated, and will take 14 hours to rebuild on the current event log. The rebuild duration is discovered when the engineer runs it on production. The caching strategy decision record documents the cache layer for read models; in an event-sourced system the consistency model of each projection — including the rebuild duration at current event volume — is the equivalent of the cache's TTL and must be documented per projection before the projection is built.
The WhyChose extractor surfaces the founding "how do we build an audit trail?" session, the first upcaster session, the projection rebuild incident session, and the analytics projection design session from AI chat history. The event sourcing ADR takes the pattern adoption buried in those sessions and converts it into a documented aggregate boundary, a projection inventory with rebuild SLA estimates, an event schema migration policy with upcaster accumulation budget, a snapshot checkpoint frequency, and a temporal query policy that distinguishes between projections built for common patterns and ad-hoc replays reserved for rare queries.
The five sections of an event sourcing ADR
Section 1: Write model boundary and aggregate selection. Document which aggregates are event-sourced and which remain CRUD, with explicit rationale for the boundary. Not every entity in the system benefits from event sourcing. A UserPreference aggregate has no meaningful event semantics — UserPreferenceUpdated carrying a JSON diff of changed fields is a technical artifact, not a domain event. Event sourcing adds value where domain events have independent business meaning: BookingRequested, PaymentCaptured, DisputeOpened are things that happened in the business domain, not just state changes in a database row. Document the criterion for including an aggregate in the event-sourced boundary: meaningful domain event semantics, temporal query requirement, or need for multiple independent read projections. Document the criterion for excluding: mutable reference data with no meaningful event history, configuration tables, user preferences that carry no business audit requirement. A mixed architecture — event sourcing for core domain aggregates, CRUD for supporting data — is operationally correct and should be documented explicitly rather than treated as a temporary state before "migrating everything to event sourcing." The multi-tenancy decision record documents whether event streams are partitioned per tenant or shared with a tenant filter; an event-sourced multi-tenant system must document the stream isolation strategy to prevent cross-tenant event visibility and to allow per-tenant projection rebuilds without processing the entire event log.
Section 2: Event schema design and versioning policy. Document the versioning strategy selected — weak schema, strong versioning, or migration events — with the rationale based on the domain's schema stability expectations. For strong versioning: document the upcaster accumulation budget (the maximum number of upcaster generations before the team prefers writing a migration event to extend the chain), the definition of a breaking change to an event schema (any removal or rename of an existing field, any type change of an existing field, any new required field; adding an optional field with a documented default is non-breaking), and the process for adding a new event type versus revising an existing event type (new event types are always non-breaking; revisions require the versioning workflow). Document the event type naming convention: past tense, business-meaningful (ReviewerApproved not ReviewerStatusUpdated), domain-scoped (aggregate name as prefix where streams contain multiple aggregate types). Document the payload field naming convention and whether events carry a correlation ID for cross-entity tracing. The observability strategy decision record documents distributed tracing; an event-sourced system should carry a correlation ID through the event payload to trace a business operation across multiple aggregate streams and projections.
Section 3: Event store technology and operational constraints. Document the technology selected with the specific rationale — not just "PostgreSQL is familiar" but the explicit tradeoff against EventStoreDB or other options for this system's event volume, team size, and subscription fan-out requirements. Document the operational plan for event log growth: partitioning strategy by occurred_at, the partition rotation schedule, the archive destination for old partitions, and the minimum retention period per event type for compliance requirements. Document the snapshot checkpoint policy: the frequency (every N events per aggregate stream, where N is chosen so that the worst-case replay from the most recent snapshot is within the accepted rebuild SLA), the snapshot storage schema, and the trigger for writing a snapshot (event-count-based, time-based, or deployment-triggered when an event handler changes). Document the read throughput estimate for projection rebuilds at current event volume and the scaling plan as volume grows: at what event count does the current rebuild infrastructure require optimization? The build vs buy decision record documents the general tooling choice methodology; for event store selection specifically the tradeoff between operational simplicity (PostgreSQL) and purpose-built capability (EventStoreDB) should be made explicitly at architecture time, not by default.
Section 4: Projection inventory and consistency model. Document each projection: its name, the event types it subscribes to, its update model (synchronous within the write transaction or asynchronous via subscriber), the consistency lag bound for asynchronous projections under normal operation, the rebuild time estimate at current event volume, the rebuild SLA (acceptable read staleness or unavailability during rebuild), and the consistency model visible to API consumers (read-after-write guaranteed, or eventually consistent with a documented lag bound). This inventory should be a living section of the ADR — updated when a new projection is added and reviewed when the event volume crosses a rebuild-time threshold that invalidates the existing SLA. Projections that require reading from event 0 to populate correctly (cross-entity correlations, lifetime-aggregated metrics) must be flagged explicitly as "full-log rebuild required," and the rebuild time estimate must be current, not computed at the event volume when the projection was first built. The rebuild duration for "full-log rebuild required" projections should be communicated to stakeholders before any rebuild is initiated. The test strategy decision record documents the testing approach for event handlers; event-sourced aggregate handlers are best tested with given/when/then: given a sequence of prior events, when a command is issued, then these events are produced — this pattern isolates the handler from the event store and tests business logic without a database.
Section 5: Temporal query policy. Document which queries require event replay — ad-hoc historical state reconstruction at an arbitrary timestamp — versus which are served by dedicated projections for common patterns. Ad-hoc event replay is powerful but does not scale for frequent queries: replaying 4 million events for every dispute resolution request is operationally different from replaying them once for a weekly analytics export. The temporal query policy documents: the maximum acceptable replay window for an ad-hoc query (the maximum number of events that will be replayed on demand without a snapshot checkpoint or dedicated projection), the trigger for promoting a common temporal query pattern from ad-hoc replay to a dedicated projection (frequency threshold, or a specific query latency SLA that ad-hoc replay cannot meet), and the "as of date" query strategy for the most common temporal queries (a dedicated aggregate_snapshots table that stores daily snapshots for frequently queried aggregates, allowing "as of" queries to start from the nearest daily snapshot rather than event 0). The temporal query capability is the primary justification for event sourcing over CRUD with an audit table; the temporal query policy is the operational discipline that ensures this capability remains performant as the event log grows beyond the scale at which ad-hoc replay is trivially fast. Without the policy, the first time a customer requests a historical state reconstruction on a large aggregate stream is the first time the team discovers whether the system can answer the query in seconds or hours.
None of these five sections appear in the founding sprint session that adopted event sourcing in response to a "we need an audit trail" requirement. The session records the event type vocabulary, the aggregate boundary, the event store choice, and the append-only model. It does not ask what happens when the event log reaches 10 million rows, how the team will migrate the third revision of a core event type, or what the rebuild SLA is for the new analytics projection the product manager will request in six months. The event sourcing ADR is the document that converts the pattern adoption decision into the operational parameters that determine whether the second year of the system is characterized by fast temporal queries and clean event evolution or by fourteen-hour weekend rebuilds and a 740-line upcaster file nobody wants to touch. The WhyChose extractor recovers the founding audit trail session, the first upcaster session, and the projection rebuild incident session from AI chat history; the event sourcing ADR extracts the durable architectural constraints from those sessions and documents them where engineers encounter them: next to the event handlers and projection definitions, not in a Slack thread from month 8.
FAQs
When does event sourcing provide genuine value over a CRUD system with an audit table?
Event sourcing provides genuine value in three scenarios: temporal queries are a core business requirement (auditors, dispute resolution, financial reconstruction at arbitrary timestamps); the business domain has natural event semantics where the event sequence is itself the valuable data (financial transactions, workflow approvals, supply chain movements); or multiple independent read models of the same data are needed and new read patterns appear frequently enough to justify the separation of write and read models.
Event sourcing adds complexity without commensurate benefit when the audit trail requirement is basic (who changed what when, without temporal state reconstruction), the data has no natural event semantics (a UserPreference update is not a meaningful domain event), the team has no prior experience with the operational requirements of event store management and projection rebuilds, or the read query patterns are simple enough that a well-indexed CRUD database serves them without a separate read model layer. An audit table with change records — a separate table recording who changed each column and when — satisfies basic audit requirements with a fraction of the operational complexity of an event-sourced system. The decision between them should be explicit, not implicit in the choice of whether to reach for event sourcing at the first mention of "audit trail."
What is a projection rebuild and why does it become a production risk?
A projection is a read model — a database table or document optimized for a specific query pattern — populated by replaying events from the event store. A projection rebuild discards the current read model and recomputes it from event 0. This becomes a production risk when: a bug in the projection's handler must be corrected by rebuilding from the beginning of the log (the projection serves incorrect data during the rebuild); a new projection requires correlating events across entity lifetimes and cannot be populated incrementally (the rebuild reads the entire event log); or the event log has grown faster than anticipated and the rebuild time has crossed the team's implicit SLA without the team noticing.
The rebuild risk is managed by three design decisions made at architecture time: snapshot checkpoints (write a point-in-time aggregate state snapshot every N events; future rebuilds start from the nearest snapshot rather than event 0), explicit rebuild SLA per projection (estimate rebuild time at current event volume, document the consistency model during rebuild, communicate the duration to customers before initiating it), and async vs sync projection update strategy (sync projections have zero consistency lag but add write latency; async projections lag and require checkpoint tracking to resume after restart without full replay).
What should an event sourcing ADR document that a general architecture decision does not?
A general architecture decision records that event sourcing was chosen and the motivation. An event sourcing ADR must document: (1) The write model boundary — which aggregates are event-sourced and which remain CRUD, with the criterion for the boundary. (2) The event schema versioning policy — weak schema versus strong versioning, the upcaster accumulation budget, and the definition of a breaking change. (3) The projection inventory — each projection's update model, consistency lag, rebuild time estimate at current event volume, and rebuild SLA. (4) The snapshot checkpoint policy — frequency, storage schema, and the trigger for writing a new checkpoint. (5) The temporal query policy — which patterns require ad-hoc event replay versus dedicated projections, the maximum replay window for ad-hoc queries, and the trigger for promoting a frequent temporal query to a dedicated projection. None of these appear in the founding session that adopted the pattern; all of them determine whether the projection rebuild and event schema evolution are predictable operational activities or production incidents discovered under pressure two years after the first event was written.