Why does multi-region deployment need an architecture decision record?

Multi-region deployment appears to be an infrastructure configuration: choose a cloud provider, select regions, enable database replication, deploy application stacks. This framing conceals the architectural decisions embedded in the topology — decisions that determine what happens when the primary region goes down, when an enterprise customer asks where their data is stored, when a user in London experiences 150ms latency because all API servers are in Virginia, and when a GDPR audit asks which third-party analytics tools receive EU personal data. The failover model (active-passive with a manual procedure, active-active with a global load balancer, multi-primary with conflict resolution) determines RPO and RTO during regional outages. The data residency policy determines whether EU customer personal data can be stored in US-region infrastructure, and which third-party tools violate that policy by forwarding event data to US-based SaaS servers. The cross-region consistency model (synchronous replication with write latency, asynchronous replication with replication lag) determines the tradeoff between write performance and RPO, and what happens to in-flight writes when a regional partition occurs. The traffic routing approach (DNS latency-based routing, anycast, regional affinity) determines the actual latency floor users experience, which is bounded below by the speed of light to the nearest application server. None of these decisions are visible as architectural rationale in the infrastructure code — they appear as Terraform region configurations, database replication settings, Route53 routing policies, and third-party API integrations. Without an ADR, the team cannot explain the topology to a new engineer, a security auditor, or a sales prospect who asks about data residency.

What is the difference between active-passive and active-active multi-region deployments?

In an active-passive multi-region deployment, one region (the primary) serves all production traffic and all write operations. The second region (the secondary) is a warm standby: it runs the application stack and receives replicated data from the primary, but does not serve traffic under normal conditions. During a primary region outage, traffic is rerouted to the secondary (via DNS change, load balancer failover, or global traffic manager) and the database secondary is promoted to primary. The RPO (Recovery Point Objective) is determined by the replication lag at the moment of failure — with asynchronous replication, the secondary may be behind by some number of committed transactions that were not yet replicated. The RTO (Recovery Time Objective) is determined by the time to detect the outage, execute the DNS change or load balancer update, promote the database, and verify the secondary is serving traffic correctly. Active-passive is simpler to implement and operate than active-active: there is no question about write routing (all writes go to the primary), no consistency conflicts between regions, and no need for a global coordination mechanism. The cost is that the secondary region's resources are underutilized during normal operation, and the failover procedure is executed infrequently enough that it is almost never practiced. In an active-active multi-region deployment, both regions serve production traffic simultaneously. Traffic is distributed between regions (typically by geographic proximity, so users are routed to the nearest region). Write operations require a strategy: either all writes are routed to a single designated write region (primary affinity for writes, regional affinity for reads) regardless of which region received the request, or the deployment uses a multi-primary database (CockroachDB, Google Spanner, PlanetScale, Cassandra) where each region can accept writes independently. Active-active provides redundancy with full utilization of both regions' resources and lower RTO (traffic already flows to both regions; a regional outage affects only the traffic that was routed to the failing region). The structural complexity is significantly higher: write routing requires coordination, multi-primary databases require a conflict resolution strategy for concurrent writes to the same record, and the operational model must handle partial regional degradation without a clean primary/secondary distinction.

What is a data residency requirement and what technical controls does it require?

A data residency requirement is a constraint that certain data — typically personal data belonging to customers in a specific jurisdiction — must be stored and processed within that jurisdiction and must not transit to servers in other jurisdictions. GDPR Article 46 requires that personal data transferred outside the EU is protected by an adequate mechanism (adequacy decision, standard contractual clauses, binding corporate rules). For a B2B SaaS product with EU customers, data residency in the EU means that EU customer personal data must reside in EU-region infrastructure and must not be sent to non-EU servers without an appropriate transfer mechanism. The technical controls required by data residency go far beyond selecting an EU cloud region for the primary database. First, the database replication strategy: if the disaster recovery plan involves cross-region replication to a US-region replica, that replication is a personal data transfer outside the EU and must be replaced with in-EU replication (eu-west-1 to eu-central-1, for example). Second, the logging and observability pipeline: if application logs containing personal data (user IDs, email addresses, IP addresses) are forwarded to a centralized logging service in a US region (Datadog, Splunk Cloud, Elastic Cloud on a US cluster), that forwarding is a personal data transfer. Third, analytics tooling: Mixpanel, Segment, Amplitude, and similar analytics services transmit events to US-based servers by default; analytics for EU customers must either be collected on EU-region infrastructure or excluded from third-party analytics tools, with documentation for the exception. Fourth, error tracking: Sentry, Bugsnag, Rollbar store error events (which may include personal data such as user IDs and request parameters) on their cloud infrastructure; EU data residency customers may require that error events for their sessions are excluded from error tracking tools, or that a self-hosted error tracking instance in the EU is used. Fifth, customer support and communication tools: Intercom, Zendesk, HubSpot are US-based services; support conversations with EU customers transmitted to these services are cross-border transfers. A data residency promise in a DPA (Data Processing Agreement) is a legal commitment; the technical enforcement mechanisms are what actually satisfy the requirement, and they must be enumerated in the multi-region deployment ADR.

What should a multi-region deployment decision record include?

A multi-region deployment ADR needs six sections. First, the region topology decision: which regions are deployed, the rationale (latency requirements, data residency requirements, disaster recovery requirements), the active-passive or active-active designation, and the write region designation. Second, the failover procedure: RPO and RTO targets with acceptance criteria, the mechanism for traffic rerouting on primary failure (DNS change with specific TTL values, global load balancer health check configuration, Route53 health check and routing policy), the database failover process (standby promotion procedure, DNS or connection string update mechanism, read replica promotion and replication reconnection), and the testing cadence (how often the failover procedure is exercised, what constitutes a successful test, who is responsible for the exercise). Third, the data residency policy: which data categories are subject to residency constraints, the technical mechanisms that enforce the constraint (EU-region infrastructure for relevant services, excluded third-party services with documented alternatives, restricted database replication topology), a compliance matrix of every third-party service used and its data residency status, and the procedure for adding new third-party integrations while maintaining residency compliance. Fourth, the cross-region consistency model: the replication mechanism (streaming replication, logical replication, managed database service replication), synchronous versus asynchronous configuration with explicit acceptance of the RPO implied by the asynchronous lag, the replication lag monitoring alert threshold, and the conflict resolution policy for any multi-primary deployment. Fifth, the traffic routing configuration: the routing mechanism (DNS latency routing, anycast, global load balancer), the routing policy per endpoint type (regional affinity for write operations and user-session-dependent reads, CDN-eligible status for cacheable responses, read replica routing for read-heavy queries), and the latency monitoring by geographic region. Sixth, the operational model: deployment sequencing across regions (canary in one region before full deploy, or simultaneous), alert routing by region, cross-region network latency monitoring, capacity planning assumptions by region, and the on-call rotation's coverage for the secondary region.

2026-06-20 · ~20 min read

The multi-region deployment decision record: why the region topology you chose determines your latency floor and your data residency compliance posture

Multi-region deployment looks like infrastructure configuration — pick a cloud provider, select additional regions, enable database replication, deploy identical application stacks. This framing conceals the architectural decisions embedded in the topology: the failover model that determines RPO and RTO during regional outages; the data residency policy that determines which data can leave which jurisdiction; the cross-region consistency model that determines the tradeoff between write latency and data loss risk; and the traffic routing approach that determines the actual latency floor users experience. Most teams discover these decisions not during the design phase, but during the first enterprise sales negotiation that includes data residency requirements, or when the primary region suffers an outage and the failover procedure has never been tested.

A B2B SaaS team serves customers across the US and Europe. Their infrastructure is entirely in us-east-1. They have a Postgres database on RDS, an application cluster on ECS, and a CloudFront distribution for static assets. The product works well. No customer has complained about latency. No enterprise sales cycle has required infrastructure documentation.

A German mid-market logistics company with 400 employees enters the sales pipeline. The deal size is €160,000 ACV. The prospect's IT security team sends a pre-purchase questionnaire with 47 questions. Question 12: "In which geographic region(s) is customer data stored? Can you guarantee that EU customer data, including personal data as defined under GDPR, never transits to or is processed by systems located outside the European Economic Area?" The sales team escalates to engineering.

Engineering's answer, after two days of investigation: all data is in us-east-1. There is no EU data residency. The CloudFront distribution caches content from US-region origins. The centralized logging service (Datadog) stores logs on US servers. The analytics pipeline sends all events to Mixpanel's US datacenter. Error tracking runs through Sentry Cloud, whose servers are in the United States. None of this was a deliberate choice against data residency — it was the accumulation of individually reasonable infrastructure decisions made by a team that was not anticipating an EU data residency requirement.

The deal is put on hold pending a data residency roadmap. Engineering begins an assessment. Adding EU data residency requires: a new RDS instance in eu-central-1 (Frankfurt), a strategy for per-tenant data routing (which tenant's data goes to which region — the current schema has no tenant-region column), changes to the authentication flow (JWT validation must happen in the same region as the data it protects), migration of the logging pipeline to a Datadog EU-region cluster (Datadog EU exists but the agent configuration must be updated and log forwarding must be redirected), replacement of Mixpanel with a GDPR-compliant analytics tool configured for EU data, a self-hosted Sentry instance in the EU or exclusion of EU customer error events from Sentry Cloud, and a revised backup strategy (cross-region replication to us-east-1 is not acceptable for EU data; replication must remain within the EU). The total migration estimate: six months, approximately €240,000 in engineering time. The decisions that created this situation were never written down.

Why multi-region deployment is an architectural decision, not just an infrastructure configuration

Infrastructure-as-code makes multi-region deployment look mechanical: copy a Terraform module, change the region parameter, add a database replication rule. The appearance of mechanical work conceals five architectural decisions that are embedded in every multi-region topology, each with consequences that become visible only when a regional outage occurs, when an enterprise prospect asks about data sovereignty, or when a user base expands to a continent where the nearest application server is 150ms away.

The failover model determines what happens during a regional outage. Active-passive topologies designate one region as the primary and maintain a warm standby in a second region. Active-active topologies serve traffic from both regions simultaneously, with a strategy for routing write operations to the correct region. Multi-primary topologies allow concurrent writes in multiple regions with a conflict resolution mechanism. The choice is an architectural commitment: active-passive is simpler to implement and has a well-understood failure mode but requires a tested failover procedure that is almost never practiced until it is needed; active-active provides higher availability and better resource utilization but requires write routing coordination that adds operational complexity; multi-primary eliminates the single write region but requires a conflict resolution policy for concurrent writes that must be specified before the first write conflict occurs in production.

The data residency policy determines which data can cross which geographic boundaries. GDPR Article 46 requires that personal data transferred outside the European Economic Area is protected by an adequate mechanism. This constraint is not satisfied by deploying a database in an EU cloud region — it extends to every system that processes or stores copies of that data: logging pipelines, analytics tools, error tracking services, customer support platforms, backup storage, and disaster recovery replicas. A data residency commitment in a Data Processing Agreement is a legal obligation backed by technical controls. If the technical controls are not enumerated, they cannot be audited, and the legal commitment is unsupported.

The cross-region consistency model determines the tradeoff between write latency and recovery risk. Synchronous database replication guarantees that a write is not acknowledged until the secondary has confirmed receipt — zero replication lag at the cost of adding the round-trip time to the secondary to every write. Asynchronous replication acknowledges writes immediately and replicates in the background — normal write latency at the cost of replication lag that determines the RPO. If the primary fails with 8 seconds of uncommitted replication lag, up to 8 seconds of committed transactions are lost on failover. This is not a configuration detail — it is the acceptance of a specific data loss risk that must be documented alongside the failover procedure.

The traffic routing approach determines the actual latency floor. The latency floor is bounded by physics: the speed of light across an ocean is approximately 100ms one-way at the fiber level (accounting for routing). A user in Frankfurt accessing an application server in us-east-1 experiences at least 100ms round-trip latency before any application code runs. DNS-based geographic routing, anycast, and regional affinity can reduce this — but only if application servers exist in the user's geographic region. CDN edge caching reduces the latency floor for cacheable responses but does not help for non-cacheable API responses that carry per-user session data. The traffic routing ADR must specify which endpoints are eligible for CDN caching and which require a round-trip to the application server, because the answer determines the latency floor for the product's most common user actions.

The operational model determines whether the multi-region topology is maintainable. A second region is a second deployment target, a second monitoring domain, a second on-call alert surface, and a second capacity planning concern. Teams that deploy to a second region without updating their deployment pipeline, monitoring configuration, and on-call runbooks discover the operational model gap at 3am during an incident in the secondary region, when the on-call engineer's runbooks describe only the primary region's infrastructure.

Failover models: active-passive, active-active, and multi-primary

The failover model is the most visible architectural decision in a multi-region topology because it determines what happens during the highest-stakes event: a regional cloud outage. The model must be specified explicitly, not inferred from the replication configuration, because the replication configuration enables a failover but does not specify the failover procedure, the RPO/RTO targets, or the person responsible for executing the procedure at 3am.

Active-passive designates one region as the primary, serving all traffic under normal conditions, and maintains one or more secondary regions as warm standbys. The secondary runs the application stack and receives replicated data from the primary database, but does not serve production traffic. During a primary region outage, three things must happen in sequence: traffic must be rerouted to the secondary (via DNS change, global load balancer failover, or anycast routing update), the secondary database must be promoted to primary (taking over the write role), and the application tier must be verified as healthy and serving traffic from the secondary region. The total elapsed time from outage detection to secondary serving verified traffic is the RTO. The data loss from transactions committed on the primary that had not yet been replicated to the secondary at the moment of failure is the RPO. Both must be specified as targets before the failover model is selected, because they constrain the replication configuration (asynchronous replication can provide better write performance but with a larger RPO than synchronous) and the detection and notification setup (a longer detection time directly increases the RTO).

The critical failure of active-passive is the untested failover. The procedure exists in a runbook. It may even be correct at the time it was written. But the team has never executed a live failover, the DNS TTL has never been validated under production conditions, the database promotion procedure has never been run against the current database version and replication configuration, and the application tier has never been verified as healthy in isolation from the primary region. The first time the failover procedure runs under real conditions, it runs during a regional outage, with customer traffic affected and engineers under time pressure. The multi-region deployment ADR must specify a testing cadence — at minimum, an annual scheduled failover exercise where production traffic is deliberately rerouted to the secondary to validate the procedure end-to-end.

Active-active serves traffic from both regions simultaneously, distributing users between regions (typically by geographic proximity). Write operations require a routing strategy, because writing to two independent databases concurrently without coordination produces divergent state. The two write routing approaches have distinct tradeoffs: primary-affinity write routing sends all write operations to a single designated write region regardless of which region received the user's request; this eliminates write conflicts but adds cross-region latency for writes from users in the non-write region. Full multi-primary write routing allows writes at each region's local database, requires a consensus or conflict resolution mechanism, and eliminates cross-region write latency at the cost of conflict resolution complexity.

Active-active's primary advantage over active-passive is that both regions are already serving traffic during normal operation, so a regional outage requires only traffic rerouting — not database promotion, which is the most complex and risky step in the active-passive failover. The secondary region has been running the application stack continuously and is warmed up; the RTO for active-active is bounded by the traffic rerouting time rather than the sum of detection time, database promotion time, and application verification time.

Multi-primary allows concurrent writes to independent databases in each region, with asynchronous replication and a conflict resolution mechanism that handles records written concurrently to both primaries before replication completes. The conflict resolution policy is a correctness decision: last-write-wins (the higher-timestamp write survives) is simple to implement but produces data loss for the losing write; application-level conflict detection (each write includes a version vector, conflicts surface to the application for resolution) preserves all writes but requires the application to handle conflict resolution at every write path; CRDTs (conflict-free replicated data types) provide conflict-free merge semantics for specific data structures (counters, sets, maps with specific merge rules) but not for arbitrary relational data models. Multi-primary is the correct model for globally distributed products where cross-region write latency is a product quality problem, but the conflict resolution policy must be specified in the ADR, named per data entity, and tested before the first conflict occurs in production.

Managed distributed databases (CockroachDB, Google Spanner, PlanetScale, Amazon Aurora Global Database) implement multi-primary or geo-distributed consensus replication as a managed service, abstracting the conflict resolution and replication mechanics. The cost is the consensus protocol's latency overhead on every write: a write to CockroachDB must achieve consensus across the quorum, which in a geo-distributed deployment includes a round-trip to the farthest quorum member. The latency profile is different from single-region deployments and must be measured and documented alongside the consistency model.

Data residency and sovereignty compliance

Data residency requirements represent one of the most common blockers in enterprise B2B sales, and one of the most expensive architectural remediation projects when encountered after the infrastructure has been built. The remediation cost is high because data residency is not a property of a single system — it is a property of every system that touches the data, including systems that were not considered data storage at the time they were integrated.

The compliance perimeter is larger than the database. A team that deploys a database in an EU-region cloud provider and believes this satisfies GDPR data residency has usually underestimated the scope. Every system that receives or stores a copy of EU customer personal data is within the compliance perimeter. The standard inventory includes: the primary database and its read replicas (where are each?); the backup storage (cross-region database backups are personal data transfers); the logging pipeline (structured application logs contain user IDs, IP addresses, email addresses — which logging service receives them and where?); analytics tooling (session events, page views, feature usage events — which analytics platforms receive them?); error tracking (stack traces and request parameters may include user context — where are errors stored?); customer support tools (support conversations contain personal data — where is the support platform's data residency?); and any batch export or reporting jobs that extract personal data for internal analysis. Most SaaS products have between 15 and 30 third-party integrations that receive personal data; a genuine data residency commitment requires a compliance matrix covering each one.

The per-tenant residency model is the most common B2B SaaS architecture. Rather than requiring all customers' data to reside in a specific region, a per-tenant residency model assigns each customer to a home region based on their jurisdiction and stores their data in that region's infrastructure. A European customer is assigned to the EU region; a US customer is assigned to the US region. The application routes each request to the customer's home region. This requires: a tenant-to-region mapping (a lookup that must be maintained and consistent), a database per region (with schema parity maintained across regions through coordinated migrations), application instances in each region that serve only requests routed to their region, and an authentication layer that is either stateless (JWT validation with the same signing keys in each region) or per-region (session state stored in the customer's home region). The authentication strategy ADR must account for regional token validation if the multi-region topology requires it.

Technical enforcement mechanisms must be enumerable. A data residency commitment in a DPA is a legal statement; the underlying claim is that the technical systems prevent EU personal data from reaching non-EU infrastructure. Enumerating the technical controls is what converts a legal promise into an auditable fact. The controls include: infrastructure-as-code that prevents deployment of EU-region resources to non-EU cloud regions; VPC configurations that restrict cross-region traffic; database replication topology that excludes non-EU replicas for EU-region databases; analytics pipeline configuration that routes EU events to EU-region analytics infrastructure; log shipping configuration that sends EU-region logs only to EU-region log storage; and a documented approval process for adding new third-party integrations that collects data residency information before integration. Without enumeration, the data residency commitment cannot be audited, and each new integration is a potential silent compliance violation.

Data retention intersects with data residency. The data retention decision record specifies how long data is kept before deletion. For EU customer data, the retention policy must be enforced in the EU-region infrastructure, and the deletion procedure must cover all EU-region systems where copies exist. A retention policy of 90 days for inactive account data must trigger deletion from the EU-region primary database, the EU-region backup storage, the EU-region log archives, and the EU-region analytics pipeline — not just the primary database. The cross-system deletion procedure is the mechanism that actually satisfies the retention policy for data residency purposes.

Cross-region consistency: the replication lag problem

Database replication is the mechanism that keeps the secondary region's data synchronized with the primary. The replication model — synchronous or asynchronous, streaming or logical, managed service or self-configured — determines the consistency guarantees available during normal operation and during failover. The replication lag is not a technical metric to be monitored and forgotten; it is the direct expression of the RPO the team has accepted.

Synchronous replication requires that a write transaction is committed on the primary only after the secondary has confirmed receipt and durability of the write. The write is not acknowledged to the application until the secondary has persisted the change. The result is zero replication lag under normal conditions: the secondary is always current as of the last committed transaction. The cost is write latency: every write must complete a round-trip to the secondary before returning to the application. For replication across a continental distance (us-east-1 to eu-west-1), the round-trip adds approximately 80-120ms to every write operation. For a product where writes are in the critical path of user-facing interactions, this is a significant latency increase. Synchronous replication is the correct choice when the RPO is zero — when no committed transactions can be lost during a regional failover — and when the write latency penalty is acceptable for the product's transaction volume.

PostgreSQL synchronous replication is configured via synchronous_commit = remote_write or synchronous_commit = on combined with synchronous_standby_names listing the standby that must confirm writes. This converts a write to a synchronous operation that waits for the standby acknowledgement. The risk is that if the standby becomes unreachable, writes block until the timeout — the primary's availability depends on the standby's connectivity.

Asynchronous replication acknowledges writes immediately and applies them to the secondary in the background. The secondary is always some number of transactions behind the primary — the replication lag. The lag is determined by the transaction rate, the replication throughput, and the network bandwidth between regions. Under low load, the lag is typically milliseconds. Under heavy write load or network congestion, the lag can grow to seconds or minutes. The RPO equals the replication lag at the moment of primary failure: transactions committed on the primary but not yet replicated to the secondary are lost when the secondary is promoted to primary. This is the explicit acceptance of data loss risk — a risk that must be acknowledged in the multi-region deployment ADR, not hidden in a replication configuration setting.

Replication lag monitoring is a critical production metric for any active-passive deployment with asynchronous replication. The lag must be measured continuously (PostgreSQL exposes it via pg_stat_replication.write_lag and pg_stat_replication.replay_lag) and must trigger alerts when it exceeds the RPO threshold. A replication lag alert at 30 seconds for an RPO target of 5 seconds means the team is operating outside their stated recovery commitment. The alert must be actioned — investigated and resolved — not acknowledged and dismissed, because a sustained high replication lag means the actual RPO in a failover scenario is the lag value at the time of failure, which may be minutes rather than seconds. The observability strategy must include replication lag alongside application performance metrics, not only in a database monitoring dashboard that is never viewed until an outage begins.

Managed global databases (Amazon Aurora Global Database, CockroachDB, Google Spanner, PlanetScale) implement cross-region replication as a managed service, exposing higher-level primitives: Aurora Global Database provides sub-second replication with a managed promotion procedure; CockroachDB and Google Spanner provide geo-distributed ACID transactions with consensus replication at the cost of consensus-protocol write latency; PlanetScale provides branching-based schema migrations with globally distributed read replicas. The consistency and latency properties of each managed offering are different from one another and from self-managed PostgreSQL replication. The multi-region deployment ADR must specify the replication mechanism, its consistency properties, and the monitoring approach, regardless of whether the implementation is managed or self-managed.

Conflict resolution for multi-primary deployments must be specified per entity type before the first write conflict occurs. Write conflicts occur when two independent primaries accept writes to the same record before replication synchronizes them. The conflict resolution policy options include: last-write-wins by timestamp (the write with the higher committed timestamp survives, the other is discarded — data loss for the losing write, correct when the data is idempotent or when losing a write is acceptable); application-level conflict detection using version vectors (each write includes a vector clock, the application receives both versions of a conflicted record and implements domain-specific merge logic — no data loss, requires application code for every entity type that can conflict); CRDTs for specific data structures (counters that only increment, sets that only grow, maps with defined merge rules — conflict-free by design within the CRDT's constraints, not applicable to arbitrary relational records); and optimistic locking with retry (the application specifies an expected version and the write is rejected if the record has been modified since the read — requires the application to retry on rejection, correct when conflicts are rare and retry is acceptable). The conflict resolution policy is a correctness decision that must be specified alongside the multi-primary model, not discovered when the first conflict occurs in production.

Traffic routing and the latency floor

The latency floor is the minimum round-trip time a user can experience regardless of how well-tuned the application is. It is bounded by physics: the speed of light in fiber is approximately 200,000 km/s, which means a New York to Frankfurt round-trip (approximately 11,000 km of fiber) takes at minimum 55ms. In practice, with routing overhead and TCP handshake overhead, cross-Atlantic round-trip times are typically 80-120ms. A user in Frankfurt accessing an application server in us-east-1 will wait at least 100ms before any application code runs, on every non-cached request. No amount of database query optimization reduces this floor.

DNS-based geographic routing (Route53 Latency Routing, Cloudflare Traffic Management, AWS Global Accelerator) routes users to the nearest regional endpoint based on DNS resolution. A user in Frankfurt resolves the application domain and receives the IP address of the EU-region load balancer rather than the US-region load balancer. DNS routing is effective but has a propagation delay: DNS TTL values determine how long a user's DNS resolver caches the resolution. Short TTLs (60 seconds) provide faster rerouting during a failover but increase DNS query volume; long TTLs (300 seconds) reduce query volume but slow down rerouting. The TTL must be specified in the routing configuration and must be considered in the failover RTO calculation — a 300-second TTL means users may continue reaching the failed region for up to 5 minutes after DNS is updated.

Anycast routing assigns the same IP address to multiple regional endpoints and routes requests to the nearest endpoint based on network topology rather than DNS resolution. AWS Global Accelerator uses anycast to route users to the nearest AWS point-of-presence and then over the AWS backbone to the target region. Anycast routing provides faster failover than DNS-based routing (rerouting is handled at the network layer without waiting for DNS cache expiration) and reduces the latency floor (AWS backbone routing between the point-of-presence and the application region is faster than public internet routing). The cost is additional infrastructure and the operational complexity of maintaining anycast-aware load balancers in each region.

CDN edge caching reduces the latency floor for cacheable responses by serving them from CDN edge nodes that are geographically close to the user. A static asset served from a CloudFront edge node in Frankfurt is 5-10ms away rather than 100ms away. For a content-heavy product, CDN edge caching dramatically improves the perceived performance for users in non-primary regions. The limitation is cacheability: API responses that carry per-user session data, personalized content, or transactional state cannot be cached at the CDN edge. The multi-region deployment ADR must specify which endpoints are CDN-eligible (and their cache TTLs) and which require a round-trip to the application server, because the answer determines the actual latency floor for the product's most common user-facing interactions. A single-page application where the HTML and JavaScript are served from a CDN edge but every API call reaches an application server in a distant region achieves fast initial page load and slow post-load interactions — a latency profile that is worse than a naive assessment of "CDN deployed" suggests.

Read replicas in additional regions reduce read latency for read-heavy workloads without the complexity of a full active-active deployment. A read replica in eu-central-1 can serve queries from EU users with regional latency while writes still go to the primary in us-east-1. The consistency model for regional reads is eventual: the replica is always some number of transactions behind the primary. A user who writes a record and immediately reads it from the regional replica may not see their own write — the read is served from the replica's view of the database, which may not yet reflect the write that was sent to the primary. The application must handle this — either by routing reads for recently written data to the primary, by using write-through caching that populates the EU region's cache immediately after the primary write, or by accepting the replication lag as a product constraint documented to users. The consistency behavior of regional read replicas must be specified in the ADR, not discovered when a user reports that "I just saved this and it's not showing up." The queue and messaging ADR intersects here when events triggering regional data updates flow through a message queue — the queue's lag adds to the replica's replication lag as the total staleness window for event-driven regional data.

Regional affinity for write operations in an active-active deployment routes each user's write operations to their home region (the region where their data resides). This eliminates cross-region writes in the common case, at the cost of routing complexity: the application must determine which region is the home region for the current user and route accordingly. The routing decision must be made before the database write — a write sent to the wrong region is either a correctness error (if the regions have independent databases), a cross-region write (if the application is routing the write back to the home region's database), or a conflict (if the wrong region's database accepts the write and replication later conflicts with the home region's write). The routing mechanism — a tenant-to-region lookup in the authentication layer, a regional header injected by the load balancer, or a region identifier in the JWT — must be specified in the ADR and consistent with the authentication strategy.

Multi-region deployment decisions in AI chat history

Multi-region deployments surface in AI chat history across four distinct session types, each recovering different decisions with different consequences.

Initial infrastructure sessions capture the region selection and the first deployment topology: "which AWS region should we deploy to?", "should we set up database replication for disaster recovery?", "how do I make my application resilient to an AWS region going down?", "should I use RDS Multi-AZ or a cross-region read replica?". These sessions contain the rejection rationale for alternatives that were not chosen: why eu-west-1 was not selected alongside us-east-1 ("we don't have any European customers yet"), why synchronous replication was not used ("the latency penalty for writes seemed too high"), why active-active was not implemented ("it's too complex for where we are right now"). The stated reasons are time-bounded: "we don't have European customers yet" is accurate at session time but obsolete six months later when the German logistics prospect arrives. The initial infrastructure sessions also contain the RPO/RTO targets the team was aiming for and whether they were achieved — or whether the disaster recovery goal was stated informally ("we want to be able to recover if a region goes down") without being quantified as specific RTO and RPO values.

Enterprise sales sessions capture the first data residency requirement and the engineering assessment that follows: "our enterprise customer requires EU data residency — what does this mean for our AWS setup?", "which AWS services store data outside the region they're deployed in?", "how do we know if Mixpanel stores data in the EU?", "what is GDPR Article 46 and what transfer mechanisms satisfy it?", "can we just sign standard contractual clauses and keep using our current infrastructure?". These sessions are the highest-value recovery target for the compliance matrix: they contain the engineering team's real-time assessment of which systems are compliant, which are not, and what the remediation path looks like. The sessions also reveal the business context that was driving urgency — deal size, prospect timeline, whether the requirement was negotiable — which explains why specific remediation decisions were made in the order they were. Decisions made under enterprise sales pressure often have a different risk profile than decisions made during deliberate architecture planning, and the AI chat record of those sessions is the only place that context is preserved.

Outage sessions capture the failover procedure under real-world conditions: "us-east-1 is having a major outage — how do we fail over to us-west-2?", "how do we promote the RDS read replica to become the primary?", "what is the DNS TTL for our application domain and how quickly will the change propagate?", "we've rerouted traffic to the secondary but some requests are still hitting the primary — why?", "the secondary database is serving traffic but the application is throwing errors about a read-only connection — what went wrong?". These sessions contain the actual failure modes encountered during the real failover, which are almost always different from the failure modes the team anticipated in the runbook. The outage sessions reveal whether the RPO/RTO targets were met, what steps took longer than expected, and what procedural gaps were discovered under pressure. This is the data that should feed back into the multi-region deployment ADR — but without capturing it in a postmortem that updates the ADR, the incident knowledge stays in the chat history and is lost when the engineer who ran the failover leaves the team.

Performance complaint sessions capture the latency ceiling encountered after the user base becomes geographically distributed: "our EU users are reporting the application feels slow compared to our US users", "why would latency be different for users in different countries if the application code is the same?", "what is the difference between CDN latency and application API latency — we have CloudFront deployed, why isn't it helping for API calls?", "how do we add a European API server without running a separate database in Europe?", "what is a read replica and would it help with latency for EU users?". These sessions contain the team's discovery of the latency floor concept and the options for addressing it. They reveal whether the latency problem was anticipated during the initial infrastructure design or came as a surprise — and what the constraints were on the solution: budget, engineering capacity, database replication complexity, data residency requirements that either helped (by requiring EU infrastructure that also reduces EU latency) or complicated the solution (by requiring EU-data-only infrastructure that cannot serve US users from the EU region).

Writing the multi-region deployment ADR

The multi-region deployment ADR needs six sections to be complete. Each section addresses a distinct set of questions that will be asked by different stakeholders — engineers during incident response, auditors during compliance reviews, sales engineers during enterprise procurement, and new team members trying to understand why the topology is configured the way it is.

Section 1: Region topology and failover model. Which regions are deployed, with the rationale for each (latency requirements for user populations, data residency requirements, disaster recovery requirements). The active-passive or active-active designation. The write region designation and write routing mechanism for active-active. The RPO target (maximum acceptable data loss, expressed as a duration) and RTO target (maximum acceptable downtime from detection to recovered service, expressed as a duration). The acceptance statement for the RPO — for asynchronous replication, the team must explicitly state that they accept losing up to N seconds of committed transactions in a regional failover, because this is a business decision that should not be buried in replication configuration.

Section 2: Failover procedure. The step-by-step procedure for executing a regional failover, including: outage detection (monitoring alert, threshold, responsible party), traffic rerouting (DNS change with specific record values and TTL, or load balancer configuration, with estimated propagation time), database failover (standby promotion or replica promotion steps, application connection string update mechanism, verification that the promoted database is accepting writes), and service verification (health check URLs, acceptance criteria for considering the failover complete). The last-tested date of the procedure. The testing cadence — how often the failover procedure is exercised end-to-end, with production traffic. For a procedure that has never been tested, the ADR must include a testing milestone.

Section 3: Data residency policy. Which data categories are subject to residency constraints, citing the specific regulatory requirements. The technical enforcement mechanisms for each constraint. A compliance matrix listing every third-party service that receives customer personal data, the geographic region of their data storage, whether it satisfies the residency constraint, and either the mitigation (EU data excluded from the service, a EU-region product alternative is used, a Data Processing Agreement is in place with documented SCCs) or the documented acceptance of a gap with a remediation plan. The procedure for evaluating new third-party integrations against the data residency policy before integration.

Section 4: Cross-region consistency model. The replication mechanism (synchronous streaming replication, asynchronous streaming replication, logical replication, managed replication service). The configuration values that implement the consistency model (PostgreSQL synchronous_commit value, synchronous_standby_names, replication slot configuration). The replication lag monitoring: the specific metric used, the alert threshold, and the response procedure when the threshold is exceeded. For multi-primary deployments, the conflict resolution policy per entity type and the evidence that the policy has been tested against realistic concurrent write scenarios.

Section 5: Traffic routing configuration. The routing mechanism (DNS latency routing, anycast, global load balancer). The routing policy per endpoint category: which endpoints have regional affinity (personalized, transactional, session-dependent), which are CDN-eligible with specific cache TTL values, and which serve read queries from regional read replicas. The observed latency by geographic region under normal conditions, and the latency monitoring that will detect when latency degrades. For read replica routing, the documented consistency behavior (read-your-writes guarantee, or documented eventual consistency with the staleness window).

Section 6: Operational model. The deployment sequencing across regions (sequential with health gates between regions, simultaneous, blue-green by region). The monitoring configuration for each region — which dashboards cover each region's infrastructure, whether alerts are region-specific or aggregated, and whether the on-call runbooks cover both regions equally. Capacity planning assumptions by region: is the secondary region sized to handle full primary traffic during a failover? If not, what degradation is expected and accepted? The procedure for adding a new region, including the checklist of systems that must be configured in the new region and the data residency review for that region. The security posture for the multi-region topology — cross-region traffic encryption, inter-region VPC peering security groups, and IAM role configuration per region.

The compounding cost of a missing multi-region ADR

The logistics company deal is a visible example of the cost of a missing data residency decision — a single lost deal. The less visible cost is the accumulated architectural debt that makes the remediation expensive. Every system that was integrated without data residency awareness is a system that must be evaluated, migrated, or excluded during remediation. Every database migration that assumed a single-region topology is a migration that must be re-examined when the schema must support per-tenant region routing. Every analytics event that was sent without regional context is an event whose origin cannot be determined retroactively for GDPR compliance.

The multi-region deployment ADR is not primarily documentation of infrastructure. It is the governance mechanism that makes regional topology decisions visible to the people who need them: sales engineers answering data residency questionnaires, engineers evaluating new third-party tools, on-call engineers executing a failover, and security auditors reviewing GDPR compliance. Without it, each person in each of those situations must reconstruct the architecture from infrastructure code, runbooks, and institutional memory — and the reconstruction is usually incomplete, because the rationale for each choice is not in the code.

Three months of AI chat history contains the complete record of the initial region selection, the first data residency assessment, the first enterprise security questionnaire discussion, and every infrastructure decision made in between — including the rejection rationale for alternatives that were ruled out at the time. Extracting those sessions produces the first version of the multi-region ADR, with the decisions dated to when they were actually made rather than reconstructed under pressure. The open-source extractor surfaces the infrastructure decisions from the chat history; the ADR template gives them a structure that persists when the next engineer or the next enterprise prospect asks the questions that the team has already answered, once, in a chat session that was closed and forgotten.