The mobile deployment decision record: why the release cadence you chose determines your hotfix window and your forced upgrade surface

2026-07-03 · Decision record · Mobile deployment · Release engineering · App store

A consumer fintech app with 180,000 monthly active users ships on a weekly release cadence. The release schedule is decided during the first week of product development — weekly syncs with the weekly sprint, weekly builds, weekly App Store submissions. It is never written down as a decision with explicit reasoning. There is no document that records why weekly was chosen over biweekly or monthly, what the implications are for hotfix window latency, or what backward compatibility the server-side API must maintain to support the inevitable tail of users who do not update promptly.

Eighteen months after launch, on a Wednesday afternoon, the payment engineering team discovers a critical bug in the checkout flow. A race condition in the order submission logic causes duplicate charges when a user taps the payment button twice within 300 milliseconds — a gesture common on slow network connections where the first tap appears to not register. The bug has been live for eleven days, introduced in the previous release. Customer support has received 23 duplicate charge complaints; the team estimates approximately 340 affected transactions totaling $18,400 in duplicate charges across the user base in the eleven days since the bug was deployed. The finance team calculates that at current transaction volume, leaving the bug live costs approximately $1,700 per day in duplicate charges that must be manually refunded.

The mobile engineering team begins preparing a hotfix release. The App Store review for the previous week's submission took 31 hours. The team submits the hotfix build at 4:30 PM Wednesday. App Store review completes at 11:15 PM Thursday — 30 hours and 45 minutes after submission. Google Play approves the hotfix within four hours. But approval is not adoption: 72 hours after the iOS hotfix is available, Mixpanel telemetry shows that 61% of iOS users have updated to the fixed version. The remaining 39% — approximately 44,000 monthly active iOS users — are still running the buggy client. The team has no forced upgrade mechanism. There is no in-app prompt that requires users to update before accessing the checkout flow. There is no server-side minimum version check that blocks checkout requests from client versions known to have the race condition.

The server-side team proposes a fix: add a version check to the payment processing API that rejects checkout requests from client versions below the hotfix version number and returns an error that prompts the user to update. The mobile engineer who built the original payment flow reviews the proposal and raises a concern: the version-gating logic will require parsing the X-App-Version header, which was not included in checkout requests from clients older than version 2.3.0, shipped four months ago. Clients older than 2.3.0 will receive an unhandled server error rather than a prompt to update, because they do not send the version header the gate checks. The team queries the active session database: 8% of active iOS sessions in the past 7 days originated from clients older than 2.3.0. Implementing the server-side version gate risks breaking the checkout flow for these users entirely, replacing a duplicate-charge bug with a complete checkout outage for 8% of iOS sessions. Nobody had documented that the X-App-Version header was a 2.3.0 addition, what client versions were in active use at any given time, or what the server-side API contract was for clients that did not send the header.

The second failure looks different but involves the same undocumented constraint set. A B2B SaaS company with an iOS companion app for field service technicians releases a major API redesign. The new API uses a token-based authentication flow that replaces the session-cookie authentication used by all previous client versions. The engineering team schedules the old authentication endpoint for deprecation 60 days after the new client version ships. On day 61, the team removes the old authentication endpoint. Within 90 minutes, the support queue fills with reports from field technicians who cannot log in. The telemetry team pulls version distribution data and discovers that 22% of active app sessions over the previous week came from client versions that pre-date the new authentication flow. These are technicians who use the app infrequently — monthly or quarterly — and who had not updated since the new client version was available. The company's minimum supported version policy was never formally set; the engineering team assumed "last two major versions" but had never written down what "major version" meant or how the policy would be enforced. The 60-day deprecation window was set by the API team independently, without reference to the actual update adoption rate for this user population. The old authentication endpoint is restored at the emergency priority; the root cause investigation takes two days because nothing documented the update adoption rate for infrequent-use apps, the minimum supported version policy, or the relationship between the deprecation timeline and the forced upgrade surface.

Both teams made their mobile deployment decisions during initial development, when the release cadence seemed like a scheduling choice rather than an architectural constraint. The mobile deployment decision record is the document that makes the relationship between release cadence, hotfix window, forced upgrade surface, and API backward compatibility window explicit — before the first incident reveals that they interact.

What a mobile deployment decision record covers

Mobile deployment decisions are distinct from server-side deployment decisions because the production environment is the user's device, not a server you control. A server-side deployment takes effect instantly for all users. A mobile deployment takes effect only after app store review approves the build, the user installs the update, and the user runs the updated version. These three steps introduce latency, uncertainty, and a multi-version client population that must all be accommodated by the server-side API simultaneously. The decisions that govern this process are made once during initial development and rarely revisited until an incident forces the team to confront their consequences.

The five decisions that belong in a mobile deployment ADR are:

Three structural properties that the release cadence decides

1. The hotfix window and its lower bound

The hotfix window is the elapsed time between a critical bug being discovered in production and a fix being available to all active users. It has four additive components: the time to write and test the hotfix (engineering effort), the time for app store review (platform constraint), the time for users to install the update (adoption latency), and the time until the server-side fix can be safely deployed without breaking users on the old client version (backward compatibility hold). The release cadence affects two of these components directly.

The first is the app store review component. Review latency is not a function of release cadence, but the team's familiarity with the expedited review process is. Teams that release weekly build institutional knowledge of what qualifies for expedited review, what the submission checklist is, and how long to expect. Teams that release monthly or less frequently often discover during an incident that the expedited review process exists but have never used it, adding exploratory overhead at the worst possible time. More importantly, a weekly cadence means the team has recent app store review metrics — they know that the last three submissions took between 18 and 36 hours — while an infrequent release team may be working from stale assumptions about review latency that underestimate the actual window.

The second is the adoption latency component. A weekly release cadence builds user muscle memory for updating — users who update reliably produce a steeper adoption curve after each release. A monthly release cadence produces a flatter adoption curve: users update less frequently, and the tail of non-updating users is longer. The practical consequence is that the server-side API backward compatibility hold — the period during which the old client behavior must be preserved on the server — is longer for monthly-cadence apps than weekly-cadence apps at equivalent forced upgrade policies. The hotfix window's fourth component is therefore shorter for teams with weekly cadence and steep adoption curves, all else being equal. This interaction is almost never documented at cadence selection time, because it only matters if the team later needs to ship a server-side breaking change or a version-gated server-side fix during an incident.

2. The forced upgrade surface and its client version depth

The forced upgrade surface is the set of server-side API contracts that must be preserved to support every client version that the minimum supported version policy permits to connect. It is determined by the product of the release cadence and the forced upgrade window. If releases are weekly and the forced upgrade window is 90 days, the server must support approximately 13 client versions simultaneously. Each of those versions may call different API endpoints, send different request shapes, expect different response formats, and depend on different server-side behaviors that have been superseded in subsequent releases. The union of all those per-version requirements is the forced upgrade surface.

The forced upgrade surface determines the cost of every API deprecation. Before any API endpoint can be removed, the team must verify that no client version within the forced upgrade window still calls that endpoint. If the forced upgrade window is 90 days and the release cadence is weekly, a deprecation submitted with the current release must remain live for at least 13 releases — until the minimum supported version ceiling has advanced past the last version that called the endpoint. For a monthly-cadence app with the same 90-day window, the same deprecation only requires 3 releases of backward compatibility. The monthly-cadence team has a much narrower forced upgrade surface for the same 90-day window, because the window contains fewer distinct client versions.

This arithmetic matters for API design velocity. A team with a weekly cadence and a permissive forced upgrade policy (6-month window) is maintaining backward compatibility for approximately 26 simultaneous client versions. Every API change that is not purely additive requires documenting which client versions are affected, waiting until all affected versions are outside the forced upgrade window, and then executing the removal. Teams that set release cadence and forced upgrade policy independently — without documenting the forced upgrade surface that results from their combination — often discover that their API is effectively immutable in practice, because removing any non-trivial behavior requires a deprecation window longer than any single engineer's planning horizon.

3. The decoupling requirement and the feature flag dependency

The fundamental constraint of mobile deployment is that the server and the client cannot be updated atomically. A server-side change is live immediately; a client-side change requires app store review and user adoption. This temporal decoupling means that the server must support both the old client behavior and the new client behavior simultaneously during the adoption window. For features that are purely additive — a new API endpoint, a new response field that old clients ignore — this decoupling is free: old clients simply do not use the new endpoint or field. For features that change existing behavior — a modified API response format, a changed authentication flow, a revised checkout calculation — the server must implement conditional logic that serves the old behavior to old clients and the new behavior to new clients, typically gated on the client version header.

The decoupling requirement also creates a class of production incidents that cannot be resolved by a server-side change alone: incidents caused by client-side bugs that have no server-side workaround. The only remediation options are a hotfix client release (subject to review latency and adoption latency) or disabling the affected feature entirely via a server-side feature flag (which avoids the incident but removes functionality for all users, including those already running the fixed client version). Teams that invest in server-side feature flag infrastructure before they encounter this class of incident can disable a broken feature within minutes of discovery, capping the blast radius while the hotfix client progresses through review. Teams that do not have feature flag infrastructure must choose between a full hotfix release cycle (days of blast radius) and a complete feature removal (immediate but disruptive). The decision about whether to invest in feature flag infrastructure is a mobile deployment architecture decision that should be made explicitly, with an understanding of what incident mitigation capability it provides and what engineering investment it requires.

Five ADR sections for a mobile deployment decision record

1. Release cadence and emergency release procedure

Document the normal release cadence explicitly, with the reasoning behind the chosen frequency. The reasoning matters because the cadence has downstream constraints: a weekly cadence creates a 13-version forced upgrade surface at a 90-day upgrade window; a biweekly cadence creates a 6-version surface. Record why the current cadence was chosen — sprint alignment, app store review overhead, testing capacity, user update fatigue — so that future engineers can evaluate whether those reasons still hold if the cadence is proposed for change.

Document the criteria for emergency release. An emergency release is a release outside the normal cadence triggered by a production incident that cannot wait for the next scheduled release. Define "cannot wait" precisely: the criteria should specify the impact threshold (data loss, security vulnerability, payment processing failure, core-flow breakage above a defined user percentage), the revenue impact threshold if applicable, and the regulatory or compliance trigger conditions (a GDPR data exposure, a PCI DSS violation in payment handling). Without explicit criteria, each potential emergency release requires a judgment call from leadership under time pressure, introducing delay and inconsistency. With explicit criteria, the on-call engineer can self-certify that an incident meets the emergency release threshold and begin the hotfix build without waiting for approval.

Document the emergency release procedure step by step: who is notified when the emergency release criteria are met, who authorizes the hotfix build, what the expedited review request process is for each platform (Apple App Store Connect's expedited review request form requires a description of the customer-facing impact and the reason it cannot wait for the next scheduled release; Google Play's review typically completes within hours without a special request), what the hotfix branch naming convention is, what the minimum test coverage requirement is for the hotfix (an emergency release may accept a reduced test suite to compress the build-test cycle, but the minimum acceptable coverage must be specified in advance), and who communicates the incident status to users during the review window. Document the post-incident requirement: every emergency release must be followed by a post-mortem that identifies whether the incident could have been mitigated by a server-side feature flag, improving the flag infrastructure to prevent the same class of incident from requiring a future emergency release.

2. Forced upgrade policy and minimum version enforcement

Document the forced upgrade policy: the oldest client version that is permitted to connect to production services without a mandatory upgrade prompt. Express this as a time window ("clients more than 90 days old receive a mandatory upgrade prompt") rather than a version number ("clients older than v3.2.0 are forced to upgrade"), because time-based policies remain correct as new versions are released without requiring manual updates. Document the upgrade prompt behavior: whether the prompt is soft (dismissable, with a reminder on next session) or hard (blocks app use until the update is installed), and what the escalation path is from soft to hard (after three dismissals, the prompt becomes mandatory).

Document the minimum version enforcement mechanism on the server side. The server-side minimum version check is the backstop for the client-side prompt: if a client version below the minimum connects to the server, the server returns a 426 Upgrade Required response (or a custom application-layer equivalent) that the client renders as a non-dismissable update prompt. This server-side check is the mechanism that allows the server to block known-buggy client versions from executing sensitive operations — payment processing, data deletion, authentication — after a critical bug is discovered in a specific version range. Document which endpoints are subject to server-side minimum version enforcement, what the response format is (so client-side handling can be implemented correctly in a single place rather than per-endpoint), and what the procedure is for updating the minimum version floor when a critical bug in a specific version range is discovered post-release.

Document the minimum version floor advancement policy: after each release, how does the minimum version floor change? A policy that advances the floor automatically after every release (the floor is always "current minus N versions") produces a predictable forced upgrade surface. A policy that advances the floor manually (the team sets the floor when a specific old version's active session count drops below a threshold) produces a more flexible surface but requires active maintenance and produces inconsistency between releases. Record which policy is in use and why, including the threshold at which active session counts are considered negligible enough to advance the floor past a given version.

3. Server-side API backward compatibility window

Document the minimum backward compatibility window for each API endpoint category, derived from the release cadence and forced upgrade policy. The formula is: backward compatibility window = forced upgrade window + maximum app store review latency + adoption tail buffer. If the forced upgrade window is 90 days, app store review typically takes up to 48 hours, and the team uses a 7-day adoption tail buffer (allowing for users in regions with slower update adoption), the minimum backward compatibility window is 99 days — approximately 100 days. This window applies to every API contract change that is not purely additive: removing an endpoint, removing a response field, changing a field's type or semantics, or changing the authentication mechanism.

Distinguish between API endpoint categories with different backward compatibility requirements. Authentication endpoints and payment processing endpoints have higher backward compatibility requirements than feature endpoints because a backward compatibility violation in authentication or payment handling causes a complete functional outage for affected users, while a backward compatibility violation in a non-critical feature endpoint may cause a degraded experience. Document the backward compatibility window separately for each category: authentication endpoints (minimum window equal to the full forced upgrade window plus the adoption tail), payment processing endpoints (same), feature endpoints (shorter window acceptable if the client-side behavior degrades gracefully when the endpoint returns an error), and deprecated endpoints (the window during which a deprecated endpoint remains live after the replacement is shipped).

Document the process for extending the backward compatibility window when the active session distribution shows a long tail of old client versions. If telemetry shows that 12% of active sessions in the past 30 days are running client versions older than the nominal forced upgrade window (indicating that the forced upgrade policy is not being enforced as designed, or that the app store review for the update containing the forced upgrade prompt was delayed), the backward compatibility window must be extended beyond its nominal value until the tail drops below a defined threshold (typically 1–2% of active sessions). Document the telemetry query that measures the active version distribution, the threshold at which the tail is considered negligible, and who is responsible for monitoring the distribution before advancing a deprecation.

4. Feature flag strategy for mobile releases

Document the feature flag infrastructure and its operational scope. Server-side feature flags that control client-visible behavior are the primary tool for decoupling the server deployment timeline from the client release timeline. A server-side flag that disables a broken feature takes effect immediately for all clients, including those running the buggy client version, without requiring a new client release or app store approval. Document which classes of features are eligible for server-side flag control (typically: UI feature entry points that can be hidden server-side, API endpoints that can return a disabled response, and behavioral flags that clients check before executing a code path), and which classes are not eligible (client-side rendering logic, local state management, offline behavior — these require a client release regardless of flag infrastructure).

Document the flag resolution architecture: whether flags are resolved by the client making a flags API call at startup (requiring network connectivity), embedded in the authentication response (resolved as part of every session), or resolved at each API call (real-time resolution with per-request latency cost). The architecture determines the flag update latency — the time between changing a flag value in the flag management interface and all active clients seeing the new value. For incident mitigation, low flag update latency (under 60 seconds) is required for the flag to be useful as an emergency response tool. Document the expected flag update latency for the chosen architecture and its degradation behavior when the flag resolution service is unavailable (fail-open: all flags use their default values; fail-closed: the app is blocked until flags resolve).

Document the flag lifecycle policy: when a flag is created, what is its expected lifetime? Flags created for a gradual feature rollout should have an expiration date (the date by which the feature reaches 100% rollout and the flag is removed). Flags created for emergency incident mitigation should have a post-incident review trigger (after the hotfix client reaches 100% rollout, the flag is removed from the codebase). Undocumented flags that persist indefinitely become invisible constraints: they are enabled in production but nobody remembers why, and disabling them to test a hypothesis risks uncovering an active dependency. The mobile deployment ADR should specify the flag lifecycle policy and the flag audit procedure (quarterly review of all active flags, with a required justification for any flag that has exceeded its planned lifetime).

5. Staged rollout strategy and rollback procedure

Document the staged rollout percentage schedule for normal releases and the different schedule for hotfix releases. Both Apple App Store and Google Play support phased releases that incrementally expose the new version to a percentage of users, allowing the team to monitor crash rates and error rates before reaching 100% rollout. The normal release schedule might be: 5% on day 1, 20% on day 2, 50% on day 3, 100% on day 5 — gated on crash rate and core-flow error rate staying within defined thresholds. The hotfix release schedule should be faster: 10% immediately, 50% after one hour of stability monitoring, 100% after four hours — because the hotfix addresses a known production incident and the urgency of reaching 100% adoption is higher than for routine feature releases.

Document the monitoring gates that advance or halt rollout. The gates must be specific and measurable: "crash rate per active user below 0.5%" is a gate; "no significant increase in crashes" is not. For each gate, document the data source (Crashlytics crash rate by app version, the payment processing error rate from the server-side API logs, the core-flow completion rate from the client-side event stream), the threshold value, the measurement window (crash rate measured over the past 15 minutes of data, not the past hour, to enable faster gate evaluation), and the escalation path when a gate fails (who is notified, what the default decision is, and who has authority to override the gate and continue rollout manually when the failure is explained by a known confound).

Document the rollback procedure for staged releases. For Google Play phased releases, rolling back means reducing the rollout percentage to 0% — which stops the new version from being offered to additional users but does not automatically revert users who have already updated. For Apple App Store phased releases, pausing the release stops further distribution but already-updated users remain on the new version. A complete rollback — returning all users to the previous version — is not possible through app store mechanisms for users who have already updated. The rollback procedure must therefore account for this asymmetry: document the server-side mitigation that is applied when a staged release is halted (typically: enabling a feature flag that disables the problematic behavior for all clients, including those that have already updated), the criteria for halting a staged release rather than pausing it, and the criteria for resuming rollout after a halt versus abandoning the release and beginning a hotfix build for the root cause.

Further reading