The API deprecation strategy decision record: why the sunset timeline you chose determines your breaking-change migration cost and your consumer communication surface
The sunset timeline on a deprecated API version is decided in a Slack message thread and never written down. Two years later, the v1 endpoint is still serving 40% of production traffic because the engineers who own consuming services never saw the deprecation notice — or saw it once, filed it as a future concern, and had no forcing function to move it above their current sprint commitments. The decision that determines how many simultaneous API versions you are maintaining forever is not the versioning scheme you chose. It is the deprecation process you never documented: how long consumers have to migrate, how you will reach the specific engineers who have code to change, and what mechanism will ensure they migrate before the deadline rather than after the incident.
A devtools company built a CI/CD platform for 2,000 engineering teams and shipped a REST API for webhook event delivery in early 2022. The webhook payload schema in v1 used camelCase field names (buildId, pipelineStatus, triggeredBy) and a flat structure. By mid-2023, the company had accumulated enough customer feedback to know that a redesigned payload structure — snake_case field names, a nested metadata object, and an explicit event_version field — would substantially reduce the integration effort for new customers. The change was breaking: existing consumers had code that destructured camelCase fields from the webhook payload, and renaming those fields would break every consumer silently if the consumer did not explicitly handle an unexpected field structure.
The engineering team decided to ship the v2 webhook schema and deprecate v1 with a 12-month sunset timeline. They published a changelog entry in October 2023: "Webhook payload v1 deprecated. v2 schema now available at /webhooks/v2/. v1 will be removed October 1, 2024." They updated the API documentation. They sent one email to the API developer mailing list with the same information. They considered adding a Sunset response header to v1 webhook responses but decided against it because it required a code change and they were behind on the v2 release.
In April 2024 — six months into the 12-month window — the team queried their API gateway access logs for traffic to the v1 webhook endpoint. The traffic had not declined. Of approximately 1,800 active consumer integrations registered against v1, only 340 had migrated to v2. The remaining 1,460 integrations were still sending to v1. The team sent a second email: "Reminder: v1 webhooks are being removed October 1, 2024. Please migrate to v2." Open rate: 31%. Of those who opened the email, most were engineering managers who forwarded the message to their teams with "is this relevant to us?" attached. A large fraction of those forwards were never acted on.
In August 2024, the team extended the sunset timeline to April 1, 2025. They posted the extension in the changelog and sent a third email. By October 2024 — the original sunset date — 720 of 1,800 consumer integrations had migrated. The team had now spent 18 months maintaining two webhook payload schemas in parallel, keeping both documented, fixing bugs in both paths, and fielding support tickets that required knowing which schema version the consumer was using. On April 1, 2025, the team removed the v1 endpoint. Traffic that had been declining accelerated its decline, but 230 integrations made requests to the now-removed v1 endpoint on April 1. Three enterprise customers opened urgent support tickets; two of those customers had CI pipelines that broke in production because a webhook event their pipeline was waiting for was never delivered. The engineering team handled six hours of incident calls. None of those three customers had replied to any of the three deprecation emails.
A 150-person SaaS company in the HR technology space had the inverse problem: an internal API deprecation with a known consumer count. Their platform team had built an authentication service in 2021 that issued session tokens via a custom token protocol. In 2023, the company standardized on OAuth 2.0, implemented an identity provider, and needed to retire the old authentication API. Unlike the external webhook case, the platform team knew exactly which services consumed the old auth API: they had an internal service registry listing 17 microservices as registered consumers.
The platform team set a 90-day sunset timeline and sent one email to the engineering-all distribution list: "Old auth API is being removed in 90 days. All services must migrate to the new OAuth 2.0 flow. Migration guide at internal-docs.example.com/auth-migration." Forty-five days later, a team lead on the platform team ran a query against the API gateway access logs: 12 of 17 services were still hitting the old authentication endpoint. The platform team reached out to the 12 teams individually. The response revealed a consistent pattern. The deprecation email had been read by engineering managers who then forwarded it to their teams or flagged it in a Slack channel. In 8 of 12 cases, no engineer on the team could find the original email or the migration guide link. In 4 of 12 cases, the migration had been added to the backlog but not yet scheduled because it was not blocking any current feature work and the sprint was full. In 2 of 12 cases, the service was owned by a contractor who had left the company 3 months earlier and no current engineer had working knowledge of the service's authentication flow.
The platform team extended the deadline by 45 days. At the new deadline, 3 services remained on the old endpoint: one was the contractor-owned service where the migration required archaeology into undocumented code; one was a legacy reporting service that ran on a quarterly cadence and the relevant sprint had already been committed; one was a third-party vendor integration that used the old auth endpoint via an API key and required a vendor support ticket to update. The old auth API continued running for 5 additional months while these three cases were resolved. The 90-day migration project took 8 months.
Three structural properties set at deprecation decision time
Both outcomes share the same root: the deprecation decision was made as a date, not as a process. The date was communicated; the process — how consumers would be identified, reached, and moved — was not documented and therefore did not exist as an executable plan. Three structural properties of the deprecation decision determine whether the sunset date holds.
1. Sunset timeline and migration window length
A sunset timeline has two failure modes: too short and too long. A timeline that is too short produces a forced extension — the engineering team discovers at 80% of the window that consumer migration rate is not on track to meet the deadline and adds months. Every forced extension erodes the credibility of future deadlines, which accelerates the problem in the next deprecation cycle. A timeline that is too long produces indefinite deferral — the deadline is beyond the planning horizon of any engineering team's sprint cadence, so no team adds the migration to their roadmap, and the API owner loses the organizational will to enforce the deadline as the date approaches. Internal-facing stakeholders conclude that the team doesn't actually mean it.
The realistic migration window for an external API with a breaking payload change is a function of three variables: the number of consumers, the per-consumer migration effort, and the scheduling latency — the time between when an engineer learns about the migration requirement and when they can start it. For the CI/CD webhook example: 1,800 consumers, medium migration effort (2–6 hours per integration), scheduling latency of 4–8 weeks for a task that is not blocking current product work. The minimum realistic window for a consumer to learn about the migration, prioritize it, implement it, test it, and deploy it is 10–14 weeks. The window for achieving 95% consumer migration across 1,800 integrations — accounting for the longest tail (enterprise customers who require change control processes, consumers who have the integration buried in a legacy system with unclear ownership, and consumers who discover the migration only when they investigate a production break) — is 12–18 months for a well-communicated migration with an enforced forcing function.
The relationship between timeline and enforcement is not linear. A 6-month timeline with a hard sunset enforced by a 429 "deprecated endpoint" response degradation beginning at month 4 will drive higher migration rates than a 12-month timeline with no enforcement mechanism other than a future removal. The timeline length is less important than the forcing function. A 6-month timeline with weekly graduated degradation — error rate increasing by 10% per week starting at week 20 — produces more migration than a 12-month timeline with a single hard cutoff at month 12 and no intermediate signal.
For internal APIs with a known consumer set, the timeline calculation is different. With 17 known consumers and a migration guide that reduces per-consumer effort, a 90-day timeline is theoretically achievable — but only if the deprecation notice reaches the engineer who owns the consuming code, that engineer has sprint capacity to prioritize the migration, and no consumer requires special-case handling. The HR technology platform team's failure was not the 90-day timeline; it was that the 90-day timeline assumed all 17 teams would receive the notice through one email to a distribution list and prioritize the migration without a forcing function. Neither assumption held.
2. Consumer communication surface
The consumer communication surface is the set of channels through which the deprecation notice reaches consumers, and more specifically the probability that it reaches the engineer who has code to change rather than the engineering manager who has an inbox to triage. These are different people with different information processing behaviors, and a deprecation notice that reaches only the manager produces a calendar event that produces a Slack message that produces a mention in a standup that produces nothing in a sprint backlog.
The channels available have different reach characteristics. A changelog entry reaches engineers who read changelogs, which in practice means early adopters, developers actively debugging integration issues, and developers who follow the product closely. This is typically 5–15% of the active consumer base for an external API. A deprecation email to a developer mailing list reaches developers who subscribed to the list, opened the email, and read it carefully enough to understand they have action required — approximately 15–25% of active consumers in a well-maintained list. A Sunset response header (RFC 8594) returned on every response from the deprecated endpoint reaches every consumer who has HTTP response monitoring and every engineer who inspects the API response while debugging, which in a production service with observability tooling approaches 80–90% of the consumer base — because the header appears in the service's own monitoring output without requiring the engineer to take any proactive action to receive it.
The Sunset header's reach advantage over email is not about channel preference. It is about the information context in which the notice appears. An email about a future API deprecation arrives in a context where the engineer is processing email, not working on the service that consumes the API. The connection between the notice and the action required — making a code change in a specific service — requires the engineer to mentally map the deprecated endpoint to their service's integration, estimate the effort, and decide to put it in the sprint backlog. An alert generated by the Sunset header appears in the context where the engineer is looking at that service's monitoring dashboard, observing the service's HTTP traffic, or debugging a production issue. The deprecated endpoint and the service that uses it are already in the same mental context. The connection between notice and action requires no mapping.
The Sunset header also provides a machine-readable deadline: Sunset: Tue, 01 Apr 2025 00:00:00 GMT. Monitoring systems can parse this header and auto-create alert rules that escalate as the deadline approaches — 90 days out: informational; 30 days out: warning; 7 days out: critical. This escalation happens automatically without requiring the API team to send additional emails or the consumer team to track the deadline manually. The API consumer's own alerting infrastructure enforces awareness of the deadline on the API team's behalf.
For internal APIs where the consumer set is known, the communication surface should include direct individual outreach to the specific engineers who own each consuming service — not distribution list email, but a message to the person whose name appears on the service in the internal service registry, or a ticket created in the team's project tracker with the migration guide attached and the sunset deadline as the due date. The goal is to put the migration into the consumer team's ticketing system, where sprint planning can act on it, not into the manager's inbox, where it will be triaged and potentially lost.
3. Migration forcing function
A sunset date without a forcing function is a suggestion. Every engineering team's sprint planning prioritizes migration work against currently shipping features, currently active bugs, and currently escalated incidents. A migration that is not blocking anything and has no near-term consequence for deferral will be deferred every sprint until it is blocking something — which means it will be deferred until the API is removed and the consumer breaks in production. The forcing function is the mechanism that makes deferral more costly than migration at some point before the deadline.
Forcing functions exist on a spectrum from cooperative to coercive. At the cooperative end: the Sunset header, which creates awareness but no degradation; deprecation warnings in the API response body or as an X-Deprecation-Warning header; and documentation that stops receiving updates for the deprecated version. These mechanisms inform consumers without degrading service, which respects the consumer's ability to plan the migration. The cost is that a consumer who is comfortable with the current integration may deprioritize migration as long as the service works normally.
In the middle of the spectrum: graduated degradation. Beginning at a defined point in the migration window — typically 70–80% through the timeline — the deprecated endpoint's error rate begins increasing: 1% of requests return a 429 with a response body containing the migration guide URL and the sunset date; 2% the following week; 5% the week after; escalating until the endpoint is fully removed. This approach gives consumers a degraded-but-functional service during the migration window, which creates urgency without causing a hard production outage. The degradation is graduated so consumers who discover the problem early have time to respond; consumers who discover it late are operating on a worse service-level agreement and have an operational incentive to migrate quickly. The cost to the API owner is that graduated degradation requires implementation and monitoring — adding the degradation logic to the endpoint, monitoring the degradation rate as it escalates, and handling the support tickets from consumers who notice the degradation before they understand it is intentional.
At the coercive end: a hard cutoff, where the endpoint returns 410 Gone on and after the sunset date. A hard cutoff is the most effective forcing function and the most expensive for consumers who miss it. In the CI/CD webhook case, the three enterprise customers who had production CI pipelines break when the v1 endpoint returned 410 were at the coercive end of this spectrum — they had effectively experienced the coercive forcing function of the hard cutoff without the preparation that the graduated degradation approach would have provided. The hard cutoff should be reserved for situations where the deprecated endpoint poses a security risk (credentials or tokens that cannot be rotated without removing the endpoint), where supporting the deprecated endpoint is blocking the API team's ability to ship critical changes, or where all other forcing functions have been applied and a tail of holdout consumers has been individually contacted and has had extended support options presented to them.
Five sections the API deprecation ADR must contain
1. Breaking change classification and consumer discovery
The first section documents how the team classifies breaking changes and how consumers are enumerated before the sunset timeline is set. A change is breaking if it causes a consumer integration to produce incorrect results or throw an unhandled exception when the consumer code is unchanged after the change is deployed. Field renames are breaking. Status code changes are breaking for consumers that branch on specific codes. Response structure changes that alter the nesting of fields the consumer accesses are breaking. Removing endpoints is breaking. Adding required fields to a request schema is breaking. Deprecating optional fields is not breaking but is a migration signal.
Consumer discovery for an external API is a query against the API gateway access logs: SELECT api_key_id, client_name, last_seen_at, request_count_30d FROM api_access_log WHERE endpoint_prefix = '/v1/webhooks' AND last_seen_at > NOW() - INTERVAL '90 days' GROUP BY api_key_id ORDER BY request_count_30d DESC. This query produces the consumer list the team is managing. For each consumer with a request count above a minimum threshold (to exclude one-time test callers), the team needs a migration contact — the email address of the developer registered against that API key, plus any escalation contact from the customer account. For internal APIs, the consumer list is the service registry filtered by the deprecated endpoint as a declared dependency.
The consumer discovery output feeds the migration window calculation: number of consumers × estimated per-consumer migration effort in engineering-hours, divided by the average engineering capacity of the teams that consume this API, adjusted by a scheduling latency multiplier (typically 2–3×, because an engineering team's available sprint capacity for a non-blocking migration is a fraction of their total capacity). This calculation gives the minimum realistic migration window before the sunset date is announced. If the calculation produces a window longer than 12 months, the team should consider whether graduated deprecation with extended support contracts is more appropriate than a firm sunset.
2. Sunset timeline and migration window policy
The second section documents the policy for calculating sunset timelines for different change types and consumer populations. A policy separates the deprecation decision into two categories: internal APIs (known consumer count, direct communication path, internal sprint prioritization mechanisms available) and external APIs (unknown consumer count until discovery, indirect communication, no control over consumer prioritization).
For internal APIs: the minimum timeline is 60 days for a simple migration (authentication header change, field rename, no logic change), 90 days for a medium migration (response structure change, protocol change with a migration guide), and 180 days for a complex migration (endpoint removal requiring architectural change in the consumer). These minimums assume the communication protocol in section 3 reaches every consuming team's engineering backlog within the first two weeks. If the communication protocol cannot guarantee that, add 30 days to each category.
For external APIs: the minimum timeline is 6 months for a simple migration, 9 months for a medium migration, and 12 months for a complex migration, measured from the first public communication to the first consumer-visible degradation. The hard removal date is 3 months after the degradation begins. The timeline policy should also define what constitutes an acceptable extension policy: a single extension is permitted for migrations where consumer discovery reveals a larger-than-expected tail; two extensions indicate a process failure (forcing function was insufficient or timeline was unrealistic) and trigger a retrospective before the rescheduled date. Indefinite extensions are prohibited — the maintenance cost of an indefinitely-supported deprecated endpoint is unbounded and must be explicitly budgeted if the decision is made to continue support.
The policy should also define extended support contracts: if a consumer cannot migrate within the sunset window due to factors outside their control (regulatory review process, contractual constraint, architectural dependencies requiring multi-quarter planning), the API team may offer an extended support contract that specifies a fixed additional period (maximum 6 months), a maintenance fee that covers the marginal cost of supporting the deprecated endpoint, and a non-renewable clause. Extended support is a business decision, not a technical one, and should be negotiated at the account management level rather than the engineering level.
3. Consumer communication protocol
The third section documents the sequence of communication actions the team takes when deprecating an endpoint and the verification steps that confirm each action reached its target. The protocol is a checklist, not a one-time broadcast.
The communication sequence for an external API deprecation: (1) Add the Sunset response header (RFC 8594) to every response from the deprecated endpoint on the announcement date, with the value set to the hard removal date. (2) Add a Link response header pointing to the migration guide: Link: <https://docs.example.com/migrate-to-v2>; rel="sunset". (3) Publish a changelog entry with the exact sunset date, a link to the migration guide, and a contact email for migration support questions. (4) Send an email to all developer contacts registered against API keys that have hit the deprecated endpoint in the last 90 days — not to a general distribution list, but to the specific developer accounts that are active consumers. Include the sunset date, the migration guide URL, and a request for acknowledgment. (5) Repeat the targeted email at 70% of the migration window and at 30 days before the degradation start date. (6) At 30% of the migration window, query the access logs for consumers who have not yet migrated and send those accounts a direct outreach message from a named account manager or support engineer, not an automated system.
The verification step after each communication action is a query against the access logs: what percentage of consumers who were on the deprecated endpoint before the communication are still on it 2 weeks after? If the migration rate is not on track to meet the timeline — a rough heuristic is 10% of consumers migrating per month for the first half of the window, accelerating to 20–30% per month in the second half — the communication protocol should escalate to more direct outreach before extending the timeline. A timeline extension without a change to the communication protocol will produce the same result as the original timeline.
4. Migration forcing function design
The fourth section documents the specific forcing function the team will implement and the triggering conditions for each escalation step. The forcing function is a technical mechanism, not a social one — it cannot rely on consumers voluntarily prioritizing migration before the deadline.
The recommended forcing function for an external API with a high consumer count is graduated response degradation beginning at 75% of the migration window: (a) Weeks 1–2 of degradation: 1% of requests to the deprecated endpoint return a 429 Too Many Requests with a response body of {"error": "deprecated_endpoint", "sunset_date": "2025-04-01", "migration_guide": "https://docs.example.com/migrate-to-v2"}. The 429 should not be logged as an error in the API team's monitoring — it is intentional. The consumer's monitoring will log it as an unexpected error rate, which is the intended behavior. (b) Weeks 3–4: 5% of requests return 429. (c) Weeks 5–6: 15% of requests return 429. (d) Weeks 7+: 50% of requests return 429 until the sunset date, at which point the endpoint begins returning 410 Gone for all requests.
The graduated schedule gives consumers who discover the degradation in week 1 a 6-week period to migrate while the service is still 99% functional. Consumers who discover it in week 5 have 2 weeks to migrate at 85% service quality. The schedule makes deferral increasingly costly without causing a hard production outage until the endpoint is fully removed. The 429 error code is appropriate because it is retriable — consumer clients with retry logic will succeed on a retry, keeping the consumer's integration functional while producing the visibility signal in their monitoring.
For internal APIs, the forcing function can be more direct: create a ticket in each consuming team's project tracker with the migration deadline as the ticket due date and the team lead's name as the assignee. This bypasses the email-to-manager pathway and places the migration work in the planning tool where sprint capacity decisions are made. Follow up with a weekly status query in the team's Slack channel asking for a migration status update. For teams that have not started migration at the 50% mark of the window, escalate to their engineering manager with a specific ask: "what sprint will this team complete the migration in?" This is a social forcing function that works for internal teams because the relationship between the API owner and the consumer team allows for direct coordination.
5. Backwards-compatibility shim cost and the indefinite-support trap
The fifth section documents what happens when consumers cannot complete migration within the sunset window and the API team decides to continue supporting the deprecated endpoint beyond the announced date. This section exists because the decision to extend support is almost always made under time pressure — an enterprise customer threatens churn, a consumer's migration is blocked by a dependency outside their control, the engineering team does not want to absorb the incident cost of a hard cutoff — and without a documented cost model, the extension decisions accumulate until the deprecated endpoint has been maintained for 3+ years.
The backwards-compatibility shim cost has three components. First, direct maintenance: every bug fix, security patch, and infrastructure upgrade applied to the current API version must also be evaluated for applicability to the deprecated version. If the deprecated version has a divergent code path, bugs fixed in the current version may also exist in the deprecated version and require a separate fix. Security vulnerabilities in shared infrastructure — authentication middleware, request parsing, database connection handling — must be patched in both paths simultaneously. For a deprecated endpoint that shares infrastructure with the current version, maintenance overhead is approximately 15–20% of the current endpoint's maintenance cost; for a deprecated endpoint with a divergent code path, it can approach 50%.
Second, documentation divergence: the deprecated endpoint must maintain its own documentation, separate from the current endpoint. Every time the current endpoint's documentation is updated, the team must decide whether the deprecated endpoint's documentation also needs to be updated. Over time, this decision is consistently made in favor of not updating the deprecated documentation — it costs effort and the endpoint is "going away anyway." Consumer support tickets for the deprecated endpoint then require engineers to recall how the deprecated endpoint worked from memory or from code, rather than from documentation. This is the primary source of the "only one engineer knows how the old API works" problem.
Third, opportunity cost: every engineering hour spent on deprecated endpoint maintenance is an hour not spent on the current product. For a team of 5 engineers maintaining a deprecated endpoint at 20% overhead, that is 1 engineer-equivalent per year. Over 3 years of indefinite support, the cumulative cost is 3 engineer-years — the equivalent of a full-stack hire who built nothing new. The decision to extend support for 6 months to avoid a customer churn event is often correct in isolation; the decision to extend support indefinitely because the forcing function was insufficient to drive consumer migration is rarely documented as the cost it represents.
The indefinite-support trap is broken by two practices: the non-renewable extended support contract (maximum 6 months, maintenance fee, explicit end date with no extension option), and the versioning strategy decision that limits simultaneous supported versions to a defined maximum — typically two (current and one previous) for internal APIs and three (current, previous, and LTS) for external APIs with enterprise customers who require long migration windows. When a new version is added that exceeds the maximum, the oldest version's extended support contract is terminated regardless of migration completion status. This constraint makes the cost of each versioning decision visible at the time the version is added, rather than allowing the supported version count to grow without bound.
What a ChatGPT session on API deprecation usually leaves out
Most AI chat sessions on API deprecation cover the versioning scheme and the changelog announcement. The session produces a migration guide outline, a deprecation notice template, and possibly a note about the Sunset response header. What the session rarely produces is the forcing function design, the consumer discovery query, and the cost model for extended support — because those require knowledge of the specific consumer base, the engineering team's incident history, and the organization's track record on following through on sunset dates. These are the details that live in the team's actual AI chat history, not in the published deprecation documentation.
An export of the AI chat sessions around an API deprecation decision typically surfaces: the initial discussion of what counts as a breaking change and how to classify this specific change; the timeline calculation discussion (often producing a date that turns out to be too optimistic); the debate about whether to use the Sunset header or rely on email; and, most valuably, the acknowledgment that the team has done this before and it did not work — the conversation where someone on the team said "last time we said 6 months and it took 18, so we should just say 12 months from the start." That context is the deprecation strategy. It is not in the changelog entry.
The five ADR sections above — breaking change classification, sunset timeline policy, consumer communication protocol, forcing function design, and extended support cost model — are the structured form of that conversation. The team that extracts these from their AI chat history and writes them down as a decision record has an executable deprecation process rather than a repeated pattern of announced dates and extended timelines. The team that does not has a history of deprecations that took longer than the sunset date, cost more in maintenance than the incident they were trying to avoid, and produced at least one production outage for a consumer who was confident the deadline would be extended again.
Further reading
- The API versioning decision record — the undocumented versioning strategy and the record to write when a breaking change forces the question
- The API versioning strategy decision record — URL path vs. header vs. date-based versioning and how the choice constrains migration in year three
- The API schema design decision record — schema choices that determine the frequency and cost of breaking changes
- The API gateway decision record — gateway layer for consumer traffic monitoring and deprecation enforcement
- The API rate limiting decision record — rate limiting infrastructure reused for graduated deprecation enforcement
- The GraphQL vs. REST decision record — GraphQL field-level deprecation and how it changes the consumer migration surface
- The test strategy decision record — consumer contract testing and how it surfaces breaking changes before they reach production
- The error handling strategy decision record — error response conventions during the degradation window
- The logging strategy decision record — access log structure that makes consumer discovery queries possible
- The notification system decision record — webhook infrastructure and how deprecation affects event delivery
- The CI/CD pipeline decision record — pipeline consumers and how breaking changes propagate through build and deployment automation
- The decisions that never get written down — how deprecation process decisions join the class of consequential undocumented choices
- The WhyChose open-source extractor — recover the original API deprecation discussion from your AI chat history