Why does 'we keep everything forever' count as a retention strategy that should be documented?

A decision not to delete data is a retention strategy with specific consequences: it commits the team to growing storage costs, an expanding GDPR compliance surface, and query performance degradation that compounds over time. 'Keep everything' typically means one of two things — either the team evaluated deletion and decided the data's future value outweighs the cost of keeping it, or the team never considered deletion at all and is operating on the path-of-least-resistance default. Both of these are real claims about the team's engineering and compliance model. The first is a defensible decision that can be explained to an auditor or a new engineer. The second is a liability that compounds until it cannot be ignored — at which point the team is discovering their retention policy rather than enforcing one they made deliberately.

What should a data retention ADR include?

A data retention ADR should include: the data classification inventory at the time of the decision (which tables and columns contain personal data, which contain business-critical data, which contain purely operational data that is safe to delete); the deletion policy for each data category (what gets deleted, when, by what mechanism — hard delete, soft delete, or anonymization — and who triggers it); the backup and archival retention period (how long full and incremental backups are retained, and whether archives are subject to the same deletion policy as the primary store); the compliance scope statement (which regulations apply to which data categories and what the minimum and maximum retention periods are for each); a legal hold exception policy (what happens to the deletion schedule when a data category is placed under legal hold); and a revisitation condition that names when the retention policy should be re-evaluated, typically tied to GDPR territory expansion, significant user growth, infrastructure cost thresholds, or a first legal hold event.

What is the difference between a soft delete and hard delete decision and why does it belong in an ADR?

A soft delete marks records as deleted in the database without removing them (typically with a deleted_at timestamp or an is_deleted flag), while a hard delete removes the record entirely. The choice between them determines whether a GDPR right-to-erasure request can be satisfied by flipping a flag or requires a multi-table deletion cascade; whether 'deleted' records appear in query results if engineers forget to filter on the deleted_at column; how backup restoration interacts with deletion requests (a backup restored from before the deletion will resurface deleted records); and how audit trails are maintained for regulatory purposes. These trade-offs interact with the compliance scope, the backup strategy, and the query performance model in ways that are not obvious from reading the schema. The soft-delete-versus-hard-delete decision deserves an ADR because the downstream implications compound over the life of the system, and the reasoning that made soft delete the right choice at one scale of user data may not hold at ten times that scale.

How do you find data retention decisions in AI chat history?

Retention decisions appear in AI chat at three predictable points: early database design sessions ('should we soft-delete or hard-delete users?', 'do we need to store every event or can we aggregate after 90 days?'), GDPR compliance preparation sessions ('what personal data do we store and where?', 'how do we handle a right-to-erasure request?'), and infrastructure cost review sessions ('our S3 costs are growing 20% per month, what can we archive?', 'the user_events table is 400GB, do we need all of it?'). The sessions that made the retention decision by default are identifiable through the absence of a conclusion: 'probably fine for now' or 'we can deal with that later' at the end of a compliance thread is a retention decision without a trigger condition. The WhyChose extractor identifies these through the compliance vocabulary (GDPR, retention, erasure, PII, archival) and the cost-concern vocabulary (table size, storage growth, archival, S3 lifecycle).

2026-06-14 · ~14 min read

The data retention decision record: why "we keep everything forever" is an undocumented retention strategy

Most engineering teams discover their data retention policy during a GDPR audit, a legal hold request, or the quarter when their infrastructure costs spike 40% and someone asks why the user_events table is 800 gigabytes. The policy was always there — accumulated across a thousand small choices and non-choices: the decision not to add a deleted_at column, the decision to keep raw logs "in case they're useful for debugging," the decision to retain every user session record because "storage is cheap." These are not defaults. They are undocumented decisions with legal, financial, and performance consequences that compound over the life of the system.

Data retention looks like an infrastructure concern because the infrastructure makes it visible. The growing S3 bill is visible. The query that used to take 20 milliseconds and now takes 2 seconds is visible. The 3-hour backup window that is threatening the recovery-time objective is visible. What is not visible is the decision behind any of this: whether the team explicitly chose to keep data indefinitely or simply never chose to delete it; what the data classification is for each table and whether it falls under GDPR, CCPA, or neither; whether "soft delete" means the data is removed from user-visible interfaces but still exists in the database, or whether it means the data is genuinely inaccessible to all systems; and what the team's actual obligation is when a user submits a right-to-erasure request under Article 17.

This invisibility matters because data retention decisions are among the few architectural decisions where the legal consequences are borne by parties outside engineering. A caching strategy that performs poorly affects engineering and product. A data retention policy that violates GDPR affects the user whose data was retained without a lawful basis, and the regulatory body that receives their complaint. Writing the retention ADR is not a compliance exercise. It is the act of making the team's data obligations legible — to the compliance team, to the legal team, to the engineers who will implement the next feature that touches personal data, and to the auditor who will eventually ask: "What is your data retention policy, and when did you decide on it?"

Why "keep everything" is a decision, not a default

The claim "we don't have a data retention policy" is not a neutral statement. It is a description of an implicit policy: everything is retained indefinitely unless explicitly deleted. That policy has specific consequences. It commits the team to storage costs that grow linearly with user count and activity over the lifetime of the system. It commits the team to an expanding GDPR compliance surface — every new row in a table containing personal data is a record that must be accounted for in a deletion request. It commits the team to a backup scope that grows with the database, lengthening backup windows and increasing recovery complexity. And it commits the team to query performance degradation that is invisible until a table crosses a size threshold and suddenly becomes the bottleneck for a query that has not changed.

None of these consequences were chosen. They were accepted by default when the team did not make a retention decision. The engineer who added the user_events table at the beginning of the product's life made a schema decision, not a retention decision — and the two are not the same. The schema decision was about what to store. The retention decision is about how long to keep it. A team that has made hundreds of schema decisions without any retention decisions has accumulated retention obligations that are entirely shaped by the path of least resistance: keep everything, because deletion requires deliberation and deliberation requires time that the sprint does not have.

The "we'll deal with data deletion later" decision is not a deferral — it is a claim: that the current data volume does not create compliance or cost obligations serious enough to prioritize, and that the team will recognize when the volume reaches a threshold that demands a retention policy. The second half of this claim is the part that is never documented. How large does the user_events table need to be before archival becomes worth the engineering investment? Is a GDPR audit request the trigger, or does the team wait until the first individual rights request? Does a 30% month-over-month growth in S3 costs constitute the trigger, or does the team wait until a specific dollar threshold? Without a documented trigger condition, the team relies on the shared recognition that "now" is when the retention problem has become serious — a recognition that is harder to reach under the pressure of a compliance deadline or an infrastructure cost review than it would have been in a quiet session before the data volume made the problem urgent.

Three categories of retention decisions

The data retention strategy for a system is not a single decision — it is a set of related decisions that are usually made at different times, by different people, and under different pressures. The trouble is that these decisions interact. A team that has decided to soft-delete user records without documenting what "deleted" means for backup restoration has made a partial deletion decision: they have chosen the mechanism but not the scope. A team that has a clear deletion policy for primary records but no policy for the analytical database that replicates from the primary has made a different partial decision: they have addressed the OLTP surface but not the OLAP surface. The complete retention ADR covers all three categories.

The deletion policy. What gets deleted, when, by what mechanism, and who or what triggers the deletion. The deletion policy is the core of the retention ADR, and it is the section most commonly absent. A deletion policy for a SaaS application typically covers at minimum: user account records (what is deleted when a user closes their account — the account record itself, the activity records, the content records, the audit records, and the records in any downstream analytical systems); time-series operational data (session logs, event logs, API request logs — categories where the data's utility declines over time and its volume grows fastest); and cached or derivative data (data that is computed from primary records and can be recomputed if needed — a strong candidate for shorter retention periods than the primary records it derives from).

The mechanism choice within the deletion policy is a separate decision with significant downstream consequences. Hard delete removes the record from the database. Soft delete marks the record as deleted (with a deleted_at timestamp or an is_deleted flag) without removing it. Anonymization replaces identifying fields with pseudonymous or null values, keeping the record for aggregate analysis while removing the personal data. Each mechanism has different implications for GDPR compliance (a soft-deleted record still exists and must be disclosed in a right-of-access response), for query correctness (a soft delete that engineers forget to filter produces queries that return "deleted" records), for backup restoration (a database restored from a backup made before a deletion request will restore the deleted record), and for audit trail integrity (hard-deleting a record that is the subject of a legal hold destroys evidence).

The backup and archival retention period. How long do full and incremental backups live? Are archived records subject to the same deletion policy as the primary store? The backup retention decision is often made by whoever sets up the backup infrastructure, in whatever way the backup tool defaults to, without any connection to the deletion policy. The result is a system where user records are hard-deleted from the primary database in compliance with GDPR deletion requests — but the backups that were created before the deletion request remain for 90 days, each containing the deleted record. The backup retention policy is a form of infrastructure decision with compliance consequences, and it belongs in the retention ADR rather than in a Terraform comment that nobody reads.

The archival strategy — the decision to move older data from the primary database to a cheaper, slower storage tier rather than deleting it — is a third mechanism that sits between retention and deletion. Data that is archived to cold storage is no longer visible in production queries, which addresses the performance consequence of indefinite retention. But archived data is still subject to GDPR deletion requests, SOC 2 audit requirements, and legal holds. An archival decision that is not coordinated with the deletion policy creates a compliance surface that grows invisibly: teams that archive to S3 without a corresponding archival deletion policy discover years later that they have terabytes of archived user data that was supposed to be deleted when users closed their accounts.

The compliance scope decision. Which tables and columns contain personal data? Which regulation applies to which data categories? What are the minimum and maximum retention periods for each regulated category? The compliance scope decision is a prerequisite for every other retention decision — without a data classification, the team cannot determine which records are subject to GDPR, which require a minimum retention period for SOC 2, and which are purely operational and can be deleted freely. Data classification is one of the five sections that a security ADR adds beyond the standard template, and it belongs in the retention ADR as well, because the threat model and the retention policy are both shaped by what personal data exists and where it lives.

The "default" pathology: how retention strategies accumulate

Most teams do not make a retention decision so much as accumulate one. The accumulation follows a predictable pattern across products.

The audit log is created for SOC 2 compliance, which requires a 90-day minimum retention period. The engineer who creates it sets no maximum. SOC 2's 90-day minimum becomes the implicit permanent retention policy — not because the team decided to keep audit logs forever, but because no one ever revisited the retention decision after the compliance minimum was met. Three years later, the audit log table is 60 GB and growing at 2 GB per month. The query performance has degraded. The backup window has lengthened. A GDPR audit reveals that the audit log contains user email addresses that were logged as part of the authentication events, making the entire table a regulated personal data store subject to the right of erasure. The team is now performing retention policy discovery rather than retention policy enforcement — and the discovery is happening under audit pressure.

The user activity log is created during an analytics sprint because the product team wants to track feature usage. "Storage is cheap" is the answer to the implicit question of how long to keep it. Two years later, the table is 400 GB. A data export request under GDPR reveals that the activity log contains granular user behavior data — which page the user viewed, how long they stayed, which buttons they clicked — that the team did not classify as personal data because it is not obviously identifying in isolation, but which becomes identifying when correlated with the user ID in the user_id column that was added because the analytics queries needed it. The team is now discovering that their activity log is a regulated personal data store and that they have two years of retention without a lawful basis documented anywhere.

The chat transcript archive is created because the customer success team wants to review conversations when users report issues. The transcripts are kept "for the life of the account" because someone on the support team found an old transcript useful once. Years later, a user submits a right-to-erasure request. The engineering team discovers that "for the life of the account" was never defined anywhere, that transcripts are not deleted when the user closes their account because the support team's workflow assumes they will be available post-closure, and that the transcripts contain not only the requesting user's messages but the messages of every other user they conversed with — creating a multi-party erasure problem where deleting one user's records requires deciding how to handle the other parties' messages in the same thread.

Each of these scenarios shares a common structure: a legitimate operational decision was made (create the log, track the activity, store the transcript) without a corresponding retention decision (how long, under what policy, with what compliance scope). The operational decisions are in the code, in the schema, in the infrastructure configuration. The retention decisions are nowhere.

The infrastructure and query performance consequences

The performance consequences of indefinite retention are predictable and compounding. Tables that grow without bounds produce queries whose performance degrades predictably: full table scans that took 100 milliseconds at 1 million rows take 2 seconds at 20 million rows and 45 seconds at 400 million rows. Indexes that provided efficient lookups at small table sizes begin to experience index bloat as the table grows. Backup windows that fit within the nightly maintenance window at 50 GB begin to overflow at 500 GB. These are not catastrophic failures — they are gradual degradations that cross thresholds one at a time, each triggering an engineering sprint to add an index, optimize a query, or renegotiate the backup SLA. The cumulative engineering cost of these reactive sprints is the infrastructure cost of the undocumented "keep everything" decision.

The AI chat history of most engineering teams contains a recognizable sequence of these threshold-crossing conversations: "our user_events table is huge, should we add a composite index?", followed months later by "the index isn't helping anymore, can we partition the table?", followed later by "the partition maintenance is expensive, should we archive events older than a year?". Each of these is a reactive response to the performance consequence of the undocumented retention decision. The archival conversation, if it happened at all, is usually the first time the team deliberates about how long user events should be retained — and it happens under the pressure of a production performance problem rather than in the calm before the table was created.

The decision to use a managed archival service versus building archival infrastructure internally is a common companion to the retention policy decision. Managed services like AWS S3 Intelligent-Tiering, Glacier, or BigQuery long-term storage make archival operationally cheap but create new questions: is the archival storage subject to the same GDPR deletion obligations as the primary database? How does the team perform a deletion from an S3-backed archive without a deletion API? Does the archival solution support field-level deletion for anonymization rather than full-record deletion? These questions only arise after the archival decision is made — and they are easier to answer before the archival infrastructure is built than after, when they have to be retrofitted into a system that was designed around bulk storage rather than record-level deletion.

The compliance and legal hold dimensions

The GDPR right to erasure under Article 17 is the most operationally consequential compliance obligation for teams that have not documented their retention policy. A right-to-erasure request requires the team to identify every location where the requesting user's personal data is stored, delete or anonymize it within 30 days, and confirm completion. The scope of this operation is entirely determined by the data classification and the deletion policy — two things that are usually not documented. A team that has documented its data classification (which tables contain personal data and in which columns) and its deletion policy (hard delete versus anonymize, which tables are in scope, which dependent systems must be notified) can respond to a right-to-erasure request in hours. A team that has not documented either must perform data classification discovery under compliance deadline pressure.

The compliance scope decision determines what counts as personal data in the system, and the answer is not always obvious. A user_id column is not personal data in isolation, but it is personal data when it is the join key that links behavioral records to identifiable users. A server log that contains IP addresses is personal data under GDPR in the EU. A support ticket that contains the user's description of their problem may contain names, locations, and other identifying information that the team did not classify as a regulated field when they created the support table. The data classification inventory in the retention ADR must capture not just the obviously personal tables (the user profile, the payment record, the authentication log) but the less obviously personal tables where personal data accumulates through operational use.

SOC 2 Type II compliance creates a minimum retention requirement that is commonly mistaken for the retention policy itself. SOC 2 requires that audit logs be retained for a minimum period — the exact period depends on the trust service criteria and the audit firm's interpretation, but 90 days to one year is common. This minimum requirement is the floor of the retention policy, not the ceiling, and the two are frequently conflated. Teams that retain audit logs "for SOC 2 compliance" often retain them indefinitely, because the SOC 2 requirement is the only retention decision that was ever made for that table, and it specifies only the minimum. The retention ADR makes the ceiling explicit — the maximum period after which data is deleted or archived — and distinguishes it from the compliance minimum, which is a legal constraint rather than a policy choice.

Legal holds are the case where the deletion policy must be suspended for specific data. A legal hold is an obligation to preserve data that may be relevant to litigation, a regulatory investigation, or an audit. When a legal hold is issued, the normal deletion schedule for the affected data must be paused until the hold is released. The retention ADR must include a legal hold exception that names the mechanism for identifying data under hold, the process for suspending deletion for that data, and the process for resuming deletion when the hold is released. Without this, the team faces a structural conflict: the deletion policy requires deleting data at a specified time, the legal hold requires preserving it, and the engineering system has no mechanism for reconciling the two.

Writing the data retention ADR

The Nygard ADR format applies to retention decisions with several important additions. The standard sections — Context, Decision, Consequences — are insufficient for a retention decision because the data classification, the deletion mechanism, the backup retention period, and the compliance scope are not captured by any of these sections as they are typically written. The retention ADR requires a "Data Classification" section that precedes the Decision, and a "Compliance Scope" section in the Consequences that makes explicit what regulatory obligations are created by the retention choices.

The decision-statement title convention for retention ADRs should name the primary choice and the regulatory context that drove it:

"Hard-deleted user account records after 30 days of account closure — GDPR right-to-erasure scope limited to one-table operation" — the mechanism and the compliance motivation are visible at the title level
"Retained audit logs 12 months then archived to cold storage — SOC 2 minimum 90 days, extended to 12 months for incident investigation utility" — the retention period, the floor, and the reasoning for the extended ceiling are all named
"Anonymized user_events after 90 days, hard-deleted after 365 days — aggregate analytics preserved, PII compliance surface bounded" — the two-phase approach and its separate motivations are named in the title

Data Classification. A prerequisite section that inventories which tables and columns contain personal data (GDPR definition: any data that can identify a natural person directly or indirectly), which contain business-critical data that has legal or contractual retention minimums, and which contain purely operational data that is safe to delete freely. The classification section is not optional — without it, the deletion policy cannot be scoped correctly. A team that writes a deletion policy without first completing the data classification will discover during implementation that the policy's scope is either too narrow (missing tables that contain personal data) or too broad (deleting tables that have compliance minimums the policy does not account for).

Alternatives Considered. Each deletion mechanism evaluated, with the specific reason it was not chosen for each data category. For a user account deletion decision, the alternatives might include: soft delete (evaluated and found to be insufficient for GDPR compliance in the team's specific context — soft-deleted records remain in the database and must be disclosed in a right-of-access response; the compliance team determined this did not satisfy the right to erasure without additional anonymization); hard delete (chosen for the account record and most dependent records); anonymization (chosen for records that must be preserved for aggregate analytics — the user_id is replaced with a pseudonymous identifier, preserving the behavioral record while removing the identifying link). The rationale for choosing different mechanisms for different record types is the kind of context that exists only in the AI chat session where the decision was made and in the ADR — never in the schema migration that implements it.

Decision. The deletion policy, the backup retention period, and the compliance scope, stated specifically enough that an engineer implementing the next schema migration can determine whether their new table requires a deletion pathway and what that pathway looks like. "Delete personal data when requested" is not a deletion policy. "Hard-delete the users table record and all rows in user_preferences, user_sessions, user_content, and user_payment_methods where user_id matches the requesting user's ID, within 30 days of a verified right-to-erasure request; anonymize rows in user_events by setting user_id to NULL and ip_address to NULL within the same 30-day window, retaining the anonymized event records for aggregate analytics; delete all records from all tables after 12 months of account inactivity regardless of user request, with 30-day email notice before deletion" is a deletion policy.

Backup Retention. The backup window, frequency, and retention period stated explicitly, along with the interaction with the deletion policy. If full backups are retained for 30 days and a right-to-erasure request arrives on day 29, the team must either restore the backup and apply the deletion, or acknowledge that the backup contains the deleted record for one more day. Both are defensible positions, but the team that has not documented this interaction discovers it during the first GDPR deletion request — at which point "we have to restore the backup and re-delete" is a surprise engineering task with a compliance deadline attached.

Compliance Scope. Which regulations apply to which data categories, what the minimum retention periods are for each, and what the maximum periods are. SOC 2 audit log minimums, GDPR lawful basis for retention, CCPA deletion rights, financial record retention requirements — each of these creates a different minimum retention period for different data categories. The compliance scope section names the regulatory landscape the team is operating in and makes the retention policy's relationship to each regulatory requirement explicit.

Legal Hold Exception. The mechanism for identifying data under legal hold, the process for suspending the normal deletion schedule, and the process for resuming deletion when the hold is released. This section can be short ("any data under legal hold is excluded from the automatic deletion schedule; a legal hold flag in the retention_holds table is the mechanism; the legal team is responsible for setting and releasing holds"), but its absence means the deletion system and the legal hold process will conflict the first time both apply to the same data.

Revisitation Condition. Named triggers for re-evaluating the retention policy: "Re-evaluate this retention policy if: the monthly infrastructure cost for storage exceeds $X, because the cost-benefit calculation for current retention periods changes at that threshold; the team begins operating in a new GDPR territory, because the regulatory scope may expand; the user count exceeds Y, because the compliance surface and the operational cost of the current retention periods scale with user count; a legal hold is issued for the first time, because the interaction between the deletion system and the legal hold mechanism will require engineering work that should be planned in advance rather than discovered under legal deadline; or the backup window exceeds Z hours, because the backup retention period and the recovery-time objective are directly linked."

Finding retention decisions in AI chat history

Data retention decisions appear in AI chat at three structurally distinct points, each with a different extraction profile.

The first is the early database design session where the deletion mechanism was chosen or not chosen. A developer asking "should we soft-delete or hard-delete users?", "do we need to keep the full audit trail or can we summarize after 90 days?", or "what happens to user data when they cancel their subscription?" contains the alternatives evaluation for the deletion mechanism. These sessions are identifiable through the schema vocabulary (soft delete, hard delete, deleted_at, is_deleted, cascade) combined with the user lifecycle vocabulary (account closure, cancellation, GDPR, deletion request). The sessions that did not produce a deletion decision are also extractable — they are identifiable through the absence of a conclusion: "let's come back to that" or "we'll handle that when we get to GDPR compliance" at the end of a database design thread is an undocumented deferral.

The second is the compliance preparation session that precedes a SOC 2 or GDPR audit. An engineer asking "what personal data do we store?", "how do we handle right-to-erasure requests?", or "what's our data retention policy for the audit?" is performing data classification discovery in real time. These sessions are high-value extraction targets because the engineer is mapping the data model against a compliance framework, which produces the kind of structured inventory that should be in the retention ADR. The discovery is often uncomfortable — these sessions frequently uncover personal data in unexpected tables, ambiguities in the deletion mechanism's scope, and backup retention periods that conflict with the GDPR right to erasure. The discomfort is the signal that the session contains real retention decision content.

The third is the infrastructure cost review session where archival becomes a serious consideration. An engineer asking "our S3 costs are growing 20% per month, can we archive older data?", "the user_events table is 400GB, do we need all of it?", or "what's the minimum we need to retain for compliance?" is performing retroactive retention policy deliberation under cost pressure. These sessions often contain the first explicit discussion of retention periods — "how long do we actually need to keep user events?" — and the answer is usually negotiated between the compliance minimum and the operational utility, producing a retention period that has no ADR and exists only in the AI chat session and in the S3 lifecycle rule that was eventually configured.

The WhyChose extractor identifies retention-related sessions through the compliance vocabulary (GDPR, right to erasure, data retention, PII, personal data, right of access, SOC 2, audit log, deletion request) and the infrastructure cost vocabulary (table size, storage growth, archival, S3 lifecycle, backup window, cold storage). Retention sessions that made a decision through default — "probably fine for now" — are identifiable through the pattern of a compliance concern raised and resolved without a documented conclusion. These are the sessions where the retention policy was made without being made, and they are the highest-value extraction targets because the team that recovers these sessions recovers not just the reasoning but the implicit trigger conditions that were operating at the time of the non-decision.

What the retention record enables

The data retention decision record — the one that names the data classification, the deletion policy, the backup retention period, the compliance scope, and the legal hold exception — enables three things that the schema migration and the S3 lifecycle rule alone do not.

It enables a right-to-erasure response that takes 20 minutes rather than two days. An engineer handling a GDPR deletion request with the retention ADR open can check the deletion policy section to identify every table in scope, confirm the mechanism for each, execute the deletion, and confirm completion with confidence that the scope is correct. An engineer handling the same request without the ADR must perform data classification discovery, check with the compliance team about which tables are in scope, identify whether any records are under legal hold, confirm that the backup retention period does not conflict with the 30-day deadline, and execute a deletion whose scope they are uncertain about. The retention ADR does not make the engineering work disappear — it makes it predictable and executable without discovery.

It enables infrastructure cost projections that include planned archival. A team with a documented retention policy knows how long each data category lives in the primary database, when it transitions to cold storage, and when it is deleted from the archive. This makes storage cost modeling possible: the team can project how the primary database size, the archive size, and the backup window will grow under different user growth scenarios, and they can plan archival infrastructure investments before the current infrastructure becomes the bottleneck. A team without a documented retention policy cannot project storage costs with any accuracy because they do not know what fraction of their current storage reflects data that should have been deleted or archived already.

And it enables the honest answer to the auditor's question. A documented retention policy is what allows the team to respond to "what is your data retention policy?" with a document rather than a description. The auditor who receives a retention ADR — with a data classification, a deletion policy, backup retention periods, a compliance scope, and a legal hold exception — receives evidence that the team has made deliberate retention decisions rather than accumulated them. The auditor who receives a verbal description of "we think we keep logs for about a year" has received an invitation to dig further, because a retention policy that exists only in someone's memory is a retention policy that has not been implemented, monitored, or enforced.

The new technical leader who inherits a system without retention documentation faces a particularly compressed version of the data classification discovery problem. They need to understand what personal data the system stores, what the deletion obligations are, and whether the current implementation of the deletion policy is correct — and they need this understanding quickly enough to respond to compliance questions from the board, the legal team, or an incoming auditor. The retention ADR gives them this understanding in a document rather than in a series of interviews with engineers who may have different and conflicting recollections of what was decided and when.

The interaction with the security ADR

The data retention decision and the security decision are not independent. The threat model is affected by what data exists. A team that retains granular user behavioral data indefinitely has a different threat model than a team that anonymizes behavioral data after 90 days — the first has a larger blast radius for a data breach, because the attacker who compromises the database gains three years of behavioral history rather than 90 days. A team that retains payment card information past the transaction that required it has created a PCI DSS compliance obligation that did not exist when the retention decision was not made. The security ADR should reference the retention policy, and the retention ADR should reference the security ADR, because the two are jointly responsible for the team's compliance posture.

The data classification inventory is the natural bridge between the retention ADR and the security ADR. A data classification that identifies which tables contain personal data, which contain payment data, which contain health data, and which contain purely operational data provides the input for both the retention policy (what must be deleted and when, under which regulation) and the security threat model (which tables represent the highest-value targets, what the blast radius of a compromise would be, which tables require additional access controls or encryption at rest). Teams that maintain a single data classification document shared between the retention ADR and the security ADR avoid the common failure mode where the two documents make conflicting assumptions about what personal data exists and where it lives.

Data retention decisions constrain every team that builds features touching user data. The product team that wants to add a behavioral analytics feature must understand whether the behavioral data they want to collect falls within the current retention policy's scope, or whether the feature creates a new data category that requires a retention decision. The data team that wants to train a model on historical user behavior must understand whether the retention policy allows the historical data to be used for that purpose, or whether the purpose limitation in the GDPR lawful basis for retention does not cover model training. Without a documented retention policy, these questions must be escalated to someone who knows the implicit policy — a process that is slow, unreliable, and guaranteed to produce inconsistent answers over time as the person with institutional knowledge changes.