2026-07-02 · ~23 min read

The data governance decision record: why the data classification schema you chose determines your GDPR deletion cost and your breach notification surface

Q: What should a data governance ADR document that a privacy policy or terms of service does not?

A privacy policy is a public-facing legal document describing what data is collected, why, and how it is used — written for users and regulators, not engineers. A terms of service is a contract governing user behavior. Neither is an operational engineering artifact. A data governance ADR must document four things that a privacy policy does not. First, the data classification schema: a machine-readable or engineer-queryable registry of which database tables and columns contain personal data, their classification (PII, special-category PII, pseudonymous, anonymized, aggregate), and the business justification for collection — the legal basis under GDPR Article 6. Second, the deletion scope contract: the exact set of tables and columns that must be modified (deleted, nulled, or anonymized) when a GDPR Article 17 right-to-erasure request arrives for a specific user ID, and the cascading deletion order that satisfies foreign key constraints without orphaning records. Third, the breach notification scope: the set of systems containing personal data that must be included in the Article 33 notification assessment when any one of them is breached, including third-party systems that receive data via API or file transfer. Fourth, the third-party data sharing registry: the set of vendors, sub-processors, and integrations that receive personal data, what categories they receive, under what legal basis, and whether they are covered by a Data Processing Agreement — the operational document that the privacy policy's 'we share data with trusted partners' statement is a summary of. A privacy policy cannot substitute for these artifacts because it describes intent, not implementation. The data governance ADR documents the implementation, and the gap between the privacy policy's statements and the implementation is both a regulatory risk and an operational cost.

The data classification schema is decided implicitly when the first user email address enters the database. The decision is not recorded anywhere. Over the next two years, personal data migrates into twenty-three tables added by nine engineers in four sprints, none of whom read the schema decisions that came before them because there were none to read. The first GDPR right-to-erasure request arrives. The engineering team runs a DELETE on the users table and closes the support ticket. Two weeks later the user sends a second message noting that their session logs, their billing address, and their usage analytics are still visible in the support dashboard. The second remediation takes six engineering-hours and a full schema grep. A data map — thirty minutes per table when each was created — would have made both remediations immediate and complete. The same gap that determines deletion cost also determines breach notification cost: when a database credential is leaked and rotated within four hours, the team cannot answer the GDPR Article 33 question — does this breach involve personal data, and if so what categories and how many records — without conducting a schema audit that takes longer than the 72-hour notification window.

A ten-person B2B SaaS startup building a project management tool for engineering teams launched in early 2023 with a PostgreSQL database and three tables: users (id, email, name, created_at), teams (id, name, plan, created_at), and team_members (user_id, team_id, role). The founding sprint did not include any discussion of data classification. The fields were obvious and the team was small enough that everyone knew which fields contained personal data — there was no need to write it down.

Over the following eighteen months, the product grew. The engineering team added a profile table (phone, job_title, timezone, avatar_url, company_name) in sprint 4 when customers asked for richer contact information. They added an activity_events table (user_id, event_type, metadata jsonb, created_at) in sprint 7 when the product team wanted to understand feature usage. In sprint 9, a contractor added session_logs (session_id, user_id, ip_address, user_agent, started_at, last_active_at) to power a "currently active users" feature in the admin dashboard. Sprint 11 brought an integration_configs table (user_id, provider, access_token_encrypted, refresh_token_encrypted, scopes, created_at) after the team added GitHub and Jira integrations. Sprint 13 added billing_addresses (user_id, line1, line2, city, state, postal_code, country) when the team moved from card-on-file to invoice billing for enterprise customers. By sprint 17, the database had twenty-three tables, and the engineering team had grown from three engineers to nine — five of whom had joined after sprint 6 and had no memory of what each field in the older tables was for or which fields contained personal data.

The first GDPR right-to-erasure request arrived in February 2025. A user in Germany sent an email to the support address — the contact listed in the privacy policy — requesting deletion of all their personal data under GDPR Article 17. The support engineer opened a ticket, tagged it as a "data deletion request," and assigned it to the most senior backend engineer on the team. The engineer ran a search of the database for the user's ID, found the users table, and executed a cascade delete from team_members and then users. The ticket was marked resolved.

Fourteen days later, the user sent a second email. They had checked the team's admin dashboard — which remained accessible because their company had an active subscription — and could see their session logs, their billing address, their activity history, and their avatar still appearing in the interface. The senior engineer reopened the ticket and spent ninety minutes auditing the schema to identify which tables contained the user's personal data. They found eight tables beyond the users table with direct user_id foreign keys containing personal data: profiles, activity_events, session_logs, integration_configs, billing_addresses, notification_preferences, api_tokens, and audit_log. They found one table with an indirect reference: support_conversations contained the user's email address as a plain text field in the requester_email column, populated from the Zendesk integration. The audit also discovered that the activity_events table's metadata jsonb column sometimes contained the user's email address as a string when the event_type was "account.email_changed" — the old email was logged in the metadata. None of this was discoverable from the schema alone; it required reading the application code that wrote to each table.

The erasure procedure — implemented correctly the second time — took six engineering-hours, including the audit, the deletions, and a review by a second engineer to confirm completeness. The team then spent two additional hours writing a data map document listing all twenty-three tables, which fields in each contained personal data, and in what format, so that future erasure requests could be fulfilled without repeating the audit. The data map took two hours because they had the schema in front of them and because the product was still small enough that the application code could be read in an afternoon. The data map could have been maintained incrementally at thirty minutes per table as each was created — a four-hour total investment across eighteen months, against a six-hour remediation plus two-hour catch-up audit for the first erasure request, plus the reputational cost of a confirmed non-compliance during the fourteen-day gap between the first and second deletions.

A fifteen-person healthcare-adjacent SaaS startup built a patient intake coordination platform for private medical practices. The product was a web application where patients completed intake surveys before appointments — medical history, current symptoms, lifestyle factors, insurance information — and practice staff reviewed the submissions in a dashboard. The engineering team stored intake survey responses in a PostgreSQL JSONB column: intake_submissions (id, patient_email, practice_id, submitted_at, form_version, responses jsonb). The responses column contained the patient's answers to forty-seven survey questions and thirty-one supplementary fields added by practices through a custom-fields feature. The engineering team had discussed data classification during the founding sprint and agreed that the intake data was "sensitive" and should be encrypted at rest — the database was encrypted and the column values were additionally encrypted using a per-row key derived from the patient's email. This was a correct technical control. The classification discussion ended there. The team did not document which fields in the responses JSONB constituted personal data, which constituted special-category data under GDPR Article 9, or what the legal basis for processing each category was under GDPR Article 6.

In March 2025, the team's database infrastructure engineer discovered that a database read-replica credential — a read-only PostgreSQL user used by the analytics service — had been included in a git commit by a contractor eight months earlier. The commit was in a private repository. A secrets-scanning tool the team added to their CI pipeline flagged the credential. The engineer rotated the credential within four hours of detection. There was no evidence of the credential being used from any IP address other than the team's own infrastructure. The breach appeared to be a credential exposure with no evidence of exploitation.

Under GDPR Article 33, the team needed to determine within seventy-two hours of becoming aware of the breach: whether the breach involved personal data; if so, the categories of personal data concerned; the approximate number of data subjects concerned; and the likely consequences of the breach. The analytics service had read access to two tables via the compromised credential: intake_submissions and practices. The practices table contained practice names, addresses, and billing contacts — clearly personal data at the business level. The intake_submissions table was the question. Was any of the JSONB content in the responses column personal data? Obviously yes — patient_email was the primary key for submissions, and the responses column contained the patient's own answers. But GDPR Article 9 distinguishes special-category personal data — data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data for the purpose of uniquely identifying a person, data concerning health, and data concerning a person's sex life or sexual orientation — which carries significantly higher notification obligations and processing restrictions than ordinary personal data.

The team's CTO escalated to their data protection officer contact — an external legal firm retained for GDPR compliance questions. The legal team asked: which of the forty-seven survey questions and thirty-one custom fields collect health data? The engineering team did not know. The form configuration was stored in a separate forms table, and the field labels were practice-specific — a question labeled "Current medications" was clearly health data; a question labeled "Emergency contact name" was not; a question labeled "Activity level" might or might not be considered health data depending on the context in which it was collected. The legal team needed to review all seventy-eight field definitions and make a classification determination for each. This review took twelve days, during which the team could not file the Article 33 notification because they could not answer the required field "categories of personal data concerned." The team filed the notification on day fifteen. The supervisory authority issued a €15,000 fine for the late notification — explicitly noting that the delay was not caused by the breach itself but by the absence of a data classification registry that would have made the notification scope immediately determinable.

The engineering team subsequently spent three weeks building a data classification registry: a structured document listing every field in every form version, its classification (non-personal, ordinary PII, special-category health data, special-category other), the legal basis for processing under GDPR Article 6, and — for special-category fields — the additional condition satisfied under GDPR Article 9(2). This registry, maintained going forward by the engineer who adds each new form field, is the artifact that would have reduced the twelve-day legal review to a thirty-minute document lookup. The €15,000 fine plus three weeks of engineering time to build the registry retroactively is the cost of the governance decision that was not made at the founding sprint.

Structural properties set by the data classification schema choice

Three structural properties are determined when a team decides — explicitly or by omission — whether to maintain a data classification schema. All three are invisible at launch. The first manifests when the first erasure request arrives. The second manifests when the first retention-tier enforcement runs. The third manifests when the first breach occurs.

Property 1: Data classification schema and the deletion scope contract. The deletion scope contract is the set of tables and columns that must be modified to fulfill a GDPR Article 17 right-to-erasure request for a specific data subject. The contract cannot be executed without the classification schema — you cannot enumerate the tables that must be modified without knowing which tables contain personal data. In a simple application with a single users table and a handful of related records, the contract is obvious and can be constructed by inspecting the foreign key graph. In an application with twenty or more tables, JSONB columns containing embedded PII, third-party integrations that cache user data in local tables, and audit logs that capture historical email addresses as string values, the contract is not derivable from the schema alone. It requires reading the application code that writes to each table, understanding what data flows from which user-facing actions, and testing the deletion procedure against a realistic data snapshot to confirm that no personal data remains after execution.

The database vendor decision record documents the database technology and the schema design principles; the data governance ADR must document the deletion scope contract that executes against that schema. The deletion scope contract must be maintained as a living document — updated when each new table is added — not reconstructed from scratch when each erasure request arrives. A deletion scope contract that is derived from the data map incrementally at table-creation time costs thirty minutes per table. A deletion scope contract that is reconstructed from the application codebase at erasure-request time costs hours to days depending on codebase size, and carries the risk of incomplete reconstruction if the engineer responsible for the audit is not the same engineer who wrote the code that populated each table.

The deletion scope contract must also specify the deletion method for each field: hard deletion (the record is removed from the table), nullification (the personal data fields are replaced with null while the record skeleton is retained for referential integrity), or anonymization (the personal data is replaced with synthetic or aggregate values that cannot be linked back to the data subject). Audit logs present a specific challenge: an audit log that records "user john.doe@example.com changed their password" cannot have the email address hard-deleted without destroying the audit trail for the password change event. The deletion scope contract must document whether audit log entries are retained with the email replaced by a pseudonym (a hash of the user ID that links events to a single user without containing an email address), whether the audit log retention period is limited to a window after which all entries are purged (making the question moot for requests that arrive after the retention window), or whether a legal basis exists to retain the audit log entry with identifiable data despite the erasure request. The data retention decision record documents the retention periods and storage tier transitions; the deletion scope contract must be consistent with the retention decisions — if audit logs are retained for seven years for legal compliance reasons, the deletion scope contract must document that audit log entries are excluded from the erasure procedure under GDPR Article 17(3)(b) (processing necessary for compliance with a legal obligation).

Property 2: Retention tier assignment and the storage-cost versus deletion-cost tradeoff. Data that is not assigned a retention tier defaults to indefinite retention in the operational database — hot storage, queryable, backed up, and therefore present in all backup snapshots. This default compounds two costs simultaneously: storage cost (accumulating over time as the dataset grows) and deletion cost (every additional record makes the erasure procedure more expensive because there are more records to find and delete, and because backup snapshots containing the pre-deletion data must eventually be handled). A retention tier assignment — at field or table granularity — specifies the maximum period for which each data category is retained in each storage tier, enabling automated enforcement of the deletion pipeline rather than requiring manual erasure for each request.

The data pipeline decision record documents the data pipeline architecture; the deletion pipeline is a data pipeline component — a scheduled job that identifies records eligible for deletion based on their age and retention tier, executes the deletion, and logs the result. Without a retention tier assignment in the governance ADR, the deletion pipeline has no input: it cannot determine which records are eligible for deletion and which are required to be retained. The deletion pipeline implementation is straightforward once the retention tiers are documented; the difficult part is maintaining the tier assignments as the schema evolves.

Retention tiers also determine the breach notification scope by constraining which data is present in each system at any given time. A system that retains user session logs indefinitely has session logs for all users who have ever used the product — a breach of that system exposes session data (including IP addresses, user agent strings, and session timestamps, all of which constitute personal data under GDPR's broad definition) for every user in the product's history. A system that purges session logs after ninety days retains session data for only the most recent three months of users. The breach notification scope — the approximate number of data subjects concerned — is determined by the intersection of which data is in the breached system and what the retention tiers allow to accumulate. The observability strategy decision record documents the logging and tracing infrastructure; observability data (distributed traces, structured logs, error reports) frequently contains user identifiers, request parameters, and in poorly-instrumented systems, personal data values that were accidentally included in log fields. The data governance ADR must document whether observability data is classified and what its retention tier is, because observability systems are a common source of personal data accumulation that is not covered by the primary database's retention policy.

Property 3: Breach notification scope and the personal data inventory. GDPR Article 33 requires notification of personal data breaches to the supervisory authority within seventy-two hours of the controller becoming aware of the breach. The notification must include the categories of personal data concerned. This requirement cannot be fulfilled without a personal data inventory — a registry of which systems contain personal data, what categories, and for how many data subjects. The personal data inventory is a direct output of the data classification schema: if the schema documents which fields are PII and which are special-category PII, the personal data inventory is the aggregation of that documentation by system, with record counts added. The inventory must be updated when data flows between systems change — when a new integration is added, when a new analytics pipeline is configured, when a new third-party vendor is given access to a data export.

The seventy-two-hour window is the constraint that makes the personal data inventory operationally critical rather than administratively useful. A team that has maintained the inventory can assess a breach notification in hours: identify the breached system, look up its personal data classification in the inventory, determine whether special-category data was involved, retrieve the approximate record count, and file the notification with accurate scope information. A team that has not maintained the inventory must conduct the classification review during the breach response — while simultaneously rotating credentials, investigating the breach scope, remediating the vulnerability, and managing stakeholder communication. The classification review adds days of delay to the notification. GDPR regulators have consistently held that late notifications caused by the absence of internal data maps are not excused by the investigation complexity — the absence of the data map is itself the governance failure, not a force majeure event.

The secrets management decision record documents the credential rotation and secret storage infrastructure; a breach caused by a leaked credential requires knowing, at the moment the credential is rotated, which systems the credential had access to and what personal data those systems contain. This determination cannot wait for a post-incident audit — it must be available within the first few hours of the breach response, when the notification timeline assessment must begin. The logging strategy decision record documents the structured logging infrastructure; access logs for the breached system — which queries ran under the compromised credential, at what times, from what IP addresses — are the primary evidence for determining whether the breach was exploited. If the access logs were not retained (because no retention tier was assigned to them), the team cannot determine whether the credential was used to exfiltrate data, and must notify under a worst-case assumption that all accessible data was exfiltrated. The personal data inventory determines the scope of that worst-case assumption.

What the founding session records and what it omits

The data governance decision is rarely made explicitly. It emerges from four types of sessions that each address a specific narrow question without anchoring the answer to the broader governance framework that would make the answer operationally durable.

The "what fields should we store in the user profile?" session. The team is designing the user model for a new product. They ask what information to collect from users. The session produces a list of fields: email, name, password hash, profile photo URL, company name, timezone, notification preferences. These are sensible fields for a product of this type. The session does not ask: which of these fields is personal data under GDPR's broad definition (it is all of them — including timezone, which can be used in combination with other data to narrow identification), and which field is the legal basis for processing each one? The email and name are collected with the user's consent for the purpose of account creation — but what is the legal basis for storing the timezone? If it is necessary for contract performance (the product shows localized times), that is GDPR Article 6(1)(b). If it is collected because it might be useful for analytics later, that is legitimate interest under Article 6(1)(f), which requires a legitimate interest assessment. The session that designed the user profile model did not produce a field-by-field legal basis registry. Twelve months later, when the data protection officer asks what legal basis the company uses to process each field, the engineering team cannot answer without reverse-engineering the product's data collection logic against a legal framework they were not trained to apply. The authentication strategy decision record documents the identity data architecture; the data governance ADR must document the personal data classification for the identity layer — the fields collected at registration, their legal basis, and their retention period — because identity data is the anchor for every subsequent deletion scope contract and breach notification scope assessment.

The "how do we implement GDPR right-to-erasure?" session. The team has received their first erasure request or is preparing for GDPR compliance ahead of a launch into the EU market. They ask how to implement the right to erasure. The session produces an implementation plan: add a "delete my account" button in the settings page, wire it to an API endpoint that executes a DELETE on the users table, and send a confirmation email. This plan fulfills the most visible part of the erasure obligation — the user's direct records are deleted and they receive confirmation. What it does not address is the full scope of the erasure obligation: all personal data held about the user, not just the primary record. The session that designed the erasure implementation did not enumerate the full set of tables containing personal data for the user, did not design the cascading deletion procedure for related tables, did not address the backup snapshot question (personal data deleted from the live database remains in backup snapshots; GDPR erasure obligations extend to backups unless the retention period for backups is shorter than the deletion confirmation timeline), and did not document the retention exceptions that apply to specific data categories (audit logs retained for legal compliance, financial transaction records retained for tax purposes). The authorization model decision record documents the access control architecture; the erasure procedure must revoke all access tokens, API keys, and session credentials for the deleted user before or concurrent with the record deletion — an access token that remains valid after the user record is deleted creates an authentication state inconsistency that can allow continued access to the product by the deleted user's credentials. The erasure procedure must include the access credential revocation as a required step, not an afterthought.

The "how do we respond to this data breach?" session. The team has discovered a potential security incident — a leaked credential, an unauthorized access log entry, or a vulnerability report from a security researcher. They ask how to respond. The session produces an incident response procedure: identify the scope of the breach, rotate affected credentials, patch the vulnerability, notify affected users, file a regulatory notification if required. The step "file a regulatory notification if required" is where the data governance gap surfaces. Whether notification is required under GDPR Article 33 depends on whether personal data was involved — which requires the personal data inventory. What categories of personal data were in the breached system — which determines whether the Article 33 notification is required at all, and whether the Article 34 notification to affected data subjects is required — requires the data classification schema. The estimated number of data subjects concerned — a required field in the Article 33 notification — requires the record counts in the personal data inventory. A session that produces an incident response checklist without a pre-existing data map defers all three answers to the post-incident classification review, which takes days and potentially pushes the notification past the 72-hour deadline. The security and threat model decision record documents the threat model and security controls; the data governance ADR must document the personal data inventory alongside the security architecture — breach response depends on knowing what personal data is in each system as quickly as the team knows what technical systems were involved.

The "what data do we collect from users?" session. The team is preparing a privacy policy, a GDPR Article 30 Record of Processing Activities, or a response to a customer security questionnaire. They ask what data they collect. The session produces a list of data categories: contact information, usage analytics, session data, payment information (tokenized, not stored directly), technical metadata. This list is accurate for the purpose it was drafted — the privacy policy's data collection disclosure. It is not operational: it does not map data categories to specific database tables and columns, does not specify retention periods per category, does not name the third-party sub-processors who receive each category, and does not include the legal basis for each category's processing. The privacy policy's "contact information" category covers at minimum the users table, the profiles table, the billing_addresses table, and the support_conversations table's requester_email column — four distinct schema locations with potentially different retention requirements and deletion procedures. The privacy policy does not distinguish them because the privacy policy's purpose is disclosure, not operational governance. The email infrastructure decision record documents the email service provider selection; the email service provider receives personal data (at minimum the recipient email addresses for every transactional email sent) and constitutes a GDPR sub-processor. The data governance ADR must document the sub-processor registry — the list of vendors who receive personal data, what categories they receive, under what legal basis, and whether a Data Processing Agreement is in place — because this registry is required for responding to DSAR (Data Subject Access Request) questions about data sharing, for the Article 30 Record of Processing Activities, and for the Article 33 notification's "description of the categories of data subjects and categories of personal data concerned."

The WhyChose extractor surfaces the founding user model session, the erasure implementation session, and the breach response planning session from AI chat history. The data governance ADR converts the implicit classification decision in those sessions — which fields were added without a governance annotation — into a documented data map, a deletion scope contract, and a personal data inventory, so that the next erasure request, retention enforcement run, or breach response assessment can be executed against a living document rather than requiring a schema audit that takes longer than the regulatory deadline.

The five sections of a data governance ADR

Section 1: Personal data classification schema and data map maintenance protocol. Document the data classification taxonomy: the categories into which personal data fields will be classified, the criteria for each category, and the labeling convention that will be applied in the schema documentation. A minimum useful taxonomy for a GDPR-regulated product is: non-personal (data that does not identify or relate to an individual — aggregate metrics, system events with no user association), ordinary PII (data that identifies or can identify an individual — name, email, phone, address, IP address, user agent, device fingerprint, national identifier), pseudonymous (data associated with a non-identifying token — behavioral events keyed by user ID, where the user ID is not itself a personal data field but is linkable to the users table), and special-category PII (data in the GDPR Article 9 categories — health, ethnicity, biometric, genetic, political opinion, religious belief, trade union membership, sex life or sexual orientation). Document the maintenance protocol: the requirement that each new table addition to the schema includes a governance annotation in the schema migration or in a companion data map document, the engineer responsible for maintaining the data map (typically the engineer writing the migration), the review step that confirms the annotation is complete before the migration is merged, and the cadence at which the data map is audited against the live schema to catch fields added without annotations. The data warehouse decision record documents the analytics data architecture; the data map must cover the data warehouse schema separately from the operational database schema — analytical tables often contain personal data derived from the operational schema (user-level behavioral aggregates, cohort membership, funnel event sequences) that are not covered by the operational database's retention and deletion procedures. The governance ADR must document whether the data warehouse is in scope for erasure requests and, if so, the deletion procedure for data warehouse tables containing personal data.

Section 2: Retention tier assignment and the archival-to-deletion pipeline. Document the retention tiers: the named tiers (hot, warm, cold, deleted), the storage technology for each tier, the access latency for each tier, the cost per GB per tier, and the transition cadence (data moves from hot to warm after X days, warm to cold after Y days, cold to deleted after Z days). Document the retention period for each data category: the maximum number of days that each classified data category is retained in each tier before transition or deletion. For special-category personal data, retention periods must be justified against a specific legal basis — indefinite retention of health data is not appropriate under GDPR's data minimization principle (Article 5(1)(c)) unless the retention period is limited to what is necessary for the processing purpose and a legal basis for the extended retention is documented. Document the deletion pipeline: the scheduled job or process that executes tier transitions and deletions, the input (the data map's retention tier assignments and the record timestamps), the output (records deleted, anonymized, or moved to the next tier), and the audit log that records what was deleted when for compliance reporting. The background job infrastructure decision record documents the scheduled job execution framework; the deletion pipeline is a scheduled job that must be monitored for completion and alerting when deletion jobs fail — a deletion job that silently fails does not purge records that are past their retention window, creating compliance drift that is not visible until a data map audit is run. The deletion pipeline audit log must be queryable by data subject ID to support the compliance requirement to demonstrate that a specific user's data was deleted within the required timeframe.

Section 3: GDPR/CCPA deletion scope and the erasure procedure. Document the erasure procedure as a numbered checklist: the ordered set of tables to be modified for a given data subject ID erasure request, the modification method for each table (hard delete, nullification, anonymization), the order of operations (parent records before child records, or reverse order if foreign key constraints require it), the credential revocation step (access tokens, API keys, active session cookies), the backup snapshot handling (whether the erasure confirmation is sent before or after backup snapshots are overwritten by the next backup cycle, and whether on-demand snapshot scrubbing is supported or whether the retention period for backups is short enough that the backup overwrite completes within the erasure confirmation window), and the notification step (the confirmation email to the data subject and the internal ticket closure). Document the erasure request intake procedure: which email address or UI flow accepts erasure requests, what identity verification is required before processing (to prevent unauthorized erasure of another user's data), the maximum response time committed to the data subject (GDPR requires "without undue delay and in any event within one month of receipt"), and the internal escalation path when the request involves data categories that require legal review before deletion (e.g., records retained under GDPR Article 17(3) retention exceptions). The file storage decision record documents the object storage architecture; files stored in object storage (user-uploaded files, generated exports, profile photos) are personal data when they are associated with an identifiable user. The erasure procedure must include the deletion of files from object storage — the S3 or GCS bucket deletion, including any CDN cache invalidation for publicly accessible files — not only the deletion of the database records that reference those files. An erasure that deletes the database record but leaves the file in object storage has not completed the erasure under GDPR.

Section 4: Breach notification scope and the personal data inventory. Document the personal data inventory: the set of systems that contain personal data, organized by system name, the personal data categories present in each system, the approximate number of data subjects represented, and whether the system contains special-category data. The personal data inventory is the primary input to the GDPR Article 33 notification assessment. When a system is breached, the team looks up the system in the inventory, identifies the personal data categories present, reads the record counts, and uses this information to determine: whether the breach involves personal data (if the system contains no personal data, Article 33 does not apply); whether the breach involves special-category personal data (if it does, the consequences are more severe and notification is more urgent); the approximate number of data subjects concerned (from the record counts); and the likely consequences of the breach (derived from the data categories — a breach of contact data has different consequences than a breach of health data or financial data). Document the notification decision tree: the questions the team must answer, in order, to determine whether notification is required and to whom (Article 33 notification to the supervisory authority is required for all breaches of personal data that are likely to result in a risk to the rights and freedoms of individuals; Article 34 notification to affected data subjects is additionally required when the breach is likely to result in a high risk to those rights and freedoms). Document the notification template: a pre-filled Article 33 notification form with the system-level fields already populated from the personal data inventory and placeholders for the breach-specific fields (nature of the breach, measures taken, DPO contact details, likely consequences). Having the template prepared with the inventory data pre-filled reduces the time to complete the notification from days to hours under breach response conditions. The data pipeline decision record documents the data pipeline architecture; personal data that flows through the pipeline (from the operational database to the data warehouse, from the warehouse to analytics sub-processors, from the operational database to email service providers) must be tracked in the personal data inventory as it moves, so that a breach at any point in the pipeline can be assessed against the inventory entry for that system rather than requiring a pipeline audit during the breach response.

Section 5: Data sharing registry and third-party transfer mapping. Document the sub-processor registry: the list of third-party vendors who receive personal data from the product, what categories of personal data they receive, under what legal basis the transfer is made, whether a Data Processing Agreement (DPA) is in place, and whether the vendor is located outside the EU (triggering additional transfer mechanism requirements under GDPR Chapter V — Standard Contractual Clauses, adequacy decision, or Binding Corporate Rules). The sub-processor registry is required for the Article 30 Record of Processing Activities, for responding to DSAR questions about data sharing, and for the Article 33 notification's scope assessment — if a sub-processor's systems are breached, the controller (the product) is responsible for notifying the supervisory authority about the breach of data the controller transferred to the sub-processor. Document the transfer mechanism for each non-EU sub-processor: the legal mechanism used to transfer personal data outside the EU (Standard Contractual Clauses signed with the vendor, the vendor's participation in an adequacy framework such as the EU-US Data Privacy Framework, or a derogation under GDPR Article 49). Document the DPA status: whether a DPA is in place with each sub-processor, the version signed, the date signed, and the review cadence (DPAs must be updated when the sub-processor's data processing activities change materially). The email infrastructure decision record documents the email service provider; the email service provider is a sub-processor that receives at minimum the recipient email addresses for every transactional email — often including names, and in some implementations the full email content which may contain personal data referenced in the email body. The sub-processor registry entry for the email provider must document what personal data is included in the API payloads sent to the provider, whether the provider's DPA covers the data categories sent, and the provider's data retention policy for email content and delivery metadata.

None of these five sections appear in the founding session that asked what fields to store in the user profile. The founding session records that email, name, and company are collected. It does not document that all of them are personal data under GDPR, that IP addresses logged in the session table are also personal data, that the intake survey fields might constitute special-category health data, that the email service provider is a sub-processor requiring a DPA, or that a GDPR erasure request requires modifications to eight tables not just the primary users record. The data governance ADR converts the implicit classification decision — fields added without governance annotation over eighteen months — into a documented data map, a deletion scope contract, a personal data inventory, and a sub-processor registry, so that the next erasure request, retention enforcement run, breach notification assessment, or security questionnaire can be answered in hours rather than days. The WhyChose extractor recovers the founding user model session, the erasure implementation session, the breach response planning session, and the privacy policy drafting session from AI chat history, and surfaces them as the decision record chain that determines whether the team can respond to a regulatory deadline within the required window or must conduct a schema audit that takes longer than the window allows.

FAQs

What is the difference between a data retention decision record and a data governance decision record?

A data retention decision record documents how long each category of data is kept and when it transitions between storage tiers — hot, warm, cold, and deleted. It determines storage cost at scale and the time window during which data is available for recovery or investigation.

A data governance decision record documents the upstream question that makes retention decisions executable: which database tables and columns contain personal data, their classification (PII, special-category PII, pseudonymous, anonymized), the legal basis for collection under GDPR Article 6, who is responsible for maintaining that classification as the schema evolves, and the deletion scope contract that executes against those classifications when an erasure request arrives. Without the classification schema, retention tiers cannot be applied selectively — you cannot purge PII after a 90-day retention window without knowing which fields are PII. The governance ADR also documents the breach notification scope and the third-party data sharing registry, which are entirely absent from a retention decision record.

When does GDPR Article 33 require notification and how does data classification affect the 72-hour deadline?

GDPR Article 33 requires a controller to notify the supervisory authority of a personal data breach without undue delay and, where feasible, not later than 72 hours after becoming aware of it. The notification must include the nature of the breach, the categories and approximate number of data subjects concerned, the categories and approximate number of personal data records concerned, the likely consequences, and the measures taken or proposed.

A team with a maintained data classification map can answer every required field within hours of discovering a breach: look up the breached system in the personal data inventory, read the personal data categories and record counts, assess the likely consequences based on the data categories, and file the notification. A team without a data map must conduct the classification review during the breach response — determining which fields are personal data, whether any are special-category data under Article 9, and estimating record counts from query results — while simultaneously rotating credentials, patching the vulnerability, and managing stakeholder communication. This review takes days for any system of non-trivial complexity. Late notifications are themselves GDPR violations; the absence of a data map is not a mitigating factor.

What should a data governance ADR document that a privacy policy or terms of service does not?

A privacy policy is a public-facing legal disclosure document — accurate for its purpose but not operational. A data governance ADR must document four things a privacy policy cannot: first, the data classification schema — a field-by-field registry of which database columns contain personal data, their classification, and the legal basis for processing each; second, the deletion scope contract — the exact set of tables and columns modified when a GDPR erasure request arrives and the cascading deletion order; third, the breach notification scope — which systems are included in an Article 33 assessment and the pre-filled notification template from the personal data inventory; fourth, the sub-processor registry — the list of vendors who receive personal data, what categories, under what legal basis, and whether a DPA is in place.

A privacy policy says "we collect contact information." A data governance ADR says "contact information is stored in four tables — users, profiles, billing_addresses, and support_conversations.requester_email — with retention periods of 90 days, 1 year, 7 years (tax compliance), and 3 years respectively, and is transferred to SendGrid (email), Stripe (billing), and Zendesk (support) under DPAs signed on these dates." The gap between these two statements is the operational cost that accumulates when erasure requests, retention enforcement, and breach response use the privacy policy as their only source of truth.

Structural properties set by the data classification schema choice

What the founding session records and what it omits

The five sections of a data governance ADR

FAQs

Further reading