The database vendor decision record: why the database you chose determines your schema migration risk and your scaling ceiling
Database vendor selection looks like a technical default until a MySQL 5.7 application discovers window functions three years later — or a PlanetScale application discovers that foreign key constraints were never enforced. The SQL feature availability ceiling, the referential integrity enforcement model, and the horizontal scaling architecture are set at vendor selection time, before the team has the production experience that makes them visible. The vendor you chose at prototype time determines which analytics queries are unavailable without an engine upgrade, whether DELETE scripts silently create orphan rows, and what the migration cost looks like when the product actually needs to scale.
This post covers database vendor selection: the structural consequences of choosing PostgreSQL versus MySQL versus PlanetScale versus CockroachDB versus SQLite. It is explicitly different from the database migration strategy decision record, which addresses the tooling and process for applying schema changes safely — the expand-contract pattern, DDL locking behavior, large-table migration policy — rather than the vendor. You can choose the right migration tooling and still choose a vendor whose SQL feature availability, referential integrity model, or scaling architecture creates a multi-week remediation project two years after setup. This post covers the vendor decision.
A 12-person B2B analytics startup chose MySQL because the founding CTO's previous company had used MySQL and the DigitalOcean "How to set up a Rails app" tutorial used MySQL. The setup session covered configuring the Rails database.yml for MySQL, running the first migration, and confirming that the application could read and write records. Nothing in the tutorial or the setup session covered which version of MySQL was being installed, which SQL features that version supported, or what the upgrade path looked like. The instance was MySQL 5.7 — the current stable release at the time.
Three years later, the analytics team was building a dashboard showing cumulative activation events per user cohort over time. The feature required calculating running totals: for each user, the sum of activation events up to and including each date. The correct SQL for this is a window function: SUM(events) OVER (PARTITION BY user_id ORDER BY event_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). A developer opened a ChatGPT session to write the query and received a correct window function formulation. The application test failed with MySQL error 1064: You have an error in your SQL syntax near OVER (PARTITION BY. Window functions are available in MySQL 8.0, released in 2018. The application was running MySQL 5.7.
The developer opened a second ChatGPT session: "running totals in MySQL without window functions." The alternatives involved a self-join (SELECT a.user_id, a.event_date, SUM(b.events) FROM events a JOIN events b ON b.user_id = a.user_id AND b.event_date <= a.event_date GROUP BY a.user_id, a.event_date) or a correlated subquery. Both worked on development data with 200 rows. On production with 3.2 million event rows, the self-join query took 31 seconds. The window function equivalent on any MySQL 8.0 instance would have taken under 400 milliseconds. The developer added EXPLAIN output to another ChatGPT session, tuned indexes for two days, and got the self-join down to 8 seconds — still unusable for a user-facing dashboard, shipped as a nightly batch report instead.
The MySQL 5.7 to 8.0 upgrade required three weeks. The upgrade path on RDS was an in-place major version upgrade, which required: auditing all 340 application queries and 23 stored procedures for compatibility with MySQL 8.0 behavioral changes (approximately 40 documented incompatible changes, including new reserved keywords added in 8.0 — GROUPS, CUBE, ROLLUP, LATERAL — that conflicted with existing column names in two tables; changed default authentication plugin from mysql_native_password to caching_sha2_password that broke two older service clients; changed SQL mode enabling ONLY_FULL_GROUP_BY that rejected 11 existing GROUP BY queries); a maintenance window for the RDS major version upgrade that was unavailable for 18 minutes during the upgrade procedure; and regression testing across the full application. The window function query that had been available for three years was deployed the day after the upgrade completed. The analytics dashboard feature that was blocked for two months shipped the following morning.
The MySQL 5.7 version constraint was never documented. The selection rationale ("the CTO's last company used MySQL, the tutorial used MySQL") was never written down. The SQL features the application would eventually need — CTEs, window functions, JSON_TABLE — were never compared against the version's feature set at selection time. The upgrade revealed not just the analytics feature but also that the WITH ... AS CTE syntax, added in MySQL 8.0, had been unavailable for the same three years. Six other queries in the codebase had been written as nested subqueries specifically to avoid CTEs, based on a Stack Overflow comment from a MySQL 5.7-era answer. The workaround complexity had accumulated invisibly. See the pattern in decisions never written down.
The second incident was independent and more consequential for data integrity. A 9-person startup chose PlanetScale after the founding CTO saw a conference talk about PlanetScale's branching workflow for schema changes. The branching model was genuinely impressive: create a branch from production, apply schema changes on the branch, open a deploy request, merge it — zero-downtime schema migrations with a review workflow that prevented accidental production changes. The ChatGPT setup session covered connecting the Rails application to PlanetScale, creating the first deploy request for the initial schema migration, and understanding the branch-based workflow. The session was thorough documentation of how PlanetScale's tooling works. Nothing in the session covered PlanetScale's foreign key constraint model.
The schema migrations included add_foreign_key :comments, :users, add_foreign_key :projects, :teams, and five other foreign key declarations written in standard Rails migration syntax. All six migrations ran on PlanetScale via deploy requests without error. The Rails schema.rb file showed all six add_foreign_key declarations. The application code assumed referential integrity was enforced: the User.find! calls throughout the codebase raised ActiveRecord::RecordNotFound on missing users, which the team treated as a programming error, not as a data integrity error they would encounter in production with valid IDs.
Five months after launch, an engineer wrote a data cleanup script to remove users who had not logged in for more than a year. The script ran in a Rails task: User.where('last_active_at < ?', 1.year.ago).delete_all. It was tested on staging, where it ran without errors and removed 23 records. On production, it deleted 4,200 users. Their user_id references remained in the comments, projects, activity_logs, team_memberships, and notifications tables — all pointing to deleted users. No cascade happened. No database error was raised. PlanetScale runs on Vitess, which does not enforce foreign key constraints. The add_foreign_key declarations in the Rails migrations were accepted by PlanetScale's migration tooling but no underlying database constraint was created. INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS returns zero rows on any PlanetScale database.
The application discovered the orphaned rows 11 hours later via error monitoring, when a user browsed to a page that loaded comments from deleted accounts. User.find!(comment.user_id) raised ActiveRecord::RecordNotFound for 14% of loaded comment sets. The next four days involved identifying all affected rows (a LEFT JOIN audit across six tables), deciding which orphaned records could be safely deleted versus which needed a placeholder "deleted user" record, writing migration scripts with PlanetScale deploy requests, and deploying application code changes to handle the missing-user case gracefully. The referential integrity had been application-managed from day one — the team simply had not known that, because nothing in the setup session, the PlanetScale documentation linked from the setup session, or the migration tooling output had surfaced the constraint.
PlanetScale's documentation does document the foreign key limitation. It appears in a "Limitations" page under their documentation. The setup sessions that produce PlanetScale deployments almost never include that limitations page — because the session goal is to get the application connecting and the first migration running, and both succeed without error regardless of whether FK constraints are in scope.
The three structural properties that database vendor selection determines
When teams evaluate databases, the conversation focuses on ecosystem fit (which database does the ORM recommend?), operational familiarity (what has the team worked with before?), and time to first working query. These are real evaluation criteria. The structural properties that determine whether the selection ages well — whether the SQL feature set covers the analytics queries the team will need in year two, whether referential integrity is enforced at the layer that prevents data corruption scripts, and whether the scaling architecture matches the product's actual growth trajectory — are set at selection time and are rarely revisited until one of them causes an incident.
SQL feature availability and version ceiling
The SQL feature set is not just a property of the database engine family but of the specific version deployed and whether the team has a policy for major version upgrades. A PostgreSQL deployment is almost never blocked by a missing SQL feature — PostgreSQL has supported window functions since version 8.4 (2009), CTEs since 8.4, lateral joins since 9.3, and JSONB since 9.4. Any PostgreSQL deployment running a version from the last decade supports the full feature set most applications use. PostgreSQL's annual major version release cycle with five-year support windows gives teams clear visibility into the upgrade path and the feature availability at each version.
MySQL's feature availability is more stratified across versions. MySQL 5.7, which was the default installation for a significant portion of the decade from 2015 to 2023, is missing window functions, CTEs, and Instant DDL for most ALTER TABLE operations — all of which are available in MySQL 8.0 and in PostgreSQL since 2009. MySQL 5.7 reached end of life in October 2023, meaning applications still on 5.7 after that date receive no security patches. The upgrade from 5.7 to 8.0 is a major version upgrade with approximately 40 documented behavioral incompatibilities, making it a non-trivial testing effort. Many teams that chose MySQL in 2016–2019 are running 5.7 installations that have never been upgraded, carrying both a security liability and a SQL feature ceiling that prevents modern query patterns.
The specific features that matter are the ones that become load-bearing as the product matures. Window functions (ROW_NUMBER(), RANK(), LAG(), LEAD(), SUM() OVER()) are the standard tool for analytics queries involving rankings, running totals, and cohort analysis — queries that appear in year two when the product has data worth analyzing and someone builds a reporting feature. CTEs are the standard tool for complex multi-step queries and recursive queries. Both are features that applications work around with subqueries and self-joins — at significant query performance cost — if they are unavailable at the database version in use.
Beyond window functions and CTEs, the feature gap that matters most between PostgreSQL and MySQL is in JSON support and full-text search. PostgreSQL's JSONB type supports GIN indexing (which enables fast containment queries: jsonb_column @> '{"key": "value"}' uses a GIN index), a rich operator set, and functions like jsonb_to_recordset() for treating JSON arrays as relational result sets. MySQL's JSON type, available since 5.7, lacks GIN-style indexing on JSON content (MySQL 8.0 adds multi-valued indexes on JSON arrays, but it is not the same model), and the JSON operator syntax is different enough from PostgreSQL's that ORM-level JSON queries are not portable between the two. PostgreSQL's full-text search via tsvector, tsquery, and GIN indexes is a complete solution for relevance-ranked text search with language stemming — teams use it successfully to avoid a dedicated search infrastructure decision until their corpus reaches millions of documents. MySQL's FULLTEXT index supports full-text search but without PostgreSQL's ts_rank relevance scoring, without stemming by default, and with a minimum word length that excludes short technical terms without configuration.
The version policy for the database is as important as the initial vendor choice. A PostgreSQL selection without a documented major version upgrade cadence has the same problem as a MySQL 5.7 installation: the application accumulates SQL workarounds for features that are available in the current version but not in the deployed version, and the upgrade accumulates unexamined behavioral changes. The version policy — current version, minimum required version for the features in use, EOL date, upgrade procedure, and test plan for behavioral changes — belongs in the database selection ADR, not in a future sprint ticket.
Referential integrity enforcement model
Whether the database enforces foreign key constraints at the engine level is not a configuration detail — it is a structural property of the database vendor choice that determines whether DELETE and UPDATE operations on parent tables can produce orphan rows in child tables without any error being raised.
PostgreSQL enforces foreign key constraints by default. ALTER TABLE child ADD CONSTRAINT fk_parent FOREIGN KEY (parent_id) REFERENCES parent(id) creates a real constraint that the database engine enforces on every INSERT into child and on every DELETE or UPDATE of a referenced row in parent. Attempting to delete a parent row that has child rows raises ERROR: update or delete on table "parent" violates foreign key constraint "fk_parent" on table "child". The application never sees a null reference from a deleted parent — the database rejects the delete. PostgreSQL also provides cascade options (ON DELETE CASCADE, ON DELETE SET NULL, ON DELETE RESTRICT) for automated handling of the cascade. The NOT VALID option enables adding FK constraints to large tables without scanning existing rows, validating only new rows — making it safe to add constraints to tables with millions of rows without a full table scan. The database migration strategy decision record covers this pattern: adding a constraint as NOT VALID first, then running VALIDATE CONSTRAINT separately as a less-blocking operation.
MySQL InnoDB enforces foreign key constraints. The InnoDB storage engine, which has been the default MySQL storage engine since MySQL 5.5, implements FK constraints and cascade behavior equivalently to PostgreSQL. The global variable foreign_key_checks can be set to 0 to disable constraint checking session-wide — a practice that is sometimes used during bulk data imports but which is dangerous in application code paths. MariaDB, the MySQL fork, also enforces FK constraints via InnoDB.
SQLite presents a subtle failure mode. SQLite parses and stores FOREIGN KEY declarations in the schema, but it does not enforce them unless the pragma PRAGMA foreign_keys = ON is set for the database connection. The default is off. This means that a SQLite database schema can declare foreign key relationships and the application ORM can include foreign key declarations in migration files — and the constraints will silently not be enforced unless every database connection in every process explicitly sets the pragma at connection open time. SQLite's documentation is clear about this behavior, but it is counterintuitive enough that many teams discover it only when a data integrity issue surfaces. ORM configuration for SQLite FK enforcement: Rails/Active Record sets the pragma via the foreign_keys: true pragma in database.yml; Sequelize requires explicit connection hook configuration; SQLAlchemy uses event.listen(engine, "connect", lambda c, r: c.execute("pragma foreign_keys=ON")). Teams using SQLite must verify that every connection that reaches the database has the FK pragma enabled — including background job workers, analytics queries, and data import scripts that may use a separate database connection.
PlanetScale does not enforce foreign key constraints. As described in the opening narrative, PlanetScale runs on Vitess — a sharding layer for MySQL designed for horizontal scale-out. Cross-shard foreign key enforcement is architecturally incompatible with Vitess's sharding model: a parent row and its child rows can reside on different shards, and enforcing a foreign key constraint across shards would require distributed transaction coordination that defeats the throughput benefits of sharding. PlanetScale accepts FOREIGN KEY syntax in schema migrations (they are parsed and stored in the schema metadata) but does not create database-level constraints. Rails' add_foreign_key migration runs without error. No constraint exists. All referential integrity is application-managed. This is documented in PlanetScale's official documentation under their list of limitations. The appropriate detection procedure for applications already running on PlanetScale is a LEFT JOIN audit query for each parent-child relationship: SELECT count(*) FROM child LEFT JOIN parent ON child.parent_id = parent.id WHERE parent.id IS NULL. Teams that discover unexpected orphan counts from this query should audit every DELETE and UPDATE code path in the application for whether it handles the missing-parent case.
CockroachDB enforces foreign key constraints, including across geo-partitioned data. The enforcement works because CockroachDB's distributed transaction model guarantees strong consistency across nodes — a FK constraint check is just another read in the transaction. The latency implication is that FK constraint checks for geo-partitioned tables can add cross-region read latency to writes: if the parent table is in us-east and the write is in eu-west, the FK check reads the parent row from us-east before confirming the child row write in eu-west. This is typically 80–150ms of additional write latency for cross-region FK checks, which is acceptable for most application writes but must be documented so that teams don't discover it as a performance regression after deploying geo-partitioned tables.
The referential integrity model belongs in the database selection ADR as an explicit property, not as a footnote. For every team running on PlanetScale or SQLite, the ADR should state: "FK constraints are not enforced at the database level; referential integrity is application-managed; the following validation requirements apply to all DELETE and UPDATE scripts against parent tables." See the data retention decision record for the intersection of referential integrity enforcement with data archival and deletion procedures — bulk deletion scripts that run correctly under PostgreSQL FK enforcement may produce orphan rows silently under PlanetScale.
Scaling architecture and migration cost
Database scaling is a problem that most SaaS products do not encounter at the scale at which it is discussed in database selection conversations. The discourse around "what happens when you need to scale?" encourages teams to evaluate distributed databases at prototype stage, before they have evidence of the throughput requirements that would justify the operational cost. Understanding the actual scaling ceiling of each option — and the migration cost when that ceiling is reached — produces better decisions than abstract discussions of horizontal scalability.
PostgreSQL on managed infrastructure scales vertically to sizes that handle the vast majority of SaaS products through their Series B. RDS db.r6g.16xlarge provides 64 vCPUs and 512 GB of RAM. A well-indexed PostgreSQL instance on that hardware can handle tens of thousands of queries per second and store billions of rows across multiple tables. Adding read replicas offloads read traffic from the primary: Aurora PostgreSQL supports up to 15 read replicas with sub-10ms replica lag, and read replicas can serve analytics queries and background job reads without impacting primary write throughput. PgBouncer in transaction mode can multiplex thousands of application connections onto hundreds of database server processes, extending the connection capacity ceiling without changing the underlying infrastructure. The vertical scaling ceiling — the point at which a single primary can no longer handle the write throughput — for a well-managed PostgreSQL deployment is typically reached at write rates exceeding 50,000–100,000 transactions per second on large instance types. Few SaaS products ever reach that threshold.
Aurora PostgreSQL and Aurora Serverless v2 extend PostgreSQL with cloud-native features: Aurora Global Database for multi-region active-active and active-passive topologies (typically 20–40ms replication lag to secondary regions), automated storage scaling (up to 128 TB per cluster without manual volume management), and Aurora Serverless v2 for auto-scaling compute that responds to traffic spikes without manual instance resizing. These features are layered on top of standard PostgreSQL without changing the wire protocol, the SQL dialect, or the FK enforcement behavior. Aurora's main limitation is the storage I/O pricing model: Aurora charges per I/O operation at $0.20 per million IOs on standard storage, which can be significant for write-heavy workloads; Aurora I/O-Optimized pricing (flat per-cluster-hour with no per-IO charges) is available and often more predictable for write-intensive applications.
PlanetScale's Vitess-based horizontal sharding is designed for write throughput that exceeds what a single MySQL primary can handle — workloads at the scale of Slack's and GitHub's message stores, where the write rate is millions of rows per second across a horizontally sharded table space. For applications at startup scale, PlanetScale's sharding capability is not the feature being used: most PlanetScale customers run on a single-shard deployment that functions identically to a hosted MySQL instance, with the additional developer experience of the branching and deploy request workflow. The FK constraint absence is a cost of the sharding architecture that applies at single-shard scale too — the FK constraints are absent from deployment day one, regardless of whether sharding is ever used.
CockroachDB's distributed transaction model provides genuine horizontal write scaling: multiple nodes, each holding a portion of the data, coordinate writes via the Raft consensus protocol. A CockroachDB cluster of three nodes in the same region handles write coordination with approximately 2–5ms of additional latency per transaction versus a single-node PostgreSQL write. A CockroachDB cluster with geo-distributed nodes for data residency adds 80–200ms of Raft coordination latency for cross-region writes. The appropriate use case for CockroachDB at startup stage is when there is a contractual or regulatory data residency requirement that mandates that specific rows (e.g., EU customer data) are stored only on EU hardware, with strong consistency guarantees — CockroachDB's geo-partitioned tables make this tractable without application-level sharding logic. The inappropriate use case is choosing CockroachDB preemptively because "we might need horizontal scale someday" — the Raft coordination overhead, the 5% PostgreSQL compatibility gap, and the operational complexity of a distributed cluster are costs the product pays from day one, not only when the scale is needed.
The migration cost from one database vendor to another is a useful forcing function for evaluating whether the scaling rationale is real. Migrating from PostgreSQL to CockroachDB: 95% SQL compatibility means most queries work without changes; the 5% that doesn't includes specific syntax, some pg_catalog queries used by frameworks, and plpgsql stored procedure constructs; a migration project for a 200GB PostgreSQL database with a dozen services is typically 6–12 weeks including schema compatibility audit, ORM compatibility testing, dual-write validation, and cutover procedure. Migrating from MySQL to PostgreSQL: the SQL dialects are different enough that query-level migration tooling exists but requires per-query review; stored procedures require rewriting; the migration project for a comparable database is typically 8–16 weeks including query audit, stored procedure rewrite, data migration, and application testing. Migrating from PlanetScale to standard MySQL: straightforward if no PlanetScale-specific features are used (branching workflow is build-time, not runtime); the FK constraint absence may have produced orphan rows that must be cleaned up before FK enforcement can be added on the target MySQL instance. These migration costs are worth estimating before making the initial selection, not after the migration becomes necessary.
The options
PostgreSQL
PostgreSQL is the most feature-complete open-source relational database. It supports window functions (since 8.4, 2009), CTEs and recursive CTEs (since 8.4), JSONB with GIN indexing (since 9.4), array types, range types, advisory locks, LISTEN/NOTIFY for asynchronous notifications, full-text search via tsvector/tsquery/GIN, lateral joins, and an extension ecosystem that includes pgvector for vector similarity search, pg_trgm for trigram similarity and fuzzy text search, and PostGIS for geospatial data. Foreign key constraints are enforced by default. The annual major version release cycle with five-year support windows gives teams clear version management. Managed hosting options include RDS PostgreSQL, Aurora PostgreSQL, Supabase (PostgreSQL plus a built-in REST API, realtime, auth, and storage layer), Neon (PostgreSQL with branching for development/test environments and auto-suspend at zero traffic), Railway, and Render. Advisory locks and LISTEN/NOTIFY are used by PostgreSQL-backed background job systems like Oban (Elixir), GoodJob (Ruby), and Que (Ruby) for distributed lock coordination and job notification — these features are PostgreSQL-specific and unavailable on MySQL-family databases, which means switching PostgreSQL-backed job queues to a MySQL database requires replacing the job queue library.
MySQL and MariaDB
MySQL has the widest hosting support of any relational database — nearly every shared hosting, VPS provider, and PaaS platform supports MySQL. InnoDB enforces foreign key constraints. MySQL 8.0 adds window functions, CTEs, Instant DDL for many ALTER TABLE operations, improved JSON support (JSON_TABLE(), multi-valued indexes on JSON arrays), and the caching_sha2_password authentication plugin. MySQL 5.7 reached end of life in October 2023. Applications on MySQL 5.7 should be treated as a security liability and an upgrade priority, not as a stable baseline. The upgrade from 5.7 to 8.0 has approximately 40 documented behavioral incompatibilities that require application review. MariaDB is a MySQL-compatible fork under the GPLv2 license that diverges from MySQL at specific features (MariaDB temporal tables, MariaDB-specific JSON extensions) while maintaining query-level compatibility for most applications. MySQL is appropriate for applications already running MySQL, legacy CMSes and frameworks (WordPress, Drupal, Laravel tutorials that default to MySQL), and teams with existing MySQL operational expertise where the 8.0 feature set is adequate for the application's query requirements.
SQLite
SQLite is a serverless embedded database — the database runs as a library within the application process with no separate server process, no network protocol, and no connection management overhead. WAL (Write-Ahead Logging) mode enables concurrent readers alongside a single writer, making it suitable for read-heavy application workloads. SQLite supports window functions from version 3.25.0 (2018) and CTEs from 3.8.3 (2014). Foreign key constraints are enforced only when PRAGMA foreign_keys = ON is set per connection — the default is off, and ORM configurations must explicitly enable this pragma. SQLite is appropriate for local-first applications, desktop software, edge deployments, and single-process applications with low write concurrency. Turso provides a hosted distributed SQLite platform. Litestream provides continuous streaming replication of a SQLite database to S3 as a lightweight backup solution. SQLite is not appropriate for multi-process application servers without careful WAL mode configuration and explicit FK pragma management — the single-writer constraint becomes a bottleneck under concurrent write load, and the FK-off default produces silent data integrity gaps if not explicitly managed in every connection path.
PlanetScale
PlanetScale provides a MySQL-compatible hosted database built on Vitess, Google's open-source MySQL sharding layer. The developer experience features — branching for schema changes, deploy requests for migration review and rollout, query insights for performance analysis — are genuinely best-in-class. PlanetScale's branching workflow solves a real problem: schema migrations on a branch that can be reviewed, tested, and merged with zero-downtime deployment, without requiring external migration tooling like gh-ost or pt-online-schema-change. Foreign key constraints are not enforced — this is a documented Vitess limitation that cannot be changed without abandoning the horizontal sharding architecture. PlanetScale is appropriate for applications with genuine extreme write throughput requirements (millions of rows per second across sharded tables) where the FK constraint absence is an accepted tradeoff, or for applications that evaluate the branching workflow as worth the FK limitation tradeoff and implement application-level referential integrity consistently. Applications that select PlanetScale primarily for the developer experience without evaluating the FK constraint model are choosing application-managed referential integrity without realizing it.
CockroachDB
CockroachDB is a distributed SQL database with a PostgreSQL-compatible wire protocol. SERIALIZABLE isolation is the default and only isolation level — stronger than PostgreSQL's READ COMMITTED default, which eliminates certain read anomalies but produces more transaction retry errors under high contention. Foreign key constraints are enforced, including for geo-partitioned tables at the cost of cross-region read latency for the FK check. CockroachDB's PostgreSQL compatibility is approximately 95%: most PostgreSQL applications connect and query CockroachDB without code changes; the 5% includes COPY TO STDOUT, certain pg_catalog queries, some plpgsql constructs, and sequence behaviors in specific contexts. CockroachDB Serverless provides a pay-per-use model appropriate for development and low-traffic applications; CockroachDB Dedicated provides a dedicated cluster for production workloads. The geo-partitioned replicas feature enables data residency compliance by pinning specific rows (by row locality) to specific geographic regions, with strongly consistent reads from the closest replica — a genuine capability for GDPR EU data residency requirements at scale. CockroachDB is appropriate when horizontal write scaling is a demonstrated requirement or when geo-partitioned data residency with strong consistency is a contractual need; it is not appropriate as a preemptive selection at prototype scale where a single PostgreSQL primary would handle the load with orders of magnitude of headroom.
The AI chat sessions that produce undocumented database vendor decisions
Database vendor selection happens in four types of ChatGPT and Claude sessions. Each session closes with a working outcome. Each session's reasoning disappears. The combination produces a deployed database whose version ceiling, referential integrity model, and scaling architecture are undocumented.
The initial setup session is the one that selects the vendor. "What database should I use for a Rails app?" or "How do I set up a Node.js app with a database?" The response recommends a vendor based on the tutorial ecosystem (MySQL for many Rails and PHP tutorials, PostgreSQL for most recent Node.js and Python tutorials), the team's stated familiarity, or the stack the responder happens to know. The session covers installing the database, configuring the ORM connection, running the first migration, and verifying that the application can read and write. It closes when the first query returns results. The version pinned at that moment is whatever was current at the time. The SQL features that version supports are not enumerated. The FK enforcement model is not discussed. The scaling ceiling is not projected. Nothing in the session output is wrong — but nothing about the structural consequences of the choice appears anywhere the team will find it two years later when the consequences arrive.
The ORM migration setup session comes next. "How do I use [ORM] migrations with [database]?" This session sets up the migration framework — Flyway, Liquibase, Rails migrations, Alembic, Django migrations — and establishes the conventions for writing migrations. Foreign key declarations are added to migration files as a matter of course, because ORMs make it easy and because the team expects them to be enforced. On PostgreSQL and MySQL/InnoDB, they are enforced. On PlanetScale, the migration files succeed without error and the constraints don't exist. On SQLite without the FK pragma, they also don't exist. Nothing in the session output identifies whether the FK declarations are real — because on PostgreSQL they are, and most sessions assume PostgreSQL-equivalent behavior.
The analytics feature session is the one that discovers the version ceiling. "How do I calculate a running total in MySQL?" or "How do I rank users by activity?" The response correctly suggests window functions. The application throws a SQL syntax error because it's on MySQL 5.7. A follow-up session works around the limitation. The version that introduced window functions is mentioned in passing in the response, but the connection between the constraint ("your version is too old") and the remediation plan ("upgrade to 8.0 on this schedule with this test plan") is never made explicit and never written down. The workaround ships. The upgrade becomes a deferred item that, if it ships at all, takes three weeks instead of one because the compatibility checklist was never prepared.
The data cleanup session is the one that discovers — or silently avoids discovering — the referential integrity gap. "How do I delete inactive users?" The SQL is correct. The test passes on staging. The production run deletes parent rows without cascade. Whether orphan rows are produced depends entirely on the FK enforcement model, and whether the team discovers them depends on whether application error monitoring fires within the staleness window. Teams running PostgreSQL never see the orphan row scenario from a correctly written delete script. Teams running PlanetScale see it eventually — the question is whether they see it during a controlled test run or during a production incident. The session that wrote the delete script never asked "does this database enforce FK constraints?" because the team believed the FK declarations in the migration files meant the constraints were enforced. The decision that the constraints were real was never made — it was assumed from the ORM syntax, and the assumption was wrong.
Writing the database vendor ADR
The database vendor ADR has five sections. None are derivable from the schema files or the ORM configuration in the codebase. Each represents a decision made at selection time that every future migration, analytics feature, and scaling conversation will depend on.
The first section is the vendor selection with alternatives evaluated. Not just "we chose PostgreSQL because the team knows it" but: why MySQL was evaluated and rejected (specific SQL feature gap in version at time of evaluation, or specific operational concern), why CockroachDB was evaluated and rejected at this stage (vertical scaling ceiling not yet reached, Raft coordination overhead not justified by current write throughput, 5% compatibility gap introduces application code risk without a demonstrated need for the horizontal scale), and which managed hosting layer was chosen and why. The rejection reasons are the section that prevents the next engineer from re-opening the vendor evaluation from scratch when a scaling concern appears.
The second section is the version policy. Current version and minimum required version (with specific SQL features that depend on that minimum: "minimum MySQL 8.0 required for window functions used in analytics queries in analytics/cohort_calculator.rb"), end-of-life date for the current version, major version upgrade cadence (annually, or on a specific trigger), and the upgrade procedure with a compatibility checklist reference. The upgrade procedure should exist before the upgrade is needed — three weeks of a MySQL 5.7-to-8.0 migration compresses to one week if the compatibility checklist was prepared at selection time and kept current as new queries were added to the codebase.
The third section is the referential integrity model. Enforcement mechanism: database-level (PostgreSQL, MySQL InnoDB, CockroachDB) or application-level (PlanetScale, SQLite without FK pragma). For application-level enforcement: the explicit list of every parent-child relationship and the application code path that enforces integrity for each. The detection procedure for orphan rows: the LEFT JOIN audit query for each FK relationship, with a schedule for running it (monthly, or before any bulk delete operation). The validation requirement for all DELETE and UPDATE scripts against tables that have FK relationships: "scripts deleting rows from parent tables must be reviewed against the FK relationship list and tested with the LEFT JOIN audit query on a staging snapshot before production execution." For SQLite: the specific connection configuration that enables the FK pragma, and confirmation that every application connection path — including background workers, data import scripts, and analytics queries — enables the pragma.
The fourth section is the SQL feature and portability policy. Two subsections: features the application IS using that are engine-specific (PostgreSQL advisory locks for job queue coordination, LISTEN/NOTIFY for realtime notifications, JSONB GIN index operators, array types, pg_trgm for trigram similarity — each with the code path that uses it and the migration cost if the feature is unavailable on a future database), and features the application explicitly avoids in order to preserve migration optionality (if the team has decided to remain portable between PostgreSQL and MySQL, which features are excluded from the codebase and enforced via a linter or code review checklist). This is a deliberate choice about whether to use the full capability of the chosen engine or restrict to portable SQL. Neither choice is universally correct — using engine-specific features produces better-performing, more maintainable code at the cost of migration optionality; restricting to portable SQL preserves options at the cost of workarounds. The ADR makes the choice explicit rather than leaving it as an implicit constraint discovered during a future migration planning session. See the connection between feature choices and multi-tenancy model decisions — per-schema multi-tenancy relies on PostgreSQL-specific schema support that is unavailable in MySQL's namespace model.
The fifth section is the scaling model and trigger. Vertical scaling ceiling at the current instance class (current class: db.t4g.medium — 2 vCPU, 8 GB; ceiling before degradation: approximately 500 concurrent connections, 5,000 queries/second; next class: db.r7g.xlarge — 4 vCPU, 32 GB; ceiling: approximately 2,000 connections, 20,000 queries/second). Expected record volume and write throughput at 12-month projection based on current growth rate. Specific measurable conditions for evaluating horizontal scaling: "re-evaluate database architecture when daily write rate exceeds 5 million rows per day AND p99 write latency on the primary exceeds 50ms after read replica offload AND query analysis confirms write amplification rather than query optimization as the bottleneck." Not "when we need more scale" — which is vague and tends to produce premature migrations — but specific metrics with thresholds that can be checked against the current Caddy logs and slow query log. The performance optimization decision record covers the methodology for identifying write amplification versus query optimization as the root cause of database latency, which is the decision point before a horizontal scaling migration is justified.
Write all five sections before the first production incident. The version ceiling, FK enforcement model, and SQL feature restrictions are cheapest to document when the setup session is fresh — the original developer still knows what was evaluated and why. The documentation that prevents the MySQL 5.7 window function incident is a version policy written when 5.7 was installed. The documentation that prevents the PlanetScale orphan row incident is the referential integrity model section that says "FK constraints are not enforced at the database level; all DELETE scripts against parent tables must be reviewed against the FK relationship list before execution." Specific consequences, not general caveats. Use the ADR format guidance for writing a Consequences section that gives the next engineer the information they need to evaluate whether the original decision is still correct — not just confirmation that a decision was made. For database vendor selection, that means specific SQL feature lists, specific FK enforcement facts, and specific scaling numbers, not "Postgres is powerful and widely supported."
The database vendor ADR is also the document that informs onboarding. A new engineer joining the team can read the ADR and understand in fifteen minutes: why this database was chosen over the alternatives, which SQL features are in use and what version is required, whether FK constraints are enforced at the database layer or must be validated manually, which engine-specific features have been deliberately adopted, and what the trigger is for the next scaling evaluation. That fifteen-minute briefing is what the new CTO problem is about — the decisions made before the new engineer arrived, extracted from the AI chat sessions where they were made, structured into a document that survives the original developer's departure. The WhyChose extractor surfaces the database setup and scaling sessions where the original vendor reasoning lives, turning the chat history into the evidence base the ADR needs.
FAQs
Does PlanetScale enforce foreign key constraints?
PlanetScale does not enforce foreign key constraints. PlanetScale runs on Vitess, which is a MySQL sharding layer designed for horizontal write scalability. Vitess's sharding model is incompatible with database-level foreign key constraint enforcement because parent and child rows can reside on different shards, and cross-shard constraint checks would require distributed transaction coordination that undermines the throughput benefits of sharding. PlanetScale accepts FOREIGN KEY syntax in schema migrations without error — Rails add_foreign_key migrations succeed, MySQL Workbench shows FK declarations in the schema inspector — but INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS returns zero rows on any PlanetScale database because no underlying constraints were created. All referential integrity is application-managed from deployment day one.
The practical consequence is that DELETE and UPDATE operations on parent tables succeed without raising a constraint violation and without executing cascade behavior, even when child rows exist that reference the deleted or updated parent. The application discovers orphaned rows through business logic errors — null reference exceptions on user lookups, missing records in joins — not through database errors. The appropriate detection procedure is a LEFT JOIN audit query for each parent-child relationship: SELECT count(*) FROM child LEFT JOIN parent ON child.parent_id = parent.id WHERE parent.id IS NULL. Teams already running on PlanetScale should run this audit for every declared FK relationship in the schema. The percentage of orphaned rows in a mature PlanetScale application that was not explicitly designed for application-managed referential integrity is typically between 1% and 8% of child table rows. Remediation requires identifying which orphaned rows can be deleted, which need a placeholder parent record, and updating application code to handle the missing-parent case defensively. PlanetScale's documentation lists foreign key constraints as a limitation; the limitation appears on the documentation page that setup sessions and ORM configuration sessions almost never open.
What SQL features are unavailable in MySQL 5.7 that require upgrading to MySQL 8.0?
MySQL 5.7 is missing several SQL features that became standard in the years after its release. The most consequential for application development are window functions (ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), NTILE(), FIRST_VALUE(), LAST_VALUE(), aggregate window functions like SUM() OVER() and AVG() OVER()) — all added in MySQL 8.0 and available in PostgreSQL since version 8.4 (2009); Common Table Expressions including recursive CTEs (WITH ... AS () syntax) — added in MySQL 8.0, available in PostgreSQL since 8.4; Instant DDL (ALGORITHM=INSTANT) for many ALTER TABLE operations — MySQL 5.7 requires a full table copy for most schema changes, while MySQL 8.0 adds instant-mode changes for operations including ADD COLUMN and MODIFY COLUMN for compatible types; and improved JSON functions including JSON_TABLE() for treating JSON arrays as relational result sets, available only in 8.0.
MySQL 5.7 reached end of life in October 2023. Applications still running 5.7 after that date receive no security patches. The upgrade to MySQL 8.0 involves approximately 40 documented behavioral changes: new reserved keywords (GROUPS, CUBE, ROLLUP, LATERAL, SYSTEM) that conflict with existing table or column names in some applications; changed default character set from latin1 to utf8mb4 (which is the correct default but can affect storage estimates and index sizes); changed default authentication plugin from mysql_native_password to caching_sha2_password (which breaks older client libraries that do not support caching_sha2_password); and ONLY_FULL_GROUP_BY added to the default SQL mode, which rejects GROUP BY queries where the SELECT list includes non-aggregated columns not in the GROUP BY — queries that MySQL 5.7 accepted with non-deterministic behavior. A migration from 5.7 to 8.0 requires reviewing all queries and stored procedures against these change categories before running the RDS in-place upgrade. Teams that prepared the compatibility checklist at 5.7 installation time can execute the upgrade in approximately one week. Teams running the audit retrospectively typically require three to five weeks depending on application size.
When should a startup choose CockroachDB over PostgreSQL?
A startup should choose CockroachDB over PostgreSQL when they have a specific, demonstrated requirement for horizontal write scaling that vertical scaling cannot meet, or when they have a contractual data residency requirement that mandates specific rows be stored only on hardware in a specific geographic region with strong consistency guarantees. These are uncommon requirements at startup scale. The vast majority of SaaS products — including products with millions of active users and billions of rows — can be served by a properly configured PostgreSQL instance on the largest available managed instance class. RDS db.r6g.16xlarge (64 vCPUs, 512 GB RAM) with PgBouncer transaction mode pooling and Aurora read replicas handles write throughput that exceeds the requirements of nearly all Series A and many Series B companies. The vertical scaling ceiling, where a single-primary write throughput is genuinely exhausted, is typically above 50,000 transactions per second on large instance types — a threshold that almost no SaaS startup reaches before they have the engineering resources to design a proper sharding or CQRS architecture anyway.
CockroachDB carries costs that apply from day one. Every write transaction goes through Raft consensus coordination, adding 2–5ms of latency per transaction in a single-region cluster versus a local write on a single-node PostgreSQL instance. The PostgreSQL compatibility is approximately 95%, not 100%: COPY TO STDOUT, some pg_catalog queries, plpgsql constructs, and certain sequence behaviors require application code changes — small changes, but changes that must be discovered and fixed during the migration validation phase. Operating a distributed database cluster requires expertise in distributed system operations — node replacement, rolling upgrades, Raft log inspection — that is different from, and rarer than, PostgreSQL operations knowledge. The correct question is not "could we need horizontal write scaling?" but "have we demonstrated that our write throughput cannot be met by a larger PostgreSQL instance with read replicas, and do we have the evidence to show that vertical scaling headroom is exhausted?" If the answer is no — if the vertical ceiling has not been reached — choose PostgreSQL and revisit CockroachDB with data when the need is demonstrated. If the answer is yes, CockroachDB is a legitimate choice; its PostgreSQL wire protocol compatibility means the migration is feasible, and its FK enforcement and SERIALIZABLE default isolation are advantages over sharding PostgreSQL manually.
What should a database vendor selection ADR document that teams typically skip?
Teams typically document the database vendor name, the ORM configuration, and the migration framework. The ADR sections that prevent the incidents described in this post are: first, the version policy — the current version and minimum required version with specific features that depend on that minimum (not "we need MySQL 8.0" but "we need MySQL 8.0 because window functions are used in analytics/cohort_queries.rb and CTEs are used in reports/user_retention.rb"), the EOL date for the current version, and the major version upgrade procedure with a compatibility checklist. Second, the referential integrity model — explicitly documenting whether FK constraints are enforced at the database level or application level; for PlanetScale the statement should be "FK constraints are not enforced at the database engine level; PlanetScale runs on Vitess which does not support cross-shard FK enforcement; all referential integrity is application-managed; the detection procedure for orphan rows is [LEFT JOIN queries]; all DELETE and UPDATE scripts affecting parent tables must be run against the FK relationship list before production execution." For SQLite: which connection configuration enables the FK pragma and confirmation that every application code path sets it. Third, the SQL feature and portability policy — which engine-specific features are intentionally used (PostgreSQL advisory locks, LISTEN/NOTIFY, JSONB operators, array types) and the application code paths that use them, versus which features are intentionally avoided to preserve migration optionality.
Fourth, the scaling model — current instance class with its specific capacity ceiling (connections, queries/second, storage), 12-month projected record volume and write rate based on current growth, and specific measurable thresholds that trigger a scaling architecture evaluation (not aspirational, but checkable against monitoring data: "re-evaluate when daily writes exceed X and p99 write latency exceeds Y ms"). Fifth, the managed hosting layer — which service hosts the database, what the backup and restore procedure is, what the major version upgrade procedure is on that specific platform, and what the migration path off the platform looks like with a cost estimate. None of these five sections appear in the codebase — they cannot be inferred from the schema files, the ORM configuration, or the migration history. They represent the vendor reasoning that every future engineer who touches the database will need to understand, and the documentation that prevents a three-week MySQL upgrade from becoming a six-week one, a PlanetScale orphan row incident from becoming a multi-day data remediation, and a CockroachDB preemptive migration from being proposed when a read replica would solve the problem in two hours.