The monorepo vs polyrepo decision record: why the repository structure you chose determines your cross-service change atomicity and your CI pipeline isolation
Monorepo versus polyrepo looks like a file organization preference until a cross-service type rename requires eleven coordinated pull requests across eleven repositories to land in the correct sequence because any intermediate state breaks the build, or a monorepo CI pipeline grows from six minutes to forty-seven because every commit rebuilds all twenty-three packages and nobody set up a task graph. The repository structure you chose at the second service determines your cross-service change atomicity model, your CI pipeline blast radius, and your dependency version alignment policy — none of which were visible when the first service was created and the choice felt obvious.
A 16-person team building a B2B developer-tools product had structured their codebase as a polyrepo since the company's founding: 11 service repositories (API gateway, authentication service, billing service, notification service, data pipeline, five domain microservices, a shared TypeScript type library, and a React frontend), each with its own GitHub repository, its own CI pipeline, and its own deployment configuration. The structure had worked cleanly for the first eighteen months. Services were developed mostly independently, the type library published to a private npm registry with semver versioning, and cross-service changes were infrequent enough that coordinating two or three pull requests at a time was acceptable overhead.
In month nineteen, the team decided to rename a core domain identifier. The type UserId had been the primary identifier since the product's first day, but an enterprise customer requested multi-tenant support, and the identifier needed to become AccountId to reflect that a single account could have multiple users. The type was defined in the shared TypeScript type library and imported by all 11 repositories. Every service had its own copy of the lockfile pinned to the published version of the type library. The rename was a breaking change: the old type name was referenced in API request and response shapes, in database column naming conventions, in log field names, and in the frontend's state management types. A service running the old type library version alongside a service running the new version would produce type mismatches at the API boundary that caused runtime deserialization failures.
The engineering lead began planning the migration. The type library PR could be merged and published first, creating version 2.0.0 with the renamed type. Then the API gateway could be updated to version 2.0.0 of the type library. But the authentication service also needed updating before the API gateway could route requests to it using the new type — which meant the authentication service PR had to merge before the API gateway PR. The billing service depended on the authentication service's UserId field in session tokens — so the billing service PR had to merge after the authentication service PR but before the API gateway started forwarding requests under the new type. The notification service and data pipeline were consumers of the billing service's events, so their PRs had to follow. The five domain microservices were all stateless and could merge in any order after the gateway, but they had to all be at the new version before the gateway started routing traffic through them. The frontend had to go last, after all backend services had been updated, because it rendered UserId values from API responses that would now be AccountId.
The dependency ordering produced a chain of 11 PRs that could not be parallelized: each PR required the previous one to be merged and deployed before its own merge was safe. The engineering team ran the migration over three weeks, with one engineer dedicated to coordinating reviews, resolving rebase conflicts as main branches diverged during the migration window, and monitoring for runtime errors after each deploy. Two of the PRs had to be reverted and re-applied because a service was deployed in isolation before its dependency had finished deploying, producing a five-minute window where the API returned deserialization errors on requests that hit the partially-migrated service mesh. The total engineering cost of the rename was estimated at six person-weeks, including the coordination overhead, the reverts, and the post-migration cleanup of temporary compatibility shims that had been added to absorb the intermediate state. In a monorepo, the same rename would have been a single refactoring pass — a one-line change in the type definition, a global find-and-replace across all packages, a single CI run confirming no type errors, and a single PR. See the structural pattern described in decisions never written down: the polyrepo structure had been set at the company's founding, without a decision record, and the team had no documentation explaining why it had been chosen, what the expected cross-service change frequency was, or under what conditions the structure should be reconsidered.
The second incident was a CI performance collapse. A 22-person team had built a TypeScript monorepo using Yarn workspaces to share a component library, a shared utility package, and a shared type package across five product applications. When the monorepo contained four packages, the CI pipeline ran in six minutes: it installed dependencies once at the root, then ran yarn workspaces foreach run build and yarn workspaces foreach run test in sequence. Each package's build and test ran serially. This was acceptable. Over the following eighteen months, the team added 19 more packages: new product applications, additional shared libraries, a design system package, a CLI package, and several integration test packages. The monorepo grew to 23 packages.
The CI pipeline was never updated. It still ran yarn workspaces foreach run build and yarn workspaces foreach run test across all 23 packages on every commit — including commits that touched a single file in a single package. At 23 packages, the full build took 47 minutes. A pull request to fix a typo in the CLI's README file triggered a 47-minute CI run that rebuilt all 23 packages, including the five product applications with their full webpack bundles, even though no application code had changed. Engineers began stacking multiple unrelated changes into single commits to reduce the number of CI runs per day. Feature branches diverged from main for two to three days as engineers waited for CI slots. The team attributed the slowness to inadequate CI compute and spent $800 per month on additional GitHub Actions runner capacity. The 47-minute pipeline became a 44-minute pipeline. The root cause — the absence of a task graph and affected-package detection — was never identified, because the CI configuration had never been the subject of an architectural decision record. An engineer who joined the team six months before the incident had proposed using Turborepo to add a task graph; the proposal was deprioritized because the CI slowness had been attributed to compute, not architecture, and adding Turborepo "seemed like a big change." Three months after the compute upgrade, the team added Turborepo with a properly declared task graph. The CI pipeline went from 44 minutes (full rebuild, extra compute) to 8 minutes on a hot cache (only packages transitively affected by the changed files are rebuilt). The $800 per month in additional compute was cancelled. The decision to use Yarn workspaces without a task graph had been made implicitly — the team added packages one at a time, and the CI configuration was copied from the initial four-package setup without modification. The pattern is the same one described in the new CTO onboarding problem: the CI configuration that worked at four packages was assumed to still be correct at twenty-three, because no one had documented what the configuration was designed to support or what the re-evaluation threshold was.
The three structural properties that repository structure determines
When teams choose between a monorepo and a polyrepo, the evaluation is rarely a formal architecture decision. Most teams create their first repository, then create a second repository for a new service, and find themselves in a de facto polyrepo. Or they follow a tutorial that uses a workspace setup, add a shared package, and find themselves in a de facto monorepo. The structural properties that the choice sets — cross-service change atomicity, CI pipeline blast radius, and dependency version alignment — are not immediately visible when the codebase contains two or three services and changes are small. They become visible when the codebase has grown to the point where cross-service changes are frequent, CI runs are slow, or dependency versions have drifted.
Cross-service change atomicity. In a monorepo, a change that spans multiple packages is a single commit. The commit touches files in package A, package B, and package C simultaneously. The CI pipeline runs against the unified commit. The PR contains all changes. Reviewers see the full scope of the change in one place. The merge either lands all of it or none of it. Atomic commits eliminate the intermediate state problem: there is no point in time when package A has the new interface and packages B and C still have the old one, because all three are updated in the same commit.
In a polyrepo, a change that spans multiple repositories requires a PR in each repository. The PRs are independent commits with independent review and merge timelines. If the cross-service change is backwards-compatible — if the new interface can coexist with the old one, if old consumers continue to work against the new provider — then the PR sequence can be merged in any order. A deprecation-and-replacement pattern that keeps the old type alias valid while introducing the new one is backwards-compatible; the 11-service migration could have been executed as parallel PRs if the type library had introduced AccountId as an alias of UserId for a transition period. But backwards-compatibility requires explicit planning — the interface must be designed to allow the transition, which is not always possible for breaking schema changes, database column renames, or changes to serialized API response fields that are consumed directly by client applications. When the change cannot be made backwards-compatible, the polyrepo requires a coordinated merge sequence, and the coordination overhead scales with the number of consumers. The API versioning strategy context is directly relevant here: the versioning policy for internal API contracts between services determines whether cross-service changes can be executed with forwards and backwards compatibility, and teams that haven't documented the policy discover it by accident when a breaking change forces a coordinated migration.
The cross-service change atomicity property is most consequential for teams whose services share rich domain types — not just simple data transfer objects, but domain entities, event schemas, and API contract types that appear in both the producer and all consumers. Teams whose services communicate only via external APIs (HTTP JSON with no shared type library) have a different atomicity profile: the API contract is the boundary, and either side can change independently as long as the contract is honored. The repository structure decision record must document the expected cross-service change frequency and the interface coupling model, because these are the variables that determine whether polyrepo coordination overhead is acceptable or structural.
CI pipeline blast radius. In a polyrepo, the CI pipeline for repository A is completely isolated from repository B. A commit to repository A triggers only repository A's pipeline. The blast radius of any individual commit is bounded by the repository boundary. A bug in a CI configuration in one repository cannot cause another repository's pipeline to fail. Individual teams can own and optimize their repository's CI pipeline without coordination. The tradeoff is that cross-repository changes require separate CI runs in each repository — verifying that a change to the shared type library does not break any consumer requires running every consumer's test suite, which requires separate trigger mechanisms (webhook-based downstream builds, or manual triggering of downstream pipelines) rather than a single unified CI run.
In a monorepo without affected-package detection, the CI blast radius is the entire repository. Every commit triggers a rebuild and retest of every package. This is the configuration that produced the 47-minute pipeline in the second incident. The blast radius can be reduced to zero for unchanged packages by adding a task graph and affected-package detection. A properly configured task graph declares: "package B depends on package A." When a commit changes files only in package A, the task graph is traversed to find all packages that transitively import from package A — those packages are rebuilt and retested. Package C, which does not depend on package A, is not rebuilt. The CI run's duration is proportional to the blast radius of the specific change, not to the total size of the monorepo.
Remote caching extends this further. Build and test outputs are stored in a content-addressed cache keyed to the hash of the package's inputs (source files, dependency versions, build configuration). If a package's inputs are identical to a previous run, the cached output is restored without recomputation. A CI run that touches one package in a 50-package monorepo and has a warm cache from a recent run of the unchanged packages completes in under three minutes — only the affected package runs fresh computation. The CI/CD pipeline decision record is where the task graph configuration, the cache storage backend (Turborepo's Vercel remote cache, Nx Cloud, an S3 bucket for Bazel's remote cache), and the cache invalidation policy should be documented. Without documentation, the task graph configuration is treated as a Turborepo or Nx implementation detail rather than an architectural decision, and the engineer who wants to add a new package dependency relationship may not know to update the task graph, producing silent blast-radius miscalculations where a change to package A should affect package B but the CI pipeline does not rebuild it because the dependency edge is missing.
Dependency version alignment. In a monorepo with a shared lockfile (npm workspaces, Yarn workspaces, or pnpm workspaces at the repository root), external dependencies are resolved to a single version shared across all packages. When package A declares react: "^18.0.0" and package B declares react: "^19.0.0", the package manager resolves a single version of React for the entire workspace — whichever version satisfies all declared ranges. This forces packages to share a common dependency baseline, which has two consequences. First, it eliminates version drift: the state where a shared UI component library was compiled against React 18 but is consumed by an application bundle running React 19, producing hydration mismatches and duplicate React instances in the bundle. In a shared lockfile, the version is the same for every package. Second, it requires coordinated upgrades: when React releases a breaking major version, all packages in the monorepo must be updated simultaneously. A PR that upgrades React in package A must also update every other package that depends on React, or the workspace will resolve to the lowest common compatible version. For a 23-package monorepo where all packages depend on React, a React major version upgrade is a large, coordinated PR that may require updating dozens of component APIs across the codebase.
In a polyrepo, each repository manages its own lockfile and dependency versions independently. A new service created six months after the others can depend on a newer major version of a shared library without affecting existing services. A team that owns service D can upgrade TypeScript from 5.0 to 5.5 without waiting for the team that owns service A to complete their upgrade. The tradeoff is version drift: over time, the version spread across services widens. The shared type library that all services depend on must be compatible with TypeScript 5.0 through 5.5 to support all services, which means the library cannot use type features introduced in 5.1, 5.2, 5.3, or 5.4. As the version spread widens, the shared library's feature set is constrained by the lowest-version consumer. Teams that attempt to move the type library forward find themselves maintaining multiple major versions of the library in parallel, backporting fixes to each version in use across the polyrepo. The startup decision log first year pattern applies: the version alignment policy that is invisible at two services (both on the same version) becomes a maintenance burden at eight services (spread across three major versions), and the architectural consequence of the repository structure choice is not felt until the spread has already compounded.
Repository structure options and their structural properties
Monorepo with package manager workspaces only (no task graph) is the default when teams adopt a monorepo incrementally — they add a shared package, configure Yarn or npm workspaces, and proceed without adding build tooling. The workspace feature handles dependency hoisting and cross-package symlinks; CI runs the full build and test suite for every package on every commit. This configuration is correct for small monorepos (under five packages, under ten minutes CI) where the simplicity of no additional tooling outweighs the efficiency cost of occasional full rebuilds. It becomes incorrect when the repository grows beyond that threshold, and the correct response is to add a task graph at that point — not to add more CI compute. The absence of a re-evaluation trigger in the CI configuration is the failure mode: teams that never document "re-evaluate this configuration when CI exceeds 15 minutes" run full-rebuild pipelines on 25-package monorepos until an engineer proposes Turborepo and the proposal is approved.
Turborepo adds a task graph (declared in turbo.json), parallel task execution, affected detection via turbo --filter, and remote caching via Vercel's hosted cache or a self-hosted backend. The task graph maps each package's build, test, lint, and typecheck tasks to their inputs and outputs, declaring which tasks in which packages must complete before dependent packages' tasks can begin. Turborepo then executes the task graph in parallel, maximizing CPU utilization across the available cores. Remote caching stores task output artifacts keyed to input hash: if the inputs haven't changed since the last run, the cached output is restored in milliseconds. Turborepo's configuration overhead is low: a turbo.json with ten lines of pipeline configuration and per-package package.json task declarations is sufficient for most monorepos. The build-versus-buy framing applies here — Turborepo is the buy option: the task graph and caching infrastructure are maintained by Vercel, and the team configures rather than builds the tooling. The tradeoff is vendor alignment: remote caching hosted by Vercel is a dependency on Vercel's infrastructure, and the cache is scoped to a Vercel team. Teams that want to self-host the cache or use an S3 backend can configure a custom remote cache endpoint, but this requires additional setup and removes the zero-configuration onboarding path. Turborepo is language-focused on JavaScript and TypeScript — its task graph understands npm/Yarn/pnpm package relationships natively, and its generators and scaffolding tools are oriented toward the Node.js ecosystem.
Nx is more opinionated than Turborepo and supports a wider range of languages and frameworks. Nx provides a project graph (the equivalent of Turborepo's task graph, but extended to understand library imports at the AST level, not just package.json dependency declarations), distributed task execution across multiple CI machines, affected-package detection via nx affected, and an extensive plugin ecosystem for React, Angular, Next.js, Nest.js, Go, Java, and .NET. Nx's project graph can infer package dependencies from TypeScript imports, which means a new package that imports from an existing package is automatically added to the graph without a manual configuration update — Turborepo requires the developer to explicitly declare the dependency in the turbo.json inputs. The tradeoff is setup complexity: Nx's configuration surface is larger than Turborepo's, the plugin model requires understanding which plugins are active and what each injects into the task graph, and the Nx Cloud distributed task execution requires a separate configuration and subscription. The infrastructure as code framing applies: the Nx configuration — the project graph, the affected detection configuration, the distributed execution settings — is infrastructure that should be version-controlled, reviewed in PRs, and documented in the ADR, not treated as configuration files that engineers modify experimentally without understanding the blast radius of a change to the task graph.
Bazel is the correct choice at large scale (hundreds of engineers, millions of lines of code, strict hermeticity requirements) or when the codebase is genuinely polyglot and the team requires reproducible, hermetic builds for compliance reasons (SBOM generation, deterministic artifact hashing, reproducibility audits). Bazel's BUILD files declare every build rule — every library, binary, and test — with explicit input and output declarations. Hermeticity means that two engineers running the same BUILD rule on the same inputs will produce bit-identical outputs, regardless of their local environments. The Bazel remote cache stores these outputs and reuses them across machines and CI runs. The tradeoff is very high setup cost: every package must have a BUILD file, every dependency must be declared in the Bazel dependency graph, and the Starlark rule language has a steep learning curve. Teams adopting Bazel from a standard package manager workflow typically spend two to four engineer-months on the initial migration and ongoing configuration maintenance. Bazel is rarely the correct default for teams under 50 engineers. It becomes the correct choice when the team's needs — hermetic builds, cross-language task graph, reproducibility guarantees — cannot be met by Turborepo or Nx, and when the engineering capacity to own the Bazel configuration exists. The performance optimization framing applies: Bazel solves a real performance and correctness problem at scale, but adopting it to solve a CI performance problem that Turborepo could resolve at a fraction of the setup cost is overengineering.
Polyrepo is the correct default when services are genuinely independent — when they have different teams, different release cadences, different technology stacks, or different organizational ownership boundaries that are intentionally maintained. A company where the mobile team owns the iOS and Android repositories, the backend team owns the API repository, and the data team owns the pipeline repository may have no shared domain types and no reason for cross-repository atomic changes. The organizational boundary is the right boundary for the repository. Polyrepo becomes costly when teams that share rich domain types or shared libraries create a high volume of cross-service changes. The re-evaluation trigger is the frequency of coordinated multi-repository PRs: when a team is regularly coordinating two or more simultaneous PRs across repositories for single logical changes, the coordination overhead has likely exceeded the cost of migrating to a monorepo. Tracking this metric requires documentation — the ADR's re-evaluation triggers. Without the triggers, the team cannot objectively evaluate whether the coordination overhead has reached the threshold; each cross-service migration feels like a one-time event rather than evidence of a structural pattern.
AI chat session types and what each one misses
The repository structure decision follows a predictable pattern of AI chat sessions. The WhyChose extractor surfaces these sessions from chat export files, and the structural decisions they omit are consistent across the decision records reviewed. The choice is typically made in the second or third service creation session — when the team has outgrown a single repository — and is not revisited until a painful cross-service change or a CI performance collapse reveals the structural consequence.
The initial repository setup session covers: how to structure a TypeScript project, how to configure a package.json, how to set up a basic CI pipeline, and how to run tests. The session ends when the first service builds and tests pass. What the session does not cover: whether the codebase is expected to contain multiple services that share types, whether the CI pipeline is designed to scale when more services are added, what the expected frequency of cross-service changes is, or what the re-evaluation threshold for the repository structure should be. The initial session has no reason to surface these questions because the codebase contains one service — the consequences of the structure choice are not yet visible. The decision record written at this point would appear speculative. But the session is also the last time the structure can be chosen rather than inherited, because by the third or fourth service, the polyrepo is established by convention and migration to a monorepo requires importing three separate commit histories.
The "we need a shared library" session is the critical decision point that is almost never documented. The team has two or three services and realizes they are copying the same utility functions or domain types across them. The AI session covers: how to extract the shared code into a separate package, how to publish it, how to consume it in the existing services. The session ends when the shared package is live and both services import from it. What the session misses: whether the shared package will be published to an npm registry (which requires versioning, release management, and a semver policy) or used as a workspace package reference (which implies a monorepo and requires moving both services into the same repository), and what the implications of each choice are for future cross-service changes. The "just publish it to npm" path commits the team to the polyrepo coordination model for all future changes to the shared package. The "move everything into a workspace" path is the monorepo migration — simpler to execute now, before the repository count grows. Neither path is wrong, but neither is documented. The next time a cross-service change is painful, the team has no record of why the structure was chosen or what conditions would justify changing it.
The CI performance debugging session covers: a specific symptom — CI takes too long — and a diagnosis of the immediate cause: slow tests, insufficient parallelism, large build artifacts. The session typically ends with a recommendation to parallelize test runners, add more CI compute, or cache npm install. What the session misses: whether the slowness is caused by running all packages on every commit (a task graph problem) versus the packages themselves being slow to build and test (an individual package optimization problem). The distinction matters because they require different fixes — the task graph problem requires Turborepo or Nx configuration, which is an architectural change; the individual package problem requires optimizing the specific slow packages, which is a targeted engineering effort. Without this distinction, the team applies compute-scaling fixes to a task-graph problem and continues paying for compute that does not address the root cause. The session also misses the question of remote caching: whether the CI environment is configured to restore cached task outputs across runs, and whether the cache key is correct (a cache key that changes on every commit, such as one that includes the commit hash, produces a zero-hit-rate cache that wastes the storage and network overhead without providing any acceleration). The observability framing applies: CI pipeline duration is a metric that should be tracked across weeks, not just noted when it becomes a complaint. A pipeline that grows from 6 minutes to 47 minutes over 18 months is invisible at each incremental step but obvious in retrospect. A weekly report of CI p50 and p95 duration would have made the growth visible at the 15-minute threshold rather than at the 47-minute crisis point.
The cross-service refactor session covers: how to rename a type, how to update an API, how to coordinate a breaking change across multiple services. The session typically produces a plan: create a PR in repository A, then repository B, then repository C, in the order determined by the dependency graph. What the session misses: whether the intermediate states in the PR sequence are valid, and what the recovery procedure is if an intermediate state produces a production error. In the 11-service migration described in this post, two PRs had to be reverted because of a timing window where a service was deployed before its dependency had finished deploying. The session that planned the migration did not model deployment timing — it assumed that PRs would merge and deploy instantaneously, which was not true in a system where each repository's CI pipeline took 12 minutes. The deployment ordering constraint, and the procedure for handling a failed deploy in the middle of a multi-repository migration, are not questions that arise in a session about "how do I rename a TypeScript type." They are architectural questions that belong in a decision record, not an AI session. The test strategy framing applies: the integration tests that would have caught the intermediate-state deserialization failure — a test that runs consumer service B against a deployed version of provider service A using the new type, before B's own PR merges — require a cross-repository test infrastructure that was never built because the cross-service change protocol was never documented.
The consistent pattern across all four session types is the one described in decisions never written down: the session closes when the immediate task is working, not when the structural implications of the approach have been made explicit. The first service builds, the shared library publishes, the CI parallelism improves, the cross-service rename completes — and the session is over. The task graph configuration, the version alignment policy, the deployment ordering constraints, and the cross-service change protocol are invisible at the moment the success criterion is met. They become visible only when a CI performance collapse or a multi-repository coordination failure reveals the gap.
Five ADR sections for repository structure selection
A repository structure ADR that prevents the coordination failures and CI performance collapses described in this post covers five sections that teams consistently skip.
First, the repository structure choice with alternatives, rejection reasons, and re-evaluation triggers. The ADR records whether the codebase uses a monorepo or a polyrepo, which alternatives were evaluated, the rejection reasons for each, and the specific conditions under which the choice should be reconsidered. "Monorepo with Turborepo task graph chosen over polyrepo because: the codebase contains three services (API, frontend, shared domain type library) that share rich domain types; cross-service type changes are expected to occur monthly as the domain model evolves; the team is small enough (8 engineers) that a shared commit history and unified CI pipeline reduces context-switching overhead; polyrepo evaluated and rejected because the shared domain type library would require semver release coordination on every type change, and the expected change frequency makes that overhead prohibitive. Re-evaluate the repository structure when: the repository contains more than 40 packages and CI p95 duration exceeds 20 minutes despite a properly configured task graph (indicating that the affected blast radius has grown beyond what a task graph can efficiently bound, and per-team repository isolation may be the correct solution); when two or more distinct teams own services with genuinely independent release cadences and no shared domain types (at which point splitting those services into separate repositories reduces CI noise without coordination cost); when the codebase requires hermetic reproducible builds for compliance reasons (SBOM generation, security audit) that Turborepo cannot provide without migrating to Bazel)." The rejection reasons prevent future re-evaluations from starting at zero. The re-evaluation triggers make the switch criteria explicit rather than driven by whoever advocates loudest in a planning meeting where the pain of the current structure is highest.
Second, the affected-package detection and CI task graph model. The ADR documents how the CI pipeline determines which packages to rebuild on each commit, what the task graph declaration format is, and what the failure mode is when a dependency edge is missing. "CI pipeline uses Turborepo task graph with remote cache hosted on Vercel. Task graph declared in turbo.json at repository root; per-package package.json scripts declare the build, test, lint, and typecheck commands. Build dependencies: the API package's build depends on the shared-types package's build completing first (declared as dependsOn: ['shared-types#build'] in the API package's turbo pipeline). Test tasks: each package's tests are declared as depending on its own build completing, not on other packages' builds, to allow parallel test execution after the build graph is resolved. CI runs turbo run build test lint --filter='...[HEAD^1]' to execute only the tasks for packages changed since the last commit and their downstream dependents. If the task graph is missing a dependency edge — for example, if a new package imports from shared-types but does not declare the dependency in package.json — Turborepo will run the new package's build before shared-types is built, producing a build failure that may appear as a cryptic module resolution error. To add a new dependency between two packages: (1) add the import in the source code; (2) add the dependency to the consuming package's package.json dependencies field; (3) run pnpm install at the repo root to update the lockfile; (4) verify that turbo run build --filter=consuming-package completes without errors before pushing. Remote cache hit rate target: above 80% on CI runs for packages that have not changed in the current PR. Cache hit rate below 60% for three consecutive weeks indicates that the cache key is including non-deterministic inputs (timestamps, process IDs, or environment variables that differ between CI and local) and requires investigation." The task graph documentation is what makes the CI pipeline legible to an engineer who joins the team after the initial Turborepo configuration and wants to add a new package or understand why a particular package is being rebuilt on every commit. Without this documentation, the task graph is a configuration file whose behavior is opaque to anyone who did not write it. The ADR Consequences section should state the specific CI duration ceiling and blast radius the configuration is designed to maintain.
Third, the dependency version alignment and upgrade coordination policy. The ADR documents whether the monorepo enforces a single version of each external dependency, the procedure for upgrading a dependency with breaking changes, and the exception handling when two packages cannot share the same version. "The repository uses pnpm workspaces with a single lockfile at the root. Each external dependency is resolved to a single version shared across all packages, enforced by pnpm's hoisting policy. Packages must not declare version ranges that exclude the shared resolved version — if package A declares react: '^18.0.0' and package B declares react: '^19.0.0', the workspace will resolve to the highest compatible version or throw a resolution conflict; this must be resolved by aligning the version range across all packages before merging. Upgrading a major version of a shared dependency: open a single PR that updates the dependency version in all packages simultaneously; run the full build and test suite on the PR; do not merge partial upgrades where some packages are at the new major version and others are at the old major version — the intermediate state may compile but will produce runtime incompatibilities if the packages interact. Exception: if two packages genuinely cannot share the same version of a dependency (for example, a legacy package that requires React 17 and a new package that requires React 18), use pnpm's overrides field in the root package.json to force the resolved version, and document the exception in the ADR with the specific package and version, the reason the exception is required, and the target date for resolving the incompatibility. Exceptions that persist beyond three months without a resolution plan should be escalated in the engineering planning session." The upgrade procedure prevents the partial-upgrade intermediate state that produces runtime incompatibilities. The exception handling with explicit documentation and resolution targets prevents exceptions from accumulating silently into a permanent version fragmentation that undermines the single-version guarantee. The infrastructure as code framing: the lockfile and the dependency version alignment policy are infrastructure that determines the runtime behavior of every package, and changes to the policy should go through the same PR and review process as changes to application code.
Fourth, the cross-service change coordination protocol. The ADR documents how changes that span multiple packages are planned and executed, how backwards-compatible changes are distinguished from breaking changes, and how breaking changes are sequenced to avoid invalid intermediate states. "Cross-service changes are classified at the PR planning stage as backwards-compatible or breaking. A backwards-compatible change: the new interface can coexist with the old interface during a transition period; consumers can be updated in any order after the provider is updated; no consumer experiences a runtime failure at any point during the transition. Backwards-compatible changes are executed as a sequence of independent PRs with no ordering requirement beyond the provider PR merging first. A breaking change: the new interface is incompatible with the old interface; a consumer running the new provider interface without being updated will produce a runtime error. Breaking changes require: (1) a migration plan specifying the exact PR sequence and the deployment ordering; (2) a verification step after each PR deploys that confirms the deployed service is functioning correctly before the next PR in the sequence is reviewed or merged; (3) a rollback procedure for each step, documented before the migration begins, covering the specific commands and the conditions that trigger a rollback. For breaking changes in a monorepo: if all consumers can be updated in a single PR, do so. If the change is too large for a single PR, use a feature flag to deploy the new interface without activating it, update all consumers behind the flag, then activate the flag across all services simultaneously. For a polyrepo: assess whether a backwards-compatible transition interface (deprecating the old interface while introducing the new one) can be introduced before the breaking change, reducing the coordination required. If not, coordinate the PR sequence as specified above." The deployment ordering verification — confirming that each deployed service is functioning before the next migration step — is the safeguard that the 11-service migration lacked, and its absence caused the two reverts. The rollback procedure documentation closes the gap between "we know we can roll back" (general confidence) and "here is the specific command to run and the specific condition that triggers it" (operational readiness). The CI/CD pipeline decision record should cross-reference this section: the deployment pipeline for each service must support the rollback procedure described here, and any change to the deployment pipeline that removes that capability must update this ADR.
Fifth, the internal package publishing policy. The ADR documents whether shared code is distributed via workspace package references (available only within the monorepo), via a private registry (available to polyrepo consumers and external teams), or via both, and the versioning model for internal packages. "Internal shared packages (shared-types, shared-ui, shared-utils) are distributed as workspace package references within the monorepo. Packages outside this repository cannot depend on these packages. If a package from this repository must be consumed by an external team or an external repository: (1) assess whether the consuming team should be moved into this repository (preferred for teams with high cross-service change frequency); (2) publish the package to the private npm registry at @company/package-name with independent semver versioning (not lockstep with the monorepo), with a public changelog and a semver major version bump policy for breaking changes; (3) maintain the published package in a separate directory from the workspace package, or use a build step that produces the publishable artifact from the workspace source, to avoid the workspace reference and the published package having incompatible module resolution. Internal packages that are published externally must have an explicit owner (a named engineer or team responsible for semver compatibility, changelog maintenance, and published version testing), a support policy (whether breaking changes require advance notice, how long old major versions are maintained), and a deprecation policy (how and when a published package is retired). Packages without a named owner must not be published externally. Packages that accumulate more than three external consumers should be evaluated for extraction into a separate open-source or cross-team repository rather than maintained as a side product within this monorepo." The internal package publishing policy is the boundary management document for the monorepo. Without it, the monorepo grows to accumulate all shared code regardless of whether the consumers are inside or outside the repository, and the first external consumer of a workspace package triggers a painful ad-hoc publishing setup. The policy also prevents the dual-maintenance problem where the same code exists as both a workspace package and a published package with diverging implementations. The build-versus-buy framing: the decision to maintain an internal shared package is a build decision — the team commits to owning the API surface, the versioning policy, the changelog, and the consumer support — and that commitment must be explicit before the first external consumer is onboarded.
None of these five sections are visible in the repository's file tree, the turbo.json configuration, the CI pipeline YAML, or the package.json files. They are the repository structure reasoning that every engineer who adds a new service, proposes a shared abstraction, coordinates a cross-service refactor, or inherits the codebase after the original team has left depends on to understand why the structure is what it is and what the rules are for operating within it. The WhyChose extractor surfaces the initial setup session, the shared library session, and the CI debugging session from AI chat history; the ADR is what takes the reasoning from those sessions and makes it legible to the team inheriting the decisions. The 11-service coordination failure and the 47-minute CI collapse are not caused by poor engineering in the individual sessions. They are caused by structural decisions that were not made explicit at decision time and that therefore could not be applied during the events that revealed them. The ADR is the artifact that closes that gap.
FAQs
What is the difference between a monorepo and a polyrepo for software teams?
A monorepo (monolithic repository) is a single version-control repository containing the source code for multiple services, libraries, or applications. All packages live under one root, share a commit history, and are versioned together. A polyrepo maintains each service or library in its own dedicated repository with its own commit history, CI pipeline, and release cadence. The distinction is not about code size. The meaningful difference is in how cross-service changes are coordinated, how CI pipelines are scoped, and how shared dependencies are aligned.
In a monorepo, a change spanning multiple packages is a single commit that either passes CI or it doesn't — there is no intermediate state. In a polyrepo, the same change requires a PR in each repository, creating an intermediate state where some repositories are updated and others are not. If the intermediate state is valid (backwards-compatible interface change), the PRs can be merged in any order. If it is invalid (breaking change), the PRs must land in a specific sequence, and any merge failure requires backtracking. The coordination overhead scales with the number of consumers. Teams that choose one model rarely switch, because both migrations are expensive: polyrepo to monorepo requires importing multiple commit histories; monorepo to polyrepo requires splitting histories and replacing intra-repository imports with published package references. The choice made at the second service is effectively permanent unless deliberately revisited.
When does a monorepo CI pipeline become slow and how does affected-package detection fix it?
A monorepo CI pipeline becomes slow when it treats every commit as requiring a full rebuild and retest of every package. At four packages and six minutes of CI, a full rebuild is tolerable. At 23 packages, the same pipeline takes 47 minutes and produces incentives that undermine engineering practices: engineers batch commits to reduce pipeline runs, push directly to main to skip PR-based CI, or ignore failing tests because re-running is faster than diagnosing. Each adaptation reduces the safety guarantee that CI is supposed to provide.
Affected-package detection limits CI work to packages transitively affected by the files that changed. A commit that modifies only the authentication service should not rebuild the billing service if billing does not import from authentication. The task graph — a declaration of which packages depend on which other packages — is the data structure that makes this determination. Tools like Turborepo (turbo.json pipeline), Nx (project.json dependencies), and Bazel (BUILD file deps) maintain this graph. Remote caching extends the benefit: build and test outputs are stored keyed to input hashes, so a package whose inputs have not changed since the last run restores the cached output in milliseconds rather than recomputing. A properly configured task graph with a warm remote cache can reduce a 47-minute monorepo CI run to under five minutes, because only the packages whose inputs actually changed require fresh computation. The task graph is an architectural artifact — it should be documented in the repository structure ADR, not treated as a Turborepo configuration detail.
How does monorepo dependency version alignment differ from polyrepo dependency management?
In a monorepo with a shared lockfile (npm, Yarn, or pnpm workspaces), external dependencies are resolved to a single version shared across all packages. This eliminates version drift — the state where package A uses React 18 and package D (added six months later) uses React 19, requiring two React runtimes or compatibility shims. The tradeoff is coordinated upgrades: a breaking major version change in a shared dependency must be applied to all packages simultaneously, which can produce a large, complex PR when the monorepo contains 20+ packages.
In a polyrepo, each repository manages its own lockfile and dependency versions independently. A new service can adopt a newer major version of a library without waiting for other services. The tradeoff is version drift over time: the shared type library that all services depend on must maintain compatibility with the full range of versions in use across the polyrepo, constraining the library to the lowest common denominator of available type system features. As the version spread widens, the library maintainers must support multiple major versions in parallel, backporting fixes to each. The version alignment policy — single version in a monorepo, per-service version contract in a polyrepo — must be documented in the ADR for any engineer who adds a new shared library, proposes a major dependency upgrade, or discovers that two packages cannot currently share the same version of a dependency and needs to understand the exception handling procedure.
What should a monorepo vs polyrepo ADR document that teams typically skip?
Teams typically document the repository structure choice (monorepo vs polyrepo) and the tooling (Nx, Turborepo, standard workspaces). The sections that prevent the coordination failures and CI performance collapses described in this post are: first, the affected-package detection model — whether CI uses a task graph, what the graph declaration format is, what the fallback behavior is when a dependency edge is missing, and what the remote cache hit rate target is; second, the dependency version alignment policy — whether the monorepo enforces a single version per dependency, the procedure for coordinated major-version upgrades, and the exception handling for packages that cannot share the same version (with a named owner and a resolution target date for each exception); third, the cross-service change coordination protocol — how changes are classified as backwards-compatible or breaking, how breaking changes are sequenced, what the deployment-ordering verification step is, and what the rollback procedure is for each step in a multi-repository migration; fourth, the internal package publishing policy — whether shared code is workspace-only or published to a registry, the versioning model for published packages, the owner requirement before a package can be published externally, and the deprecation policy; fifth, the re-evaluation triggers — specific measurable conditions under which the repository structure should be reconsidered, because the polyrepo that is correct at three independent services may be structurally wrong at fifteen services with shared domain types, and the monorepo without a task graph that is correct at four packages is structurally wrong at twenty-three.
None of these sections are visible in the file tree, the CI configuration, or the package.json files. They are the repository structure reasoning that every engineer who adds a service, coordinates a cross-service change, or debugs a CI performance problem needs — and that is missing when the WhyChose extractor surfaces the initial setup session and the shared library session from AI chat history without a corresponding ADR to explain why the structure is what it is and what the rules are for operating within it.