2026-06-25 · ~20 min read

The authorization model decision record: why the permission architecture you chose determines your multi-tenant data leak surface and your role delegation depth

Authorization is treated as an implementation detail of the authentication system — a few role checks added alongside the login flow, a roles table with Admin and Viewer, a middleware that checks the field. The permission model, the tenant isolation boundary, and the delegation policy are chosen during the first access control ticket and never documented. Two years later, the tenant isolation model set in that first ticket is the root cause of a cross-tenant data exposure that requires a permission schema migration across three million rows, or the RBAC structure adopted for five internal roles cannot express the custom role combinations that enterprise customers require without a second authorization system bolted onto the side with different enforcement semantics. The authorization model is an architectural decision. It should be documented before the first role check ships.

A 30-engineer B2B SaaS company had built a document collaboration product. Authorization was implemented as a middleware that checked the user's role field — Admin, Editor, or Viewer — against a permission map that specified which HTTP methods on which routes each role could access. Editors could GET and PUT documents. Viewers could only GET documents. Admins could do everything. The model had worked for the first eighteen months of the product's life, when every customer was a single organization and the concept of tenant isolation had not yet arisen.

In month twenty, the product added a workspace collaboration feature: users from different organizations could be invited into a shared workspace to co-edit a document set. To implement this, the team added a workspaces table, a workspace_members table, and a workspace_documents table. The existing permission check was: userHasPermission(userId, method, route) — it checked the user's role against the global permission map. For the workspace feature, the team added a separate check: userIsWorkspaceMember(userId, workspaceId). A user who passed either check — global Editor role, or workspace member — could access the requested document.

The tenant isolation bug was embedded in this design from the start. The permission check for documents was: "does the user have the Editor role OR is the user a member of any workspace that contains this document?" The document identifier was an auto-incrementing integer. A document with ID 4291 in the database could belong to organization A's private workspace or to organization B's private workspace — the identifier was not scoped to a tenant. A user from organization A who was a member of a shared workspace containing organization A's document #4291 satisfied the workspace membership check. If they knew or guessed the integer ID of organization B's document #4291 — or if any document ID collision existed — the membership check would return true for the wrong tenant's document, because the check did not verify that the document's owner workspace belonged to the user's organization.

The exposure was discovered six weeks into the collaboration pilot when a user reported seeing document content that was not theirs. The security team traced the report to a numeric ID prediction in a URL. The fix required three changes: adding a tenant_id column to the permission check interface (making tenantId a required parameter, not derived from the user's session); auditing every authorization call site in the codebase — 47 distinct locations across 9 services — to add the tenant parameter; and migrating 2.8 million permission grant rows in the workspace_members table to add an explicit tenant_id foreign key. The migration used an expand-contract pattern over eight weeks: add the nullable tenant_id column, backfill from the organizations table JOIN, validate backfill coverage, add NOT NULL constraint, update all call sites, remove the nullable phase. The total engineering cost was eleven person-weeks, plus a security disclosure to affected enterprise customers. The authorization model — chosen during the first access control ticket without a written decision record — had not documented that tenantId would be required as a permission check parameter when multi-tenant isolation was added. The engineers who built the workspace feature had no record of what the permission model was designed to guarantee, and added the workspace membership check without understanding that it needed to include tenant scoping.

The second incident was an enterprise onboarding failure. A 45-engineer team had built a content management product with four roles: Admin, Publisher, Editor, Commenter. The roles were sufficient for the product's initial SMB customers, where one or two Admins managed the full team. When the product moved upmarket, enterprise customers began requesting customization. A 2,000-seat healthcare organization needed a "Department Editor" role: users who could create and edit content within their department's content space but could not publish to the global site. A "Department Admin" role: users who could manage team members within their department but could not access content or users in other departments. A "Read-only Reviewer" role that could see unpublished drafts but not edit them — different from the existing Commenter who could see only published content. Three custom role requirements from one customer.

The engineering team's first response was to add these as new built-in roles, hardcoded alongside Admin, Publisher, Editor, Commenter. By the time the fourth enterprise customer was onboarded, the role list had grown to eleven. The fifth enterprise customer wanted the Department Admin role to have the ability to invite new Commenters but not new Editors. The sixth enterprise customer wanted Editors to be able to approve their own content for publication within a specific project they owned, but not across the site. The eleventh hardcoded role was "Project-Owner-Publisher" — a role invented for one customer and meaningless to any other. The team recognized the pattern after the sixth customer: enterprise customers needed to compose their own roles from permission primitives, and the hardcoded role model could not express this without an indefinite proliferation of special-case roles. The fix required building a custom roles system: a role builder UI where Admins could create named roles from a set of permission checkboxes. But the custom roles system was built as an extension of the existing hardcoded RBAC model, storing custom role assignments in a separate table with a different enforcement code path. The hardcoded roles went through the middleware permission map. The custom roles went through a separate service that evaluated the custom role's permission set against the request. Two authorization paths, inconsistent semantics, a security audit finding in the next quarter. The pattern is the same one described in decisions never written down: the RBAC model was adopted during the first access control ticket, without a decision record, without documenting what the model was designed to express or under what conditions it should be extended or replaced.

The three structural properties that the authorization model determines

When teams implement authorization for the first time, the scope is narrow: a logged-in user should be able to view their own data but not other users' data. A middleware role check achieves this in an afternoon. The structural properties that the choice sets — tenant isolation surface, permission model expressiveness ceiling, and delegation depth — are not visible when the product has one organization and five internal roles. They become visible when the product is multi-tenant, when enterprise customers require organizational hierarchies in the permission structure, or when a sharing feature needs to express permissions that the original model cannot represent.

Tenant isolation surface. In a multi-tenant SaaS application, the authorization question is never "can user U perform action A on resource R?" — it is always "can user U from tenant T perform action A on resource R that belongs to tenant T?" The tenant dimension is not an optional filter that can be added to the data query; it is a required parameter of every authorization decision. If the authorization model stores permission grants as (userId, roleId) tuples without a tenant dimension, then a permission grant is evaluated independently of which tenant the requested resource belongs to. The data layer may enforce tenant isolation through WHERE clauses — queries include AND tenant_id = ? — but the authorization layer evaluates the grant without knowing the resource's tenant. The gap between data-layer isolation and authorization-layer isolation is where cross-tenant exposure occurs: a direct authorization check that asks "does user U have permission to access resource R?" without also asking "does resource R belong to user U's tenant?" may return true for a resource in a different tenant.

The three tenant isolation models available at the authorization layer are: implicit tenant scoping, where the user's authenticated session carries a tenant ID that the authorization middleware automatically injects into every permission check; explicit tenant parameter, where every permission check function requires a tenantId argument at the call site; and resource ownership verification, where the permission check retrieves the resource's tenant_id from the database and verifies that it matches the requesting user's tenant before evaluating the permission grant. Implicit scoping is the simplest and is correct when users belong to exactly one tenant. It fails for cross-tenant features — shared workspaces, guest users, partner integrations — where a user's session may be authenticating into one tenant while accessing resources managed by another. Explicit tenant parameter is correct by construction but requires auditing every authorization call site to ensure the tenantId is always passed and never derived unsafely. Resource ownership verification is the most defensive and prevents a class of bugs where a permission grant drifts out of sync with resource ownership, but it requires a database lookup per permission check and must be designed to avoid N+1 query patterns when evaluating permissions for a list of resources. The multi-tenancy decision record establishes the tenant model — how the database partitions tenant data, how the API authenticates tenancy, and what the isolation guarantee is. The authorization model ADR must reference the multi-tenancy decision and specify how tenant context flows from the request into the permission check at every call site.

Permission model expressiveness ceiling. The permission model determines what access control requirements can be expressed without extending or replacing the model. RBAC can express: user U has role R, role R permits action A on resource type T. RBAC cannot express, without extension: user U can perform action A on resource instances R1 and R3 but not R2 of the same type (resource-instance-level grants); user U can perform action A on resource R only if resource R has attribute X (attribute-conditional access); user U can perform action A on resource R because U is a member of team T which has inherited permission to project P which contains R (hierarchical permission inheritance). Each of these requirements arises naturally in product evolution — a sharing feature produces resource-instance-level grants, a compliance requirement produces attribute-conditional access, an organizational hierarchy produces hierarchical inheritance — and each one requires either an extension to the RBAC model that produces inconsistent semantics or a migration to a more expressive model.

The expressiveness ceiling is structural because of the permission data shape. RBAC stores (userId, roleId) tuples. A resource-level grant requires either a new (userId, roleId, resourceId) tuple type (a schema migration plus new enforcement logic) or a supplementary (userId, permission, resourceId) table that is evaluated in addition to the role table (two code paths, two data stores, two sets of enforcement logic). ABAC stores policies evaluated at runtime against user, resource, and environment attributes; it can express any access control requirement that can be expressed as a policy function, but it does not store grants as data — it stores policies, and changing what a user can do requires changing the policy, not changing the grant data. ReBAC stores (subject, relation, object) relationship tuples; it can express hierarchical inheritance natively — the question "can user U perform action A on document D?" is answered by traversing the relationship graph from U — but requires a dedicated relationship store (SpiceDB, OpenFGA, or a custom graph query) and keeps the relationship graph current as organizational memberships change. The authorization model ADR must document not just what the model does express, but what it deliberately does not express, so that when a new requirement arises the team can evaluate whether it fits the existing model or requires a documented model extension. The authentication decision record addresses how users prove their identity; the authorization model ADR addresses what authenticated users are permitted to do. These are separate decisions with separate failure modes and must be documented separately — conflating them produces an authentication ADR that omits the permission expressiveness ceiling and a security review that addresses identity assurance without addressing access control boundaries.

Delegation depth. Role delegation determines who can grant permissions and how far those grants can propagate. In a product with no delegation, only product Admins can assign roles and grant permissions. This is simple and safe, but it creates an admin bottleneck at scale: every permission change requires contacting a product Admin, which becomes impractical when enterprise customers have hundreds of users with varying access requirements. In a product with single-hop delegation, an Admin can grant a Manager the ability to manage permissions within a defined scope — the Manager can assign users to a specific project, but cannot create new Managers or extend their own delegation rights. This is the most common model in early SaaS products: workspace admins who can invite users, project leads who can manage their team. In a product with multi-hop delegation, any user can grant to another user any permission they themselves hold, up to a configurable depth limit. This is the Zanzibar model used by Google Drive: a user who has editor access to a document can share that document with another user, granting editor access that the second user can in turn share further. Multi-hop delegation is correct for consumer sharing features; it requires cycle detection (if A delegates to B who delegates back to A, the traversal must not loop) and a depth limit (permissions should not propagate indefinitely through delegation chains). Scoped delegation is required for enterprise customers who need role management within their organization without product admin access: a tenant IT admin who can create and assign custom roles within their tenant, but has no visibility into or control over other tenants' role structures or the product's global admin surface.

The delegation depth is a consequence of the permission model: RBAC can express single-hop delegation by adding a "can manage users in scope S" permission to a role; it cannot express multi-hop delegation without tracking the delegation chain across grant records; ReBAC expresses multi-hop delegation natively through the relationship graph, where a sharing action creates a new relationship edge that becomes part of the graph traversed by subsequent permission checks. The delegation policy must be documented alongside the permission model, because enterprise onboarding requirements consistently reveal delegation gaps that were not anticipated at product design time. The new CTO onboarding problem manifests here as well: the incoming engineering lead who asks "why do we have eleven built-in roles?" should be able to find the authorization model ADR explaining that the delegation model does not support scoped role management, so each enterprise-specific permission combination was hardcoded as a new role rather than expressed through a delegation mechanism the customer could manage themselves.

Authorization model options and their structural properties

In-application RBAC with a roles table is the default: a roles table, a user_roles join table, a permission map keyed by role and resource type, and a middleware that checks the user's roles against the permission map before the request reaches the handler. This model is correct for products with a small, stable set of roles, no resource-instance-level grants, no organizational hierarchy, and single-tenant or simple multi-tenant use. It becomes incorrect when: resource-instance grants are needed (user X can edit document D1 but not D2 of the same type), when roles multiply beyond ten to accommodate enterprise customization, or when multi-tenant isolation requires the tenant dimension in every permission check. The failure mode is the dual-system pattern: when the in-application RBAC cannot express a new requirement, teams add a parallel permission store (a resource_grants table, a custom_roles system) with separate enforcement code, producing two authorization paths with inconsistent semantics and a surface area for bugs in the gap between them. Correct approach: document the expressiveness scope explicitly so that when the first out-of-scope requirement arrives, the response is a planned model extension or migration, not an ad-hoc parallel system.

Casbin is an open-source policy engine that supports RBAC, ABAC, and a hybrid model. The policy is stored in a database or file as rules that Casbin evaluates at permission check time. Casbin supports Go natively (used widely in the Go ecosystem) with bindings for Node.js, Python, Java, and Rust. The policy model is expressed in a Casbin Model file that specifies the request format, the policy format, the effect aggregation rule, and the matchers; changing the policy model requires no code changes, only a model file change. Casbin supports role hierarchies natively — a role can inherit from another role, and the policy evaluation traverses the hierarchy — which makes it suitable for products that need organizational hierarchy in their permission model without the full complexity of a ReBAC graph store. The tradeoff is operational ownership: Casbin is a library that runs in-process or as a sidecar, with no hosted service, no management UI, and no built-in audit trail beyond what the application implements. Policy changes require database writes to the Casbin policy table and may require cache invalidation if the evaluation result is cached. Casbin is correct for teams that want a flexible policy model with self-hosting control and can commit to owning the operational burden of policy management. The build-versus-buy framing applies: Casbin is the build option — the team owns the policy language, the policy storage, the evaluation infrastructure, and the audit trail. The buy options are managed authorization services that provide hosted policy management, audit trails, and SDKs at the cost of vendor dependency.

Open Policy Agent (OPA) is an ABAC-first policy engine with a declarative policy language (Rego) that can evaluate policies against arbitrary JSON input — user attributes, resource attributes, environment context — and return a boolean or a structured decision object. OPA is deployed as a sidecar to each service or as a centralized service, and each service sends authorization queries to OPA as JSON HTTP requests; OPA evaluates the relevant policies and returns the decision. OPA is correct for products with compliance-driven access control requirements that cannot be expressed as simple role checks: time-of-day restrictions, attribute-based constraints, data classification policies. The Rego policy language is powerful but has a learning curve: engineers accustomed to RBAC role checks find Rego's logic-programming style unintuitive, and debugging a policy that returns an unexpected result requires understanding how Rego unifies rules. OPA's bundle distribution mechanism allows policies to be compiled and distributed to OPA instances without redeployment, which is useful for compliance environments where policy updates must be audited and versioned. OPA does not natively answer "what can user U do?" in a way that produces a human-readable permission list — it answers "can user U do X?" for a specific query, which makes permission display in a UI (showing a user which actions they can perform) require constructing and evaluating a query for every possible action. The observability framing applies: OPA decision logs — the record of every authorization query and its result — are essential for security auditing and for diagnosing unexpected access denials, and the decision log format, storage backend, and retention policy should be specified in the authorization model ADR.

SpiceDB / OpenFGA are ReBAC-first authorization services derived from Google's Zanzibar paper. They store authorization data as relationship tuples — (subject, relation, object) — in a dedicated database and answer permission queries by traversing the relationship graph. SpiceDB uses a schema language to declare the types of relationships and how permissions are derived from them; a document that grants view access to all members of the team that owns it is expressed as a schema relationship chain, not as imperative code. OpenFGA is the CNCF-hosted equivalent with a JSON schema language. Both systems provide a Check API (does subject S have relation R on object O?) and an Expand API (which subjects have relation R on object O?) and a ListObjects API (which objects of type T does subject S have relation R on?). The ListObjects API is what RBAC systems cannot provide efficiently for resource-level grants across large datasets. SpiceDB and OpenFGA are correct for products where permission inheritance through organizational hierarchy is the core requirement — nested teams, inherited project permissions, cross-organizational sharing with revocable delegation chains. The tradeoff is operational complexity: both systems require a dedicated database for the relationship store, a consistency model for relationship writes versus permission checks (Zanzibar uses a zookie — a consistency token — to guarantee that a permission check observes relationships written in a specific transaction), and SDK integration across every service that makes permission checks. Remote-cached SpiceDB deployments with a PostgreSQL or CockroachDB backend are production-proven, but the team must understand the consistency model to avoid reading stale relationship data after a write. The database connection pooling framing applies: the relationship store is a database, and its connection pooling, failover, and query latency characteristics determine the p99 latency of every permission check in the application.

Managed authorization services (Auth0 Fine-Grained Authorization, Permit.io, Oso Cloud, AWS Verified Permissions with Cedar) provide hosted policy management, SDKs, audit trails, and management UIs at the cost of vendor dependency and data residency constraints. These services are correct when the team wants to avoid the operational burden of running a self-hosted policy engine, when compliance requirements mandate an auditable managed service, or when the engineering team is small and authorization infrastructure is not a core competency. The tradeoff is the data model constraint: each service has an opinionated data model for how permissions are expressed, and the team's permission requirements must fit that model. Migrating from a managed authorization service to a self-hosted solution requires exporting the full policy and relationship dataset in a format compatible with the target system, which may not be a supported export path. The startup decision log first year pattern applies: managed authorization services are often adopted during a rapid-growth period to avoid infrastructure overhead, and the vendor lock-in is not felt until the team needs to migrate to a lower-cost or more customizable solution at scale.

AI chat session types and what each one misses

The authorization model decision follows a consistent pattern of AI chat sessions. The WhyChose extractor surfaces these sessions from chat export files, and the structural decisions they omit are consistent across the decision records reviewed. The authorization model is typically chosen during the first authenticated route implementation session, is not revisited until an enterprise customer reveals a delegation gap or a security review reveals a tenant isolation gap, and is then extended ad-hoc rather than replaced through a planned migration.

The initial access control session covers: how to add authentication to an HTTP server, how to protect routes so only logged-in users can access them, and how to add an admin role that can access the admin panel. The session ends when the middleware works and the admin route is protected. What the session does not cover: whether the permission model will need to express resource-level grants (can this user access this specific document, not just the document resource type?), whether the product will be multi-tenant and if so how tenant context will flow into the authorization check, what the delegation model will be when customers need to manage their own users, or what happens when the number of roles needs to grow beyond Admin and User. These questions are not visible in the session because the product has one organization and the access control requirement is binary. The decision made in this session — use a role field on the user record, check it in middleware — is the authorization model the product will carry until a painful incident or an enterprise requirement reveals its limitations.

The "adding a sharing feature" session is the moment when the in-application RBAC model first encounters a requirement it cannot cleanly express. The team needs user X to be able to share document D with user Y, granting Y read access to D without giving Y access to all other documents of the same type. The AI session covers: how to add a document_shares table, how to check if a user has been granted access to a specific document, how to display the sharing UI. The session ends when the sharing feature works. What the session misses: the sharing feature has just added a second authorization code path — the resource-level grant check — alongside the existing role check. When a request arrives for document D, the application now checks: (1) does the user have an Editor or Admin role? OR (2) has this document been explicitly shared with the user? Two conditions, evaluated independently, each with different data stores and different enforcement code. When a bug is introduced in either path — a missing tenant check in the sharing path, or a role check that does not account for shared documents being visible to non-editors — the authorization behavior diverges from the intended model in ways that may not be caught by tests that exercise only one path at a time. The dual-path pattern is the seed of the inconsistent enforcement semantics finding that shows up in the next security audit. The threat model framing applies: the authorization model is the primary control for data confidentiality, and a dual-enforcement-path model has a higher threat surface than a unified model, because the gap between paths is where unauthorized access most commonly occurs.

The enterprise onboarding session covers: how to add a custom roles feature, how to let customers create their own roles with configurable permissions, how to build the role management UI. The session ends when the custom roles feature ships for the first enterprise customer. What the session misses: the custom roles feature has been built as a third authorization code path — beside the original role middleware and the document sharing grants table. Custom role assignments are stored in a custom_role_assignments table and checked in a custom roles service that is called after the main middleware. The permission evaluation order — which path takes precedence when a user has conflicting grants from different paths — is not specified, and the evaluation order has been determined by the order of conditional checks in the route handler rather than by a documented policy. An engineer who adds a new resource type two months later implements authorization checks in the order they encountered them: role check, then sharing check, missing the custom roles check entirely, because the custom roles service is a separate call not integrated into the main permission check flow. The API versioning framing applies: the permission check interface — checkPermission(userId, action, resourceId) — is an internal API contract, and adding a third authorization code path without versioning the interface produces consumers that call the old interface signature and miss the new authorization path.

The security audit response session covers: a specific finding from an external security audit — "the application has inconsistent authorization enforcement across different permission paths" — and a plan to remediate it: unify the authorization paths into a single permission check function, add tenant context to all permission checks, add tests for cross-tenant access attempts. The session ends with a remediation plan. What the session misses: the remediation plan addresses the symptoms of the authorization model not having been formally decided — multiple enforcement paths, missing tenant context, no documented permission check interface contract — without recording why the model evolved as it did or what the unified model is designed to guarantee. Six months after the audit remediation, a new engineer adds a feature that requires a new kind of permission — attribute-conditional access based on document classification — and, finding no ADR describing the authorization model's expressiveness scope, adds a fourth permission path for the new requirement. The cycle continues. The decisions never written down pattern is complete: the authorization model exists as a set of enforcement code paths whose design rationale is distributed across chat sessions, GitHub PR descriptions, and the institutional memory of the engineers who built each feature, none of it accessible to the engineer who inherits the system.

Five ADR sections for authorization model selection

An authorization model ADR that prevents the multi-tenant exposure and enterprise onboarding failures described in this post covers five sections that teams consistently omit.

First, the permission model selection with alternatives, expressiveness scope, and re-evaluation triggers. The ADR documents the selected permission model — RBAC, ABAC, ReBAC, or a hybrid — the alternatives evaluated, the rejection reasons, and the explicit expressiveness scope. "Role-based access control with a roles table and a permission map was selected over attribute-based access control (OPA Rego) and relationship-based access control (SpiceDB). ABAC rejected: the current access control requirements are expressible as role checks without attribute conditions; OPA's Rego policy language introduces operational complexity not justified by the current requirements; re-evaluate when compliance requirements mandate attribute-conditional access based on data classification, user jurisdiction, or time-based restrictions. SpiceDB rejected: the product currently has no hierarchical team structures or inherited permission chains; the relationship store and consistency model introduce operational overhead not justified by the current product structure; re-evaluate when the sharing feature requires revocable multi-hop delegation or when organizational hierarchy inheritance is required for the team management feature. Expressiveness scope of the selected model: the RBAC model can express role assignments at the user level, role-level permission grants at the resource type level, and explicit resource-instance grants via the resource_grants table. It cannot express: permission inheritance through organizational membership hierarchies (nested teams where a team's permission inherits to all members and sub-teams); attribute-conditional access where the permission depends on a resource attribute other than its type; or time-limited permission grants with automatic expiry. Requirements that fall outside the expressiveness scope must be escalated before implementation so the team can evaluate whether to extend the model or migrate to a more expressive one. Re-evaluate the permission model when: a sharing feature requires granting a non-Admin user the ability to extend their own permissions to additional users (multi-hop delegation); when enterprise customers with organizational hierarchies require inherited permissions across nested teams; when three or more resource types require attribute-conditional access." The expressiveness scope — what the model does not do — is the section that prevents the ad-hoc parallel authorization system that grows from each out-of-scope requirement.

Second, the tenant isolation model and cross-tenant access policy. The ADR documents how tenant context flows into every permission check, the default cross-tenant access policy, and the explicit grant model for cross-tenant features. "Tenant context is an explicit required parameter at every permission check call site. The permission check function signature is checkPermission(userId: string, action: Action, resourceId: string, tenantId: string). The tenantId parameter is never optional and is never derived from a global or thread-local context — it must be passed explicitly at every call site. Derive tenantId from the authenticated user's session token, which includes the tenant claim set at authentication time. For resources that belong to a specific tenant, the checkPermission call also verifies resource ownership: the resource record's tenant_id column is fetched and compared to the tenantId parameter before the role check is evaluated. If resource.tenant_id does not match tenantId, the permission check returns false regardless of the user's role. Cross-tenant access default: all cross-tenant access is denied unless an explicit cross-tenant grant record exists in the cross_tenant_grants table. A cross-tenant grant specifies: (grantor_tenant_id, grantee_user_id, resource_id, permission_type, expires_at, created_by_admin_id). Cross-tenant grants can only be created by product Admins, not by tenant-level Admins. When a request arrives with a cross-tenant access pattern — the authenticated user's tenant differs from the requested resource's tenant — the permission check first queries the cross_tenant_grants table for a valid unexpired grant before evaluating the role check. Multi-tenant membership: users who belong to multiple tenants authenticate into one tenant at a time via an explicit tenant selection step at login; the session token carries the active tenant ID; switching tenants requires re-authentication into the target tenant." The explicit parameter requirement at every call site is the enforcement mechanism that prevents the class of bugs where a new permission check function is added without the tenant parameter because the engineer did not know it was required. The database migration framing applies: adding the tenantId parameter to an existing permission check interface across forty-seven call sites is a schema migration equivalent in scope and requires the same expand-contract discipline — add the nullable parameter, backfill, add NOT NULL, update call sites, validate, remove the nullable phase.

Third, the delegation policy and role management scope. The ADR documents who can create and assign roles, the delegation depth, and the scope constraints for enterprise role management. "Delegation model in v1: single-hop, Admin-restricted. Only product Admins (role_id = admin, permission_scope = global) can create roles, assign roles to users, and revoke role assignments. Tenant Admins (role_id = tenant_admin, permission_scope = tenant_id) can manage users within their tenant — invite new users, assign them to roles that exist within the allowed role set for their tenant — but cannot create new roles or grant delegation rights to other users. Tenant Admins cannot assign roles that exceed their own permission level: a Tenant Admin with Editor access cannot grant an Editor role to a user if the Tenant Admin does not themselves hold the Editor role for the same resource scope. Delegation depth limit: one. A Tenant Admin can assign the Viewer or Editor role to a user within their tenant. That user cannot in turn assign the role to additional users — only Admins and Tenant Admins have grant authority. Custom roles in v2 (enterprise): a Tenant Admin can compose a custom role from a defined set of permission primitives (documented in the role_permission_primitives table, managed by product Admins). Custom role creation is scoped to the tenant: a custom role created by Tenant A's Admin is not visible to Tenant B. A Tenant Admin can assign a custom role only to users within their own tenant. Custom role assignments are enforced through the same checkPermission function as built-in roles — no separate code path. The delegation scope constraint — Tenant Admins cannot create roles or assign roles outside their tenant — is enforced at the permission check layer, not at the API input validation layer: the permission check for 'create_role' and 'assign_role' actions verifies that the requesting Tenant Admin's tenant_id matches the target resource's tenant_id, regardless of the API parameters passed." The no-separate-code-path requirement for custom roles is the constraint that prevents the dual-enforcement-path pattern described in the incidents above. Enforcing it requires that the role management API uses the same checkPermission function as every other resource, with custom roles stored in the same permission data structures as built-in roles.

Fourth, the permission check interface contract. The ADR documents the function signature, the required parameters, the call site policy, and the bypass conditions. "The permission check interface is the single point of authorization enforcement in the application. All authorization decisions go through checkPermission(ctx: RequestContext, action: Action, resource: AuthzResource). The RequestContext carries the authenticated userId and tenantId from the session token — these are set at authentication middleware and are not parameters that call sites can override. The AuthzResource carries the resourceId and the resource's owner tenantId (fetched from the resource record before the permission check is called). The Action is an enum of all permitted actions in the application (DocumentRead, DocumentEdit, DocumentPublish, UserInvite, RoleCreate, etc.), enforced at compile time. The function returns a boolean. Call site policy: checkPermission must be called at the beginning of every handler function that accesses, modifies, or deletes a resource, before any business logic or database queries that depend on the resource's content. checkPermission must not be called inside loops that iterate over a list of resources — use the bulk checkPermissions variant that evaluates grants for a list of resources in a single database query. Bypass conditions: none. There are no administrator override modes, no development-environment skip flags, and no internal service tokens that bypass authorization checks. If a background job needs to access resources without an authenticated user context, it uses a service account identity that has specific permission grants configured through the same permission data structures as user grants. The 403 vs 404 policy: unauthorized access to a resource returns 404 (not 403) when the resource exists but the user does not have permission, to avoid confirming resource existence to unauthorized requestors. The decision logic: if the resource does not exist, return 404; if the resource exists but checkPermission returns false, return 404; if checkPermission returns true, proceed with the handler. Every authorization denial — every case where checkPermission returns false — generates an audit event logged to the authorization_audit table with: userId, tenantId, action, resourceId, resource_tenantId, result (denied), timestamp, and denial_reason (role_insufficient / tenant_mismatch / cross_tenant_no_grant / resource_not_found). The audit log is the data source for security review and for investigating user-reported access issues." The 403-versus-404 policy is a frequently omitted section that has significant security implications: returning 403 on a resource the user cannot access confirms that the resource exists at that ID, which enables resource ID enumeration attacks against guessable sequential identifiers. The rate limiting framing applies: even with consistent 404 responses, a caller who can enumerate resource IDs at high velocity can identify which IDs exist by correlating response timing; rate limiting on resource lookups from unauthorized users reduces the enumeration risk.

Fifth, the authorization error surface and audit trail. The ADR documents the audit log schema, the retention policy, the alerting thresholds, and the permission grant lifecycle management. "Authorization audit log: every permission check that results in a denial is logged to the authorization_audit table (userId, tenantId, action, resourceId, resource_tenantId, result, denial_reason, timestamp). Every permission grant creation and revocation is logged to the permission_grant_audit table (grantor_userId, grantor_tenantId, grantee_userId, action, resourceId, grant_type, timestamp, admin_override_reason if applicable). Both tables are append-only and are retained for 24 months to support security audits and compliance investigations. Denial rate alerting: if a single user generates more than 50 authorization denials within a 10-minute window, an alert fires to the security on-call channel. This threshold catches credential stuffing attempts and misconfigured integrations without producing alert fatigue from normal 404 traffic. Permission grant lifecycle: permission grants do not expire by default (for role assignments) or carry a configurable expiry (for cross-tenant grants and shared resource grants). Expired grants are not deleted — they are marked expired in the grant record — so the audit trail shows when access was active. Role assignments that have been inactive for 90 days (the role-holder has made no authorized requests against resources covered by the role) generate a monthly report for Admins, who can decide whether the assignment should be revoked. The report is advisory, not automated — the team has decided that automated revocation has too high a risk of unexpected access loss, and that monthly human review is the correct balance at current scale. Re-evaluate automated revocation when the user base exceeds 10,000 active role assignments, at which point manual review becomes impractical." The authorization audit trail is the artifact that makes a security incident investigation tractable: without it, the question "did user U access resource R between dates X and Y?" requires parsing raw HTTP logs and correlating them with the authorization code, which is a multi-day forensic exercise. With it, the answer is a single SQL query against the authorization_audit table. The ADR Consequences section for the authorization model should state the data confidentiality guarantee the model is designed to enforce — the specific class of unauthorized access the model prevents, and the specific class it does not — so that the security team and future engineers understand what the authorization layer is and is not responsible for.

None of these five sections are visible in the roles table schema, the permission map constant, the middleware function, or the route handler permission checks. They are the authorization reasoning that every engineer who adds a new resource type, builds a sharing feature, onboards an enterprise customer, or investigates a reported access anomaly depends on to understand what the permission model guarantees and what it does not. The 3-million-row tenant_id backfill migration and the eleven hardcoded enterprise roles are not caused by poor engineering in the individual sessions. They are caused by an authorization model that was chosen without being documented — without specifying the tenant isolation model, the expressiveness scope, the delegation policy, or the bypass conditions — so that each engineer who later encountered an access control requirement out of scope for the existing model had no way to know the model's designed boundaries or the sanctioned extension mechanism. The WhyChose extractor surfaces the initial role check session, the sharing feature session, and the enterprise onboarding session from AI chat history; the authorization model ADR is what takes the reasoning from those sessions and makes it legible to the team that inherits the permission model and must operate within it, extend it, or replace it.

FAQs

What is the difference between RBAC, ABAC, and ReBAC in a SaaS authorization model?

RBAC assigns users to roles, and roles define which actions on which resource types are permitted. Simple to implement and audit. Breaks down when permissions must vary at the individual resource instance level — user X can edit document D1 but not D2 of the same type — because that requires either role explosion (a new role per resource) or a parallel resource-level grant table outside the RBAC model with separate enforcement semantics.

ABAC evaluates policies against attributes of the user, action, resource, and environment at check time. Extremely flexible — can express "users in the engineering department may edit documents tagged internal if they were created in the last 90 days" — but auditing is difficult because "what can this user do?" requires evaluating every policy against every resource type rather than looking up a static role assignment. The policy language (OPA Rego, XACML, Cedar) has a steeper learning curve than a roles table.

ReBAC derives permissions from relationships in a graph. User X can edit document D because X is a member of team T, T has editor access to project P, P contains D. Correct for products with organizational hierarchy where permissions are inherited through membership chains. Requires a dedicated relationship store (SpiceDB, OpenFGA) and a consistency model for relationship writes versus permission checks. The permission model choice is not reversible without a data migration, because each model stores permission data in a different structure. Choose based on what the product needs to express in the next 18 months, not just what it needs today.

How does multi-tenant isolation fail in a permission model that was designed for single-tenant use?

The failure pattern: the permission check asks "can user U perform action A on resource R?" without including a tenant dimension. The data layer enforces tenant isolation through WHERE tenant_id = ? query filters. But the authorization layer evaluates the permission grant — (userId, roleId) — without knowing the resource's tenant. A user from tenant A who has a permission grant on resource ID 4291 (which belongs to tenant A) may also have the permission grant evaluated as true for resource ID 4291 in tenant B, because the grant does not carry tenant context and the resource ID collides across tenants.

The fix requires one of three approaches: adding tenantId as a required explicit parameter at every permission check call site; verifying resource ownership by fetching the resource's tenant_id and comparing it to the requesting user's tenant before evaluating the role; or migrating to a ReBAC model where tenant membership is a relationship edge that must be traversed in every permission check. Each approach requires migrating existing permission data and auditing every call site. Teams that discover this class of vulnerability during a security audit rather than through a user report have the advantage of controlled migration; teams that discover it through a reported cross-tenant data access are in a security incident. The section of the authorization model ADR that specifies the tenant isolation model and the tenantId parameter requirement at every call site is what would have prevented the incident at design time rather than requiring an eleven-person-week migration to remediate.

What should an authorization model ADR document that teams typically skip?

Teams typically document the permission model chosen and the initial role set. The sections that prevent the multi-tenant exposure and enterprise onboarding failures described in this post: first, the tenant isolation model — how tenant context flows into every permission check, whether it is implicit or an explicit required parameter, and how cross-tenant access is handled; second, the expressiveness scope — what the permission model is designed to express and explicitly what it is not, so that out-of-scope requirements trigger a planned extension or migration rather than an ad-hoc parallel enforcement path; third, the delegation policy — who can grant permissions, the delegation depth limit, and the scope constraint for tenant-level role management; fourth, the permission check interface contract — the function signature, the required parameters including tenantId, the call site policy (always first, never in loops, no bypass conditions), and the 403-versus-404 policy for unauthorized resource access; fifth, the authorization audit trail — what is logged on every denial and every grant change, the retention period, the denial rate alerting threshold, and the grant lifecycle review cadence.

None of these sections are visible in the roles table schema or the middleware code. They are the authorization reasoning that every engineer who adds a resource type, builds a sharing feature, onboards an enterprise customer, or investigates a reported access anomaly depends on. Without them, each new access control requirement is implemented against the existing enforcement code without understanding its designed boundaries, and the gap between what the model guarantees and what the code actually enforces widens with each addition until a security audit or a production incident reveals it.