2026-06-27 · ~20 min read

The file storage decision record: why the object storage model you chose determines your presigned URL security surface and your CDN integration cost

S3 versus GCS versus Azure Blob is decided in the first sprint session where an engineer asks "how do I store user-uploaded files?" — and never documented as a deliberate storage architecture choice with alternatives evaluated. Presigned URL expiry and user binding determine whether download links become bearer tokens that can be forwarded outside the product. CDN signed URL configuration determines whether deleting an object makes it inaccessible within seconds or leaves it cacheable and downloadable for hours. Virus scanning integration point — pre-upload versus post-upload — determines whether the scanning layer catches novel exploits or only known signature matches. Cross-region replication and data residency configuration determine whether a regional outage triggers data loss or a failover, and whether storing a user's file in a secondary region creates a GDPR compliance gap. Each structural property is determined by configuration choices in the founding implementation session that are invisible in the final code.

A 34-person SaaS company built a document collaboration platform that stored user files in S3 and generated presigned URLs for downloads. The founding engineer set up the storage in a single afternoon following a Claude session about S3 presigned URLs in Python. The session provided working code: an S3 client configured with IAM role credentials, an upload handler that put objects under a per-user prefix, and a download handler that generated presigned URLs with a 7-day expiry. The choice of 7 days was not deliberate — the example in the session used 7 days and the engineer kept it. The implementation worked correctly. Files uploaded, files downloaded, the team launched and acquired their first 200 users.

Eight months after launch, a support ticket arrived from an enterprise customer reporting that an employee had received a document download link from a colleague at a different company — a company that was not a customer of the platform. The recipient had been able to download the document successfully. An investigation traced the URL to a specific user who had downloaded a document and forwarded the presigned URL to a contact in a Slack channel for review. The contact, who had no account, clicked the URL and downloaded the document because the 7-day presigned URL was still valid. The document contained the customer's unreleased pricing strategy. The enterprise customer's security team classified the incident as a data leakage event under their information security policy and issued a formal notice to the platform.

The investigation revealed the full scope of the architectural problem. The presigned URL model provided no user binding — the URL was a bearer token, and any HTTP client that possessed it could download the object regardless of whether they were an authenticated user of the platform. The 7-day expiry meant URLs emailed for review, pasted into documents, or shared in messaging applications remained valid for a week. There was no revocation mechanism: once a presigned URL was generated, it could not be invalidated before its expiry without changing the object's storage path — which would invalidate all existing URLs for that object, including URLs held by legitimate users who had not yet downloaded. The URLs were not logged at the application level in correlation with the user who generated them: the application server generated the URL and returned it to the client without recording which URL was issued to which user at what time, making post-incident audit impossible except by cross-referencing S3 server access logs with application-level user session logs — a manual correlation that took the security team three days to complete for a single incident.

The fix required three sprints. The presigned URL expiry was reduced to 15 minutes. A URL generation log table was added to the application database, recording the URL signature hash, the user who generated it, the object key, the generation timestamp, and the expiry timestamp. The download endpoint was changed to regenerate a fresh presigned URL on each request from the authenticated user, requiring active authentication for each download attempt rather than permitting a long-lived URL to serve as a persistent access credential. For enterprise customers with strict access control requirements, a CloudFront signed URL path was added that bound the URL to the user's current session IP address range using a CloudFront policy document — trades some download friction for network-level access control. None of this was driven by a deliberate initial decision: the 7-day expiry, the bearer-token model, and the absence of URL generation logging were all consequences of using an example value from the founding session without evaluating the access control implications of the presigned URL model for a product storing sensitive business documents.

A 22-person project management platform added file attachment support to their task cards. The founding engineer built the upload flow in a week using the AWS SDK's managed upload utility, which selects between single-put and multi-part upload automatically based on file size. For files larger than 5 MB, the SDK used multi-part upload with 5 MB parts. The implementation uploaded files to S3 through a presigned multi-part upload flow — the application server generated the presigned URLs for each part and returned them to the browser, which uploaded the parts directly to S3 without routing file bytes through the application server. The flow worked correctly in development and staging.

Three weeks after launching file attachments, the engineering team enabled CloudFront as a CDN in front of their S3 bucket to reduce download latency for users in regions distant from the S3 bucket's home region. The CloudFront distribution was configured with the S3 bucket as the origin, with a default cache behavior that cached all GET responses for 24 hours. The team added an exclusion rule for GET requests — they did not want to cache object downloads, only static assets — but did not add an exclusion rule for POST and PUT requests, because the default cache behavior did not appear to apply to write operations.

Two days after enabling CloudFront, the engineering team began receiving bug reports that large file uploads were failing with errors. Users uploading files above 5 MB saw upload progress indicators that stalled at intermediate percentages and then reported an upload failure with no specific error message. The application logs showed HTTP errors from the S3 API: NoSuchUpload errors on the UploadPart calls, indicating that S3 had no record of the UploadId that the browser was submitting. The team investigated the S3 API call logs. The CreateMultipartUpload call — which initiates a multi-part upload session and returns an UploadId — was succeeding on the first attempt for each user session. But subsequent upload attempts from different users or different browser tabs within the CloudFront cache TTL window were receiving a cached response containing the UploadId from the first initiation — a stale UploadId that corresponded to a different user's upload session that S3 had already completed or expired.

CloudFront had cached the CreateMultipartUpload response. The initiation endpoint uses an HTTP POST to the object's path with the query parameter ?uploads — the same path structure as any other S3 API call, with no response header that CloudFront's default configuration recognized as uncacheable. The CloudFront cache TTL of 24 hours meant that every upload initiation within 24 hours of the first received a cached UploadId from the first session. The UploadPart calls sent to that stale UploadId failed because the UploadId was either from a completed upload session (which S3 had cleaned up) or from a different user's in-progress session (which S3 validated against the session's IAM credentials — different from the current user's). The fix required adding explicit CloudFront cache behaviors to exclude the upload initiation path (POST /?uploads), the part upload path (PUT ?partNumber=*&uploadId=*), and the upload completion path (POST ?uploadId=*) from all caching. The exclusion configuration took four hours to identify and implement; the bug had been active for two days and had affected every large file upload during that period, all of which had silently failed. No file larger than 5 MB had been successfully uploaded since CloudFront was enabled. The team discovered this only when a customer complained that their project's attachment — a 40 MB architecture diagram — had never arrived despite the upload flow completing with a success indicator.

The four structural properties the storage decision determines

When a team chooses an object storage provider and configures the initial bucket and access model in an early sprint AI session, they are making a decision with four structural properties that create security, performance, and compliance surfaces as the product scales. Each property generates a failure mode that becomes visible only after the system is in production, under real load, with real users and real data.

Presigned URL security model and URL lifetime. Presigned URLs are cryptographic capability tokens — any bearer can use them regardless of application-level authentication status. The security surface of the presigned URL model is determined by three decisions made at adoption time: the expiry window, the user binding mechanism, and the URL generation audit trail. The expiry window determines the practical life of a forwarded URL: a 7-day presigned URL is functionally equivalent to a public URL for 7 days; a 15-minute presigned URL limits forwarding risk to the active download session but requires the application to regenerate URLs on each download request from an authenticated user, adding a round-trip to the download flow. User binding does not exist in the S3 presigned URL model natively — the URL is generated by the server, returned to the authenticated client, and from the S3 layer's perspective is accessible to any bearer. Application-level user binding requires either generating a fresh presigned URL per authenticated request (so the URL is never stored or forwarded separately from the authenticated session) or using CloudFront signed URLs with policy documents that restrict access by IP address or time window beyond the URL's stated expiry. The URL generation audit trail — a log of which URL was issued to which authenticated user, at what time, for which object — is not produced by S3 presigned URL generation by default; the application must write this record explicitly if it needs to answer post-incident questions about which user generated a URL that was subsequently forwarded externally. The audit trail is also the revocation mechanism: if the application needs to revoke access to an object before the URL expiry, the options are to change the object's storage path (invalidating all outstanding URLs for that object, including legitimate ones), to rotate the signing credentials (invalidating all outstanding presigned URLs for all objects, breaking active download sessions), or to move to application-proxied downloads (where the application server fetches the object from S3 and streams it to the client on each authenticated request, giving the application full control over access but routing all bytes through the application server). None of these options exist in the code unless the storage ADR specified the revocation requirement at adoption time. The security and compliance decision record intersects with the presigned URL model at the data classification boundary: files that contain personal data under GDPR, health data under HIPAA, or financial data under PCI-DSS require an access control model that can be audited and that can revoke access on user data deletion requests — requirements that the bearer-token presigned URL model cannot satisfy without additional application-layer controls.

CDN integration and cache invalidation behavior. A CDN in front of object storage improves download performance and reduces S3 request costs at the expense of a cache invalidation requirement: when an object is deleted from S3, it must also be invalidated in the CDN cache, or the CDN will continue serving the cached copy to clients within the cache TTL. The cache invalidation requirement is operationally significant for any product where file deletion has legal or compliance consequences: a user who requests deletion of their data under GDPR's right to erasure expects the data to be immediately inaccessible after the deletion request is honored. If the CDN continues to serve a cached copy of a deleted object for up to 24 hours, the erasure request has been honored at the storage layer but not at the delivery layer — a compliance gap that is not visible in the application logs, because the S3 delete request succeeded and the application recorded the deletion. The cache invalidation for CloudFront costs $0.005 per path per 1,000 invalidations, with the first 1,000 invalidations free per month. For products that delete user files in bulk (account deletion, project deletion, data export followed by source deletion), the invalidation cost is proportional to the number of deleted objects, and the invalidation API has a rate limit that can create a delay between the deletion request and the completion of cache invalidation for large batches. The CDN integration must also specify the behavior for presigned URL delivery: if CloudFront is the CDN and S3 presigned URLs are served directly (without using CloudFront as the signing entity), the presigned URL's IAM-credential-based signature is valid at the S3 origin but CloudFront cannot validate it — a CDN that caches presigned URL responses will serve the cached object to any bearer of the original presigned URL path, regardless of whether the presigned URL's expiry has passed, because the CDN caches the object bytes not the URL's validity window. The correct CDN integration for presigned URL delivery requires either using CloudFront signed URLs (so the CDN is the signing entity and enforces expiry) or configuring CloudFront to forward the presigned URL query parameters to the S3 origin for validation on each request (which defeats the caching benefit, since each request requires an S3 origin fetch). The CDN decision record covers the provider selection and caching architecture; the file storage ADR's CDN section should specify the interaction between the CDN caching model and the object deletion and presigned URL access control policies, since those interactions are specific to object storage and are not covered in a general CDN ADR.

Virus scanning and content policy integration point. A product that accepts user file uploads must decide where in the upload pipeline to integrate content scanning — and that decision determines what class of threat the scanning layer can detect. Pre-upload scanning (scanning before the file is written to the storage bucket) provides synchronous rejection of malicious content at the cost of scanning latency added to the upload request path. Post-upload scanning (writing to a quarantine bucket first, scanning asynchronously, promoting clean files to the production bucket) allows the upload to complete at the client's perspective without scanning latency while enabling deeper analysis than the synchronous path's time budget permits. The class of malware that pre-upload scanning misses relative to post-upload scanning is determined by the scanning depth that is feasible within the upload request's response time budget. ClamAV, the most common open-source scanning engine for self-hosted integrations, uses signature-based detection against a database of known malware hashes and byte patterns. A novel malware variant not yet in the ClamAV signature database passes pre-upload scanning with a clean result. A polyglot file — a file that is simultaneously valid in two different formats, such as a JPEG that is also a valid ZIP archive containing malicious content — passes signature-based scanning if the JPEG format is parsed first and the ZIP content is not extracted. Post-upload scanning in a quarantine-to-production pipeline can integrate with managed threat intelligence services that run behavioral analysis, sandboxed execution, and heuristic analysis beyond what signature matching provides, because the asynchronous scan does not have a user-facing request timeout constraint. The quarantine bucket model requires two additional design decisions: the quarantine-to-production transition trigger (an S3 EventBridge event when a scan result is written, which invokes a Lambda that moves the file to the production bucket and notifies the application database) and the user notification policy for quarantined files (how to inform the uploading user that their file is under review, without revealing scanning engine details that would help an attacker refine evasion). The scan result must be recorded in the application database alongside the object metadata — not only in the scanning service's own log — because the scan result is part of the provenance of the stored object and is needed to answer audit questions about which files were accepted and which were rejected, and by which scanning engine version. The observability strategy decision record intersects with the scanning pipeline: the quarantine bucket depth (files pending scan), scan processing latency distribution, and scan failure rate (scans that timed out or returned an error rather than a clean or infected result) are operational metrics that must be monitored to detect scanning pipeline failures that would allow unscanned files to accumulate in the quarantine bucket without being promoted or rejected.

Cross-region replication and data residency. Object storage is regional by default — an S3 bucket in us-east-1 stores objects in the US East region's infrastructure. Cross-region replication (S3 CRR, GCS dual-region buckets, Azure Blob geo-redundant storage) copies objects to a secondary region asynchronously, providing a replicated copy that can serve as a disaster recovery source if the primary region is unavailable. The replication configuration determines three structural properties with distinct failure modes. First, the replication time SLA: S3 CRR with Replication Time Control (S3 RTC) guarantees 99.99% of objects are replicated within 15 minutes; standard CRR without RTC provides no time guarantee — replication lag is typically minutes but can be hours for large objects or during regional events, meaning the disaster recovery copy may be up to several hours stale when the primary region becomes unavailable. The replication time SLA must match the product's Recovery Point Objective documented in the disaster recovery decision record — a product with an RPO of 1 hour and standard CRR without RTC has a documentation gap between the stated RPO and the actual replication guarantee. Second, the versioning and replication interaction: S3 CRR requires versioning to be enabled on both the source and destination buckets. Versioning means that overwriting an object creates a new version rather than replacing the previous one; deleting an object creates a delete marker rather than removing the object bytes. A product that stores user files without versioning enabled and then enables versioning to configure CRR must decide how to handle the existing objects (they are under version control going forward, but their version history starts at the enable date) and how to handle delete operations (deleting a versioned object produces a delete marker that is replicated to the destination bucket, but the object bytes remain accessible by version ID until a lifecycle rule expires them). Storage costs increase with versioning enabled because all previous versions are retained until a lifecycle rule removes them; the lifecycle rule must be configured on both the source and destination buckets, and the delete marker replication policy must specify whether delete markers are replicated (so that logically deleted objects appear deleted in the replica as well) or not (so that the replica retains objects that have been deleted from the source, providing a recovery path for accidental deletions). Third, data residency and GDPR compliance: if a user's file is stored in the primary region (us-east-1) and cross-region replicated to a secondary region (eu-west-1), the file's bytes exist in both regions simultaneously. If the user is an EU resident and the primary region is in the US, storing the replicated copy in eu-west-1 does not satisfy GDPR's data transfer requirements — the primary copy in the US region means the data has been transferred to the US, and the eu-west-1 replica is a secondary copy of US-hosted data, not the primary storage location. A product that serves EU users must either use an EU-primary bucket with US-replica replication (satisfying data residency for EU users at the cost of higher latency for US users who access from the EU-primary), or use separate regional buckets with routing logic that stores each user's files in the bucket matching their data residency requirement (adding application complexity to manage multi-bucket uploads and download routing). The residency configuration is determined at bucket creation time; migrating an existing user population from a US-primary bucket to an EU-primary bucket requires a data migration that triggers re-consent obligations under GDPR. The multi-region deployment decision record covers the application-layer regional routing; the file storage ADR's data residency section must specify the per-user residency determination logic and the bucket configuration that implements it, because the application server and the storage layer must agree on which regional bucket receives which user's files — an agreement that is distributed across the application code, the infrastructure configuration, and the data residency policy, and that must be documented in the storage ADR to remain coherent as all three components evolve independently.

Provider options and their structural properties

Amazon S3. S3 is the reference implementation for object storage — the API that GCS, Azure Blob, R2, and MinIO all provide compatibility modes for. S3's structural advantages are the depth of the AWS ecosystem integrations: S3 EventBridge event notifications for upload triggers, S3 Object Lambda for processing object content on retrieval, S3 Access Points for fine-grained per-application access policies, S3 Intelligent-Tiering for automatic cost optimization across access frequency tiers, and S3 Glacier for archival with configurable retrieval time SLAs. The S3 pricing model has two costs that are not apparent in the headline per-GB storage price: S3 request costs (PUT, COPY, POST, LIST requests at $0.005 per 1,000; GET and all other requests at $0.0004 per 1,000) and S3 data transfer costs (data transfer out to the internet at $0.09 per GB; data transfer to CloudFront at $0.00 for the same region — a significant cost differential that makes CloudFront the correct CDN choice for S3 origins rather than other CDNs). S3 request costs become material when the application generates high request volume: a product that lists object metadata frequently, generates presigned URLs per-request, or uses S3 as a trigger for Lambda functions at high event volume will accumulate request costs that exceed the storage costs. The S3 request cost model must be in the storage ADR alongside the storage pricing, because the request cost drives the architecture of the upload flow (direct browser-to-S3 upload reduces application server egress costs but generates presigned URL generation requests), the CDN caching strategy (caching GET responses in CloudFront eliminates repeated S3 GET requests for the same object), and the metadata retrieval pattern (HeadObject for existence checks generates one request per check; S3 Inventory for bulk metadata is a batch alternative that generates one manifest file per day rather than one API call per object). The infrastructure-as-code strategy decision record intersects with S3 configuration: bucket policies, CORS configuration, lifecycle rules, versioning settings, and replication configuration must all be managed in Terraform or CloudFormation — manual S3 console configuration creates drift that is invisible until an audit or an incident exposes the gap between the documented configuration and the actual bucket state.

Google Cloud Storage (GCS). GCS provides an S3-compatible API in addition to its native JSON API, making migration from S3 to GCS feasible with SDK reconfiguration rather than code rewrite for basic operations. GCS's structural differentiator is its storage class model: GCS does not charge for GET requests on Standard storage class, eliminating the per-request cost that makes high-GET-volume workloads expensive on S3. GCS charges for data retrieval from Nearline (30-day minimum storage, $0.01 per GB retrieval), Coldline (90-day minimum, $0.02 per GB retrieval), and Archive (365-day minimum, $0.05 per GB retrieval) classes, making the class selection an architectural decision for products that store large volumes of infrequently accessed files. GCS signed URLs use service account credentials rather than IAM user credentials — the signing identity is a service account JSON key file or a Workload Identity service account, each with different key rotation and credential management requirements. GCS does not provide native cross-region replication in the S3 CRR sense; data residency is configured through bucket location types: Regional (single region), Dual-Regional (two specific regions with synchronous replication), and Multi-Regional (distributed across a continent). Dual-Regional GCS provides synchronous replication between two specified regions — both regions receive the write before the operation returns success — which provides stronger consistency guarantees than S3 CRR's asynchronous replication but at a 2.75× storage cost multiplier relative to Regional. The multi-tenancy decision record intersects with GCS bucket organization: a multi-tenant SaaS on GCS must decide whether to use per-tenant buckets (strong isolation, higher operational overhead as tenant count grows) or a single bucket with per-tenant prefix namespacing (simpler operation, requires application-layer enforcement of tenant isolation rather than bucket-level policies).

Azure Blob Storage. Azure Blob uses Shared Access Signatures (SAS tokens) rather than presigned URLs — the access control model is analogous but the token structure and signing mechanism differ. Azure Blob SAS tokens can be scoped to the storage account level (account SAS), the container level (service SAS for containers), or the individual blob level (service SAS for blobs), providing finer-grained expiry and permission scoping than S3 presigned URLs, which are always scoped to a single object. Azure Blob provides two replication models relevant to the file storage ADR: Geo-Redundant Storage (GRS, asynchronous replication to a paired region with no time guarantee) and Geo-Zone-Redundant Storage (GZRS, synchronous replication across three availability zones in the primary region plus asynchronous replication to the paired region). Azure Blob's access tier system (Hot, Cool, Cold, Archive) is analogous to S3 Intelligent-Tiering but requires explicit per-blob tier assignment at upload time rather than automatic tier migration based on access patterns — a product that uploads all files to Hot tier without reviewing access patterns will pay Hot tier pricing for files that transition to infrequent access, while S3 Intelligent-Tiering would have migrated those files automatically. The Azure CDN integration uses Azure Content Delivery Network as the CDN layer rather than CloudFront; the cache invalidation and signed URL interaction patterns mirror the S3/CloudFront model but with Azure-specific API surface and configuration.

Cloudflare R2. R2 is S3-compatible object storage with zero egress charges — data transfer out to the internet is free, eliminating the $0.09 per GB S3 egress cost that makes high-download-volume products significantly more expensive on AWS. R2's pricing advantage is material for products where total data transfer volume is high relative to total stored data volume: a video platform, a file backup service, or a document download hub where users download files frequently accumulates S3 egress costs that can exceed the storage costs at scale. R2's structural limitations relative to S3 are the absence of lifecycle rules at general availability (S3's lifecycle rules for automatic object expiry and tier transition are not available on R2 as of mid-2026), the absence of native event notifications (S3 EventBridge events that trigger Lambda on upload are not available on R2 — event-driven processing requires Cloudflare Workers triggered by R2 bucket notifications), and the requirement that the CDN layer be Cloudflare's own network (R2 is designed to be served through Cloudflare's CDN; serving R2 through a third-party CDN reintroduces egress costs that negate R2's pricing advantage). For products that are already deployed on Cloudflare's network or that are building a Cloudflare Workers-native architecture, R2 is the lowest total cost storage option at high download volume. For products deployed on AWS or Azure with existing CDN configurations, R2 requires a network-layer migration that changes the delivery path for all file downloads.

MinIO. MinIO is self-hosted S3-compatible object storage, providing S3 API compatibility on infrastructure the operator controls. MinIO is the correct choice for products with data sovereignty requirements that prohibit storing user data in public cloud provider infrastructure — government contracts, regulated healthcare, financial services with on-premises data requirements — and for development and testing environments where S3 costs during development are worth eliminating. MinIO's structural constraints are the operational cost of self-hosting: capacity planning, disk failure handling, cluster rebalancing on node addition, backup configuration, and version upgrade management are all operational responsibilities that public cloud object storage providers absorb. A MinIO cluster that is not replicated across multiple nodes provides no redundancy — a single disk failure in a single-node MinIO instance results in data loss that a S3-hosted equivalent would not experience. The erasure coding configuration (the number of data and parity shards across MinIO cluster nodes) determines the fault tolerance: a MinIO cluster configured with N/2 parity can survive N/2 simultaneous node failures without data loss, at the cost of storing 2× the raw data. The erasure coding configuration must be specified in the storage ADR alongside the backup and disaster recovery policy, because the fault tolerance guarantee of the self-hosted cluster is the substitute for the durability guarantee that S3 provides by default (eleven nines durability through redundant storage across multiple Availability Zones).

AI chat session types and what each one misses

The file storage decision follows a consistent pattern in AI chat history. The founding session establishes the provider and basic configuration based on the immediate use case. A user support ticket triggers an investigation of presigned URL forwarding or download failures. A cost review surfaces unexpected egress or request charges. An audit or compliance review discovers residency or encryption gaps. Each session addresses the immediate symptom without generalizing to the storage architecture that would prevent the next variant of the same class of failure. The WhyChose extractor surfaces these sessions because they contain decisions that belong in a file storage ADR and that are consistently left in conversational form — visible in the chat history but not accessible to the team member who inherits the storage system without context.

The "how do I store user-uploaded files?" session. This is the founding session — the first time an engineer asks how to accept user file uploads and store them durably. The session covers the S3 SDK upload API, the bucket creation and permission configuration, and the presigned URL generation for downloads. What the session misses: the engineer is solving the immediate upload-and-download problem, not designing a storage architecture. The session recommends an expiry value for presigned URLs that matches the code example rather than the product's access control requirement. It does not address virus scanning because no files have been uploaded yet and the scanning requirement is not in scope for the initial implementation. It does not address CDN integration because the product is pre-launch and CDN cost optimization is not yet a consideration. It does not address cross-region replication because the initial deployment is to a single region and disaster recovery has not been discussed. The provider choice from this session — S3, GCS, R2, or MinIO — becomes the storage substrate for the product's entire file handling surface. Changing providers after files are in production requires a migration that moves all existing objects, updates all URL generation code, and validates that the new provider's presigned URL or SAS token model matches the existing access control expectations. An ADR written after the founding session documents the provider choice and the rationale, names the expiry policy and its access control implications, and explicitly addresses the virus scanning, CDN integration, and residency requirements that were out of scope for the initial session but must be designed before the first production upload at scale.

The "CDN isn't serving updated files" session. This session is triggered by a user report that an updated file — a revised document uploaded to replace a previous version — is still showing the old version to other users. An engineer queries: "I updated a file in S3 but the CDN is still serving the old version." The session covers the CloudFront cache invalidation API, the cache TTL configuration, and the CloudFront versioning strategy for static assets (cache-busting by filename suffix). What the session misses: the engineer is treating the CDN cache staleness as a content delivery problem, not as an access control problem. The session does not surface that the same caching behavior that serves stale updated files also serves deleted files — a deleted object that the CDN has cached continues to be served within the cache TTL window. The session recommends reducing the cache TTL or running an explicit cache invalidation on each file update, but does not address the delete case, the compliance implications of serving cached copies of deleted personal data, or the cost of programmatic cache invalidations at scale. The storage ADR's CDN section, written at adoption time, documents both the update invalidation policy and the delete invalidation requirement, specifying the trigger (S3 object delete event via EventBridge → Lambda → CloudFront invalidation API call) and the latency expectation (CloudFront invalidation typically completes within 60 seconds, but is not instantaneous). This prevents the delete invalidation gap from being discovered during a GDPR data deletion audit rather than during the initial CDN integration session.

The "large file uploads are failing" session. This session is triggered by user reports of upload failures on files above a certain size threshold. An engineer queries: "users are saying large file uploads fail, what could cause this?" The session covers multi-part upload configuration, browser upload timeout settings, the S3 Transfer Manager's auto-selection of upload strategy by file size, and network timeout handling. What the session misses: the engineer is investigating the upload mechanism without knowing whether a CDN sits between the browser and the storage layer on the upload path. If the application generates presigned URLs for direct browser-to-S3 uploads and the CDN is configured to proxy all traffic to the S3 origin, the CDN is in the upload path — and the multi-part upload session management problem described above applies. The session recommends increasing timeouts and adjusting the multi-part part size, which are correct for network stability issues but do not address the CDN caching problem that causes upload initiation responses to be cached and returned to subsequent upload attempts. The storage ADR's upload path section documents whether the CDN is in the upload path or the download path exclusively, and if the CDN is on the upload path, the required cache exclusion rules for multi-part upload management endpoints. The exclusion rules are the first thing checked when large file uploads begin failing, rather than a discovery that emerges after two days of debugging traffic that appears to succeed from the S3 API's perspective.

The "user data deletion compliance" session. This session is triggered by a GDPR data deletion request or a privacy audit. An engineer queries: "a user requested deletion of all their data, how do I ensure their files are deleted from S3?" The session covers the S3 delete API, the versioning complication (versioned objects require deleting all versions and the delete marker), and the S3 lifecycle rules for automating version expiry. What the session misses: the CDN cache invalidation requirement, which means that deleting the object from S3 does not make the file immediately inaccessible if the CDN has cached it. The session also does not address the replication complication: if cross-region replication is enabled, the delete operation must replicate to all replica buckets, and if delete marker replication is not configured, the replica retains the object bytes after the source bucket has deleted them. The deletion flow for GDPR compliance must invalidate the CDN cache, delete all versions and delete markers in the primary bucket, confirm that the delete marker has replicated to all replica buckets, and record the deletion timestamp and confirmation in the application's compliance log. This flow is not derivable from the S3 delete API documentation alone — it requires awareness of the CDN layer, the versioning configuration, and the replication configuration that were established in prior sessions and are not documented in any single place unless the storage ADR records them. The data retention decision record covers the retention policy for application data; the file storage ADR's deletion section must document the multi-step deletion flow and the compliance confirmation record specifically for object storage, where deletion has multiple layers that must all be completed for the data to be truly inaccessible.

Five ADR sections for file storage architecture

A file storage ADR that prevents the presigned URL forwarding incident, the CDN multi-part upload corruption, the scanning gap, and the GDPR deletion audit failure covers five sections that teams consistently omit from the founding implementation session.

First, provider and bucket configuration. The ADR documents the storage provider choice with rationale (S3 for AWS-native ecosystem integrations and CloudFront cost advantage; GCS for zero per-request costs at high GET volume; R2 for zero egress costs at high download volume; MinIO for data sovereignty), the bucket naming scheme and per-environment isolation strategy (separate buckets per environment — production, staging, development — rather than prefix-based environment separation in a shared bucket, to prevent accidental cross-environment access), the versioning configuration (enabled or disabled at adoption time, with the versioning-enable migration plan if the product starts without versioning and needs it later for disaster recovery), the bucket region selection aligned with the application's primary deployment region and user data residency requirements, and the object key structure (per-user prefix, per-tenant prefix in multi-tenant products, per-content-type prefix for mixed content, or flat UUID keys with application-layer mapping). The bucket configuration section also documents the encryption model: S3 server-side encryption with S3-managed keys (SSE-S3, no additional cost, no key management overhead), S3 server-side encryption with KMS-managed keys (SSE-KMS, per-request KMS charges, customer-controlled key rotation and access policy), or client-side encryption (the application encrypts before upload and decrypts after download, giving the application full control over the encryption key but requiring the application to manage key storage and rotation). The secrets management decision record intersects with the encryption key choice: SSE-KMS keys and client-side encryption keys are high-value secrets that must be managed with the same rigor as database credentials. The infrastructure-as-code strategy decision record determines whether the bucket configuration — policies, CORS, versioning, lifecycle rules, encryption, replication — is managed in Terraform or in the cloud provider's console; the storage ADR should specify that all bucket configuration is managed in infrastructure-as-code with the actual Terraform resource names, so that any drift between the documented configuration and the applied configuration is detectable by running a Terraform plan.

Second, access control model. The ADR documents the presigned URL expiry policy (the expiry window for each use case — file upload presigned URLs, file download presigned URLs for short-session downloads, and long-lived sharing links for collaboration features that require external access), the user binding mechanism (whether the application regenerates a fresh presigned URL on each authenticated download request, or issues a presigned URL with a long expiry that persists in the client), the URL generation audit log schema (the table that records URL generation events with issuing user, object key, expiry, and URL signature hash), the revocation mechanism (the application's response to a user reporting a forwarded URL — whether this is a manual support ticket response that invalidates the object's path, a CloudFront key rotation, or an application-proxied download that the application can block immediately by revoking the user's session), and the CloudFront signed URL policy for content that requires network-level access control beyond bearer-token expiry. The access control model section must specify whether the product uses S3 presigned URLs, CloudFront signed URLs, or CloudFront signed cookies, and the rationale for the choice based on the access control requirements of the specific content types the product stores. A product that stores both public assets (marketing images, product screenshots that can be CDN-cached without access control) and private user files (documents, attachments, personal data) may use different access control models for each content type — public assets served directly from CloudFront without signing, private files served via CloudFront signed URLs or presigned URLs with short expiry. The access control model for each content type must be documented explicitly; a gap where one content type is incorrectly treated as public creates the data leakage incident pattern described in the opening narrative.

Third, CDN integration and cache invalidation. The ADR documents the CDN placement in the delivery path (download-only CDN, where uploads go directly to the storage provider and downloads are served through the CDN; full-proxy CDN, where all storage traffic routes through the CDN), the cache exclusion rules for the upload path (the multi-part upload initiation, part upload, and completion paths that must bypass the CDN cache if the CDN is on the upload path), the cache TTL for download responses (the maximum time a deleted or updated object remains accessible from the CDN cache after the source object is deleted or replaced), the cache invalidation trigger (the application event — object delete, object replace — that triggers a CloudFront invalidation API call, and whether the invalidation is synchronous in the request path or queued as a background task), and the signed URL key pair management (the CloudFront public-private key pair used for signed URL generation, the key rotation schedule, and the grace period for outstanding signed URLs during rotation). The CDN decision record covers the provider selection and general caching strategy; the storage ADR's CDN section documents the storage-specific cache behavior and invalidation requirements that differ from the general CDN configuration. The CDN integration section must explicitly address the delete invalidation latency — the time between an object being deleted from the storage layer and the CDN cache being invalidated — and whether this latency is acceptable under the product's data deletion compliance obligations. If the latency is not acceptable (for example, a GDPR deletion request that must result in immediate inaccessibility), the CDN layer must be bypassed for the delivery of personal data or the cache TTL must be set to zero for those content types, effectively converting the CDN from a caching layer to a routing layer for personal data.

Fourth, virus scanning and content policy. The ADR documents the scanning integration point (pre-upload scanning in the upload request path, post-upload scanning in the quarantine-to-production pipeline, or both), the scanning engine choice (ClamAV self-hosted for known signature detection, managed service for broader detection coverage), the quarantine bucket configuration (the staging bucket where uploads land before scanning completes, with no presigned URL generation or CDN delivery until the scan result is clean and the file is promoted to the production bucket), the quarantine-to-production transition trigger (the scan completion event that invokes the promotion Lambda or background job), the rejection policy for infected files (delete from quarantine immediately, or retain in a forensic bucket for a defined period for security team review), and the user notification flow for quarantined files (informing the uploading user that their file is under review without exposing scanning engine details). The content policy section also documents the file type allowlist or blocklist: which file types the product accepts (enforced by content-type header validation and magic-byte validation, not by file extension alone, since extensions are attacker-controlled), which file types are rejected before scanning, and whether the product applies different scanning or quarantine policies for different file types (PDFs and Office documents with macro support receive sandbox execution analysis; image files receive EXIF metadata stripping; executable file types are rejected regardless of scan result). The content type validation must verify both the declared content-type header and the file's binary magic bytes — a file uploaded with Content-Type: image/jpeg that has a ZIP magic byte signature is a polyglot and should be rejected at the content type validation step rather than passed to the scanning engine that may analyze only the JPEG format. The observability strategy decision record must include monitoring for the quarantine pipeline: quarantine bucket depth as a leading indicator of scanning pipeline blockage, scan processing latency as a measure of scanning service capacity, and the rate of scan result errors (neither clean nor infected — a scan service returning errors passes files through to production if the error handling is not explicitly specified as reject-on-error rather than accept-on-error).

Fifth, cross-region replication and data residency. The ADR documents the replication configuration (enabled or disabled, primary and secondary regions, S3 CRR versus RTC, synchronous versus asynchronous replication), the replication time SLA relative to the disaster recovery RPO documented in the disaster recovery record, the versioning and delete marker replication policy (whether delete markers are replicated to the secondary region, providing logical deletion in the replica, or suppressed in replication, retaining object bytes in the replica for deletion recovery), the lifecycle rules on both primary and secondary buckets (version expiry and abort-incomplete-multipart-upload rules must be applied to both buckets independently), and the data residency determination logic for multi-tenant products (the mechanism that routes each user's files to the bucket in the correct region based on the user's data residency requirement). The data residency section must address the migration path for existing users when their residency requirement changes — for example, a user who initially signed up in a US-based account and whose employer subsequently requires EU data residency must have their existing files migrated to an EU bucket. The migration process (copy objects to EU bucket, verify copy integrity, update application database references, delete from US bucket, confirm CDN cache invalidation for the US bucket delivery path) must be documented before the first migration is required, because designing the migration process under a deadline from a specific customer's security team is a worse outcome than designing it at adoption time when the storage architecture is still fresh. The disaster recovery decision record covers the failover procedure for the application layer; the file storage ADR's replication section documents the failover procedure specifically for object storage — the promotion of the secondary region replica to primary after a regional outage, the DNS change or CDN origin configuration change that routes downloads to the replica, and the replication restart configuration after the primary region recovers. The multi-region deployment decision record specifies the application-layer regional routing; the storage ADR documents the storage-layer regional configuration that the application routing depends on, since a failover that routes application traffic to the secondary region but continues to reference the primary region's S3 bucket endpoint will fail to serve file content if the primary region is inaccessible.

None of these five sections appear in the "how do I store user-uploaded files?" AI session that established the storage provider and basic configuration. The founding session covers the minimum viable integration — a working upload and download with presigned URLs. It does not cover the bearer-token access control model of presigned URLs and when it requires supplementation with CloudFront signing or application-proxied downloads. It does not cover the CDN interaction with multi-part uploads or the cache invalidation requirement for deleted objects. It does not cover the virus scanning integration point or the quarantine model for post-upload analysis. It does not cover cross-region replication, versioning, and delete marker replication policy. These are not advanced optimization concerns — they are the operational requirements of a file storage system that continues to work correctly as users share download links, files are uploaded at large sizes through CDN-proxied paths, malicious content is submitted, and data deletion requests arrive with compliance implications. The WhyChose extractor surfaces the founding session, the CDN configuration session, the large file upload investigation, and the GDPR deletion audit from AI chat history; the file storage ADR takes the provider and access control choices buried in those sessions and converts them into a documented presigned URL access control policy, CDN integration specification, scanning pipeline configuration, and replication disaster recovery plan — written before the incidents that make those requirements visible at the worst possible moment.

FAQs

What is the difference between S3 presigned URLs and CloudFront signed URLs, and when does each create a security vulnerability?

An S3 presigned URL is a time-limited bearer token — any HTTP client that possesses the URL can download the object within the validity window, with no authentication and no user binding. The URL contains the signing key, expiry, and object path as query parameters; forwarding the URL to anyone gives them download access for the full expiry duration. The security surface of a presigned URL is entirely determined by its expiry window: a 7-day presigned URL is practically equivalent to a public URL for 7 days. S3 presigned URLs have no native revocation mechanism — once issued, a presigned URL cannot be invalidated before expiry without changing the object's storage path (which invalidates all outstanding URLs for that object, including legitimate ones) or rotating the signing credentials (which invalidates all presigned URLs for all objects).

A CloudFront signed URL is signed using a CloudFront key pair managed separately from IAM credentials. It can be paired with a policy document that restricts access by IP address range and specifies a not-before time in addition to the expiry — a URL that is not valid before a specified time prevents pre-issuance URL hoarding. CloudFront signed cookies authorize access to an entire path prefix for a session, allowing an authenticated user to browse multiple files without per-object URL generation. The security vulnerability in CloudFront signed URL setups is key pair rotation: rotating the signing key pair invalidates all outstanding signed URLs immediately unless the application supports a grace period by maintaining two active key pairs during rotation. The correct model is short presigned URL expiry (5 to 15 minutes) for standard downloads, regenerated per authenticated request so the URL is never stored separately from the session; or CloudFront signed URLs with IP policy documents for content requiring network-level access control.

Why does CDN integration require explicit exceptions for multi-part upload endpoints, and what happens when those exceptions are missing?

A multi-part S3 upload proceeds through three API calls: CreateMultipartUpload (which returns a session UploadId), UploadPart (which uploads each part and returns an ETag per part), and CompleteMultipartUpload (which assembles the object from the parts and ETags). The UploadId from CreateMultipartUpload is session-specific — it is valid only for the specific upload session and cannot be reused across sessions or users. If a CDN caches the CreateMultipartUpload response, subsequent upload initiations within the cache TTL receive the cached UploadId from a previous session. S3 returns a NoSuchUpload error for UploadPart calls submitted with a stale UploadId, causing every large file upload to fail silently from the user's perspective — the upload flow appears to proceed but every part upload fails.

If UploadPart responses are cached, a different failure occurs: the cached ETag from a previous part's content is submitted to CompleteMultipartUpload for a different part's bytes, causing the assembled object to be constructed from mismatched part content. S3 accepts the CompleteMultipartUpload call if the ETag is syntactically valid, even if it corresponds to a different part. The assembled file is silently corrupted — no API error indicates the problem. The fix requires explicit CDN cache exclusion rules for all three upload API endpoints, specified by query parameter pattern (?uploads for initiation, partNumber and uploadId for part uploads, uploadId alone for completion). These exclusion rules are CDN-provider-specific and are not included in any CDN provider's default object storage integration template.

What is the correct integration point for virus scanning in an object storage pipeline, and why does pre-upload scanning miss a class of malware that post-upload scanning catches?

Pre-upload scanning intercepts the file at the application server before forwarding the upload to the storage provider, providing synchronous rejection of malicious content at the cost of scanning latency in the upload request path. It is effective for known malware signatures detectable by engines like ClamAV. Post-upload scanning writes to a quarantine bucket first, scans asynchronously, and promotes clean files to the production bucket on scan completion. The class of malware that pre-upload scanning misses is content that requires deeper analysis than signature matching: polyglot files (files simultaneously valid in two formats, such as a JPEG that is also a ZIP archive containing malicious content), novel malware variants not yet in the signature database, and files with obfuscated payloads that evade static signature matching but trigger behavioral analysis in a sandboxed execution environment.

Post-upload scanning can integrate with managed threat intelligence services that run behavioral analysis and heuristic detection, because the asynchronous scan has no user-facing request timeout constraint. Pre-upload scanning is constrained to the response time budget of the upload HTTP request — typically under 30 seconds, which is sufficient for signature matching but not for sandboxed execution analysis. The production architecture for high-security file storage combines both: pre-upload scanning with ClamAV to block known signatures synchronously at the application boundary, and post-upload scanning in the quarantine pipeline with a managed service for deeper analysis. The scan result for each file must be recorded in the application database alongside the object metadata, because the scan result is part of the provenance of the stored object and is needed to answer compliance audit questions about which files were accepted and by which scanning engine version.