Why does search need an architecture decision record?

Search appears to be a query — a WHERE clause with a LIKE operator, or a full-text search call, or an API call to a managed search service. This framing hides the architectural decisions embedded in the implementation. The indexing strategy (database full-text search, a dedicated search engine, a managed SaaS service, vector embeddings) determines the performance ceiling under corpus growth: a Postgres ILIKE query that returns results in 40 milliseconds at 50,000 records returns results in 12 seconds at 2,000,000 records because a sequential scan does not benefit from corpus growth the way an inverted index does. The relevance model (BM25, TF-IDF, learned ranking) determines whether the first result is the most relevant result or just the result with the most keyword matches — and relevance is not a configuration setting, it is a model that must be tuned as the corpus grows and user behavior reveals which results users actually click. The synchronization approach (synchronous dual-write, event-driven async, CDC log tailing, periodic reindex) determines how stale the search index can be and what the failure mode is when the synchronization process fails — whether users see a missing document or an outdated version of a document, and for how long. The schema evolution policy determines whether adding a new field to the search index requires downtime or can be performed with a zero-downtime blue-green reindex. The GDPR erasure procedure determines whether deleting a user's data from the primary database is sufficient or whether the search index is a separate system that requires its own deletion procedure. None of these decisions are visible in the code as design rationale — they appear as index configuration, query code, background jobs, and operational runbooks scattered across the codebase. The search ADR holds all of them together with the reasoning that justifies each choice, making it possible to evolve the search architecture safely as the corpus grows and requirements change.

What is the performance cliff in database full-text search and when does it matter?

Database full-text search — Postgres tsvector with a GIN index, MySQL FULLTEXT, SQLite FTS5 — is a valid search implementation for corpora up to a few million records, with important caveats about query complexity and update frequency. The performance cliff is not at a specific record count; it is at the intersection of corpus size, query complexity, and the write-to-read ratio of the indexed data. A Postgres GIN index on a tsvector column enables full-text search queries that use the index, avoiding sequential scans. This is fast for single-table queries with simple text matching. It becomes slow when the search needs to join multiple tables (a product search that ranks by title match, body match, and sales volume requires joining the products table, the descriptions table, and the orders table — and GIN indexes do not accelerate the join), when relevance ranking requires BM25 statistics across the entire corpus (Postgres full-text ranking uses a simplified tf-idf model that is less sophisticated than Elasticsearch's BM25 implementation), and when the indexed data has a high update rate (GIN index updates are expensive, and heavy write workloads on indexed tables can slow down writes across the entire table). The practical boundary is: database full-text search is correct when the corpus is under a few million records, the query complexity is low (single-table or simple join, keyword matching without advanced relevance tuning), and the relevance model does not need to be tuned independently of the database schema. When relevance tuning, high query throughput, typo tolerance, synonym expansion, faceting, or per-field boosting are requirements, a dedicated search engine is the correct architectural choice — and that choice must be made before the corpus grows to the point where database search becomes a production problem.

What does zero-downtime reindexing require and why is it an architectural decision?

Zero-downtime reindexing is the procedure for rebuilding a search index (to add new fields, change analyzers, update mappings, or perform a full corpus re-ingestion) without taking the search feature offline during the rebuild. It requires three things: index aliasing, dual-write during the rebuild window, and an atomic alias swap at cutover. In Elasticsearch and OpenSearch, an alias is a stable name that points to one or more physical indices. Application code reads from and writes to the alias; the alias pointer is changed at cutover without any application code change. The procedure is: create a new versioned index with the new schema (products_v2 while products_v1 is live), run the reindex job to copy all documents from v1 to v2 with the new schema applied, configure dual-write so new writes go to both v1 and v2 during the rebuild window, complete the reindex of historical documents, atomically swap the alias from v1 to v2, stop writing to v1, delete v1 after a monitoring window confirms v2 is serving correctly. This procedure requires that the application code always references the search system through an alias, not a physical index name. If the application code writes directly to a named index, the alias-swap approach cannot be used and reindexing requires application downtime. The alias-based architecture must be designed from the first index creation — retrofitting it onto a system that writes to a hard-coded index name requires an alias migration that is itself a coordinated operation. The search ADR must specify whether the system uses aliases and versioned indices, because this determines whether future schema changes require downtime or can be performed with the zero-downtime procedure.

What should a search architecture decision record include?

A search architecture ADR needs six sections. First, the indexing technology selection with rejection reasons for each alternative: not a benchmark comparison table but a specific statement of what was wrong with each alternative for this specific corpus size, query complexity, and operational context — why Postgres full-text was rejected (anticipated corpus growth beyond the performance ceiling, or a relevance requirement that GIN-index ranking cannot satisfy), why Algolia was rejected (cost at scale, inability to self-host, vendor dependency for a core product feature), why Elasticsearch was chosen (self-hostable, BM25 relevance model, native alias support for zero-downtime reindex, Lucene-backed full-text with language analysis). Second, the relevance model and tuning policy: the scoring algorithm (BM25 with what field weights, boosting for recency or popularity signals, query-time or index-time boosting), the synonym dictionary (what process adds synonyms, who reviews the synonym list, how the synonym expansion is tested before deployment), the fuzziness and typo tolerance configuration (maximum edit distance, what fields use fuzzy matching, whether prefix matching is enabled for autocomplete). Third, the synchronization approach: how changes in the primary database propagate to the search index (synchronous dual-write on every mutation, event-driven consumer reading from the change queue, CDC log tailing via Debezium reading the database WAL, periodic reindex job), the maximum acceptable staleness of the search index, and the failure mode when synchronization fails (search serves stale results, or search degrades to the database fallback, or search returns an error). Fourth, the index schema and evolution policy: what fields are indexed and with what analyzers, whether the system uses versioned indices and aliases, the procedure for schema changes (zero-downtime reindex procedure for additive changes, migration procedure for breaking changes), and the reindex time estimate for the current corpus size at current ingestion throughput. Fifth, the multi-tenancy model: whether tenant data is separated into per-tenant indices (isolation, clean deletion, multiplied index count) or a shared index with a mandatory tenant filter (simpler operations, but relevance statistics are shared across tenants), and how tenant-scoped API keys or query filters enforce the isolation. Sixth, the GDPR and personal data policy: what personal data is present in the search index, the deletion procedure for personal data on user erasure request (delete-by-query on the document ID, how long deletion propagates through the index segments, whether crypto-shredding is required for long-retention indices), and whether the search index is included in the data retention and erasure runbook.

2026-06-19 · ~20 min read

The search architecture decision record: why the search approach you chose determines your relevance tuning capability and your query latency under index growth

Search feels like a database query — a WHERE clause, a LIKE operator, maybe a full-text index. This framing obscures the architectural decisions embedded in the implementation: the indexing strategy that determines the performance ceiling under corpus growth; the relevance model that determines whether the first result is actually the most relevant result; the synchronization approach that determines how stale the index can be and what happens when synchronization fails; the schema evolution policy that determines whether adding a new field requires downtime. Most teams discover these decisions not when designing the search feature, but when the corpus has grown large enough that the original approach stops working.

A B2B SaaS team ships a document management feature. Users upload contracts, proposals, and compliance documents. The search bar at the top of the page runs a query against the primary Postgres database using ILIKE '%search term%' on the document title and extracted text columns. At launch, with 40,000 documents across all customers, search returns results in under 100 milliseconds. The product works well. Nobody flags search as an architectural concern.

Eight months later, the customer base has grown. The largest customer has 300,000 documents. Across all customers, the corpus is 2.1 million documents. The ILIKE query is now running a sequential scan on a 2-million-row table because ILIKE with a leading wildcard cannot use a B-tree index. Search takes 11 seconds. The product team adds a "search is currently slow — we are working on a fix" banner to the search results page.

The engineering team begins evaluating Elasticsearch. They discover that the migration requires answering questions that were never asked when the original search was built: What fields should be indexed? At what weights? Should document title matches rank higher than body matches? How should typos be handled? What is the maximum acceptable lag between a document upload and its appearance in search results? What happens to search when the indexing pipeline falls behind? How does search handle the 20 customers who have uploaded documents in German and French? None of these are questions about Elasticsearch configuration — they are product decisions that the original ILIKE query implicitly answered by ignoring relevance entirely, and that the migration forces the team to answer explicitly, under delivery pressure, with limited time to experiment.

The migration takes three months. The performance problem is solved. The relevance model, designed under pressure, is never tuned after the initial deployment — the synonym list is empty, the field weights are the Elasticsearch defaults, typo tolerance uses the default edit distance. The decisions made during the migration are not written down. Six months later, a new engineer asks why the search architecture is the way it is, and nobody can explain the relevance tuning choices.

Why search is an architectural decision, not just a query

Search appears to be an implementation detail — a database query with some text matching. The appearance conceals the fact that every search implementation embeds at least six distinct architectural decisions, each with consequences that become visible only after the corpus grows or the product requirements evolve.

The indexing strategy determines the performance ceiling under corpus growth. A database sequential scan with ILIKE performs a full table scan on every query — the query time grows linearly with the corpus size. A Postgres full-text index (tsvector with a GIN index) enables index-accelerated full-text matching but with a relevance model that is less sophisticated than dedicated search engines. A dedicated search engine (Elasticsearch, OpenSearch, Meilisearch, Typesense) uses an inverted index — a data structure that maps terms to documents, enabling sub-100-millisecond queries on corpora of tens of millions of documents. A vector search index (pgvector, Weaviate, Pinecone) enables semantic similarity search using embedding vectors, with different performance characteristics and a different query model than keyword search. The indexing strategy is an architectural decision because migrating from one approach to another requires rebuilding the index, changing the query code, and often redesigning the relevance model. A team that starts with ILIKE and migrates to Elasticsearch is performing an architectural migration, not a performance tuning exercise.

The relevance model determines the quality of search results, which is a product property. BM25 (Best Match 25, the default ranking algorithm in Elasticsearch and OpenSearch) scores documents by term frequency (how often the search term appears in the document), inverse document frequency (how rare the term is across the entire corpus — a term that appears in every document is a weaker signal than a term that appears in few documents), and field length normalization (a match in a short title is weighted more heavily than a match in a long document body). TF-IDF is the predecessor to BM25 and is used by older systems and by Postgres full-text search's ranking functions. Learned ranking (machine-learned ranking models trained on user click behavior) is more accurate than BM25 but requires a corpus of user interactions to train. The relevance model is a product decision: it determines whether the first result is actually the most relevant, and it must be tuned as the corpus grows and user behavior reveals which results users actually select. A search ADR that specifies "BM25 with default field weights" is incomplete — it must specify what "default" means in the context of the product's corpus and query patterns, and what the tuning procedure is for improving relevance over time.

The synchronization approach determines the consistency model between the primary database and the search index. A search engine is a secondary data store — documents are indexed from a primary source (a relational database, an object store, a document database). Any time the primary data changes, the search index must be updated. The mechanism for propagating changes (synchronous write to the search engine on every database mutation, event-driven indexing via a message queue consumer, CDC log tailing via Debezium reading the database WAL, periodic batch reindex on a cron schedule) determines how stale the search index can be, what the failure mode is when the synchronization breaks, and what the recovery procedure is when the index falls out of sync. A search index that is updated synchronously on every write has zero lag but adds the search engine's write latency to every primary database write — and the search engine's availability becomes a dependency of every database write, which can cascade into write failures when the search cluster is degraded. A search index that is updated asynchronously via an event queue has configurable lag but isolates write failures to the search indexing path.

The schema evolution policy determines whether index changes require downtime. A search index has a schema — the set of fields, their data types, and their analyzer configurations. Adding a new field to the search schema requires reindexing: the existing documents must be re-processed to include the new field's values. For small corpora, reindexing takes seconds. For large corpora, reindexing takes hours or days. If the application writes directly to a named index, reindexing requires either accepting the latency of a background reindex while the index is inconsistent, or taking the search feature offline during the reindex. If the application uses index aliases (a stable name pointing to a versioned physical index), reindexing can be performed with a zero-downtime blue-green procedure: build the new index in the background, swap the alias atomically, decommission the old index. The alias architecture must be designed from the first index creation; retrofitting it later is a coordinated migration.

The GDPR erasure procedure must cover the search index as a separate system. A search index is not a cache of the primary database — it is a secondary data store with its own retention and deletion characteristics. When a user requests deletion of their data under GDPR's right to erasure, deleting the user's records from the primary database is not sufficient if the search index contains indexed copies of those records. A delete-by-query on the Elasticsearch index removes the document from the live search results but does not immediately remove the data from the underlying Lucene segments on disk — the data is marked as deleted and excluded from query results, but the physical bytes are not removed until the next segment merge. For a search index with infrequent merges, the physical deletion may lag the logical deletion by hours or days. The erasure procedure must account for this, specify the merge policy, and be included in the data retention ADR alongside the primary database deletion procedure.

Indexing technology: the decision that sets the ceiling

The choice of indexing technology is the foundational decision in the search ADR. It determines the performance ceiling, the relevance model options, the operational complexity, and the cost structure. The decision must be made against specific requirements, not as a general evaluation of search technologies.

Database full-text search (Postgres tsvector with a GIN index, MySQL FULLTEXT, SQLite FTS5) is the correct starting point for teams that do not yet know their search requirements, whose corpus is under a few hundred thousand records, and whose relevance requirements are basic keyword matching without synonym expansion, faceting, or field boosting. The operational advantage is zero additional infrastructure — the search index lives in the primary database, there is no separate cluster to manage, and the search results are consistent with the primary data because they are the primary data. The structural limitations emerge as the product grows: the GIN index accelerates text matching but not complex ranking; joining across multiple tables to compute a relevance score is expensive; high write rates on indexed columns degrade write throughput for the entire table; and the Postgres full-text ranking functions (ts_rank, ts_rank_cd) are less sophisticated than BM25. Database full-text search is rejected when the corpus exceeds a few million records with complex queries, when relevance tuning is a product requirement, or when query latency at the 95th percentile must be under 100 milliseconds at scale.

Dedicated search engines (Elasticsearch, OpenSearch, Meilisearch, Typesense) use an inverted index built on Lucene (Elasticsearch, OpenSearch) or purpose-built engines optimized for specific use cases (Meilisearch for typo-tolerant instant search, Typesense for multi-tenant SaaS search). The inverted index maps terms to the documents that contain them, enabling queries that scan the index rather than the corpus — a query on a 10-million-document corpus takes a similar time to a query on a 100,000-document corpus, because the index structure scales sublinearly with corpus size. Elasticsearch and OpenSearch offer the most complete feature set: BM25 ranking, field boosting, synonym expansion with a synonym filter token chain, fuzzy matching via Levenshtein edit distance, aggregations for faceting, nested document support, and a rich query DSL. The cost is operational complexity: an Elasticsearch cluster requires capacity planning, index management, shard allocation tuning, and a monitoring posture distinct from the primary database. Managed Elasticsearch services (Elastic Cloud, Amazon OpenSearch Service, Bonsai) reduce the operational burden while preserving the feature set. Meilisearch and Typesense trade some of Elasticsearch's flexibility for simpler operations and first-class multi-tenancy support (Typesense's scoped API keys make per-tenant isolation a configuration option rather than an architecture pattern).

Managed SaaS search (Algolia, Typesense Cloud) eliminates cluster operations entirely. The search index lives in the vendor's infrastructure; the application sends indexing requests and runs queries against the vendor API. Algolia's feature set is comprehensive — instant search, typo tolerance, faceting, personalization, and A/B testing for relevance experiments — and the operations cost is effectively zero beyond API rate limits. The structural constraints are cost at scale (Algolia pricing is per-operation and per-record, and the cost grows significantly at millions of records and high query volumes), vendor dependency (the search feature's availability depends on the vendor's uptime, and the search schema is constrained to the vendor's data model), and data gravity (all indexed data flows to the vendor's infrastructure, which creates a compliance review requirement for any corpus that contains personal data). Algolia is rejected when the cost at projected scale is prohibitive, when data sovereignty requirements prohibit sending personal data to a third-party SaaS, or when the search feature requires customization beyond what the vendor API exposes.

Vector search (pgvector for Postgres, Weaviate, Pinecone, Qdrant) enables semantic similarity search using embedding vectors — numerical representations of document meaning generated by a language model. A vector search query computes the embedding of the search query and retrieves documents whose embeddings are nearest in vector space, regardless of whether the exact query terms appear in the documents. Vector search is the correct approach for semantic search (finding documents about a concept without requiring exact keyword matches), similar document retrieval, and hybrid search (combining keyword relevance from BM25 with semantic relevance from vector similarity). The structural requirement is an embedding pipeline: documents must be embedded by a language model before indexing, queries must be embedded at query time, and the embedding model must be fixed (or re-embedding the entire corpus must be planned) because changing the embedding model invalidates all stored embeddings. Vector search is not a replacement for keyword search — it is an additional signal that can be combined with BM25 via a hybrid ranking approach. The search ADR must specify whether vector search is in scope for the initial implementation, what embedding model is used, and how the embedding pipeline interacts with the synchronization approach.

Relevance tuning: the maintenance work that is always deferred

Relevance is not a configuration setting — it is a model that degrades if it is not maintained. The default BM25 configuration in Elasticsearch is a reasonable starting point for most corpora, but it is not tuned for any specific product's corpus or query patterns. The gap between default relevance and tuned relevance grows as the corpus grows and as user expectations rise.

Field boosting assigns different weights to matches in different fields. A match in the document title is a stronger signal than a match in the document body, because a document titled "Postgres performance tuning" is more specifically about Postgres performance tuning than a document that mentions the phrase in passing in its body. Elasticsearch field boosting is configured at query time (by specifying boost values in the multi-match query) or at index time (by using a boost in the field mapping, which was deprecated in Elasticsearch 5.0 in favor of query-time boosting). The field boost values must be calibrated against the specific corpus and query patterns: a document management product where document titles are short and precise needs different field weights than a knowledge base where article titles are long and body content is highly variable. The field weight calibration is a product experiment — it requires search result quality measurement (click-through rates, the position at which users find the result they click) to determine whether the current weights are producing good results.

Synonym expansion allows a query for "contract" to also match documents that contain "agreement" or "SLA" or "NDA". Synonyms are configured in the index analyzer pipeline as a synonym token filter. The synonym list is a dictionary that must be maintained — it must be created (populated with synonyms that matter for the product's domain), reviewed (wrong synonyms degrade relevance, because they cause irrelevant documents to match), deployed (synonym changes require reindexing to apply to existing documents, unless synonyms are configured at query time), and updated as the product domain evolves. A search ADR that does not specify the synonym list management process — who maintains it, how changes are tested before deployment, how the synonym filter is applied (index-time or query-time) — is leaving the synonym dictionary as a permanent empty default.

Fuzziness and typo tolerance allow a query for "managment" to match documents that contain "management". Elasticsearch implements fuzziness using Levenshtein edit distance — a configuration that specifies the maximum number of character operations (insertions, deletions, substitutions, transpositions) allowed between the query term and the indexed term. An edit distance of 1 matches single-character typos. An edit distance of 2 matches two-character typos but also matches more irrelevant documents, because more terms fall within edit distance 2. Meilisearch and Typesense implement typo tolerance as a first-class feature with automatic edit distance scaling by word length. The fuzziness configuration must be calibrated against the product's query patterns: a search over technical documentation should use lower fuzziness (technical terms are precise, and fuzzy matching of "SQL" can produce surprising results) than a search over user-generated content where typos are common.

Relevance measurement and the improvement cadence is the maintenance process that keeps relevance from degrading over time. Without a measurement framework, relevance problems are reported as user complaints rather than metrics. A basic relevance measurement framework has three components: (1) a set of standard queries ("test queries") with known good results, checked before each relevance change deployment to detect regressions; (2) a click-through rate metric that measures what fraction of searches result in a click, which indicates whether users are finding what they are looking for; (3) a mean reciprocal rank metric that measures the position of the first result that the user clicks, which indicates whether the most relevant result is appearing near the top. The observability strategy must include search quality metrics alongside latency and error rate metrics, because a search system can have excellent latency and zero errors while returning irrelevant results.

Synchronization: the consistency model between the database and the index

A search engine is a secondary data store. Any architecture that has a primary database and a search index has an eventual consistency problem: changes to the primary data must propagate to the search index, and between the primary write and the index update, the search index is stale. The synchronization approach determines the lag, the failure mode, and the recovery procedure.

Synchronous dual-write writes to the primary database and to the search index in the same request handler, sequentially. If the database write succeeds and the search index write fails, the request returns an error and the client retries — which may produce a duplicate write to the database (depending on the retry semantics and the write idempotency of the database operation). If the database write succeeds and the search index write times out, the caller does not know whether the index was updated or not. Synchronous dual-write couples the availability of the search engine to the availability of every write path in the application: if the search cluster is degraded, writes fail. It is the simplest implementation of search indexing and the most fragile — it is correct only for small-scale applications where the search engine's write latency and availability are acceptable as a dependency of every primary write.

Event-driven indexing via a message queue decouples the primary database write from the search index update. When a document is created or updated in the primary database, an event is published to a queue (as described in the queue and messaging decision record). A separate consumer reads events from the queue and sends indexing requests to the search engine. The search index update is asynchronous — the document appears in search results after the consumer processes the event, which may be seconds or minutes after the primary write depending on the queue lag and the indexing throughput. The advantage is isolation: if the search cluster is degraded, the queue accumulates events that are replayed when the cluster recovers, without failing the primary write path. The disadvantage is complexity: the consumer must be deployed, monitored, and maintained; the queue must be sized for peak indexing load; and the consumer must implement idempotency (re-processing an event that was already indexed must be safe).

CDC log tailing (Change Data Capture using Debezium, the AWS DMS CDC feature, or database-specific CDC connectors) reads the database write-ahead log (WAL) and publishes events for every row change in the monitored tables. The application code makes no changes to implement indexing — the CDC process observes changes to the database at the storage layer and publishes indexing events without any application-level instrumentation. This eliminates the dual-write problem: the CDC process captures every change, including changes made by administrative tools, migrations, and other processes that bypass the application layer. The structural requirements are higher operational complexity (Debezium requires ZooKeeper or Kafka, a deployed connector, and WAL retention configuration on the source database) and schema coupling (the CDC connector must be reconfigured whenever the database schema changes, because it operates on row-level changes that reference column names). CDC is the correct synchronization approach for large-scale systems where complete change capture is a correctness requirement and the operational capacity to run CDC infrastructure exists.

Periodic batch reindex rebuilds the search index from the primary database on a scheduled interval — a cron job that queries the database and pushes all changed records (or all records, for a full reindex) to the search engine. The simplest implementation uses a last-modified timestamp to identify records changed since the last reindex run. The lag is the reindex interval: if the reindex runs every 5 minutes, the search index is at most 5 minutes stale. Periodic reindex is the correct approach when eventual consistency with a bounded lag is acceptable, when the indexing volume is low enough that a full reindex completes within the interval, and when operational simplicity is a priority over minimal lag. The failure mode is bounded: if a reindex run fails, the next scheduled run picks up where it left off. The limitation is that deleted records are not automatically detected — a deletion in the primary database does not produce a last-modified timestamp change, so periodic reindex must include a reconciliation step that compares the primary database record set against the search index and deletes orphan documents.

Schema evolution and zero-downtime reindexing

A search schema is the definition of what is indexed and how it is analyzed. When the schema changes — a new field is added, an analyzer is reconfigured, a field type changes — the existing documents in the index do not automatically reflect the change. Reindexing is required: all documents must be re-processed through the new schema. The architectural question is whether reindexing can be performed without taking search offline.

The index alias pattern is the foundation of zero-downtime reindexing in Elasticsearch and OpenSearch. An alias is a stable name that maps to one or more physical indices. Application code interacts with the alias rather than with a named physical index. When the index schema must change, a new physical index is created with the new schema, the existing documents are reindexed from the old physical index to the new one, and the alias is atomically swapped to point to the new index. From the application's perspective, the alias name did not change — search queries and indexing operations continue to use the same alias. The old index is decommissioned after a monitoring window confirms the new index is serving correctly. This pattern requires that the alias architecture be established from the first index creation — naming the first physical index with a version suffix (products_v1) and immediately creating an alias (products pointing to products_v1) so that subsequent schema changes can use the zero-downtime procedure.

During the reindexing window, new writes to the system must be indexed into both the old and new physical indices simultaneously (dual-write to both indices via the alias, or a reindex-with-catch-up approach where new writes go to the old index during reindex and are re-indexed to the new index at cutover). If new writes are not applied to the new index during the reindex, the new index will be missing documents created after the reindex started, and the alias swap will result in missing search results until the next synchronization cycle catches up. The dual-write-during-reindex approach adds operational complexity to the reindex procedure; it must be specified in the search ADR so that the procedure is documented before the first schema change, not improvised under pressure during the first production schema migration.

Breaking schema changes in Elasticsearch — changes to a field's type (a string field changed to a numeric field), changes to an analyzer (changing the tokenizer for a field), or removal of a field — require a full reindex because Elasticsearch does not support in-place field type changes. The reindex time depends on the corpus size and the indexing throughput of the cluster. For a corpus of 10 million documents at 5,000 documents per second (a typical Elasticsearch single-shard indexing throughput), a full reindex takes approximately 33 minutes. For a 100-million-document corpus, the reindex takes approximately 5.5 hours. The reindex duration determines the length of the dual-write window and the total duration of the schema migration procedure. These estimates must be in the search ADR alongside the schema evolution policy, because they determine the planning and execution requirements for any future schema change.

Additive schema changes — adding a new optional field to the index — do not always require a full reindex. If the new field is populated from data that can be retrieved from the primary database at query time (a lookup at index time but not at query time), the field can be added to the index mapping and populated during the next synchronization cycle, with a background job filling in the field for existing documents over time. This partial-reindex approach avoids the complexity of a zero-downtime full reindex for simple additive changes, at the cost of a window during which the new field is missing from existing documents. Whether partial reindex is acceptable depends on whether the new field is required for query correctness or is an optional ranking signal.

Multi-tenancy: isolation versus shared relevance

For B2B SaaS products with multiple customers, the search architecture must address multi-tenancy — how documents from different customers are isolated from each other so that a search by one customer does not surface results from another customer's corpus.

Per-tenant indices give each customer their own physical search index. A customer with 10,000 documents has an index of 10,000 documents; a customer with 1,000,000 documents has an index of 1,000,000 documents. Isolation is complete: there is no risk of cross-tenant data exposure because queries against one tenant's index cannot access another tenant's index. Deletion on tenant offboarding is clean: decommission the tenant's index and all their data is gone from the search system. BM25 relevance statistics are per-corpus: a term's IDF (inverse document frequency) is computed within the tenant's own corpus, which means relevance is calibrated to what is rare in that tenant's documents, not what is rare across all customers. The structural cost is index count: a SaaS product with 10,000 customers has 10,000 indices. Elasticsearch clusters have a practical limit on the number of indices they can manage (each index has a per-shard overhead; thousands of small indices can strain the cluster coordinator). Per-tenant indices are correct when tenant corpora are large enough to justify their own index, when isolation is a compliance requirement, or when per-tenant relevance calibration is a product priority.

Shared index with tenant filter stores all tenants' documents in a single index, with a tenant ID field on every document. Every search query includes a mandatory filter on the tenant ID field, applied before the relevance scoring. This approach scales the index count to one (or a small number of shards) regardless of the number of tenants. The operational simplicity is significant: one index to manage, one alias to maintain, one reindexing procedure for schema changes. The structural limitation is relevance calibration: BM25 IDF statistics are computed across all tenants' documents, which means a term's IDF is determined by how rare it is across the entire multi-tenant corpus, not within a single tenant's documents. For small tenants with few documents, this can produce poor relevance — a term that is rare in the entire corpus is given high IDF weight even if it appears in every one of that tenant's documents. The security requirement for a shared index is that the tenant ID filter is mandatory and cannot be bypassed: if the application code constructs the query, the filter must be added to every query path before the query is sent to the search engine.

Tenant-scoped API keys (Typesense's scoped search keys, Algolia's secured API keys) implement the shared-index-with-filter model at the infrastructure layer: the search engine issues a restricted API key for each tenant that includes a mandatory filter condition. Even if the application code omits the tenant filter, the API key itself enforces the restriction. This approach provides the operational simplicity of a shared index with a security guarantee stronger than application-layer filtering. It is the correct approach when the search engine supports it and when the development team is comfortable with the operational model of issuing and managing per-tenant API keys.

GDPR and personal data in the search index

A search index is a system that stores a copy of document content in a format optimized for retrieval. If the indexed documents contain personal data — names, email addresses, phone numbers, IP addresses, or any information that identifies a natural person — the search index is a personal data processor subject to GDPR's requirements, including the right to erasure.

Deletion propagation in Lucene-based engines has a specific behavior that is relevant to GDPR erasure. When a document is deleted from an Elasticsearch or OpenSearch index, the deletion is initially a logical operation: the document is marked as deleted in the index metadata and excluded from query results. The physical bytes of the deleted document remain in the Lucene segment on disk until the next segment merge. Lucene periodically merges smaller segments into larger segments; during a merge, deleted documents are physically removed from the merged segment. The merge frequency is determined by the merge policy configuration. For a write-heavy index with frequent small merges, physical deletion may happen quickly. For a read-heavy index with infrequent merges, physical deletion may be deferred for hours or days. For GDPR erasure, the question is whether the right to erasure requires physical deletion from disk or logical exclusion from query results. The legal interpretation varies by jurisdiction; the search ADR must document the interpretation applied and whether a forced segment merge is required as part of the erasure procedure.

Designing for erasure starts with minimizing the personal data embedded in the search index. If the search index contains document IDs that can be looked up in the primary database at consume time, rather than embedding the email address or user name in the indexed document, erasure from the search index requires deleting the indexed document (not the personal data within it), and the personal data erasure is handled entirely in the primary database. This is the same principle as the message queue GDPR design: do not embed PII in secondary storage, embed a reference to the PII that lives in the primary system. The search ADR must specify what personal data fields are indexed, whether they are indexed as searchable terms or stored as metadata, and whether the erasure procedure requires modifying indexed content or only deleting indexed documents.

Crypto-shredding in search indices applies when personal data is embedded in indexed content that cannot be separated from the indexed document. If the search index contains the full text of user-generated documents, and the full text includes the author's name and contact information, deleting the indexed document may remove personal data from search results but not from the stored source — Elasticsearch stores the original source document by default (the _source field) and this must be explicitly disabled or the personal data must be encrypted with a per-user key. Disabling _source in Elasticsearch removes the ability to retrieve the original document from the index (the index can still be queried, but the original text cannot be retrieved from the index). Enabling encryption of the _source with a per-user key (crypto-shredding) allows the encrypted source to remain in the index after erasure — the personal data is unreadable without the key, and the key is deleted as part of the erasure procedure. The approach must be specified in both the search ADR and the data retention ADR.

Finding search architecture decisions in AI chat history

Search architecture decisions are some of the most consequential decisions buried in AI chat history, because they appear to be narrow technical questions at the time they are asked and reveal themselves as product-level decisions months later. Three months of AI chat history for a team that shipped a search feature typically contains the full archaeology of the search design, distributed across sessions that do not visibly connect to each other.

The initial implementation session contains the foundational decision that set the ceiling: "how do I add a search bar to my Rails app?", "what's the easiest way to search my database in Node.js?", "should I use Elasticsearch or is there something simpler?", "how do I do a case-insensitive search in Postgres?" The response the engineer received in this session determined whether the initial implementation used ILIKE, a Postgres full-text index, or an external search engine — and the choice was made under the framing of "the easiest way to add search" rather than "what approach will support the product at 10x the current scale." The initial implementation session is the session that set the performance ceiling; it is rarely labeled or indexed as a search architecture decision.

The performance incident session surfaces months later: "my search queries are taking 8 seconds, how do I speed them up?", "Postgres is slow on LIKE queries, what's the fix?", "we have 2 million rows and full-text search is killing the database." These sessions contain the migration decision — whether to add a GIN index to the existing Postgres table (which solves the sequential scan problem but not the relevance problem), or to migrate to an external search engine. The migration decision is made under performance pressure, which means the relevance model, the synchronization approach, the multi-tenancy architecture, and the schema evolution policy may all be left as defaults because the immediate goal is to fix the performance problem, not to design the long-term search architecture.

The relevance complaint session follows the migration: "our search returns irrelevant results after we moved to Elasticsearch", "why does a search for 'contract' not find documents titled 'agreement'?", "how do I make title matches more important than body matches?", "how do I handle typos in search queries?" These sessions contain the relevance configuration decisions — field weights, synonyms, fuzziness, analyzer configuration — made ad-hoc in response to specific complaints rather than as a coherent relevance model. The WhyChose extractor surfaces these sessions and connects them into a coherent picture of the search architecture as it actually exists, which is often significantly different from the architecture as it was originally intended.

The compliance session arrives when a customer requests deletion under GDPR: "how do I delete a user's data from Elasticsearch?", "does Elasticsearch delete data immediately or does it linger?", "we got a data subject access request, how do I tell the customer what data we have in our search index about them?" These sessions reveal whether the erasure procedure was designed or improvised, whether the search index was included in the data processing record, and whether the indexed data includes personal data that requires physical deletion rather than logical exclusion. The search index's status as a personal data processor is a decision that is frequently not made until the first erasure request arrives.

Writing the search architecture ADR

A search ADR must hold together the indexing technology selection, the relevance model, the synchronization approach, the schema evolution policy, the multi-tenancy model, and the GDPR compliance approach. These decisions interact: the multi-tenancy model affects the index count and the alias architecture; the synchronization approach affects the schema evolution procedure (a CDC-based sync is unaffected by reindexing, while an event-driven sync requires the consumer to handle both the active index and the new index during a reindex window); the GDPR approach affects what data is indexed and how the erasure procedure is defined.

The first section documents the indexing technology selection with rejection reasons. Not a benchmark table — a specific statement of why each alternative was wrong for this product at this scale with these requirements. Postgres full-text search was rejected because the anticipated corpus growth (from 50,000 to 2,000,000 records in 12 months based on current growth rate) will exceed the performance range where Postgres GIN indexes produce acceptable p95 latency for the product's SLA of under 200 milliseconds. Algolia was rejected because the projected cost at 2,000,000 records and 500,000 queries per month exceeds the budget constraint of $500 per month for search infrastructure. Elasticsearch with a managed service (Elastic Cloud) was selected because it supports the anticipated corpus scale, provides BM25 relevance with configurable field boosting and synonym support, supports alias-based zero-downtime reindexing, and can be self-hosted if the managed service cost exceeds the budget at scale.

The second section documents the relevance model and its maintenance process. BM25 with field boost configuration: title^3, summary^2, body^1, tags^2. Synonym filter configured at index time with the dictionary at config/search/synonyms.txt. The synonym dictionary maintenance process: engineering team submits additions via a pull request to synonyms.txt, reviewed by the product team for correctness, deployed with the next scheduled reindex. Fuzziness: AUTO configuration (Elasticsearch's automatic edit distance scaling by word length — no fuzziness for 1-2 character terms, edit distance 1 for 3-5 character terms, edit distance 2 for 6+ character terms). Relevance measurement: click-through rate measured by logging search queries and result clicks to the analytics pipeline; mean reciprocal rank computed weekly from the analytics data; a set of 50 standard queries with known correct first results checked in CI before each relevance configuration change deployment.

The third section documents the synchronization approach. Event-driven indexing via the messaging infrastructure: on document create/update/delete in the primary database, an event is published to the document_index_events queue. The search indexing consumer reads events from the queue and sends indexing requests to the search cluster. Maximum acceptable lag: 30 seconds under normal load, 5 minutes during indexing backlog. Failure mode: if the search cluster is unavailable, events queue in the document_index_events queue and are replayed when the cluster recovers. Consumer idempotency: indexing the same document version twice is safe because the Elasticsearch index operation is idempotent (same document ID, same content produces the same indexed state).

The fourth section documents the index schema and schema evolution policy. Physical index names are versioned (documents_v1, documents_v2). The search alias (documents) points to the current physical index. All application code references the documents alias. Schema changes that require reindexing use the zero-downtime procedure: create the new index, run the background reindex job (estimated duration: current corpus size × 200ms per 1,000 documents), configure dual-write to both indices during the reindex window, swap the alias, decommission the old index. The reindex procedure is documented in runbooks/search-reindex.md.

The fifth section documents the multi-tenancy model. Shared index with tenant filter and scoped API keys. Every indexed document includes a tenant_id field. Every query applies a mandatory tenant_id filter using a Typesense scoped API key that includes the filter condition in the key's embedded search parameters. Application code does not construct the tenant filter — it is embedded in the API key, which is issued per-tenant at tenant provisioning time and rotated on tenant offboarding. The authentication strategy ADR documents the API key issuance and rotation procedure.

The sixth section documents the GDPR compliance approach. Personal data indexed: none. Documents are indexed by document ID; the document title is indexed as a searchable field but does not contain personal data (document titles are defined by the user, and user-provided content that may contain personal data is excluded from the title field by the application's content policy). Document body text is indexed as full-text but is not stored in the Elasticsearch _source field (source: false in the field mapping). GDPR erasure for a document: delete-by-query on the document_id field removes the document from search results within seconds. Physical deletion from Lucene segments occurs within 24 hours of logical deletion based on the current merge policy configuration. The search index is included in the data processing record maintained by the legal team. Erasure confirmation includes both the primary database deletion and the Elasticsearch delete-by-query result, logged to the erasure audit trail described in the data retention ADR.

Search architecture decisions are made in three phases — initial implementation, performance migration, and compliance pressure — and each phase adds decisions that interact with the decisions from previous phases. A startup's first year of search architecture is typically documented in three separate crisis sessions rather than one coherent ADR. The WhyChose extractor surfaces all three phases from AI chat history and connects them into a single document, so that the next engineer who asks "why is the search architecture designed this way?" can read the reasoning rather than re-derive it from production incidents.