The search infrastructure decision record: why the search engine you chose determines your query relevance ceiling and your schema evolution cost

Search engine selection looks like a configuration detail until a relevance improvement triggers a 4.5-hour index rebuild that freezes document creation, or a managed SaaS bill grows from $420 to $1,240 a month in three billing cycles — because the relevance model ceiling, the schema evolution constraint, and the per-operation cost at scale were never modeled when the first query returned correct results and the session closed. The engine you chose at prototype time determines what relevance improvements are free, which ones require hours of index rebuild work, and what the infrastructure bill looks like when the product is real.

This post covers search engine vendor and platform selection: the structural consequences of choosing Elasticsearch versus OpenSearch versus Typesense versus Algolia versus Postgres full-text search. It is explicitly different from the search architecture decision record, which addresses the architectural pattern — federated search, dedicated search service, embedded search, search-as-a-sidecar — rather than the vendor. You can choose the right architecture pattern and still choose a vendor whose schema evolution model or cost structure creates incidents. This post covers the vendor decision.

A 15-person SaaS company chose Elasticsearch because the AWS managed Elasticsearch tutorial was the first result for "how to add search to a Node.js app." The engineering lead ran the tutorial, spun up an Amazon OpenSearch Service domain (AWS had renamed the service by then, but the tutorial still called it Elasticsearch and the API was compatible), and within an afternoon had a working setup: an index with a basic mapping, a Lambda that indexed documents from a PostgreSQL table on write, and a search endpoint that ran a multi-match query across project name and description fields. The session covered creating the index mapping, writing the indexing Lambda, and wiring the search endpoint. It was thorough for its stated goal. The session closed when the first search query returned correct results. Nobody asked what happened when the field mapping needed to change.

Fourteen months later, the team needed to improve search relevance for their core use case — searching through project names and description text. Users were complaining that prefix searches were unreliable: searching for "dash" did not reliably surface "Dashboard redesign" or "Data dashboard cleanup." A developer researched Elasticsearch analyzers and discovered that switching from the default standard analyzer to a custom analyzer with edge n-grams would significantly improve prefix matching. The configuration looked straightforward: define a custom analyzer in the index settings, reference it from the field mapping, and the field would tokenize text into progressively longer prefixes at index time, making prefix queries exact rather than fuzzy. The developer opened a ChatGPT session, got a working custom analyzer configuration with the edge_ngram tokenizer, and added a mapping update to the next deploy.

The deploy failed. Elasticsearch does not permit changing the analyzer on an existing field. The analyzer configuration is part of the field's inverted index structure, which is built at indexing time and cannot be modified without rebuilding the index from scratch. The error returned by the Elasticsearch API was explicit: "cannot update static setting [index.analysis.analyzer.ngram.tokenizer]". A secondary error from the mapping update attempt read: "Rejecting mapping update to [projects] as the final mapping would have more than 1 type". The developer had never encountered an immutable schema constraint in any other data store they had worked with. PostgreSQL allows ALTER TABLE ... ALTER COLUMN ... TYPE, and the change applies to new rows. MongoDB allows changing document structure without any schema migration at all. Elasticsearch's constraint is structural — the inverted index is a fundamentally different data structure than a row-oriented table, and changing how text is tokenized requires rebuilding every document's indexed representation from scratch.

The index rebuild required: creating a new index with the updated mapping (including the custom analyzer definition in the index settings), re-indexing all 2.4 million project documents from the PostgreSQL source table into the new index (4.5 hours at the cluster's indexing throughput limit, constrained by the write throttle on the Amazon OpenSearch Service domain's instance type), switching the application's read alias from the old index to the new index using an atomic alias swap (zero-downtime for reads once the new index was populated), validating search quality on the new index against a set of known good queries, and deleting the old index to recover cluster storage. The 4.5-hour rebuild window required the team to freeze document creation for the duration — the cluster's version predated the Reindex API, so the team was not able to use a live-reindex strategy where writes continue to both the old and new index during the migration. No runbook existed for the index rebuild procedure. The developer documented it afterward, in a Notion page that captured the specific API calls but not the reasoning — specifically, not why the standard analyzer had been chosen originally, why edge n-grams were the right improvement for this use case rather than a prefix query with wildcard or the search_as_you_type field type, or what would happen when the next relevance change required another rebuild. The decision that edge n-grams were the correct solution was made in the ChatGPT session and not documented in the codebase. See the pattern in decisions never written down.

The second incident was independent and more consequential in aggregate cost. An 8-person startup chose Algolia for their product search — auto-complete on product names, typo tolerance on search terms, faceted filtering on category, brand, and price range. The initial setup used Algolia's free tier: 10,000 records, 10,000 operations per month. The ChatGPT session covered InstantSearch.js integration with a React frontend, index configuration (searchable attributes, faceting attributes, custom ranking signals), and relevance tuning. The session was primarily a frontend implementation session — the developer wanted a polished search UI fast, and Algolia's InstantSearch components delivered it. The session covered enough of the Algolia dashboard to get the index configured correctly. Nobody looked at the pricing page beyond confirming the free tier covered the prototype.

Over the following year, the product grew. By month 16, the Algolia index had 450,000 records and the product was serving 4.8 million search operations per month — auto-complete requests, full-search requests, and facet-count requests all counted as operations. The Algolia bill for that month was $1,240. The previous month had been $980. Three months earlier it had been $420. The team had not modeled Algolia's cost at their record count and operation volume at initial selection time. The Algolia pricing calculator on Algolia's website was not consulted at launch, or if consulted, was consulted with the free-tier numbers that did not project the product's growth trajectory.

Algolia's Grow plan charges per record stored (approximately $0.50 per 1,000 records per month above the included base) and per search operation ($1.00 per 1,000 operations above the included base). At 450,000 records and 4.8 million operations, the math is straightforward — roughly $220/month in record storage charges and approximately $4,790 in excess operations, though Algolia's actual billing at this scale is through tiered bundles rather than pure per-unit charges, landing the actual bill in the $1,000–$1,500 range rather than the theoretical maximum. The growth curve was linear and predictable; the team simply had not done the projection at selection time.

The team investigated migration. Typesense offered a self-hosted alternative with an API that was designed to be broadly compatible with Algolia's query model. The migration cost: rewriting the InstantSearch adapter from Algolia's proprietary response format to Typesense's compatible format (one week of frontend engineering work), re-importing all product records and re-tuning the relevance configuration in Typesense (Algolia's custom ranking signals — specifically the formula weighting sales velocity, inventory status, and margin — did not have a one-to-one mapping in Typesense's weighting model), and running Algolia and Typesense in parallel for three weeks to validate that search quality for the highest-traffic queries was equivalent before cutting over. They completed the migration. Their monthly search infrastructure cost went from $1,240 to $60 per month on Typesense Cloud. The migration was successful and the search quality was preserved. It was also 5–6 weeks of engineering time that would have been avoidable if the Algolia cost model had been projected at initial selection time with realistic growth assumptions.

The three structural properties that search engine selection determines

When teams evaluate search engines, the conversation focuses on time to first working query, quality of client library documentation, and whether the engine handles typos. These are real evaluation criteria. The structural properties that determine whether the selection ages well — whether relevance improvements are cheap or expensive, whether the cost model scales gracefully, whether the cluster requires a dedicated operator — are different, and they are set at selection time, before the team has the production use that makes them visible.

Relevance model and customization ceiling

The relevance model is the scoring function that determines which results rank above others for a given query. All five engines covered here use BM25 (Best Match 25) as the core scoring function, but the customization available above the default varies significantly and determines what kinds of relevance improvements are possible without an index rebuild or a platform migration.

Elasticsearch and OpenSearch expose the most complete relevance customization: per-field boost weights in the query (boost the name field 3× above the description field), function score queries (multiply BM25 by a recency decay function or by a custom numeric field like view count), script score queries (arbitrary Painless script logic at query time), and custom field-level analyzers for controlling tokenization. Typo tolerance is not built-in as a first-class feature — it requires configuring a fuzzy match query (fuzziness: AUTO) or a custom analyzer with edge n-grams or trigrams. Getting good prefix matching requires the analyzer configuration that triggered the rebuild incident above. Getting synonym expansion requires loading a synonyms file into the analyzer configuration (and rebuilding the index if the synonyms file format is changed). Elasticsearch's relevance ceiling is effectively unlimited — any scoring logic expressible in Painless script is available — but reaching that ceiling requires analyst-level knowledge of Lucene query planning and analyzer configuration.

Typesense and Algolia both provide typo tolerance as a first-class, default-on feature. Typesense's typo tolerance is configurable per-collection: num_typos controls the number of character edits tolerated per word, and min_len_1typo and min_len_2typo control the minimum word length before tolerating one or two typos. No custom analyzer configuration is required. Algolia's typo tolerance is similarly built-in with configurable tolerance per attribute. Faceted search is native in all five engines, but Elasticsearch's aggregations model is more powerful for complex faceting: nested aggregations (facets within facets), pipeline aggregations (percentile bucketing of a numeric field), geo-distance aggregations, and date histogram facets for time-series filtering are all supported in Elasticsearch's aggregations API. Typesense and Algolia support flat faceting (count of values per attribute, filter by selected value) well; deeply nested or computed facet logic requires Elasticsearch's aggregations.

Vector and semantic search is the dimension with the most rapid capability change. Elasticsearch supports dense_vector fields and kNN queries as a first-class feature as of version 8.x, enabling hybrid BM25 + kNN search (keyword relevance combined with embedding similarity). OpenSearch added k-NN support as a plugin from version 1.x, with the k-NN plugin included in Amazon OpenSearch Service managed deployments. Typesense supports vector search from version 0.25 with hybrid search available out of the box. Algolia's NeuralSearch combines their proprietary embedding model with BM25 and is available on Enterprise plans — not on Grow or standard plans. Postgres with the pgvector extension supports vector similarity search; combined with tsvector full-text search and SQL filtering, Postgres can implement hybrid search without a dedicated search engine at the cost of more complex query construction. The relevance model decision links to the performance optimization decision record for search latency — vector search kNN queries are more compute-intensive than BM25 full-text queries, and the latency at P95 is a function of the index size, the embedding dimensionality, and the approximate-nearest-neighbor algorithm configuration (HNSW parameters for Elasticsearch, Typesense, and pgvector).

Schema evolution cost

Schema evolution cost is the most underestimated structural property at search engine selection time, because prototype indexes have simple mappings and the team has not yet encountered the relevance improvement that requires a mapping change.

Elasticsearch and OpenSearch have an immutable field analyzer constraint: once a field is indexed with a specific analyzer, changing the analyzer requires a full index rebuild. Adding a new field to an existing mapping is safe — Elasticsearch accepts mapping additions without a rebuild. Changing a field's type (from text to keyword, from float to double) also requires a full rebuild. Changing a field from analyzed text to a search_as_you_type field type requires a rebuild. The operational cost of a rebuild is linear in document count and bounded by the cluster's indexing throughput: at 2.4 million documents and a modest indexing rate of 500 documents/second, the rebuild takes 80 minutes of pure indexing time. The practical duration including alias swap and validation is longer. The rebuild procedure involves creating a parallel index, populating it, aliasing traffic to it, and deleting the old index — a procedure that every Elasticsearch team should have documented before the first relevance change requires it, not during the incident when it is first needed. This is a class of schema migration — the database migration strategy decision record covers schema migration patterns broadly; Elasticsearch index rebuilds are the search-engine analog of destructive column type changes in a relational database.

Typesense takes a more flexible approach to schema evolution. From Typesense version 0.21+, the schema can be updated without re-indexing documents for most changes: adding new fields is supported and existing documents are treated as having the field absent. Modifying the configuration of an existing field (changing its type, toggling facet: true, adjusting sort configuration) may require a re-index in some cases, but the Typesense documentation explicitly documents which changes are backward-compatible and which require re-ingestion. The operational model for schema changes is significantly less risky in Typesense than in Elasticsearch, which makes iterative relevance improvement — adding a popularity_score numeric field, enabling faceting on a field that was not previously facetable — less operationally expensive.

Algolia is effectively schemaless from a storage perspective: any field in the indexed JSON document is stored and can be made searchable, facetable, or part of custom ranking without a schema declaration. Index settings — searchable attributes, faceting attributes, custom ranking formula — can be updated live via the Algolia dashboard or the Settings API without any document re-indexing. Changing the custom ranking formula is a settings change, not a schema migration. This is a structural advantage for iterative relevance improvement: the team can modify the ranking model without touching the documents or the indexing pipeline. The trade-off is that Algolia's schemaless approach means there is no enforcement of field type consistency — a field that should be a number can silently receive a string value, which breaks numeric sorting and range filtering in ways that may not be immediately visible.

Postgres full-text search uses tsvector columns populated by a generated column expression or a trigger. Schema evolution uses standard SQL ALTER TABLE — adding a new column, changing the expression used to generate the tsvector, adding a GIN index on the new column. A GIN index can be rebuilt with REINDEX CONCURRENTLY without blocking reads or writes. The tsvector column can be updated via a migration that sets the new expression for the generated column. Schema evolution is the most flexible of the five options — SQL migrations are well-understood, the tooling is mature (Flyway, Liquibase, Rails migrations, any migration framework the team already uses), and there is no separate schema format to learn. The constraint is that Postgres full-text search provides no typo tolerance without pg_trgm (trigram similarity), and the relevance model for complex multi-field search with faceted navigation is more complex to implement in SQL than in a dedicated search engine's query DSL.

Operational model and cost at scale

The operational model covers who manages the search infrastructure, what expertise is required, and what the cost structure looks like as the product grows — a dimension that the team at prototype time cannot evaluate accurately because the operational overhead only becomes real when things break in production.

Self-hosted Elasticsearch and OpenSearch require the most operational investment. JVM heap configuration is the first operational hurdle: Elasticsearch recommends allocating no more than 50% of available RAM to the JVM heap, with a maximum of 32GB (beyond which compressed ordinary object pointers are disabled and memory efficiency drops). Getting the heap allocation wrong — too low causes excessive garbage collection and query latency spikes; too high causes long GC pauses — requires monitoring and tuning that is specific to JVM applications. Cluster operations include: snapshot and restore for backup (Elasticsearch snapshot API to S3 or GCS; the snapshot schedule and retention policy are operational decisions that need to be documented in the data retention decision record), rolling cluster upgrades (minor version upgrades can often be done rolling; major version upgrades may require index migration for old-format indices), and shard rebalancing (adding nodes to a cluster requires shard migration that consumes cluster resources). Teams without existing Elasticsearch operational experience consistently underestimate this burden at selection time.

Amazon OpenSearch Service and Elastic Cloud (the managed Elasticsearch service) eliminate the cluster management overhead in exchange for per-unit pricing. Amazon OpenSearch Service charges per-instance-hour for the data nodes plus EBS storage: a production cluster with two r6g.large.search instances (2 vCPU, 16GB RAM each) costs approximately $0.17 per instance-hour × 2 instances × 720 hours/month = $245/month before storage. Elastic Cloud charges by deployment size (memory and vCPU allocated to data and master nodes). Both managed services handle upgrades, backup, and scaling operations, but the per-unit cost is higher than self-hosted EC2 for equivalent compute. The managed cost needs to be modeled at the document count and query volume the team projects at 12 and 24 months, not at the prototype scale where a development cluster is sufficient.

Typesense Cloud charges at $0.000014 per document ingested and $0.000001 per search operation, making the cost model simple to project: at 500,000 documents ingested (a one-time cost for the initial index build) and 2 million search operations per month, the Typesense Cloud cost is $7 for the initial ingest plus $2/month in ongoing operations. The ongoing monthly cost at realistic operation volumes is substantially lower than Algolia at equivalent scale. Self-hosted Typesense on a VPS or Kubernetes deployment costs the underlying infrastructure — a 4GB RAM instance is adequate for collections under 1 million moderately-sized documents, at approximately $20–40/month on most cloud providers. Typesense's lower memory footprint versus Elasticsearch (no JVM overhead) means self-hosted Typesense requires less RAM per document than self-hosted Elasticsearch for equivalent document counts.

Algolia's SaaS model has no infrastructure to manage — no clusters to operate, no upgrades to plan, no snapshots to configure. The operational cost is effectively zero. The financial cost is the per-record and per-operation pricing described in the incident above. The build-vs-buy framing is explicit here: Algolia is buying the absence of operational overhead; self-hosted alternatives are building the operational capability. The build-versus-buy decision record should document this trade-off explicitly, including the engineering-hours-per-month cost of search cluster operations — if the team has zero spare operations capacity, Algolia's premium is not an Algolia cost, it is a staffing cost that Algolia is eliminating. The cost comparison should include that denominator.

Postgres full-text search has zero additional infrastructure cost for teams already running Postgres — there is no separate search cluster to provision, operate, or pay for. The operational model is the Postgres operational model the team already knows. Backup, restore, failover, and scaling are all standard Postgres operations. The relevant operational addition is maintaining the GIN index and the tsvector generated column as the table grows — GIN indexes on large tables can be slow to update during bulk imports, and REINDEX CONCURRENTLY during maintenance windows is the standard remediation. The constraint is that Postgres full-text search is appropriate for products with simple search requirements on existing Postgres data: single-language full-text queries, basic ranking by ts_rank, and limited faceting via SQL GROUP BY. It is not a good fit for typo tolerance (without pg_trgm for fuzzy matching and careful query construction), complex multi-facet navigation at Elasticsearch's aggregation depth, or vector search at scale (pgvector works for moderate document counts but degrades at very large collections without careful HNSW index tuning).

The engines: structural tradeoffs

Elasticsearch

Elasticsearch is the most powerful and most operationally demanding of the five options. Built on Apache Lucene, it runs on the JVM and exposes Lucene's full capability through a JSON query DSL. The aggregations API is the most sophisticated of any search engine available: nested aggregations, pipeline aggregations, geo-distance bucketing, significant terms analysis, and composite aggregations for paginated facet enumeration are all supported. This power makes Elasticsearch appropriate for use cases that go beyond simple text search: log analytics, time-series data exploration, and complex faceted product catalogs with computed facets are all valid Elasticsearch use cases.

The inverted index immutability constraint is the structural limitation to document in any Elasticsearch ADR. Field analyzer changes require a full index rebuild. Field type changes require a full index rebuild. Teams that choose Elasticsearch for application search (rather than log analytics) frequently discover this constraint when they attempt their first relevance improvement and find that the improvement requires a mapping change. The rebuild procedure should be documented before the first relevance improvement is attempted, not discovered during it.

Elasticsearch is licensed under the Elastic License 2.0 (EL2) for versions 7.11 and later. EL2 permits use for internal purposes and for SaaS products where Elasticsearch is not the primary commercial value, but restricts using Elasticsearch to build a competing managed search service. Teams building products where the search functionality is a core commercial offering should review EL2 carefully. For most application search use cases — search within a SaaS product — EL2 is not a constraint in practice. The Elasticsearch index lifecycle management (ILM) feature enables hot/warm/cold data tiers: recent data on fast SSDs (hot), older data on larger slower storage (warm), archived data on object storage (cold). This is relevant for teams using Elasticsearch for both log storage and application search in the same cluster — a dual-purpose cluster design that creates the risk of log data volume affecting application search query latency. The observability platform decision record covers this risk: when Elasticsearch is chosen for the observability stack (log aggregation, metrics, traces), and the same cluster is later used for application search, the resource contention and index lifecycle collision between log data (write-heavy, time-series, high volume) and application search data (lower write volume, relevance queries, long-lived documents) creates operational risk that a dedicated cluster would avoid.

The infrastructure-as-code strategy for Elasticsearch should cover both cluster configuration (instance types, shard count, replica count in Terraform or the managed service console) and index mappings (stored in version-controlled JSON files, applied by the deployment pipeline, not modified manually). Index mappings managed manually in the Elasticsearch console drift from the version-controlled definition; a cluster rebuild or disaster recovery event that applies the version-controlled mapping to a fresh cluster will produce a different mapping than what was in production if console edits were made and not committed. The mapping JSON should be the source of truth, applied in the deployment pipeline, version-controlled alongside the application code.

OpenSearch

OpenSearch is the AWS-managed fork of Elasticsearch 7.10, created in 2021 when Elastic changed the Elasticsearch license from Apache 2.0 to EL2. OpenSearch is licensed under Apache 2.0, making it a fully open-source option without the EL2 restrictions. The OpenSearch API is compatible with most Elasticsearch client libraries — the same Node.js, Python, Java, and Ruby clients that work with Elasticsearch work with OpenSearch with minor configuration changes. Amazon OpenSearch Service (the managed offering) has deep AWS integration: VPC support, IAM-based access control, CloudWatch metrics, S3 snapshot support, and integration with other AWS services via Lambda event pipelines.

OpenSearch and Elasticsearch diverge at the feature level for newer capabilities. Elasticsearch's vector search implementation (approximate kNN with HNSW graphs in version 8.x) is more mature and higher-performance than OpenSearch's k-NN plugin in equivalent versions. OpenSearch's ML-Commons framework for embedding model inference at the cluster level is a different architectural approach than Elasticsearch's inference API. Teams that need vector search or ML-augmented search should evaluate both engines' current capabilities at evaluation time rather than assuming compatibility — the API surface for these features is diverging faster than the core search functionality.

The schema evolution constraints are identical to Elasticsearch: field analyzer changes require a full index rebuild. OpenSearch has the same inverted index immutability property because it shares the Lucene foundation. Teams that choose OpenSearch for the Apache 2.0 license and AWS integration should document the same schema evolution policy as Elasticsearch teams: the full rebuild procedure, the acceptable downtime window, and the approval process for mapping changes.

Typesense

Typesense is written in C++ and runs without a JVM, which has two practical consequences: lower memory footprint per document than Elasticsearch (no heap overhead, no GC pauses) and no JVM tuning requirement for operations teams. A 4GB RAM Typesense instance handles collections well into the millions of moderately-sized documents. Typesense's design goal is application search — products, articles, users, documents — rather than log analytics or large-scale data exploration. This focus shows in the feature set: built-in typo tolerance (configurable threshold, no custom analyzer required), prefix search optimized at the engine level, geosearch for location-aware results, vector search (hybrid BM25 + kNN from v0.25), and a real-time indexing model where documents appear in search results within milliseconds of being indexed.

Typesense is licensed under GPL-3.0 for the open-source distribution. GPL-3.0 requires that software incorporating Typesense be licensed under GPL-3.0 as well — a constraint that is relevant for teams embedding Typesense as a library, but not for teams running Typesense as a separate service accessed via its HTTP API (which is the standard deployment model and does not trigger the GPL copyleft requirement). Typesense Cloud is the managed offering with transparent per-operation pricing. Self-hosted Typesense can be deployed on Docker, Kubernetes, or bare instances and is the appropriate model for teams with existing container infrastructure and the capacity to manage the service. Typesense's raft-based clustering for high availability is documented and supported from version 0.20+, enabling multi-node deployments with automatic failover.

Typesense's relevance model is less customizable than Elasticsearch's at the high end: Typesense does not support arbitrary script scoring (Elasticsearch's Painless scripts), pipeline aggregations, or deeply nested facet structures. For straightforward application search — search by relevance, filter by attributes, sort by custom field — Typesense's defaults produce good results without significant configuration. For complex relevance requirements (merchandising rules that boost specific products based on business logic, multi-level nested aggregations, geospatial faceting within time ranges), Typesense reaches its ceiling before Elasticsearch does. The multi-tenancy model for Typesense can use per-tenant collections (full data isolation, more collections to manage, scoped API keys per tenant) or shared collections with a mandatory filter field, which links to the multi-tenancy decision record for the full isolation-versus-efficiency trade-off analysis.

Algolia

Algolia is a SaaS-only search platform with no self-hosting option. Its relevance model combines BM25 with a configurable custom ranking formula that can incorporate numeric fields (view count, sales count, recency score) as tiebreakers or primary ranking signals. Typo tolerance is built-in and highly configurable: typos are tolerated based on word length (no typos tolerated for 1-2 character words, one typo for 3-5 characters, two typos for longer words by default). InstantSearch.js, React InstantSearch, Vue InstantSearch, and native iOS and Android InstantSearch libraries provide pre-built UI components that integrate directly with Algolia's query API — a meaningful developer experience advantage for teams that want a polished search UI without building the UI layer from scratch.

Algolia's merchandising rules engine allows the team to configure business-logic overrides on top of the relevance model: pin a specific product to the top position for a given query, bury out-of-stock products, redirect a specific search term to a landing page. These merchandising rules are configured in the Algolia dashboard and apply without code changes or redeployment. For e-commerce teams with business stakeholders who need to manage search results directly, Algolia's merchandising interface is a genuine operational advantage over self-hosted alternatives where equivalent logic requires developer time to implement and deploy.

The pricing model is the structural constraint to document at selection time. Per-record and per-operation billing scales linearly with product growth. The cost projection at 12 and 24 months should be calculated before committing to Algolia, using realistic growth assumptions for record count and monthly operation volume. The operations count includes every search request, every auto-complete request, and every facet-count request — not just full-search queries. Products with aggressive auto-complete (triggering a search request on every keystroke) accumulate operations faster than products with debounced search. Algolia NeuralSearch (vector/semantic search) is available on Enterprise plans only, which changes the cost model significantly for teams that need vector search capabilities.

Postgres full-text search

Postgres full-text search uses the to_tsvector function to convert text fields into tsvector values — sorted, de-duplicated lists of lexemes (normalized word stems) with position information. Queries use to_tsquery or plainto_tsquery to generate query vectors, and the @@ operator matches documents against queries. Relevance ranking uses ts_rank or ts_rank_cd, which score documents by term frequency and position. A GIN (Generalized Inverted Index) on the tsvector column makes full-text queries fast — GIN indexes are optimized for set-membership queries like full-text lookups.

The appropriate use case for Postgres full-text search is products with simple search requirements on existing Postgres data, where adding a separate search infrastructure would add operational complexity without proportional relevance improvement. If the team already runs Postgres, the development cost of implementing full-text search in Postgres is lower than standing up a dedicated search engine: no new infrastructure, no separate index pipeline to maintain, no additional operational expertise required. Schema evolution is standard SQL — ALTER TABLE to add columns, update the generated column expression for the tsvector, REINDEX CONCURRENTLY to rebuild the GIN index without downtime. The maximum schema flexibility of the five options, with the most familiar tooling.

Postgres full-text search's limitations are real and should be documented explicitly in the ADR. Typo tolerance requires pg_trgm (trigram similarity) — add pg_trgm to the Postgres extensions, build a GIN or GiST index on similarity(field, query), and use word_similarity or strict_word_similarity for fuzzy matching. This works but requires more complex query construction than Typesense or Algolia's built-in typo tolerance. Complex multi-facet navigation (nested facets, computed facet counts across millions of documents) is possible in SQL but expensive — each facet count requires a GROUP BY query against potentially millions of rows. At large scale, Postgres full-text search faceting can create performance problems that require dedicated search engine aggregations to solve. Vector search via pgvector works for moderate scale — tens of millions of vectors with HNSW indexing — but approximate nearest neighbor search at billion-scale requires purpose-built vector databases. If the product's relevance requirements outgrow Postgres full-text, the migration path leads to one of the dedicated engines, and the migration cost (separate indexing pipeline, index format changes, query rewrite) should be estimated before choosing Postgres full-text as the long-term solution.

The AI chat sessions that produced undocumented decisions

Search engine decisions are made across a cluster of sessions that feel like implementation work rather than architecture decisions. The initial session gets the engine running. Subsequent sessions tune relevance, change configuration, or investigate cost. Each session solves an immediate problem and closes. The structural constraints of the vendor choice — the schema evolution cost, the relevance ceiling, the cost at scale — accumulate silently until a production incident or a billing surprise makes them visible.

The initial search selection session — "how do I add search to my Rails app?" or "how do I add search to my Node.js app?" — produces the vendor choice based on what tutorial appears first in the search results, which managed service the team's cloud provider offers most prominently, or which engine the developer has used before. The session covers creating an index, indexing documents, and wiring a search endpoint. The session closes when the first query returns correct results. The structural constraints of the choice — Elasticsearch's immutable field analyzers, Algolia's per-operation pricing, OpenSearch's divergence from Elasticsearch at ML features, Typesense's relevance ceiling at complex scoring — are not discussed because they are not in scope for a "get search working" session. The relevance model is not tuned. The schema evolution policy is not discussed. The cost at scale is not modeled. See the decisions never written down pattern: the session closes when the first result returns, and every structural property of the engine choice that matters at production scale is undocumented.

The relevance tuning session — "how do I improve search quality for prefix matching?" or "how do I boost product name results above description results?" — produces analyzer configuration changes, field boost weights, or custom ranking signals. The session closes when search quality looks better in testing against a sample query set. The ranking changes made in this session are not documented as decisions: the developer applies the configuration, verifies that search looks better, and moves on. What is not documented is the list of alternatives considered (edge n-grams versus search_as_you_type versus prefix queries with wildcard), the trade-offs of the chosen approach (edge n-grams improve prefix matching but increase index size by 3–5×; wildcard queries have worse query-time performance), and whether the configuration change requires a rebuild (it does for any field analyzer change in Elasticsearch). The next developer to modify the search configuration starts from the current index mapping without knowing why edge n-grams were chosen, what the rebuild procedure is, or what the performance implications of further customization will be.

The schema change session — "I'm getting an Elasticsearch error when I try to change the field mapping" — is where the immutable field analyzer constraint is discovered for the first time. The session walks through the error, explains why the constraint exists (inverted index structure built at indexing time), and describes the rebuild procedure. The session produces a working rebuild procedure and a successfully improved index. What it does not produce is a documented rebuild runbook checked into the codebase, a documented schema evolution policy that specifies the approval process for mapping changes, or an understanding of which other fields in the mapping would also require a rebuild if their configuration changed. The rebuild procedure is in the chat history, not in a runbook. The next schema change session starts from scratch discovering the same constraint. The WhyChose extractor run on the schema change session surfaces the rebuild procedure and the constraint explanation — the information needed to write the runbook — from the chat history rather than requiring the developer to rediscover it.

The cost review session — "our Algolia bill is $1,240 this month, how do we reduce it?" — produces the migration evaluation described in the opening incident. The session generates a comparison of Algolia versus Typesense at the current operation volume, a migration cost estimate, and a recommendation. What it does not produce is documentation of why Algolia was chosen originally (the InstantSearch.js integration quality, the merchandising dashboard for the business team, the zero-infrastructure-operations overhead), which Algolia-specific features are in active use (custom ranking signals based on sales velocity, merchandising rules for seasonal products, InstantSearch UI components across three frontend surfaces), and what the true migration cost is for re-implementing those features in Typesense (one week for the query adapter, two weeks to re-implement the merchandising logic in application code, three weeks of parallel validation). Without the original ADR documenting why Algolia was chosen and which features were being used, the cost review session produces an underestimated migration cost and an oversimplified "Typesense is cheaper" recommendation. See the decisions never written down pattern again: the original selection session did not document the Algolia feature dependencies, so the migration cost session cannot accurately account for them.

What to document in the search engine ADR

A search engine ADR that prevents the schema evolution incident, the cost surprise, and the relevance ceiling discovery does not document the index mapping — the mapping JSON in version control covers that. It documents why this engine was chosen over the alternatives, what structural properties that choice imposes on every future relevance improvement and schema change, and what the cost model looks like when the product is at 10× its current scale.

The relevance requirements section documents what "good search" means for this product, specifically: whether typo tolerance is required and at what sensitivity (products with short names need more aggressive typo tolerance than products with long descriptive names), whether prefix and auto-complete search is required (and therefore whether edge n-grams, search_as_you_type fields, or built-in prefix handling is the appropriate approach), whether faceted navigation is required and at what complexity (flat facets on categorical attributes versus multi-level nested facets on hierarchical attributes), whether vector or semantic search is in scope now or in the next 12 months (and therefore whether the chosen engine supports it without a migration), and whether multi-language support is required (and therefore which language-specific analyzers or stemming configurations are needed). These requirements are not derivable from the initial index mapping. They represent the product team's judgment about what search quality is required, which the search engine ADR should make explicit and version-controlled alongside the code.

The schema evolution policy section documents what happens when a field mapping needs to change. For Elasticsearch and OpenSearch: state explicitly that field analyzer changes require a full index rebuild, provide the rebuild procedure (create new index, re-index from source, alias swap, delete old index), specify the acceptable downtime window for document creation during the rebuild (or the procedure for live reindexing if the Reindex API is in use), and specify who approves mapping changes before they are attempted. Mapping changes in Elasticsearch that hit the immutable constraint at deploy time are deployed-and-failed changes that require an unplanned rebuild — the schema evolution policy should ensure that mapping changes are reviewed and the rebuild procedure is planned before deployment, not discovered at deploy time. This section connects directly to the database migration strategy decision record, which covers the migration framework and approval process for the primary database; the search engine schema evolution policy should be consistent with that framework even though the tooling is different.

The operational model section documents who owns cluster operations. For self-hosted Elasticsearch or OpenSearch: who is responsible for JVM heap configuration, cluster sizing, snapshot schedule, and cluster upgrades? Does the team have existing Elasticsearch operational expertise, or is this a capability being built from scratch? What is the snapshot and restore procedure (S3 snapshot repository configuration, snapshot schedule, restore test procedure)? What is the cluster upgrade path — are rolling minor-version upgrades supported, or do major version upgrades require index re-creation? For managed services (Elastic Cloud, Amazon OpenSearch Service, Typesense Cloud): what is the instance tier selection and what is the scaling trigger (at what query latency P95 or index size will the tier be upgraded)? For Algolia: no cluster operations required; document instead the Algolia account access policy (who can change index settings, who can access the merchandising dashboard) and the API key rotation procedure. The operational model links to the infrastructure-as-code strategy decision record: cluster configuration (for self-hosted engines) and index mappings should be managed in IaC or version-controlled configuration, not in the engine's dashboard or console, and the IaC source of truth should be explicit.

The cost model section documents the current and projected financial cost of the search infrastructure. For Algolia: current record count and monthly operations, projected at 12 and 24 months at the product's growth rate, per-record and per-operation cost at those projections (not at the current free-tier scale), total monthly cost at each projection, and the trigger that would prompt a cost review or migration evaluation. For self-hosted engines: current instance type and cost (EC2 or managed service), projected instance type at 12 and 24 months (as document count and query volume grow, what node type is required to maintain P95 query latency under 100ms), total monthly cost at each projection including storage (EBS for Elasticsearch, disk for Typesense). For Typesense Cloud: current monthly operations, projected at 12 and 24 months, cost at each projection using the published per-operation rate. For Postgres full-text: zero additional infrastructure cost, but document the projected query latency impact of full-text search on the primary Postgres database at the projected document count — at what scale does full-text search query load warrant a read replica dedicated to search traffic?

The multi-tenancy model section documents how search results are isolated between tenants. Per-tenant index gives the strongest isolation: each tenant's documents are in a separate Elasticsearch index or Typesense collection, and cross-tenant data leakage requires index-level access control misconfiguration rather than a missing query filter. Shared index with tenant filter requires a mandatory tenant_id filter on every query — every search endpoint, every auto-complete endpoint, every facet-count endpoint must include the filter. Missing the filter on one endpoint exposes all tenants' documents to any authenticated user. The multi-tenancy decision record covers this trade-off in the context of the full data architecture. The search engine ADR should reference that record and document which model is implemented for search specifically, because the implementation detail (per-tenant index versus shared index) has different operational costs: per-tenant indexes multiply the number of indexes managed by the tenant count (at 1,000 tenants, 1,000 indexes), while shared indexes require careful enforcement of the tenant filter at every query site in the application code.

The search engine selection ADR template

The template below follows the Nygard format extended with search-specific sections. The five sections whose absence produced the incidents above — the schema evolution policy with the rebuild procedure, the relevance requirements, the operational model, the cost model at projected scale, and the multi-tenancy model — are all present. Adapt field values to the chosen engine.

# ADR-NNN: Search engine selection

## Status
Accepted / Proposed / Superseded by ADR-NNN

## Context
[What search requirements does the product have? What content is being
searched — products, articles, users, documents, logs? What are the
relevance requirements: typo tolerance, prefix/auto-complete, faceted
navigation, vector/semantic search, multi-language support? What is the
current document count and monthly query volume? What is the team's
operational expertise — does the team have Elasticsearch experience, or
is this a new capability? What is the primary constraint: zero operational
overhead (managed SaaS), cost at scale (self-hosted), relevance ceiling
(Elasticsearch full query DSL), ease of schema evolution (Typesense,
Algolia), or zero additional infrastructure (Postgres full-text)?]

[Note: this ADR covers vendor/platform selection. For search architecture
patterns — federated search, dedicated service, embedded search — see
/blog/search-architecture-decision-record.]

## Decision
We will use [Elasticsearch / OpenSearch / Typesense / Algolia / Postgres
full-text search] for [scope: all product search / document search only /
user/account search / etc].

## Relevance model
Typo tolerance: [built-in (Typesense, Algolia) / custom analyzer required
  (Elasticsearch edge n-grams or fuzzy queries) / pg_trgm (Postgres)]
Prefix / auto-complete: [search_as_you_type field / edge n-grams / Typesense
  built-in prefix / Algolia built-in / Postgres prefix query]
Faceted navigation: [Elasticsearch aggregations / Typesense facets /
  Algolia facets / Postgres GROUP BY — document complexity level]
Vector / semantic search: [dense_vector + kNN (Elasticsearch 8.x) /
  k-NN plugin (OpenSearch) / hybrid vector (Typesense 0.25+) /
  NeuralSearch Enterprise (Algolia) / pgvector (Postgres) / not in scope]
Multi-language: [per-language analyzer (Elasticsearch) / language-aware
  stemming (Typesense) / Algolia language support / Postgres text search
  dictionaries — document languages in scope]
Relevance ceiling: [describe what relevance improvements are NOT possible
  without migration — this is the commitment the team is making at
  selection time]

## Schema evolution policy
Field analyzer changes: [require full index rebuild (Elasticsearch, OpenSearch)
  / no rebuild required (Typesense most changes) / live settings update (Algolia)
  / SQL ALTER TABLE + REINDEX CONCURRENTLY (Postgres)]
Field type changes: [require full index rebuild (Elasticsearch, OpenSearch) /
  may require re-ingestion (Typesense) / no schema constraint (Algolia) /
  SQL migration (Postgres)]
New field additions: [safe without rebuild (all engines) / document any exceptions]
Rebuild procedure (for Elasticsearch / OpenSearch):
  1. Create new index with updated mapping:
     PUT /[index-name]-v[N+1] { "settings": {...}, "mappings": {...} }
  2. Re-index from source (database or source index):
     POST /_reindex { "source": {"index": "[old-index]"},
                      "dest": {"index": "[new-index]"} }
     (or from database via application re-index job)
  3. Alias swap (atomic): POST /_aliases { "actions": [
       {"remove": {"index": "[old-index]", "alias": "[alias]"}},
       {"add": {"index": "[new-index]", "alias": "[alias]"}} ] }
  4. Validate query quality on new index against test query set
  5. Delete old index: DELETE /[old-index]
Estimated rebuild time at current document count: [N hours at Y docs/sec
  indexing throughput — measure on staging before the first production rebuild]
Document creation freeze required: [yes (old Elasticsearch without Reindex API)
  / no (use dual-write or Reindex API with live source) — document which]
Mapping change approval: [who reviews and approves mapping changes before deploy?
  mapping changes that hit the immutable constraint at deploy time require
  an unplanned rebuild — review before deploy, not during]

## Operational model
Operational model: [self-hosted / managed (Elastic Cloud / Amazon OpenSearch
  Service / Typesense Cloud) / SaaS (Algolia) / in-database (Postgres)]
For self-hosted Elasticsearch / OpenSearch:
  JVM heap allocation: [N GB heap on M GB RAM instance — document the ratio]
  Cluster configuration: [number of data nodes, master nodes, shard count,
    replica count — document and manage in IaC, not in the console]
  Snapshot schedule: [daily snapshots to S3 bucket / frequency / retention]
  Snapshot restore test: [quarterly restore test procedure to staging cluster]
  Cluster upgrade path: [rolling minor upgrades supported / major version
    upgrade procedure — document specifically for in-use major version]
  JVM tuning owner: [who is responsible for GC configuration and heap tuning?]
For managed service:
  Instance tier: [current tier and selection rationale]
  Scaling trigger: [at what P95 query latency or index size will tier be upgraded?]
  Backup: [managed snapshot schedule and retention — document the policy]
For Algolia:
  Index settings access: [who can change index settings / merchandising rules]
  API key rotation: [procedure and schedule for rotating search API keys]
IaC source of truth:
  Cluster config: [Terraform module path / not applicable for SaaS]
  Index mapping: [version-controlled JSON in repo path; applied by deploy pipeline]
  Console edit policy: [not permitted / permitted for emergencies with PR within 24h]

## Cost model
Engine: [Elasticsearch / OpenSearch / Typesense / Algolia / Postgres]
Current monthly cost: $[amount] itemized:
  [Infrastructure: $X / Operations per month: $Y / Storage: $Z]
Current document count: [N] documents
Current monthly operations: [N] search operations / month
Projected document count at 12 months: [N]
Projected operations at 12 months: [N/month]
Projected cost at 12 months: $[amount] — show the calculation
Projected document count at 24 months: [N]
Projected operations at 24 months: [N/month]
Projected cost at 24 months: $[amount] — show the calculation
Cost review trigger: [at what monthly cost / document count / operation volume
  will the team re-evaluate the search engine selection?]
If Algolia: include per-record × record count + per-operation × operations/month
  at each projection using current Algolia plan pricing
If self-hosted: include instance cost + storage + engineering hours for operations
  (N hours/month × $Y/hour) — total cost of ownership, not just infrastructure

## Multi-tenancy model
Model: [per-tenant index / shared index with mandatory tenant filter]
For per-tenant index:
  Index naming convention: [projects-tenant-{tenant_id} or similar]
  Index count at current tenant scale: [N indexes]
  Index count at 12-month projected tenant count: [N indexes]
  Alias management: [how aliases are provisioned and deleted for tenant lifecycle]
  Access control: [API key or role scoped to tenant's index / document-level
    security — Elasticsearch X-Pack feature]
For shared index with tenant filter:
  Filter field: [tenant_id field name and type]
  Filter enforcement: [every query must include filter — document enforcement
    mechanism in application code, e.g., search service wrapper that injects
    filter on every query call]
  Audit: [how is correct filter application verified? test coverage?]
See also: /blog/multi-tenancy-decision-record for the broader data architecture

## IaC source of truth
Index mapping file: [path in repository — e.g., search/mappings/projects.json]
Mapping deploy step: [CI/CD pipeline step that applies mapping to the cluster]
Mapping versioning: [how index versions are tracked (index aliases by version,
  e.g., projects-v3)]
Cluster config: [Terraform module path or "not applicable — managed SaaS"]

## Consequences
Positive: [capabilities this engine provides — built-in typo tolerance /
  powerful aggregations / zero infrastructure / familiar SQL migration model /
  cost efficiency at scale]
Negative: [Elasticsearch field analyzer immutability — all relevance changes
  involving analyzer config require a full index rebuild;
  Algolia per-operation cost scales linearly — model the cost at 24 months now;
  Typesense relevance ceiling below Elasticsearch's full query DSL;
  Postgres full-text search: no built-in typo tolerance without pg_trgm,
    poor performance for complex multi-facet navigation at large scale;
  self-hosted Elasticsearch: JVM operational expertise required, snapshot
    and restore procedure must be maintained and tested;
  all engines: search index is a derived data store — source of truth is
    the primary database; re-index procedure must be documented and tested
    before it is needed in an incident]
Risks: [field mapping change attempted at deploy time without rebuild
    procedure planned — results in unplanned rebuild under pressure;
  Algolia cost growth not modeled — results in cost surprise at product scale;
  engine chosen for prototype constraints persisted to production scale
    beyond its relevance ceiling;
  dual-purpose Elasticsearch cluster for log analytics + application search
    — resource contention and operational complexity risk]

The sections that teams consistently skip are the schema evolution policy with the explicit rebuild procedure (most teams know that Elasticsearch has "immutable mappings" in the abstract; few have a written procedure before they need it), the cost model at 24 months (the Algolia cost at 450,000 records and 4.8 million operations is not surprising if it has been calculated in advance — it is only a surprise when it hasn't), and the relevance ceiling statement (documenting what the chosen engine cannot do without a migration forces the team to decide whether those capabilities are in scope before they are needed, rather than discovering the gap during a relevance improvement initiative). Write all three sections before the first relevance improvement is attempted on a production index, not after the first rebuild or the first billing surprise.

The search engine ADR also connects to the ADR format guidance for writing Consequences sections that are specific enough to be useful: not "search may be slow at scale" but "Elasticsearch kNN vector search latency at P95 for a 10M document index with 768-dimension embeddings and HNSW ef=100 is approximately 80–120ms on r6g.large.search instances — benchmark before selecting the kNN configuration for production." Not "Algolia costs money" but "at 12-month projected record count of 800,000 and 8 million operations/month, the Algolia Grow plan cost is approximately $2,100/month — trigger a migration evaluation if the monthly cost exceeds $1,500." The guidance on documenting architecture decisions covers the cost-tradeoff framing: the Consequences section should give the next engineer the information they need to evaluate whether the original decision is still correct, not just confirm that a decision was made. For search engine selection, that means specific cost projections, specific rebuild procedures, and specific relevance ceiling statements — not general observations about search complexity.

FAQs

What is the difference between Elasticsearch and Typesense for application search?

The fundamental structural difference between Elasticsearch and Typesense for application search is the runtime model, the schema evolution cost, and the operational overhead. Elasticsearch is built on Apache Lucene and runs on the JVM. Its inverted index structure is immutable at the field analyzer level: the tokenizer and analyzer configuration for each field is baked into the index at creation time and cannot be changed without rebuilding the index from scratch. Changing from the standard analyzer to a custom edge n-gram analyzer — a common relevance improvement for prefix search — requires creating a new index with the updated mapping, re-indexing every document from the source, swapping an alias from the old index to the new, and deleting the old index. At millions of documents this process can take hours and may require freezing document creation during the rebuild. Elasticsearch also requires JVM tuning knowledge: heap size configuration, garbage collection settings, and understanding the trade-off between on-heap and off-heap memory are operational requirements that Typesense does not impose.

Typesense is written in C++ without a JVM, which means lower memory footprint per document, no GC pauses, and no heap tuning requirement. Typesense's schema is more flexible at update time: adding fields and most configuration changes do not require a full re-index. Typesense has built-in typo tolerance as a default-on feature with configurable thresholds — no custom analyzer configuration is required for prefix matching or fuzzy search. Typesense supports vector search from version 0.25 with hybrid BM25 + kNN search. Elasticsearch's relevance ceiling is higher: custom Painless script scoring, pipeline aggregations, nested aggregations, and the full Lucene query DSL are available in Elasticsearch but not in Typesense. For straightforward application search — products, articles, users — Typesense's defaults produce good results with lower operational overhead. For complex relevance requirements (multi-level nested aggregations, arbitrary scoring functions, dual-use log analytics) or very large scale, Elasticsearch reaches capabilities Typesense does not have.

When does Algolia's pricing model become more expensive than a self-hosted alternative?

Algolia's pricing charges per record stored and per search operation, with the Grow plan pricing approximately $0.50 per 1,000 records per month above the free tier and approximately $1.00 per 1,000 operations above 10,000. The crossover where Algolia becomes more expensive than a self-hosted alternative depends on operation volume and record count, but teams typically encounter it between 500,000 and 2,000,000 monthly search operations at moderate record counts. At 450,000 records and 4.8 million operations per month — a realistic scale for a product that started on Algolia's free tier and grew over 16 months — the Algolia bill lands in the range of $1,000–$1,500 per month depending on plan tier negotiation. A Typesense Cloud deployment at the same scale costs approximately $60–$120 per month. Self-hosted Typesense on adequate infrastructure costs $20–$60 per month in infrastructure, plus the engineering time for cluster operations.

The legitimate Algolia premium is the elimination of all infrastructure operations: no clusters to provision, no upgrades to run, no backups to configure, no capacity planning to perform. This operational cost is real and should be included in the comparison. If the team spends an average of 4 hours per month on search cluster operations at an engineer cost of $100/hour, that is $400/month in operational cost — which narrows the cost gap between Algolia and self-hosted substantially. The break-even calculation should be: (Algolia monthly bill) versus (self-hosted infrastructure cost + monthly engineering hours for operations × engineer hourly cost). At very low operation volumes (under 200,000 operations/month), Algolia's zero-operational-overhead often justifies the premium. At millions of operations per month with a team that has the operational capacity, the differential is typically $500–$2,000+/month in favor of self-hosted, which represents meaningful engineering budget. The cost model at projected 12-month and 24-month scale should be calculated before selecting Algolia, using realistic growth assumptions — not the prototype-scale numbers that fit within the free tier.

Why does changing an Elasticsearch field analyzer require a full index rebuild?

An Elasticsearch field analyzer is not a query-time configuration — it is applied at indexing time to convert raw text into the tokens stored in the inverted index. The inverted index maps each token to the list of documents containing that token, with position information. When a document is indexed with the standard analyzer, Elasticsearch stores the tokens the standard analyzer produces: lowercased words split on whitespace and punctuation. If the field's analyzer is changed to a custom edge_ngram analyzer after indexing, the new analyzer would produce substantially different tokens (progressively longer character n-grams from the start of each word) for the same input text. But the existing indexed tokens — produced by the old analyzer — are already stored in the inverted index. Elasticsearch cannot retroactively re-tokenize the existing indexed data because the inverted index is a read-optimized data structure built at write time, not a derivable representation that can be recomputed from stored source documents.

Attempting to change a field's analyzer via the Elasticsearch mapping update API returns an explicit error: "cannot update static setting [index.analysis.analyzer.custom_analyzer.tokenizer]". The only supported path to changing the analyzer is to create a new index with the updated mapping, re-index all documents from the source data (the primary database, or the old index via the Reindex API), and swap an alias from the old index to the new index once re-indexing is complete. This is structurally different from PostgreSQL, where ALTER TABLE ... ALTER COLUMN changes apply to new data with existing rows handled via a migration. Adding a new field to an Elasticsearch mapping is safe and does not require a rebuild — Elasticsearch supports dynamic mapping for new fields. Changing a field's type (e.g., from text to keyword) also requires a full rebuild. The immutability constraint applies specifically to the configuration of existing indexed fields, not to the addition of new fields. Teams should treat Elasticsearch field analyzer configuration as a deployment-time structural commitment, not a runtime configuration value — every field's analyzer in the initial mapping is a decision that cannot be changed without an engineering rebuild procedure.

What should a search engine selection ADR document that teams typically skip?

Teams typically document the search engine vendor name, the initial index mapping, and the application query code. The ADR sections that prevent the schema evolution incident, the cost surprise, and the relevance ceiling discovery are: first, the schema evolution policy — explicitly stating that Elasticsearch and OpenSearch field analyzer changes require a full index rebuild, documenting the complete rebuild procedure before it is needed, specifying who approves mapping changes before deployment (mapping changes that hit the immutable constraint at deploy time become unplanned rebuilds under pressure), and estimating the rebuild time at the current document count based on the cluster's measured indexing throughput. Second, the cost model at projected scale — for Algolia, calculating the monthly cost at the 12-month and 24-month projected record count and operation volume using the Grow plan pricing, and setting an explicit cost-review trigger (e.g., "re-evaluate if monthly cost exceeds $1,000") — not assuming that the free-tier economics will persist as the product grows.

Third, the relevance requirements and ceiling — documenting what search quality the product requires (typo tolerance, prefix matching, faceted navigation complexity, vector search), whether the chosen engine provides those capabilities without migration, and what the chosen engine explicitly cannot do without migration (Elasticsearch's full Painless script scoring is unavailable in Typesense; Algolia's merchandising rules dashboard is unavailable in self-hosted alternatives; Postgres full-text has no built-in typo tolerance without pg_trgm). Fourth, the multi-tenancy model — whether search is per-tenant index or shared index with filter, and the enforcement mechanism for the tenant filter if shared. Fifth, the operational model — who owns cluster operations for self-hosted engines, what the snapshot and restore procedure is, and what the cluster upgrade path is. None of these five sections are derivable from the index mapping or the search query code in the codebase. They represent the decisions made about the search engine that every future search improvement will depend on — and they are the decisions most consistently absent from engineering documentation because the initial "get search working" session produced working search, not working documentation.