---
title: Postgres Full Text Search vs the rest
description: Comparing one of the most popular Postgres features against alternatives
author: victor
date: '2022-10-14'
tags:
  - postgres
  - planetpg
categories:
  - postgres
---
Postgres is one of the best F/OSS databases partly because of its immense feature-set.One of my favorite Postgres features is [Full Text Search](https://supabase.com/docs/guides/database/full-text-search) (FTS). Search is a common requirement of applications. Well-known search engines like [Solr](https://solr.apache.org/) and [ElasticSearch](https://www.elastic.co/elasticsearch) are often a first choice, but with Postgres in your stack you've got a great chance for [Pareto improvement](https://en.wikipedia.org/wiki/Pareto_principle) at low complexity cost.Many projects function just fine with Postgres Full Text Search and other [built-in](https://www.postgresql.org/docs/current/contrib.html) extensions like [trigram search (`pg_trgm`)](https://www.postgresql.org/docs/current/pgtrgm.html). [GitLab's blog has a great article](https://about.staging.gitlab.com/blog/2016/03/18/fast-search-using-postgresql-trigram-indexes/) on their use of Trigram indices to speed up search.## Search Engines: The Next GenerationThese days, the debate isn't *just* Postgres versus Solr or ElasticSearch, a new generation of F/OSS search engines has arrived. To name a few:* [MeiliSearch](https://www.meilisearch.com/)
* [OpenSearch](https://opensearch.org/)
* [SQLite FTS](https://www.sqlite.org/fts5.html)
* [Typesense](https://typesense.org/)Well OK, some of these engines aren't *quite* new; SQLite is stable and has supported the search use-case for a while and OpenSearch is a fork of ElasticSearch. While not new, both are worth comparing as choices in front of engineers today (spoiler alert: we're going to compare them).These search engines are all impressive, but it's hard to evaluate these options in a `VACUUM`.If Postgres is already your main database, then it's probably easier to integrate than anything else. But would it be *better* to take on more complexity by choosing a dedicated search engine?While search is an endlessly complicated problem and all of these tools are configurable, a comparison of minimal setup and tuning should yield some interesting insights.Before we get into it though, a *brief* primer on what "Full Text Search" *means*, in Postgres land.## A whirlwind tour through FTS in PostgresIf we want to compare other solutions to Postgres FTS, we'd better review how to set up and use [Postgres FTS](https://www.postgresql.org/docs/current/textsearch.html)!### Wrangling your existing dataImagine you have some information to search through, stored in your main database.Maybe a `movies` table with a structure like this:```sql
	create table movies (
	id bigint primary key generated by default as identity,
	title text not null,
	original_title text not null,
	overview text not null,
	created_at timestamptz not null default now()
);
```While you're inserting data you might perform `INSERTs` like this one:```sql
insert into movies (
	title,
	original_title,
	overview
)
values
(
	"Avengers: Age of Ultron",
	"Avengers: Age of Ultron",
	"When Tony Stark tries to jumpstart a dormant peacekeeping program, things go awry and Earth’s Mightiest Heroes are put to the ultimate test as the fate of the planet hangs in the balance. As the villainous Ultron emerges, it is up to The Avengers to stop him from enacting his terrible plans, and soon uneasy alliances and unexpected action pave the way for an epic and unique global adventure.",
);
```### Basic searching with select and likeWith nothing but the raw data in your database, we can actually do some *basic* searching using `select` and comparison tools like `like`.```sql
select
  *
from movies
where title like '%Avengers%' or overview like '%Avengers%';
```The `LIKE` [operator](https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-LIKE) as used in the query above will match *only* the term "Avengers", but it will match it in any part of a given row's `title` or `overview`.We can hack around this basic functionality and make it a bit more robust:* Storing the text as [case insensitive text (`citext`)](https://www.postgresql.org/docs/current/citext.html)
* Combining the text that will be searched into one column
* Using a [GIN index](https://www.postgresql.org/docs/14/indexes-types.html#INDEXES-TYPES-GIN)Our "v0" would be better, but using `like` isn't quite what people mean when they ask for *robust* full text search. **Not a bad first version, but Postgres can do much better.**### Robust Full Text Search with PostgresPostgres has built-in [Full Text Search](https://www.postgresql.org/docs/current/textsearch.html) which we can use instead of cobbling together our own with `like`.Luckily for us, [the Postgres FTS documentation](https://www.postgresql.org/docs/current/textsearch.html) is quite extensive, containing lots of information on the all-important [Postgres search data types: `tsvector` and `tsquery`](https://www.postgresql.org/docs/current/datatype-textsearch.html).State of the art search engines like [Solr](https://solr.apache.org/) do things to make searches work well:* [Stemming](https://en.wikipedia.org/wiki/Stemming) ("jumped", "jumper" -> "jump")
* [Lemmatization](https://en.wikipedia.org/wiki/lemmatization) ("better" -> "good")
* Result Ranking configuration
* Search by [edit distance](https://en.wikipedia.org/wiki/Levenshtein_distance)
* Vector based similarity (which you might use for a basic recommendation system)Postgres FTS does not do *all* of the things a purpose-built search engine can do, but Postgres can produce very good results with decent performance.## A working Postgres FTS setupIntegrating Full Text Search into our imaginary movies table is as simple as executing the following:```sql
-- Add a generated column that contains the search document
alter table movies
	add column fts_doc_en
	generated always as to_tsvector (
		'english', title || ' ' || original_title || ' ' || overview
	)
	stored;

-- Create a GIN index to make our searches faster
create index movies_fts_doc_en_idx
	on movies
	using gin (fts_doc_en);
```Postgres supports even more features like [facilities for ranking search results](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING) but for now we'll simply use `title`, `original_title`, and `overview` as they are.```sql
select
	* from movies
where doc_en @@ websearch_to_tsquery('english', 'Avengers');
```## Comparable FTS enginesNow that we've got a grasp on what FTS is and how it's used in Postgres, let's discuss a few of the alternative solutions Postgres can be compared to.As you might expect, this isn't an exhaustive list of FTS engines, there are many more that we're not covering here (If you've got another search engine you'd like to see featured, [file an issue](https://github.com/VADOSWARE/fts-benchmark/issues)).### SQLite FTS[SQLite](https://www.sqlite.org/) is arguably the most widely deployed database in the world. SQLite is a library that produces and manages single file (or in-memory) databases that run on phones, planes, laptops, kiosks, Raspberry Pis, and everything in between.Much like Postgres, SQLite is an excellent F/OSS project that has adopted the ability to perform Full Text Search.The [SQLite FTS documentation](https://www.sqlite.org/fts5.html) is extensive, and the FTS subsystems have undergone a few version upgrades -- [FTS 3/4](https://www.sqlite.org/fts3.html) and [SQLite FTS 5](https://www.sqlite.org/fts5.html) are both options.For our experiments we'll use SQLite FTS5.### TypeSense[TypeSense](https://typesense.org/) is a relatively new project that focuses on *lightning fast*, typo-tolerant open source search. Typesense boasts a high performance C++ codebase, promising nearly instant search results.TypeSense provides many demos with example datasets available for query, which you can try out:* [32MM songs from MusicBrainz](https://songs-search.typesense.org/)
* [28M books from OpenLibrary](https://books-search.typesense.org/)
* [XKCD comic search by topic](https://xkcd-search.typesense.org/)Typesense quite ambitiously brands itself as the "Algolia" and "Elasticsearch" alternative, offering to solve your search problems without requiring a PhD.### MeiliSearch[MeiliSearch](https://www.meilisearch.com/) is a new search engine written from the ground up in [Rust](https://rust-lang.org/), which also promises *lightning fast* searches, flexibility, and typo tolerance.MeiliSearch has [extensive documentation](https://docs.meilisearch.com/), boasts an easy to use yet extendable architecture, and is well supported by a large community and team.### OpenSearchAWS [OpenSearch](https://opensearch.org/) is AWS's answer to ElasticSearch's license changes. While opinions are varied, ElasticSearch built a great piece (formerly Apache 2.0 licensed) piece of software, and the AWS fork represents at least a *stale* measure of ElasticSearch's capabilities.OpenSearch is the *same* as ElasticSearch (since the fork) but as it's license is much more permissive, it's the easy choice as a comparison to Postgres.In production make sure to evaluate ElasticSearch versus OpenSearch (or other solutions) sufficiently for your use case.## Testing our FTS solutionsWell, how do we test performance of these disparate complex systems on an infinitely redefinable, hard-to-solve problem? By sticking to the use cases.Any search engine has two jobs:* Ingest information (usually referred to as "documents")
* Return documents that match user queriesIngesting information *can* be important, but "search engine performance" usually refers to the second step - taking queries and outputting results.That is to say, the primary concern is assuming a cluster already has proper data loaded on to it, how fast can it deliver the relevant results for a user-submitted query?We'll focus on the end-user-centric meaning of performance here (query speed).### Getting a dataset[Hugging Face](https://huggingface.co/) has a [32MB movie data set](https://huggingface.co/spaces/Kamand/Movie_Recommendation/blob/main/movies_metadata.csv). We're interested in the follow columns in this data set:* `title`
* `original_title`
* `overview`Postgres is capable of loading CSVs, but since we'll be using other search engines as well, let's convert to format that's much easier to use and process - [Newline Delimited JSON](https://dataprotocols.org/ndjson/).After a few lines of code and one [`csv2ndjson.mjs` script](https://github.com/VADOSWARE/fts-benchmark/blob/main/src/util/csv2ndjson.mjs) later, we have a `movies.ndjson.json` full of JSON documents that we can easily ingest into any search engine (or other database for that matter!).### Inserting the dataEach system ingests data slightly differently, but with a little scripting we can paper over the differences.[The code](https://github.com/VADOSWARE/fts-benchmark) contains a scheme of "drivers" which are minimal ES6 modules.Here's an example of the `ingest` function used for Postgres (with [`slonik`](https://github.com/gajus/slonik) as our DB driver):```js
// Ingesting search documents
async function ingest({ document, pool }) {
  // Ignore rows without proper release dates
  if (document.release_date.trim().length === 0) {
    return
  }

  await pool.query(sql`
insert into movies (
	id, 
	title, 
	original_title, 
	overview, 
	released_at
)
values
(
	${document.id},
	${document.title},
	${document.original_title},
	${document.overview},
	${new Date(document.release_date).toISOString()}::timestamptz
)
on conflict (id) do update set
  title=${document.title}, 
	original_title=${document.original_title}, 
	overview=${document.overview}, 
	released_at=${new Date(document.release_date).toISOString()}::timestamptz
;
`)
}
```The [repo](https://github.com/VADOSWARE/fts-benchmark/) contains the complete code for Postgres as well as the other search engines, written in much this same fashion.### Picking a set of queriesThis part is somewhat arbitrary - I've gotten a list of queries here that one might search for in terms of movies:```
"super hero"
"superhero"
"superman"
"suprman"
"love"
"world war"
"spy"
"romance"
"comedy"
"awakening"
```Thanks to [boinboing.net's article from 2018](https://boingboing.net/2018/04/30/the-most-common-words-in-movie.html), there are some words that show up quite often in movie titles that we can search for.The data set isn't very big, but doing all these searches in quick succession should be enough to get a realistic performance baseline.### Running the queriesSimilar to ingesting data, all the search engines take slightly different input for queries, so we'll change our simple string queries into whatever the relevant search engine expects to receive.Here's what that script looks like for Postgres:```js
// Querying search phrases
async function query({ phrase, pool }) {
  const results = await pool.query(sql`
select id, title
from movies
where fts_doc_en @@ websearch_to_tsquery('english', ${phrase}::text)
`)
  const ids = results.rows.map((r) => r.id)
  return ids
}
```No surprises there, just like the Postgres FTS example from earlier.## Automating our toilOnce we've combined our scripting magic with some `Makefile` incantations, running the queries produces output like this:```
TIMING=true FTS_ENGINE=pg make query
[info] importing driver from [./pg.mjs]
[info] finished initializing driver [pg]
[info] processing lines in [/path/to/pg-fts-benchmark/search-phrases.ndjson.json]...
[timing] phrase [super hero]: returned [34] results in 3.132471.ms
[timing] phrase [superhero]: returned [86] results in 1.180798.ms
[timing] phrase [superman]: returned [47] results in 0.912615.ms
[timing] phrase [suprman]: returned [0] results in 0.781712.ms
[timing] phrase [love]: returned [5417] results in 19.088668.ms
[timing] phrase [world war]: returned [834] results in 2.902097.ms
[timing] phrase [spy]: returned [349] results in 1.497892.ms
[timing] phrase [romance]: returned [630] results in 1.91661.ms
[timing] phrase [comedy]: returned [1213] results in 2.534538.ms
[timing] phrase [awakening]: returned [210] results in 2.443748.ms
[info] successfully processed [10] lines
```Right off the bat we can see that Postgres is plenty quick! Some results are quite obvious - as you might expect, "love" is *quite* a common search result.## ResultsWith the `ingest` and `query` machinery defined for the search engines, it's easy to get some results.## What does the data tell us?There's a *lot* of tuning left undone here, but a few points stand out right away:* Even when consuming similar content, engines can produce different results, but generally ratios between queries on the same engine should be consistent.
* Postgres FTS is quite close performance-wise to many other solutions, at least in their default configuration.
* Only Typesense and MeiliSearch properly handled mis-spellings (the "suprman" query).
* Typesense was relatively strict with matches compared to other engines.
* OpenSearch was *very* fast with ingest, but the default configuration doesn't index misspellings.
* In-memory SQLite is by far the fastest, and PG isn't too far behind for this small data set.## Feedback from the other productsSince this is a benchmark, we felt it was fair to give the other products an opportunity to give their feedback.### Meilisearch> I agree that using PG in simple use cases is an excellent way to be pragmatic.
> Unfortunately, PG will never be able to offer you a search experience like the one you could have with Meilisearch. Meilisearch has perfect management of typos and searches by prefix, allowing a search at each keystroke. Enhanced relevancy with rules such as the number of words present, the importance of the attribute, the proximity of the terms searched in the document, and custom parameters. The ability to mix textual, geographic, and facet searches. Meilisearch automatically understands almost all languages with the possibility of having synonyms and stop words. Meilisearch has an incredible performance even on large volumes of data, and no backend proxy is needed, thanks to complete API Key management.*Quentin de Quelen, Meilisearch CEO*### Typesense> Some more context around Typesense's behavior: how loose or exact matches are done are configurable in Typesense. For eg, you can configure the number of typos that are accounted for (num\_typos), and when typo tolerance should kick-in, only if a configurable amount of results are not found (typo\_tokens\_threshold). For multi keyword searches, you can configure Typesense to expand the search by dropping some of the keywords, until at least a configurable number of search results are found (drop\_tokens\_threshold).
> We've chosen specifically to keep the search tight because we received feedback from users that some results were confusing if we let the defaults be too loose.*Jason Bosco, Typesense CEO*## Wrap-upWhile this certainly isn't "big data", it looks like Postgres has very much held it's own, even without too much wrangling/tuning. Postgres is clearly more than capable of delivering *good enough* search for this very basic case.Special purpose systems are expected to out-perform a more general system like Postgres, but clearly result quality and search speed provided Postgres is likely to be *good enough* for many use cases, and is “complexity neutral” -- no new systems needed!## More Postgres resources* [Postgres WASM by Snaplet and Supabase](https://supabase.com/blog/postgres-wasm)
* [Choosing a Postgres Primary Key](https://supabase.com/blog/choosing-a-postgres-primary-key)
* [Implementing "seen by" functionality with Postgres](https://supabase.com/blog/seen-by-in-postgresql)
* [Partial data dumps using Postgres Row Level Security](https://supabase.com/blog/partial-postgresql-data-dumps-with-rls)
* [Postgres Views](https://supabase.com/blog/postgresql-views)
* [Realtime Postgres RLS on Supabase](https://supabase.com/blog/realtime-row-level-security-in-postgresql)
