How to Use PostgreSQL Full-Text Search

1. Introduction

Effective information retrieval is one of the most crucial functionalities for content-driven websites like news sites and e-commerce platforms, where users expect fast and relevant search results. Facing that challenge, developers often consider specialized search services or AI-generated embeddings for semantic search. While these solutions are powerful, they add complexity, cost, and maintenance overhead.

Before adopting external systems, it’s worth exploring the capabilities of your existing database. If you’re using PostgreSQL as your database, the built-in Full Text Search (FTS) functionality may offer a reasonable alternative. By leveraging this feature, you can build search functionality that is sufficient for your needs directly within your database, minimizing integration challenges and infrastructure costs.

In this article, we’ll dive into PostgreSQL’s Full Text Search feature, explore its core concepts, capabilities, and limitations in practice. By following along with practical examples, you’ll gain an overview of how to implement and optimize FTS in your own applications.

Why Choose FTS Over Pattern Matching?

Basic pattern matching with LIKE might work in many cases, but it has its limitations. Full Text Search provides several advantages that might be useful, including:

Language support: FTS considers derived words (e.g. ‘search’ and ‘searching’) and can be customized with a synonym dictionary if needed.
Relevance-based sorting: FTS allows sorting results by relevance, showing the most relevant matches first.
Performance: FTS is more efficient, because it supports indexing.

By the end of this article, you will understand the core concepts of Full Text Search in PostgreSQL and be able to apply them in practice by building a tailored search system for your application’s needs.

2. Follow Along: Setting Up Your Own Database

If you’d like to follow along with the examples in this article, they are easy to adapt to your own database setup. However, if you’d like to replicate the exact setup I used, I’m working with the Goodreads.com books dataset, available here. This dataset is quite large, containing approximately 2.3 million books, making it ideal for showcasing the real benefits of performance optimization using indexes.

Setting Up a PostgreSQL Database

If you don’t already have a database, you can easily spin one up using Docker:

docker run --name postgres -e POSTGRES_PASSWORD=password -d -p 5432:5432 postgres:16

docker exec -it postgres psql -U postgres

docker run --name postgres -e POSTGRES_PASSWORD=password -d -p 5432:5432 postgres:16 docker exec -it postgres psql -U postgres

docker run --name postgres -e POSTGRES_PASSWORD=password -d -p 5432:5432 postgres:16
docker exec -it postgres psql -U postgres

Creating the Table

CREATE TABLE books (

id SERIAL PRIMARY KEY,

isbn TEXT,

title TEXT,

description TEXT,

year INTEGER,

rating FLOAT,

rating_count INTEGER,

pages INTEGER,

authors INTEGER []

);

CREATE TABLE books ( id SERIAL PRIMARY KEY, isbn TEXT, title TEXT, description TEXT, year INTEGER, rating FLOAT, rating_count INTEGER, pages INTEGER, authors INTEGER [] );

CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    isbn TEXT,
    title TEXT,
    description TEXT,
    year INTEGER,
    rating FLOAT,
    rating_count INTEGER,
    pages INTEGER,
    authors INTEGER []
);

Importing the Data

To populate the database, we’ll use a script to process the Goodreads dataset (or your chosen dataset). The script reads a JSON file line by line, parses the data, and inserts it into PostgreSQL in batches for better performance. It’s written in JavaScript, but feel free to adapt it to any programming language you’re comfortable with.

import fs from "node:fs";

import readline from "node:readline";

import postgres from "postgres";

const FILE_PATH = `${process.env.HOME}/Downloads/goodreads_books.json`; // Replace with your JSON file's location

const POSTGRES_CONNECTION_STRING =

"postgres://postgres:password@localhost:5432/postgres"; // Replace with your database connection string

const BATCH_SIZE = 5_000;

const rl = readline.createInterface({

input: fs.createReadStream(FILE_PATH, { encoding: "utf-8" }),

});

const sql = postgres(POSTGRES_CONNECTION_STRING);

let rows = [];

for await (const line of rl) {

try {

const record = JSON.parse(line);

rows.push({

isbn: record.isbn || null,

title: record.title || null,

description: record.description || null,

year: record.publication_year || null,

rating: record.average_rating || null,

rating_count: record.ratings_count || null,

pages: record.num_pages ? parseInt(record.num_pages) : null,

authors: record.authors.map((a) => parseInt(a.author_id)) || null,

});

if (rows.length >= BATCH_SIZE) {

await sql`INSERT INTO books ${sql(rows)}`;

rows = [];

}

} catch (e) {

console.error(e);

}

if (rows.length > 0) {

await sql`INSERT INTO books ${sql(rows)}`;

}

rl.close();

await sql.end();

import fs from "node:fs"; import readline from "node:readline"; import postgres from "postgres"; const FILE_PATH = `${process.env.HOME}/Downloads/goodreads_books.json`; // Replace with your JSON file's location const POSTGRES_CONNECTION_STRING = "postgres://postgres:password@localhost:5432/postgres"; // Replace with your database connection string const BATCH_SIZE = 5_000; const rl = readline.createInterface({ input: fs.createReadStream(FILE_PATH, { encoding: "utf-8" }), }); const sql = postgres(POSTGRES_CONNECTION_STRING); let rows = []; for await (const line of rl) { try { const record = JSON.parse(line); rows.push({ isbn: record.isbn || null, title: record.title || null, description: record.description || null, year: record.publication_year || null, rating: record.average_rating || null, rating_count: record.ratings_count || null, pages: record.num_pages ? parseInt(record.num_pages) : null, authors: record.authors.map((a) => parseInt(a.author_id)) || null, }); if (rows.length >= BATCH_SIZE) { await sql`INSERT INTO books ${sql(rows)}`; rows = []; } } catch (e) { console.error(e); } } if (rows.length > 0) { await sql`INSERT INTO books ${sql(rows)}`; } rl.close(); await sql.end();

import fs from "node:fs";
import readline from "node:readline";
import postgres from "postgres";

const FILE_PATH = `${process.env.HOME}/Downloads/goodreads_books.json`; // Replace with your JSON file's location
const POSTGRES_CONNECTION_STRING =
  "postgres://postgres:password@localhost:5432/postgres"; // Replace with your database connection string
const BATCH_SIZE = 5_000;

const rl = readline.createInterface({
  input: fs.createReadStream(FILE_PATH, { encoding: "utf-8" }),
});
const sql = postgres(POSTGRES_CONNECTION_STRING);

let rows = [];
for await (const line of rl) {
  try {
    const record = JSON.parse(line);
    rows.push({
      isbn: record.isbn || null,
      title: record.title || null,
      description: record.description || null,
      year: record.publication_year || null,
      rating: record.average_rating || null,
      rating_count: record.ratings_count || null,
      pages: record.num_pages ? parseInt(record.num_pages) : null,
      authors: record.authors.map((a) => parseInt(a.author_id)) || null,
    });

    if (rows.length >= BATCH_SIZE) {
      await sql`INSERT INTO books ${sql(rows)}`;
      rows = [];
    }
  } catch (e) {
    console.error(e);
  }
}
if (rows.length > 0) {
  await sql`INSERT INTO books ${sql(rows)}`;
}

rl.close();
await sql.end();

Verifying the Import

Once the script finishes, you can verify that everything worked as expected:

SELECT COUNT(*) FROM books;

count

---------

2360655

SELECT COUNT(*) FROM books; count --------- 2360655

SELECT COUNT(*) FROM books;
  count
---------
 2360655

This setup provides you with a good test environment for exploring PostgreSQL’s Full Text Search functionality and can demonstrate how these features perform at a scale that mirrors the size of a real-world application.

3. Key Concepts of Full Text Search

To use Full Text Search (FTS) effectively—and to understand its capabilities and limitations—it’s helpful to first familiarize yourself with its basic principles.

Basic terms

Document: A unit of text to be searched. This can be a single column or a combination of multiple columns (e.g., title and description).
Token: A single unit of text, such as a word or a number.
Stop Words: Common words like the or and that occur so frequently and carry so little semantic weight that they are ignored. The full list for the english configuration can be found in the $SHAREDIR/tsearch_data/english.stop file and adjusted if needed.
Lexemes: Tokens normalized to their base form through a process called stemming. For example, words like query, queries, and querying are reduced to queri. Stemming ensures that variations of a word match the same search query, improving relevance.
Dictionary: A program that normalizes, filters, or transforms tokens. Some dictionaries stop processing after handling a token, while others (filtering dictionaries) pass it along for further processing. The order of dictionaries matters: they should be arranged from the most specific to the most general.

`tsvector`

tsvector is a PostgreSQL data type made for storing preprocessed text. It’s optimized for search and can be indexed. To create a tsvector, we will use the to_tsvector function. Its only required argument is the text document to process, but you can also specify a configuration. Default configuration can be checked like this:

SHOW default_text_search_config;

default_text_search_config

----------------------------

pg_catalog.english

SHOW default_text_search_config; default_text_search_config ---------------------------- pg_catalog.english

SHOW default_text_search_config;

 default_text_search_config
----------------------------
 pg_catalog.english

Here’s an example using the first sentence from the FTS documentation:

to_tsvector

-----------------------------------------------------------------------------------------------------------------------

'capabl':10 'document':16 'full':1 'identifi':12 'languag':15 'natur':14 'natural-languag':13 'option':22 'provid':8 'queri':20,30 'relev':27 'satisfi':18 'search':3,7 'sort':24 'text':2,6

SELECT to_tsvector('english', 'Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.'); to_tsvector ----------------------------------------------------------------------------------------------------------------------- 'capabl':10 'document':16 'full':1 'identifi':12 'languag':15 'natur':14 'natural-languag':13 'option':22 'provid':8 'queri':20,30 'relev':27 'satisfi':18 'search':3,7 'sort':24 'text':2,6

SELECT to_tsvector('english', 'Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.');

                  to_tsvector
-----------------------------------------------------------------------------------------------------------------------
'capabl':10 'document':16 'full':1 'identifi':12 'languag':15 'natur':14 'natural-languag':13 'option':22 'provid':8 'queri':20,30 'relev':27 'satisfi':18 'search':3,7 'sort':24 'text':2,6

What’s happening here?

Stop words like the, of, and that are ignored.
The punctuations signs are also ignored.
Words are reduced to their root forms (lexemes). For example, both searching and search are indexed as search.
natural-language is indexed as natura, language, and natural-language. This happens because the parser generates tokens for both the full hyphenated word and each of its individual components. This behavior is particularly useful in searches where users might query only part of a compound word.

There’s also a simple configuration, which doesn’t remove stop words or perform stemming. For example:

SELECT to_tsvector('simple', 'Harry Potter and the Prisoner of Azkaban');

to_tsvector

----------------------------------------------------------------------

'and':3 'azkaban':7 'harry':1 'of':6 'potter':2 'prisoner':5 'the':4

SELECT to_tsvector('simple', 'Harry Potter and the Prisoner of Azkaban'); to_tsvector ---------------------------------------------------------------------- 'and':3 'azkaban':7 'harry':1 'of':6 'potter':2 'prisoner':5 'the':4

SELECT to_tsvector('simple', 'Harry Potter and the Prisoner of Azkaban');

                             to_tsvector
----------------------------------------------------------------------
 'and':3 'azkaban':7 'harry':1 'of':6 'potter':2 'prisoner':5 'the':4

This configuration can be useful in specific scenarios where the document’s language is unknown, the text contains many significant stop words, or there are words that we want to keep in their original form. If you use the simple configuration to index the documents, you should apply the same configuration when performing searches, in order to maintain consistency in how the text is processed.

To sum up, a tsvector stores an alphabetically sorted list of lexemes, along with their positions and optionally their weights (we’ll talk about weights later).

`tsquery`

The second key building block of FTS in PostgreSQL is tsquery. It stores the lexemes that will be used for searching. Similar to tsvector, there’s a built-in function called to_tsquery to create it. Let’s take a look at the following example:

SELECT to_tsquery('full text search');

ERROR: syntax error in tsquery: "full text search"

SELECT to_tsquery('full text search'); ERROR: syntax error in tsquery: "full text search"

SELECT to_tsquery('full text search');

ERROR:  syntax error in tsquery: "full text search"

It’s happening because to_tsquery requires logical operators like & (AND) and | (OR). If we omit them, PostgreSQL throw a syntax error. To fix it, we will explicitly include the operators:

SELECT to_tsquery('full & text & search & (postgres | pg) & !mysql');

to_tsquery

-------------------------------------------------------------

'full' & 'text' & 'search' & ( 'postgr' | 'pg' ) & !'mysql'

SELECT to_tsquery('full & text & search & (postgres | pg) & !mysql'); to_tsquery ------------------------------------------------------------- 'full' & 'text' & 'search' & ( 'postgr' | 'pg' ) & !'mysql'

SELECT to_tsquery('full & text & search & (postgres | pg) & !mysql');

                         to_tsquery
-------------------------------------------------------------
 'full' & 'text' & 'search' & ( 'postgr' | 'pg' ) & !'mysql'

Thankfully, PostgreSQL offers additional functions beyond the strict to_tsquery that are much easier to work with, especially in certain scenarios:

plainto_tsquery: Parses and normalizes text, just like to_tsquery, but automatically connects the resulting lexemes with the AND (&) operator.
phraseto_tsquery: Similar to plainto_tsquery, but instead of using AND, it links lexemes with the FOLLOWED BY (<->) operator.
websearch_to_tsquery: The most user-friendly option when building search features for end users. It’s forgiving with input and supports syntax familiar from web search engines. For example: OR is converted to | and - is converted to !. If you use quotes ("), the lexemes are connected with FOLLOWED BY (<->) and if you provide input without quotes, they’re linked with AND (&).

Function	Usage	Example Input	Example Output
`to_tsquery`	Requires strict syntax	`'full & text & search'`	`'full' & 'text' & 'search'`
`plainto_tsquery`	Converts input using AND (`&`).	`'full text search'`	`'full' & 'text' & 'search'`
`phraseto_tsquery`	Links lexemes with FOLLOWED BY (`<->`).	`'full text search'`	`'full' <-> 'text' <-> 'search'`
`websearch_to_tsquery`	Allows user-friendly syntax like web search engines.	`'"full text" -mysql'`	`'full' <-> 'text' & !'mysql'`

`@@` – match operator

@@ operator checks whether a document (tsvector) matches a search query (tsquery). We can combine the knowledge from the previous sections and see it in action using the queries we’ve written earlier:

SELECT to_tsvector('Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.') @@ to_tsquery('full & text & search');

?column?

----------

SELECT to_tsvector('Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.') @@ to_tsquery('full & text & search');

 ?column?
----------
 t

The result above means that all the relevant lexemes from the query are present in the document’s vector.

?column?

----------

SELECT to_tsvector('Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.') @@ to_tsquery('full & text & search & (postgres | pg)');

 ?column?
----------
 f

Putting It All Together

Using the information we’ve gathered so far, we can now start performing operations on our books database.

SELECT COUNT(*)

FROM books

WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'Harry Potter');

count

-------

2874

SELECT COUNT(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'Harry Potter'); count ------- 2874

SELECT COUNT(*)
FROM books
WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'Harry Potter');

 count
-------
  2874

In this example, we are searching for the phrase “Harry Potter” in both the title and description columns of the books table. The query uses to_tsvector to convert the concatenated text from both columns into a tsvector, and then matches it against a tsquery created using the plainto_tsquery function. The coalesce function ensures that NULL values in title or description are replaced with an empty string (”). This prevents the concatenation from producing NULL, which would exclude those rows from the search.

The result shows that there are 2,874 books that contain the phrase “Harry Potter” in their title or description.

4. Indexing for Better Performance

One of the main strengths of PostgreSQL Full Text Search is its ability to efficiently handle large amounts of data. However, Full Text Search operations can be computationally intensive. To maximize performance, indexing is crucial.

If you tried running the query from the previous example, you might have noticed it could take over one minute to complete, depending on your hardware and this is unacceptable for most applications. The reason can be identified using the EXPLAIN command:

postgres=# EXPLAIN ANALYZE SELECT count(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'Harry Potter');

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Finalize Aggregate (cost=490138.74..490138.75 rows=1 width=8) (actual time=64852.525..64860.303 rows=1 loops=1)

-> Gather (cost=490138.52..490138.73 rows=2 width=8) (actual time=64851.653..64860.285 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=489138.52..489138.53 rows=1 width=8) (actual time=64825.494..64825.495 rows=1 loops=3)

-> Parallel Seq Scan on books (cost=0.00..489138.46 rows=25 width=0) (actual time=53.240..64824.575 rows=958 loops=3)

Filter: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery)

Rows Removed by Filter: 785927

Planning Time: 0.295 ms

Execution Time: 64863.263 ms

postgres=# EXPLAIN ANALYZE SELECT count(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'Harry Potter'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=490138.74..490138.75 rows=1 width=8) (actual time=64852.525..64860.303 rows=1 loops=1) -> Gather (cost=490138.52..490138.73 rows=2 width=8) (actual time=64851.653..64860.285 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=489138.52..489138.53 rows=1 width=8) (actual time=64825.494..64825.495 rows=1 loops=3) -> Parallel Seq Scan on books (cost=0.00..489138.46 rows=25 width=0) (actual time=53.240..64824.575 rows=958 loops=3) Filter: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery) Rows Removed by Filter: 785927 Planning Time: 0.295 ms Execution Time: 64863.263 ms

postgres=# EXPLAIN ANALYZE SELECT count(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'Harry Potter');
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=490138.74..490138.75 rows=1 width=8) (actual time=64852.525..64860.303 rows=1 loops=1)
   ->  Gather  (cost=490138.52..490138.73 rows=2 width=8) (actual time=64851.653..64860.285 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=489138.52..489138.53 rows=1 width=8) (actual time=64825.494..64825.495 rows=1 loops=3)
               ->  Parallel Seq Scan on books  (cost=0.00..489138.46 rows=25 width=0) (actual time=53.240..64824.575 rows=958 loops=3)
                     Filter: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery)
                     Rows Removed by Filter: 785927
 Planning Time: 0.295 ms
 Execution Time: 64863.263 ms

The EXPLAIN command helps analyze query performance by showing the query execution plan. In this case Parallel Seq Scan shows PostgreSQL performs a full table scan, meaning it checks every row, generates a tsvector for it, and evaluates its match against the tsquery which is slow for large datasets. Indexing solves this by allowing PostgreSQL to quickly identify rows containing the searched terms, drastically reducing query time.

Using GIN Indexes

PostgreSQL recommends using a GIN (Generalized Inverted Index) for Full Text Search. GIN index stores a list of all lexemes across all documents, along with their corresponding row locations. For example, if “Harry” is a search term, PostgreSQL can quickly identify rows containing this word using the index, avoiding a full table scan.

Assuming we want to search both the title and description columns, here are some possible indexing strategies:

Direct Indexing on the Search Expression

CREATE INDEX search_vector_idx ON books USING GIN (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')));

CREATE INDEX search_vector_idx ON books USING GIN (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')));

Generated Column with Index

ALTER TABLE books ADD COLUMN search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, ''))) STORED;

CREATE INDEX search_vector_idx ON books USING GIN (search_vector);

ALTER TABLE books ADD COLUMN search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, ''))) STORED; CREATE INDEX search_vector_idx ON books USING GIN (search_vector);

ALTER TABLE books ADD COLUMN search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, ''))) STORED;
CREATE INDEX search_vector_idx ON books USING GIN (search_vector);

Storing Documents in a Separate Column

In our case, this approach doesn’t make much sense because we are directly using the unmodified title and description columns. However, there are scenarios when creating the document column from other fields with SQL expressions may be challenging, such as when data requires parsing (e.g. stripping HTML tags) or involves complex structures like JSON. In such cases, you can generate the document column in your application code and store it separately:

ALTER TABLE books ADD COLUMN search_document TEXT;

CREATE INDEX search_vector_idx ON books USING GIN (to_tsvector('english', search_document));

ALTER TABLE books ADD COLUMN search_document TEXT; CREATE INDEX search_vector_idx ON books USING GIN (to_tsvector('english', search_document));

ALTER TABLE books ADD COLUMN search_document TEXT;
CREATE INDEX search_vector_idx ON books USING GIN (to_tsvector('english', search_document));

Verifying Index

All of these methods will speed up our search, and for the example above, the query will seem almost instant. Using the EXPLAIN command again shows how the index improves performance:

postgres=# EXPLAIN ANALYZE SELECT count(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'harry potter');

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=297.01..297.02 rows=1 width=8) (actual time=45.266..45.269 rows=1 loops=1)

-> Bitmap Heap Scan on books (cost=47.41..296.87 rows=59 width=0) (actual time=2.987..44.881 rows=2874 loops=1)

Recheck Cond: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery)

Heap Blocks: exact=2790

-> Bitmap Index Scan on search_vector_idx (cost=0.00..47.40 rows=59 width=0) (actual time=2.214..2.215 rows=2874 loops=1)

Index Cond: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery)

Planning Time: 13.629 ms

Execution Time: 46.008 ms

postgres=# EXPLAIN ANALYZE SELECT count(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'harry potter'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=297.01..297.02 rows=1 width=8) (actual time=45.266..45.269 rows=1 loops=1) -> Bitmap Heap Scan on books (cost=47.41..296.87 rows=59 width=0) (actual time=2.987..44.881 rows=2874 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery) Heap Blocks: exact=2790 -> Bitmap Index Scan on search_vector_idx (cost=0.00..47.40 rows=59 width=0) (actual time=2.214..2.215 rows=2874 loops=1) Index Cond: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery) Planning Time: 13.629 ms Execution Time: 46.008 ms

postgres=# EXPLAIN ANALYZE SELECT count(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'harry potter');
                                                                                    QUERY PLAN                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=297.01..297.02 rows=1 width=8) (actual time=45.266..45.269 rows=1 loops=1)
   ->  Bitmap Heap Scan on books  (cost=47.41..296.87 rows=59 width=0) (actual time=2.987..44.881 rows=2874 loops=1)
         Recheck Cond: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery)
         Heap Blocks: exact=2790
         ->  Bitmap Index Scan on search_vector_idx  (cost=0.00..47.40 rows=59 width=0) (actual time=2.214..2.215 rows=2874 loops=1)
               Index Cond: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery)
 Planning Time: 13.629 ms
 Execution Time: 46.008 ms

We will use the generated column search_vector (second method) to make the next queries more readable.

5. Ordering Results by Relevance

Ranking is a highly useful feature of Full Text Search that allows us to sort search results based on their relevance to a given query.

Basic Ranking

PostgreSQL provides two built-in functions to rank results based on how well a document (tsvector) matches a query:

ts_rank: Computes ranking based on the frequency of matching lexemes.
ts_rank_cd: Extends ts_rank by also considering cover density, which accounts for the proximity of matching lexemes to one another.

Both functions take tsvector and tsquery as their primary arguments.

Example: Sorting Books by Relevance with ts_rank

SELECT

title,

ts_rank(

search_vector,

plainto_tsquery('english', 'Harry Potter')

) as rank

FROM books

WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')

ORDER BY rank DESC;

SELECT title, ts_rank( search_vector, plainto_tsquery('english', 'Harry Potter') ) as rank FROM books WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter') ORDER BY rank DESC;

SELECT
      title,
      ts_rank(
          search_vector,
          plainto_tsquery('english', 'Harry Potter')
      ) as rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY rank DESC;

This query returns books ranked by relevance to the search query using the combined title and description fields. Results are sorted in order of the computed rank.

Considering Document Length

When calculating rankings, the document’s size can also be considered. Both ts_rank and ts_rank_cd accept an optional third argument, the normalization option, which influences how document length affects ranking.

Examples of normalization values:

0 (default): Ignores document length.
2: Divides the rank by the document’s length.
8: Divides the rank by the number of unique words in the document.

For all possible values, refer to the PostgreSQL documentation.

Example: Using Normalization with ts_rank

SELECT

title,

ts_rank(

search_vector,

plainto_tsquery('english', 'Harry Potter'),

) as rank

FROM books

WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')

ORDER BY rank DESC;

SELECT title, ts_rank( search_vector, plainto_tsquery('english', 'Harry Potter'), 2 ) as rank FROM books WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter') ORDER BY rank DESC;

SELECT
      title,
      ts_rank(
          search_vector,
          plainto_tsquery('english', 'Harry Potter'),
          2
      ) as rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY rank DESC;

This query adjusts rankings by dividing them by document length, making shorter documents more likely to rank higher if they match well. This is not very useful in our case, as we are dealing with relatively short documents (book titles and descriptions), but it might be definitely more applicable in systems containing very long documents, such as news articles, research papers, or entire books, where document length significantly varies and could impact relevance.

Customizing Ranking with `set_weight`

The setweight function allows assigning weights to lexemes. For example, a search term appearing in the title might be more important than in the description. PostgreSQL provides four weights:

A: 1.0 (Highest priority)
B: 0.4
C: 0.2
D: 0.1 (Lowest priority)

Example: Applying setweight to tsvector

SELECT setweight(to_tsvector('Harrry Potter'), 'A') || setweight(to_tsvector('and The Philosopher Stone'), 'B');

?column?

---------------------------------------------------

'harrri':1A 'philosoph':5B 'potter':2A 'stone':6B

SELECT setweight(to_tsvector('Harrry Potter'), 'A') || setweight(to_tsvector('and The Philosopher Stone'), 'B'); ?column? --------------------------------------------------- 'harrri':1A 'philosoph':5B 'potter':2A 'stone':6B

SELECT setweight(to_tsvector('Harrry Potter'), 'A') || setweight(to_tsvector('and The Philosopher Stone'), 'B');
                     ?column?
---------------------------------------------------
 'harrri':1A 'philosoph':5B 'potter':2A 'stone':6B

Example: Applying setweight while querying

SELECT

ts_rank(

setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'),

plainto_tsquery('english', 'Harry Potter')

) as rank

FROM books

WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')

ORDER BY rank DESC;

SELECT *, ts_rank( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'), plainto_tsquery('english', 'Harry Potter') ) as rank FROM books WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter') ORDER BY rank DESC;

SELECT
      *,
      ts_rank(
          setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'),
          plainto_tsquery('english', 'Harry Potter')
      ) as rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY rank DESC;

It is worth mentioning that the weights have no effect on the @@ operator, which is why do not use them in the WHERE clause, so PostgreSQL is able to take advantage of the previously created index for filtering. Additionally, to combine two vectors with different weights, we used the || operator, which works similarly to string concatenation.

In the example above any match directly in the title will have 10 times more impact on the ranking than a match in the description.

Custom Modifiers for Ranking

Finally, it is worth to notice that the value returned by ts_rank is simply a float number, so we can modify it in any way we like, for example, by multiplying it or adding another number to it. Each application, depending on its specific needs, may require the consideration of different factors when prioritizing some search results over others. News websites may want to display the latest news matching the search query first, while e-commerce websites may take into account sales or current advertising campaigns.

If we run the query from the previous section, you may notice that the results are not exactly what you might expect:

SELECT

title,

ts_rank(

setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'),

plainto_tsquery('english', 'Harry Potter')

) as rank

FROM books

WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')

ORDER BY rank DESC

LIMIT 10;

title | rank

--------------------------------------------------------------------------------------------------------+------

Harry Potter and the Chamber of Secrets - Harry Potter dan Kamar Rahasia (Harry Potter, #2) | 1

Harry Potter and the Goblet of Fire - Harry Potter dan Piala Api (Harry Potter, #4) | 1

Harry Potter and the Philosopher's Stone - Harry Potter dan Batu Bertuah (Harry Potter, #1) | 1

Harry Potter and the Order of the Phoenix - Harry Potter dan Orde Phoenix (Harry Potter, #5) | 1

Articles on Harry Potter Films, Including: Harry Potter and the Philosopher's Stone (Film), Harry P... | 1

Harry Potter and the Prisoner of Azkaban - Harry Potter dan Tawanan Azkaban (Harry Potter, #3) | 1

Harry Potter: Harry Potter a L'Ecole Des Sorciers / Harry Potter Et Le Chambre Des Secrets / Harry ... | 1

The Potterhead's Unofficial Harry Potter Cookbook: The Best Recipes from Harry Potter - Harry Potte... | 1

Harry Potter and The Half-Blood Prince - Harry Potter dan Pangeran Berdarah-Campuran (Harry Potter,... | 1

Harry Potter and the Deathly Hallows - Harry Potter dan Relikui Kematian (Harry Potter, #7) | 1

(10 rows)

SELECT title, ts_rank( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'), plainto_tsquery('english', 'Harry Potter') ) as rank FROM books WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter') ORDER BY rank DESC LIMIT 10; title | rank --------------------------------------------------------------------------------------------------------+------ Harry Potter and the Chamber of Secrets - Harry Potter dan Kamar Rahasia (Harry Potter, #2) | 1 Harry Potter and the Goblet of Fire - Harry Potter dan Piala Api (Harry Potter, #4) | 1 Harry Potter and the Philosopher's Stone - Harry Potter dan Batu Bertuah (Harry Potter, #1) | 1 Harry Potter and the Order of the Phoenix - Harry Potter dan Orde Phoenix (Harry Potter, #5) | 1 Articles on Harry Potter Films, Including: Harry Potter and the Philosopher's Stone (Film), Harry P... | 1 Harry Potter and the Prisoner of Azkaban - Harry Potter dan Tawanan Azkaban (Harry Potter, #3) | 1 Harry Potter: Harry Potter a L'Ecole Des Sorciers / Harry Potter Et Le Chambre Des Secrets / Harry ... | 1 The Potterhead's Unofficial Harry Potter Cookbook: The Best Recipes from Harry Potter - Harry Potte... | 1 Harry Potter and The Half-Blood Prince - Harry Potter dan Pangeran Berdarah-Campuran (Harry Potter,... | 1 Harry Potter and the Deathly Hallows - Harry Potter dan Relikui Kematian (Harry Potter, #7) | 1 (10 rows)

SELECT
      title,
      ts_rank(
          setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'),
          plainto_tsquery('english', 'Harry Potter')
      ) as rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY rank DESC
LIMIT 10;

                                                 title                                                  | rank
--------------------------------------------------------------------------------------------------------+------
 Harry Potter and the Chamber of Secrets - Harry Potter dan Kamar Rahasia (Harry Potter, #2)            |    1
 Harry Potter and the Goblet of Fire - Harry Potter dan Piala Api (Harry Potter, #4)                    |    1
 Harry Potter and the Philosopher's Stone - Harry Potter dan Batu Bertuah  (Harry Potter, #1)           |    1
 Harry Potter and the Order of the Phoenix - Harry Potter dan Orde Phoenix (Harry Potter, #5)           |    1
 Articles on Harry Potter Films, Including: Harry Potter and the Philosopher's Stone (Film), Harry P... |    1
 Harry Potter and the Prisoner of Azkaban - Harry Potter dan Tawanan Azkaban (Harry Potter, #3)         |    1
 Harry Potter: Harry Potter a L'Ecole Des Sorciers / Harry Potter Et Le Chambre Des Secrets / Harry ... |    1
 The Potterhead's Unofficial Harry Potter Cookbook: The Best Recipes from Harry Potter - Harry Potte... |    1
 Harry Potter and The Half-Blood Prince - Harry Potter dan Pangeran Berdarah-Campuran (Harry Potter,... |    1
 Harry Potter and the Deathly Hallows - Harry Potter dan Relikui Kematian (Harry Potter, #7)            |    1
(10 rows)

In the case of our database, it seems like a good idea to consider both the popularity (reflected by the number of ratings) and the average rating of a book. There is no one-size-fits-all approach to determining such a relationship. It is best to determine it through trial and error, considering our dataset and, if available, user behavior data. A good starting point for tuning could be to consider the rating relative to the maximum possible rating and the number of ratings relative to the maximum number of ratings in our dataset.

SELECT

title,

rating_count,

rating,

ts_rank(

setweight(to_tsvector('english', coalesce(title, '')), 'A') ||

setweight(to_tsvector('english', coalesce(description, '')), 'D'),

plainto_tsquery('english', 'Harry Potter')

)

* rating / 5

* log(rating_count + 1) / log(4899965)

as total_rank

FROM books

WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')

ORDER BY total_rank DESC NULLS LAST

LIMIT 10;

title | rating_count | rating | total_rank

--------------------------------------------------------------+--------------+--------+--------------------

Harry Potter and the Sorcerer's Stone (Harry Potter, #1) | 4765497 | 4.45 | 0.8883911497250822

Harry Potter and the Deathly Hallows (Harry Potter, #7) | 1784684 | 4.62 | 0.8634184949841442

Harry Potter and the Prisoner of Azkaban (Harry Potter, #3) | 1876252 | 4.53 | 0.8495411470439224

Harry Potter and the Goblet of Fire (Harry Potter, #4) | 1792561 | 4.53 | 0.8468575628943454

Harry Potter and the Half-Blood Prince (Harry Potter, #6) | 1713866 | 4.54 | 0.8460755511189171

Harry Potter and the Order of the Phoenix (Harry Potter, #5) | 1766895 | 4.47 | 0.8348022743499233

Harry Potter and the Chamber of Secrets (Harry Potter, #2) | 1821802 | 4.38 | 0.8197347807858469

Harry Potter Boxset (Harry Potter, #1-7) | 193057 | 4.74 | 0.7489816444604352

The Harry Potter Collection 1-4 (Harry Potter, #1-4) | 44587 | 4.66 | 0.6476747171319697

Harry Potter Boxed Set, Books 1-5 (Harry Potter, #1-5) | 34349 | 4.77 | 0.6468081396747123

(10 rows)

SELECT title, rating_count, rating, ts_rank( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'), plainto_tsquery('english', 'Harry Potter') ) * rating / 5 * log(rating_count + 1) / log(4899965) as total_rank FROM books WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter') ORDER BY total_rank DESC NULLS LAST LIMIT 10; title | rating_count | rating | total_rank --------------------------------------------------------------+--------------+--------+-------------------- Harry Potter and the Sorcerer's Stone (Harry Potter, #1) | 4765497 | 4.45 | 0.8883911497250822 Harry Potter and the Deathly Hallows (Harry Potter, #7) | 1784684 | 4.62 | 0.8634184949841442 Harry Potter and the Prisoner of Azkaban (Harry Potter, #3) | 1876252 | 4.53 | 0.8495411470439224 Harry Potter and the Goblet of Fire (Harry Potter, #4) | 1792561 | 4.53 | 0.8468575628943454 Harry Potter and the Half-Blood Prince (Harry Potter, #6) | 1713866 | 4.54 | 0.8460755511189171 Harry Potter and the Order of the Phoenix (Harry Potter, #5) | 1766895 | 4.47 | 0.8348022743499233 Harry Potter and the Chamber of Secrets (Harry Potter, #2) | 1821802 | 4.38 | 0.8197347807858469 Harry Potter Boxset (Harry Potter, #1-7) | 193057 | 4.74 | 0.7489816444604352 The Harry Potter Collection 1-4 (Harry Potter, #1-4) | 44587 | 4.66 | 0.6476747171319697 Harry Potter Boxed Set, Books 1-5 (Harry Potter, #1-5) | 34349 | 4.77 | 0.6468081396747123 (10 rows)

SELECT
    title,
    rating_count,
    rating,
    ts_rank(
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(description, '')), 'D'),
        plainto_tsquery('english', 'Harry Potter')
    )
    * rating / 5
    * log(rating_count + 1) / log(4899965)
    as total_rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY total_rank DESC NULLS LAST
LIMIT 10;

                            title                             | rating_count | rating |     total_rank
--------------------------------------------------------------+--------------+--------+--------------------
 Harry Potter and the Sorcerer's Stone (Harry Potter, #1)     |      4765497 |   4.45 | 0.8883911497250822
 Harry Potter and the Deathly Hallows (Harry Potter, #7)      |      1784684 |   4.62 | 0.8634184949841442
 Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)  |      1876252 |   4.53 | 0.8495411470439224
 Harry Potter and the Goblet of Fire (Harry Potter, #4)       |      1792561 |   4.53 | 0.8468575628943454
 Harry Potter and the Half-Blood Prince (Harry Potter, #6)    |      1713866 |   4.54 | 0.8460755511189171
 Harry Potter and the Order of the Phoenix (Harry Potter, #5) |      1766895 |   4.47 | 0.8348022743499233
 Harry Potter and the Chamber of Secrets (Harry Potter, #2)   |      1821802 |   4.38 | 0.8197347807858469
 Harry Potter Boxset (Harry Potter, #1-7)                     |       193057 |   4.74 | 0.7489816444604352
 The Harry Potter Collection 1-4 (Harry Potter, #1-4)         |        44587 |   4.66 | 0.6476747171319697
 Harry Potter Boxed Set, Books 1-5 (Harry Potter, #1-5)       |        34349 |   4.77 | 0.6468081396747123
(10 rows)

We made the following modifications:

ts_rank(...): Base relevance score from the search query.
rating / 5: Normalizes the book rating (out of 5) to a 0–1 scale.
log(rating_count + 1) / log(4899965): Adjusts the score based on popularity, comparing the number of ratings to the maximum in the dataset (4899965). In a real application, this value should be fetched more dynamically. Logarithmic scaling flattens extreme values to avoid outliers dominating the results and adding 1 to rating_count handles the case where the value is 0.

By combining these factors, the query prioritizes popular and highly rated books that match the search query. This is just a suggestion that should be tested in many other cases and adjusted if necessary. If it turns out that one of the components (e.g., book popularity) outweighs the others, it is also possible to apply weights, for example, giving 50% influence to ts_rank and 25% each to rating and rating_count.

Ranking Performance

Ranking search results with functions ts_rank and ts_rank_cd is computationally expensive because scores are calculated dynamically. For queries that match many rows (e.g., ‘world’ or ‘life’ match several thousand results), this can lead to slow performance, especially in cloud-hosted environments.

To optimize ranking performance:

Limit Rows Before Ranking: It’s the simplest and most effective approach. Apply additional filters (e.g., exclude low-rated or irrelevant rows) to reduce the number of results PostgreSQL needs to rank. For instance, if our ranking system heavily weights average ratings, we could filter out items with low ratings early on, since it’s unlikely they will appear on the first page of results. Use indexes like GIN to efficiently filter rows before applying ranking calculations.
Simplify Ranking for Generic Queries: For very generic queries matching many rows, it’s a good idea to fallback to simpler criteria (e.g., just sort by popularity or rating instead of ts_rank).
Cache Frequent Queries: Store results of common searches in an external cache or materialized views to avoid recalculating ranks repeatedly.

6. Other Tools

Beyond these foundational features, PostgreSQL offers additional tools to further enhance your search capabilities:

ts_headline: This function highlights fragments matching the query, making it easier to show relevant excerpts in your search results.
Synonym Dictionaries: Extensions like xsyn allow to include predefined variations of the same word in your search results, improving flexibility and relevance.
Similarity Matching: Tools like pg_trgm may well complement the possibilities and limitations of FTS for some cases.

7. Limitations

While PostgreSQL is powerful and versatile, it has some limitations that are important to consider:

No Semantic Understanding: FTS matches lexemes but doesn’t recognize that some words can be semantically related (e.g., ‘wizard’ and ‘sorcerer’). Although it’s possible to use a synonym dictionary, it is very difficult to cover all potential cases.
No Built-In Support for Approximate Matching: Full Text Search requires exact matches so queries with typos won’t return any results. To address this, it can be combined with PostgreSQL extensions like pg_trgm.
Scaling Limitations: PostgreSQL FTS is tied to a single database instance, which can make scaling more difficult for large or distributed applications.
Limited Ranking Features: Custom ranking needs manual setup; lacks built-in support for factors like popularity or recency.
Multilingual Challenges: It can be complex to configure it for multiple languages.

8. Conclusion

In this article, we explored how PostgreSQL’s Full Text Search (FTS) can be used to build search system directly within your database. We covered core concepts like tsvector and tsquery, ranking results with functions like ts_rank, and optimizing performance using GIN indexes. Additionally, we demonstrated how to tune relevance and customize ranking to meet specific application needs.

PostgreSQL Full Text Search can be a robust and versatile tool that in some cases can eliminate the need for external search systems. With thoughtful indexing, proper configuration, and adequate relevance tuning, it’s able to deliver a powerful search experience while keeping infrastructure simple and costs low.

1. Introduction

Why Choose FTS Over Pattern Matching?

2. Follow Along: Setting Up Your Own Database

Setting Up a PostgreSQL Database

Creating the Table

Importing the Data

Verifying the Import

3. Key Concepts of Full Text Search

Basic terms

tsvector

tsquery

@@ – match operator

Putting It All Together

4. Indexing for Better Performance

Using GIN Indexes

Verifying Index

5. Ordering Results by Relevance

Basic Ranking

Considering Document Length

Customizing Ranking with set_weight

Custom Modifiers for Ranking

Ranking Performance

6. Other Tools

7. Limitations

8. Conclusion

`tsvector`

`tsquery`

`@@` – match operator

Customizing Ranking with `set_weight`