1. Introduction
Effective information retrieval is one of the most crucial functionalities for content-driven websites like news sites and e-commerce platforms, where users expect fast and relevant search results. Facing that challenge, developers often consider specialized search services or AI-generated embeddings for semantic search. While these solutions are powerful, they add complexity, cost, and maintenance overhead.
Before adopting external systems, it’s worth exploring the capabilities of your existing database. If you’re using PostgreSQL as your database, the built-in Full Text Search (FTS) functionality may offer a reasonable alternative. By leveraging this feature, you can build search functionality that is sufficient for your needs directly within your database, minimizing integration challenges and infrastructure costs.
In this article, we’ll dive into PostgreSQL’s Full Text Search feature, explore its core concepts, capabilities, and limitations in practice. By following along with practical examples, you’ll gain an overview of how to implement and optimize FTS in your own applications.
Why Choose FTS Over Pattern Matching?
Basic pattern matching with LIKE
might work in many cases, but it has its limitations. Full Text Search provides several advantages that might be useful, including:
- Language support: FTS considers derived words (e.g. ‘search’ and ‘searching’) and can be customized with a synonym dictionary if needed.
- Relevance-based sorting: FTS allows sorting results by relevance, showing the most relevant matches first.
- Performance: FTS is more efficient, because it supports indexing.
By the end of this article, you will understand the core concepts of Full Text Search in PostgreSQL and be able to apply them in practice by building a tailored search system for your application’s needs.
2. Follow Along: Setting Up Your Own Database
If you’d like to follow along with the examples in this article, they are easy to adapt to your own database setup. However, if you’d like to replicate the exact setup I used, I’m working with the Goodreads.com books dataset, available here. This dataset is quite large, containing approximately 2.3 million books, making it ideal for showcasing the real benefits of performance optimization using indexes.
Setting Up a PostgreSQL Database
If you don’t already have a database, you can easily spin one up using Docker:
docker run --name postgres -e POSTGRES_PASSWORD=password -d -p 5432:5432 postgres:16
docker exec -it postgres psql -U postgres
Creating the Table
CREATE TABLE books (
id SERIAL PRIMARY KEY,
isbn TEXT,
title TEXT,
description TEXT,
year INTEGER,
rating FLOAT,
rating_count INTEGER,
pages INTEGER,
authors INTEGER []
);
Importing the Data
To populate the database, we’ll use a script to process the Goodreads dataset (or your chosen dataset). The script reads a JSON file line by line, parses the data, and inserts it into PostgreSQL in batches for better performance. It’s written in JavaScript, but feel free to adapt it to any programming language you’re comfortable with.
import fs from "node:fs";
import readline from "node:readline";
import postgres from "postgres";
const FILE_PATH = `${process.env.HOME}/Downloads/goodreads_books.json`; // Replace with your JSON file's location
const POSTGRES_CONNECTION_STRING =
"postgres://postgres:password@localhost:5432/postgres"; // Replace with your database connection string
const BATCH_SIZE = 5_000;
const rl = readline.createInterface({
input: fs.createReadStream(FILE_PATH, { encoding: "utf-8" }),
});
const sql = postgres(POSTGRES_CONNECTION_STRING);
let rows = [];
for await (const line of rl) {
try {
const record = JSON.parse(line);
rows.push({
isbn: record.isbn || null,
title: record.title || null,
description: record.description || null,
year: record.publication_year || null,
rating: record.average_rating || null,
rating_count: record.ratings_count || null,
pages: record.num_pages ? parseInt(record.num_pages) : null,
authors: record.authors.map((a) => parseInt(a.author_id)) || null,
});
if (rows.length >= BATCH_SIZE) {
await sql`INSERT INTO books ${sql(rows)}`;
rows = [];
}
} catch (e) {
console.error(e);
}
}
if (rows.length > 0) {
await sql`INSERT INTO books ${sql(rows)}`;
}
rl.close();
await sql.end();
Verifying the Import
Once the script finishes, you can verify that everything worked as expected:
SELECT COUNT(*) FROM books;
count
---------
2360655
This setup provides you with a good test environment for exploring PostgreSQL’s Full Text Search functionality and can demonstrate how these features perform at a scale that mirrors the size of a real-world application.
3. Key Concepts of Full Text Search
To use Full Text Search (FTS) effectively—and to understand its capabilities and limitations—it’s helpful to first familiarize yourself with its basic principles.
Basic terms
- Document: A unit of text to be searched. This can be a single column or a combination of multiple columns (e.g., title and description).
- Token: A single unit of text, such as a word or a number.
- Stop Words: Common words like
the
orand
that occur so frequently and carry so little semantic weight that they are ignored. The full list for theenglish
configuration can be found in the$SHAREDIR/tsearch_data/english.stop
file and adjusted if needed. - Lexemes: Tokens normalized to their base form through a process called stemming. For example, words like
query
,queries
, andquerying
are reduced toqueri
. Stemming ensures that variations of a word match the same search query, improving relevance. - Dictionary: A program that normalizes, filters, or transforms tokens. Some dictionaries stop processing after handling a token, while others (filtering dictionaries) pass it along for further processing. The order of dictionaries matters: they should be arranged from the most specific to the most general.
tsvector
tsvector
is a PostgreSQL data type made for storing preprocessed text. It’s optimized for search and can be indexed. To create a tsvector, we will use the to_tsvector
function. Its only required argument is the text document to process, but you can also specify a configuration. Default configuration can be checked like this:
SHOW default_text_search_config;
default_text_search_config
----------------------------
pg_catalog.english
Here’s an example using the first sentence from the FTS documentation:
SELECT to_tsvector('english', 'Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.');
to_tsvector
-----------------------------------------------------------------------------------------------------------------------
'capabl':10 'document':16 'full':1 'identifi':12 'languag':15 'natur':14 'natural-languag':13 'option':22 'provid':8 'queri':20,30 'relev':27 'satisfi':18 'search':3,7 'sort':24 'text':2,6
What’s happening here?
- Stop words like
the
,of
, andthat
are ignored. - The punctuations signs are also ignored.
- Words are reduced to their root forms (lexemes). For example, both
searching
andsearch
are indexed assearch
. natural-language
is indexed asnatura
,language
, andnatural-language
. This happens because the parser generates tokens for both the full hyphenated word and each of its individual components. This behavior is particularly useful in searches where users might query only part of a compound word.
There’s also a simple
configuration, which doesn’t remove stop words or perform stemming. For example:
SELECT to_tsvector('simple', 'Harry Potter and the Prisoner of Azkaban');
to_tsvector
----------------------------------------------------------------------
'and':3 'azkaban':7 'harry':1 'of':6 'potter':2 'prisoner':5 'the':4
This configuration can be useful in specific scenarios where the document’s language is unknown, the text contains many significant stop words, or there are words that we want to keep in their original form. If you use the simple
configuration to index the documents, you should apply the same configuration when performing searches, in order to maintain consistency in how the text is processed.
To sum up, a tsvector
stores an alphabetically sorted list of lexemes, along with their positions and optionally their weights (we’ll talk about weights later).
tsquery
The second key building block of FTS in PostgreSQL is tsquery
. It stores the lexemes that will be used for searching. Similar to tsvector
, there’s a built-in function called to_tsquery
to create it. Let’s take a look at the following example:
SELECT to_tsquery('full text search');
ERROR: syntax error in tsquery: "full text search"
It’s happening because to_tsquery
requires logical operators like &
(AND) and |
(OR). If we omit them, PostgreSQL throw a syntax error. To fix it, we will explicitly include the operators:
SELECT to_tsquery('full & text & search & (postgres | pg) & !mysql');
to_tsquery
-------------------------------------------------------------
'full' & 'text' & 'search' & ( 'postgr' | 'pg' ) & !'mysql'
Thankfully, PostgreSQL offers additional functions beyond the strict to_tsquery
that are much easier to work with, especially in certain scenarios:
plainto_tsquery
: Parses and normalizes text, just liketo_tsquery
, but automatically connects the resulting lexemes with the AND (&
) operator.phraseto_tsquery
: Similar toplainto_tsquery
, but instead of using AND, it links lexemes with the FOLLOWED BY (<->
) operator.websearch_to_tsquery
: The most user-friendly option when building search features for end users. It’s forgiving with input and supports syntax familiar from web search engines. For example:OR
is converted to|
and-
is converted to!
. If you use quotes ("
), the lexemes are connected with FOLLOWED BY (<->
) and if you provide input without quotes, they’re linked with AND (&
).
Function | Usage | Example Input | Example Output |
---|---|---|---|
to_tsquery |
Requires strict syntax | 'full & text & search' |
'full' & 'text' & 'search' |
plainto_tsquery |
Converts input using AND (& ). |
'full text search' |
'full' & 'text' & 'search' |
phraseto_tsquery |
Links lexemes with FOLLOWED BY (<-> ). |
'full text search' |
'full' <-> 'text' <-> 'search' |
websearch_to_tsquery |
Allows user-friendly syntax like web search engines. | '"full text" -mysql' |
'full' <-> 'text' & !'mysql' |
@@
– match operator
@@
operator checks whether a document (tsvector
) matches a search query (tsquery
). We can combine the knowledge from the previous sections and see it in action using the queries we’ve written earlier:
SELECT to_tsvector('Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.') @@ to_tsquery('full & text & search');
?column?
----------
t
The result above means that all the relevant lexemes from the query are present in the document’s vector.
SELECT to_tsvector('Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.') @@ to_tsquery('full & text & search & (postgres | pg)');
?column?
----------
f
Putting It All Together
Using the information we’ve gathered so far, we can now start performing operations on our books database.
SELECT COUNT(*)
FROM books
WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'Harry Potter');
count
-------
2874
In this example, we are searching for the phrase “Harry Potter” in both the title and description columns of the books table. The query uses to_tsvector
to convert the concatenated text from both columns into a tsvector
, and then matches it against a tsquery
created using the plainto_tsquery
function. The coalesce
function ensures that NULL
values in title
or description
are replaced with an empty string (”). This prevents the concatenation from producing NULL
, which would exclude those rows from the search.
The result shows that there are 2,874 books that contain the phrase “Harry Potter” in their title
or description
.
4. Indexing for Better Performance
One of the main strengths of PostgreSQL Full Text Search is its ability to efficiently handle large amounts of data. However, Full Text Search operations can be computationally intensive. To maximize performance, indexing is crucial.
If you tried running the query from the previous example, you might have noticed it could take over one minute to complete, depending on your hardware and this is unacceptable for most applications. The reason can be identified using the EXPLAIN
command:
postgres=# EXPLAIN ANALYZE SELECT count(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'Harry Potter');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=490138.74..490138.75 rows=1 width=8) (actual time=64852.525..64860.303 rows=1 loops=1)
-> Gather (cost=490138.52..490138.73 rows=2 width=8) (actual time=64851.653..64860.285 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=489138.52..489138.53 rows=1 width=8) (actual time=64825.494..64825.495 rows=1 loops=3)
-> Parallel Seq Scan on books (cost=0.00..489138.46 rows=25 width=0) (actual time=53.240..64824.575 rows=958 loops=3)
Filter: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery)
Rows Removed by Filter: 785927
Planning Time: 0.295 ms
Execution Time: 64863.263 ms
The EXPLAIN
command helps analyze query performance by showing the query execution plan. In this case Parallel Seq Scan
shows PostgreSQL performs a full table scan, meaning it checks every row, generates a tsvector
for it, and evaluates its match against the tsquery
which is slow for large datasets. Indexing solves this by allowing PostgreSQL to quickly identify rows containing the searched terms, drastically reducing query time.
Using GIN Indexes
PostgreSQL recommends using a GIN (Generalized Inverted Index) for Full Text Search. GIN index stores a list of all lexemes across all documents, along with their corresponding row locations. For example, if “Harry” is a search term, PostgreSQL can quickly identify rows containing this word using the index, avoiding a full table scan.
Assuming we want to search both the title
and description
columns, here are some possible indexing strategies:
- Direct Indexing on the Search Expression
CREATE INDEX search_vector_idx ON books USING GIN (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')));
- Generated Column with Index
ALTER TABLE books ADD COLUMN search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, ''))) STORED;
CREATE INDEX search_vector_idx ON books USING GIN (search_vector);
- Storing Documents in a Separate Column
In our case, this approach doesn’t make much sense because we are directly using the unmodified title and description columns. However, there are scenarios when creating the document column from other fields with SQL expressions may be challenging, such as when data requires parsing (e.g. stripping HTML tags) or involves complex structures like JSON. In such cases, you can generate the document column in your application code and store it separately:
ALTER TABLE books ADD COLUMN search_document TEXT;
CREATE INDEX search_vector_idx ON books USING GIN (to_tsvector('english', search_document));
Verifying Index
All of these methods will speed up our search, and for the example above, the query will seem almost instant. Using the EXPLAIN
command again shows how the index improves performance:
postgres=# EXPLAIN ANALYZE SELECT count(*) FROM books WHERE to_tsvector('english', coalesce(title, '') || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', 'harry potter'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=297.01..297.02 rows=1 width=8) (actual time=45.266..45.269 rows=1 loops=1) -> Bitmap Heap Scan on books (cost=47.41..296.87 rows=59 width=0) (actual time=2.987..44.881 rows=2874 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery) Heap Blocks: exact=2790 -> Bitmap Index Scan on search_vector_idx (cost=0.00..47.40 rows=59 width=0) (actual time=2.214..2.215 rows=2874 loops=1) Index Cond: (to_tsvector('english'::regconfig, ((COALESCE(title, ''::text) || ' '::text) || COALESCE(description, ''::text))) @@ '''harri'' & ''potter'''::tsquery) Planning Time: 13.629 ms Execution Time: 46.008 ms
We will use the generated column search_vector
(second method) to make the next queries more readable.
5. Ordering Results by Relevance
Ranking is a highly useful feature of Full Text Search that allows us to sort search results based on their relevance to a given query.
Basic Ranking
PostgreSQL provides two built-in functions to rank results based on how well a document (tsvector
) matches a query:
ts_rank
: Computes ranking based on the frequency of matching lexemes.ts_rank_cd
: Extendsts_rank
by also considering cover density, which accounts for the proximity of matching lexemes to one another.
Both functions take tsvector
and tsquery
as their primary arguments.
Example: Sorting Books by Relevance with ts_rank
SELECT
title,
ts_rank(
search_vector,
plainto_tsquery('english', 'Harry Potter')
) as rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY rank DESC;
This query returns books ranked by relevance to the search query using the combined title and description fields. Results are sorted in order of the computed rank.
Considering Document Length
When calculating rankings, the document’s size can also be considered. Both ts_rank and ts_rank_cd accept an optional third argument, the normalization option, which influences how document length affects ranking.
Examples of normalization values:
- 0 (default): Ignores document length.
- 2: Divides the rank by the document’s length.
- 8: Divides the rank by the number of unique words in the document.
For all possible values, refer to the PostgreSQL documentation.
Example: Using Normalization with ts_rank
SELECT
title,
ts_rank(
search_vector,
plainto_tsquery('english', 'Harry Potter'),
2
) as rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY rank DESC;
This query adjusts rankings by dividing them by document length, making shorter documents more likely to rank higher if they match well. This is not very useful in our case, as we are dealing with relatively short documents (book titles and descriptions), but it might be definitely more applicable in systems containing very long documents, such as news articles, research papers, or entire books, where document length significantly varies and could impact relevance.
Customizing Ranking with set_weight
The setweight
function allows assigning weights to lexemes. For example, a search term appearing in the title
might be more important than in the description
. PostgreSQL provides four weights:
- A: 1.0 (Highest priority)
- B: 0.4
- C: 0.2
- D: 0.1 (Lowest priority)
Example: Applying setweight
to tsvector
SELECT setweight(to_tsvector('Harrry Potter'), 'A') || setweight(to_tsvector('and The Philosopher Stone'), 'B');
?column?
---------------------------------------------------
'harrri':1A 'philosoph':5B 'potter':2A 'stone':6B
Example: Applying setweight
while querying
SELECT
*,
ts_rank(
setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'),
plainto_tsquery('english', 'Harry Potter')
) as rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY rank DESC;
It is worth mentioning that the weights have no effect on the @@
operator, which is why do not use them in the WHERE
clause, so PostgreSQL is able to take advantage of the previously created index for filtering. Additionally, to combine two vectors with different weights, we used the ||
operator, which works similarly to string concatenation.
In the example above any match directly in the title
will have 10 times more impact on the ranking than a match in the description
.
Custom Modifiers for Ranking
Finally, it is worth to notice that the value returned by ts_rank
is simply a float number, so we can modify it in any way we like, for example, by multiplying it or adding another number to it. Each application, depending on its specific needs, may require the consideration of different factors when prioritizing some search results over others. News websites may want to display the latest news matching the search query first, while e-commerce websites may take into account sales or current advertising campaigns.
If we run the query from the previous section, you may notice that the results are not exactly what you might expect:
SELECT
title,
ts_rank(
setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'D'),
plainto_tsquery('english', 'Harry Potter')
) as rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY rank DESC
LIMIT 10;
title | rank
--------------------------------------------------------------------------------------------------------+------
Harry Potter and the Chamber of Secrets - Harry Potter dan Kamar Rahasia (Harry Potter, #2) | 1
Harry Potter and the Goblet of Fire - Harry Potter dan Piala Api (Harry Potter, #4) | 1
Harry Potter and the Philosopher's Stone - Harry Potter dan Batu Bertuah (Harry Potter, #1) | 1
Harry Potter and the Order of the Phoenix - Harry Potter dan Orde Phoenix (Harry Potter, #5) | 1
Articles on Harry Potter Films, Including: Harry Potter and the Philosopher's Stone (Film), Harry P... | 1
Harry Potter and the Prisoner of Azkaban - Harry Potter dan Tawanan Azkaban (Harry Potter, #3) | 1
Harry Potter: Harry Potter a L'Ecole Des Sorciers / Harry Potter Et Le Chambre Des Secrets / Harry ... | 1
The Potterhead's Unofficial Harry Potter Cookbook: The Best Recipes from Harry Potter - Harry Potte... | 1
Harry Potter and The Half-Blood Prince - Harry Potter dan Pangeran Berdarah-Campuran (Harry Potter,... | 1
Harry Potter and the Deathly Hallows - Harry Potter dan Relikui Kematian (Harry Potter, #7) | 1
(10 rows)
In the case of our database, it seems like a good idea to consider both the popularity (reflected by the number of ratings) and the average rating of a book. There is no one-size-fits-all approach to determining such a relationship. It is best to determine it through trial and error, considering our dataset and, if available, user behavior data. A good starting point for tuning could be to consider the rating relative to the maximum possible rating and the number of ratings relative to the maximum number of ratings in our dataset.
SELECT
title,
rating_count,
rating,
ts_rank(
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(description, '')), 'D'),
plainto_tsquery('english', 'Harry Potter')
)
* rating / 5
* log(rating_count + 1) / log(4899965)
as total_rank
FROM books
WHERE search_vector @@ plainto_tsquery('english', 'Harry Potter')
ORDER BY total_rank DESC NULLS LAST
LIMIT 10;
title | rating_count | rating | total_rank
--------------------------------------------------------------+--------------+--------+--------------------
Harry Potter and the Sorcerer's Stone (Harry Potter, #1) | 4765497 | 4.45 | 0.8883911497250822
Harry Potter and the Deathly Hallows (Harry Potter, #7) | 1784684 | 4.62 | 0.8634184949841442
Harry Potter and the Prisoner of Azkaban (Harry Potter, #3) | 1876252 | 4.53 | 0.8495411470439224
Harry Potter and the Goblet of Fire (Harry Potter, #4) | 1792561 | 4.53 | 0.8468575628943454
Harry Potter and the Half-Blood Prince (Harry Potter, #6) | 1713866 | 4.54 | 0.8460755511189171
Harry Potter and the Order of the Phoenix (Harry Potter, #5) | 1766895 | 4.47 | 0.8348022743499233
Harry Potter and the Chamber of Secrets (Harry Potter, #2) | 1821802 | 4.38 | 0.8197347807858469
Harry Potter Boxset (Harry Potter, #1-7) | 193057 | 4.74 | 0.7489816444604352
The Harry Potter Collection 1-4 (Harry Potter, #1-4) | 44587 | 4.66 | 0.6476747171319697
Harry Potter Boxed Set, Books 1-5 (Harry Potter, #1-5) | 34349 | 4.77 | 0.6468081396747123
(10 rows)
We made the following modifications:
ts_rank(...)
: Base relevance score from the search query.rating / 5
: Normalizes the book rating (out of 5) to a 0–1 scale.log(rating_count + 1) / log(4899965)
: Adjusts the score based on popularity, comparing the number of ratings to the maximum in the dataset (4899965). In a real application, this value should be fetched more dynamically. Logarithmic scaling flattens extreme values to avoid outliers dominating the results and adding 1 torating_count
handles the case where the value is 0.
By combining these factors, the query prioritizes popular and highly rated books that match the search query. This is just a suggestion that should be tested in many other cases and adjusted if necessary. If it turns out that one of the components (e.g., book popularity) outweighs the others, it is also possible to apply weights, for example, giving 50% influence to ts_rank
and 25% each to rating
and rating_count
.
Ranking Performance
Ranking search results with functions ts_rank
and ts_rank_cd
is computationally expensive because scores are calculated dynamically. For queries that match many rows (e.g., ‘world’ or ‘life’ match several thousand results), this can lead to slow performance, especially in cloud-hosted environments.
To optimize ranking performance:
- Limit Rows Before Ranking: It’s the simplest and most effective approach. Apply additional filters (e.g., exclude low-rated or irrelevant rows) to reduce the number of results PostgreSQL needs to rank. For instance, if our ranking system heavily weights average ratings, we could filter out items with low ratings early on, since it’s unlikely they will appear on the first page of results. Use indexes like GIN to efficiently filter rows before applying ranking calculations.
- Simplify Ranking for Generic Queries: For very generic queries matching many rows, it’s a good idea to fallback to simpler criteria (e.g., just sort by popularity or rating instead of
ts_rank
). - Cache Frequent Queries: Store results of common searches in an external cache or materialized views to avoid recalculating ranks repeatedly.
6. Other Tools
Beyond these foundational features, PostgreSQL offers additional tools to further enhance your search capabilities:
ts_headline
: This function highlights fragments matching the query, making it easier to show relevant excerpts in your search results.- Synonym Dictionaries: Extensions like
xsyn
allow to include predefined variations of the same word in your search results, improving flexibility and relevance. - Similarity Matching: Tools like
pg_trgm
may well complement the possibilities and limitations of FTS for some cases.
7. Limitations
While PostgreSQL is powerful and versatile, it has some limitations that are important to consider:
- No Semantic Understanding: FTS matches lexemes but doesn’t recognize that some words can be semantically related (e.g., ‘wizard’ and ‘sorcerer’). Although it’s possible to use a synonym dictionary, it is very difficult to cover all potential cases.
- No Built-In Support for Approximate Matching: Full Text Search requires exact matches so queries with typos won’t return any results. To address this, it can be combined with PostgreSQL extensions like
pg_trgm
. - Scaling Limitations: PostgreSQL FTS is tied to a single database instance, which can make scaling more difficult for large or distributed applications.
- Limited Ranking Features: Custom ranking needs manual setup; lacks built-in support for factors like popularity or recency.
- Multilingual Challenges: It can be complex to configure it for multiple languages.
8. Conclusion
In this article, we explored how PostgreSQL’s Full Text Search (FTS) can be used to build search system directly within your database. We covered core concepts like tsvector
and tsquery
, ranking results with functions like ts_rank
, and optimizing performance using GIN indexes. Additionally, we demonstrated how to tune relevance and customize ranking to meet specific application needs.
PostgreSQL Full Text Search can be a robust and versatile tool that in some cases can eliminate the need for external search systems. With thoughtful indexing, proper configuration, and adequate relevance tuning, it’s able to deliver a powerful search experience while keeping infrastructure simple and costs low.