VillageSQL is a drop-in replacement for MySQL with extensions.
All examples in this guide work on VillageSQL. Install Now →
This guide requires VillageSQL 0.0.4 or later.
LIKE, SOUNDEX, LEVENSHTEIN — cover basic matching, but none of them give you a continuous similarity score you can sort or threshold on. VillageSQL’s vsql_trgm extension adds trigram-based similarity: a number between 0 and 1 that measures how alike two strings are, so you can rank results and filter by similarity in a single query.
How trigrams work
A trigram is a group of 3 consecutive characters.vsql_trgm lowercases the input, splits it on non-alphanumeric characters, pads each word with two leading spaces and one trailing space, then slides a 3-character window across each padded word. Similarity is how many trigrams the two inputs share, divided by the larger of the two trigram set sizes.
"Cat" and "cat" produce the same trigrams. They are also split on non-alphanumeric characters, so each word is padded and trigrammed independently — "hello world" produces trigrams from " hello " and " world " separately, not from a single padded string. There is no "o w" or " wo" trigram.
Similarity ranges from 0 (nothing in common) to 1 (identical).
Getting started
Core functions
trgm_similarity — continuous score
trgm_similar — threshold check
Returns 1 if similarity is at or above 0.3 (the PostgreSQL default), 0 otherwise:trgm_distance — complement of similarity
1 - similarity. Useful for ORDER BY when you want closest first:
Finding near-matches in a table
The most common use: find rows that resemble a search term.LIKE '%adidas%' would miss.
Word similarity
trgm_word_similarity finds the best match between a query string and any contiguous substring of the target. This is useful when the search term might be a word inside a longer string.
trgm_strict_word_similarity is a stricter variant that requires the match to align with word boundaries.
| Function | Threshold |
|---|---|
trgm_word_similar(a, b) | 0.6 |
trgm_strict_word_similar(a, b) | 0.5 |
Inspecting trigrams
trgm_show returns the sorted trigram set for a string as a JSON array:
Function reference
| Function | Returns | Notes |
|---|---|---|
trgm_similarity(a, b) | REAL | 0–1 similarity score |
trgm_distance(a, b) | REAL | 1 − similarity |
trgm_similar(a, b) | INT | 1 if similarity ≥ 0.3 |
trgm_similar_threshold(a, b, t) | INT | 1 if similarity ≥ t |
trgm_word_similarity(a, b) | REAL | best substring match score |
trgm_word_distance(a, b) | REAL | 1 − word_similarity |
trgm_word_similar(a, b) | INT | 1 if word_similarity ≥ 0.6 |
trgm_strict_word_similarity(a, b) | REAL | word-boundary-aligned score |
trgm_strict_word_distance(a, b) | REAL | 1 − strict_word_similarity |
trgm_strict_word_similar(a, b) | INT | 1 if strict_word_similarity ≥ 0.5 |
trgm_show(text) | TEXT | JSON array of trigrams |
Known limitations
No index acceleration. VEF does not expose a custom index API, soWHERE trgm_similarity(col, 'term') > 0.3 requires a full table scan. For large tables, pre-filter with LIKE or full-text search to reduce the candidate set, then re-score with trgm_similarity.
No session-level threshold. PostgreSQL’s set_limit() / show_limit() cannot be implemented in VEF (VEF functions are stateless). Use trgm_similar_threshold(a, b, t) to pass the threshold explicitly.
No array return. trgm_show returns a JSON string rather than a native array type.

