VillageSQL is a drop-in replacement for MySQL with extensions.
All examples in this guide work on VillageSQL. Install Now →
This guide requires VillageSQL 0.0.4 or later.
= and LIKE miss. VillageSQL’s vsql_fuzzystrmatch extension provides four approaches: Soundex (sounds-alike grouping), Levenshtein edit distance (character-by-character cost), Metaphone (pronunciation-based encoding), and Double Metaphone (handles ambiguous pronunciations from multiple linguistic origins).
Installation
vsql_fuzzystrmatch schema. All return NULL when any argument is NULL.
Soundex
Soundex reduces an English word to a 4-character code based on how it sounds. Words with the same code are considered phonetic matches.difference returns a 0–4 score for how many Soundex code characters two strings share. 4 = identical codes.
Finding phonetic matches in a table
Levenshtein edit distance
Levenshtein counts the minimum number of single-character edits — insertions, deletions, substitutions — needed to turn one string into another.Custom operation costs
levenshtein_cost lets you weight insertions, deletions, and substitutions differently:
Early-exit threshold
levenshtein_less_equal short-circuits once the distance exceeds a threshold — more efficient for filtering. When the actual distance exceeds max_d, it returns max_d + 1 instead of computing the full distance:
Deduplication by edit distance
Metaphone
Metaphone applies more English pronunciation rules than Soundex and produces variable-length output. Themax_output_length parameter caps the result.
Double Metaphone
Double Metaphone handles ambiguous pronunciations from multiple linguistic origins — names of German, Slavic, Spanish, Hebrew, and other origins that get anglicized in different ways. It produces two codes: primary and alternate.dmetaphone_alt falls back to the primary code.
Use Double Metaphone when your data contains names from multiple linguistic backgrounds, or when Metaphone misses matches that a human would consider equivalent.
Matching with both codes
Choosing an approach
| Situation | Recommended approach |
|---|---|
| English name sounds-alike grouping | Soundex |
| Typo tolerance with a distance budget | Levenshtein |
| Better English phonetics than Soundex | Metaphone |
| Names from mixed linguistic backgrounds | Double Metaphone |
| Similarity score rather than a threshold | trgm_similarity |
Function reference
| Function | Returns | Notes |
|---|---|---|
soundex(text) | TEXT | 4-char Soundex code |
difference(text, text) | INT | 0–4 Soundex overlap score |
levenshtein(s, t) | INT | Edit distance, all ops cost 1 |
levenshtein_cost(s, t, ins, del, sub) | INT | Edit distance with custom costs |
levenshtein_less_equal(s, t, max_d) | INT | Distance if ≤ max_d; else max_d+1 |
levenshtein_less_equal_cost(s, t, ins, del, sub, max_d) | INT | Custom costs + early exit |
metaphone(text, max_len) | TEXT | Metaphone code, truncated to max_len |
dmetaphone(text) | TEXT | Double Metaphone primary code |
dmetaphone_alt(text) | TEXT | Double Metaphone alternate code |
Known limitations
No function overloading. VEF registers functions by name only; two functions with the same name cannot coexist. Where PostgreSQL uses overloads (levenshtein with 2 vs 5 args), this extension uses distinct names (levenshtein vs levenshtein_cost).
String length limit. levenshtein inputs must be ≤ 255 characters.

