VillageSQL’s extension framework allows you to add custom functionality to the database server. Build custom extensions using the VEF SDK and extension template.
A VillageSQL extension is packaged as a VEB file (VillageSQL Extension Bundle) containing:
Manifest - Metadata about the extension (name, version, description)
Shared library - Compiled C++ code implementing the functionality
Optional metadata - Additional resources or configuration
Extensions are built using the VEF SDK (VillageSQL Extension Framework), which provides:
C++ API for defining types and functions
Automatic registration without SQL scripts
Type-safe function wrappers
Builder pattern for extension definition
VDFs vs Traditional UDFs: Functions registered through the VEF SDK are called VDFs (VillageSQL Defined Functions). VillageSQL also supports traditional MySQL UDFs registered via CREATE FUNCTION ... SONAME, but the VEF SDK approach is recommended for new extensions.
VDFs can be called with or without the extension prefix:
-- Unqualified (preferred for cleaner code)SELECT complex_abs(impedance) FROM signals;-- Qualified with extension name (explicit)SELECT vsql_complex.complex_abs(impedance) FROM signals;
Function Resolution Order:When you call a function without qualification, VillageSQL resolves it in this order:
System functions (built-in MySQL functions like NOW(), CONCAT())
UDFs (traditional MySQL user-defined functions)
VDFs (extension functions) - only if there’s exactly one function with that name
Stored functions (created with CREATE FUNCTION)
When to Use Qualified Names:
Use extension.function_name when multiple extensions provide functions with the same name
Use unqualified names for cleaner code when there’s no ambiguity
Qualification is never required if only one extension provides that function name
Extensions can add:
Custom functions (VDFs) - SQL functions with automatic type checking and validation
Custom data types - New column types like COMPLEX, UUID, or VECTOR that work with ORDER BY and indexes
Type operations - Encode, decode, compare, and hash functions for custom types
Before you begin, build VillageSQL from source — extensions link against the server’s SDK headers and build tree. Follow the Clone and Build from Source guide first.You also need:
Git - For cloning and version control
CMake 3.16 or higher - Build system
C++ Compiler - GCC 8+, Clang 8+, or MSVC 2019+ with C++11 support
Basic C++ knowledge - Understanding of C++ and function pointers
Unique identifier. Must match INSTALL EXTENSION name.
version
✅ Yes
MAJOR.MINOR.PATCH
Semantic version string
description
No
String
Brief explanation of functionality
author
No
String
Author name or organization
license
No
String
License identifier (GPL-2.0 recommended)
Validation rules:
name: Must start with a letter and end with a letter or digit. May contain lowercase letters, digits, underscores, and hyphens. Max 64 characters. Use underscores — hyphens require backtick quoting in SQL.
version: Must follow semantic versioning (e.g., 1.0.0, 0.2.1)
Invalid manifest will cause INSTALL EXTENSION to fail
Example:
-- manifest.json has "name": "my_awesome_extension"INSTALL EXTENSION my_awesome_extension; -- ✅ Correct: no quoting neededINSTALL EXTENSION `my-awesome-extension`; -- ⚠️ Works, but requires backtick quoting
.buffer_size(size_t) - Request specific output buffer size for STRING/CUSTOM returns
.deterministic(bool = true) - Declare that this function always returns the same output for the same inputs and has no side effects. Default is non-deterministic.
.prerun<func>() - Set per-statement setup function (optional)
.postrun<func>() - Set per-statement cleanup function (optional)
.build() - Finalize function registration
Parameter Limit: Functions support a maximum of 8 parameters (defined by kMaxParams). If you need more, consider using structured types or multiple functions.
A VDF can take and return custom type values using .param(TYPE_NAME) and
.returns(TYPE_NAME) in the builder. The implementation uses CustomArg
for input and CustomResult for output — the same wrappers used for type
operations:
void complex_conjugate_impl(CustomArg in, CustomResult out) { if (in.is_null()) { out.set_null(); return; } auto src = in.value(); // Span<const unsigned char> — raw binary auto dst = out.buffer(); // Span<unsigned char> // ... read src, write result to dst ... out.set_length(src.size());}
Register with .param(COMPLEX) and .returns(COMPLEX):
By default, VDFs are registered as non-deterministic. A non-deterministic function is blocked from three SQL contexts: generated columns, CHECK constraints, and expression default values (DEFAULT (expr) on a column) — using any of these features with a non-deterministic VDF returns an error. If your function always produces the same output for the same inputs and has no side effects, you can declare it deterministic by adding .deterministic() to the builder chain.The optimizer may use this information to evaluate the function once per statement and reuse the value across rows, rather than calling it per row. Incorrectly marking a non-deterministic function as deterministic can therefore cause the server to return the same result for inputs that should produce different outputs. Only add .deterministic() when your function truly has no dependency on external state, randomness, or time. See the MySQL UDF Characteristics documentation for more details on why and when determinism is required.Builder signature:.deterministic(bool d = true) — the zero-argument form defaults to true.Example:
Because complex_add is declared deterministic, it can be used in a generated column definition:
-- Deterministic VDFs can be used in generated columnsCREATE TABLE t ( a COMPLEX, b COMPLEX, result COMPLEX GENERATED ALWAYS AS (complex_add(a, b)) STORED);-- STORED vs VIRTUAL follows standard MySQL generated column rules
These contracts govern how VDF implementation functions interact with the
VEF runtime. Every function registered via make_func<> must follow them.
The types referenced below are defined in #include <villagesql/extension.h>.
Communicate success, NULL, or error exclusively through result->type.2. Set result->type to exactly one of the four result constants.Four constants exist in vef_return_value_type_t:
Constant
Value
Meaning
VEF_RESULT_VALUE
0
Success — the output is in the appropriate union field
VEF_RESULT_NULL
1
The result is SQL NULL
VEF_RESULT_WARNING
2
Row-level warning — execution continues, NULL is returned for this row, and a SQL warning is added. In strict mode, MySQL promotes this to an error on INSERT/UPDATE.
VEF_RESULT_ERROR
3
Fatal error — statement execution is aborted; message is in result->error_msg
There are no type-specific variants. VEF_RESULT_VALUE is the single success
constant for strings, integers, reals, and custom types alike. The type of
output is determined by which union field you write to (str_buf,
int_value, real_value, bin_buf).3. Check input->is_null before reading any other field on an input argument.If is_null is true, every other field on that vef_invalue_t is undefined.
Accessing str_value, int_value, or any other field without checking
is_null first is undefined behavior.
5. Write error messages to result->error_msg, not to result->str_buf. Use VEF_MAX_ERROR_LEN (512 bytes) as the snprintf size limit.error_msg is a separate, caller-provided buffer dedicated to error text.
It is independent of str_buf. The maximum size including the null
terminator is VEF_MAX_ERROR_LEN (512).
The bool return convention applies only to encode and decode function pointers used by custom types. It does not apply to VDF-name type operations (which use the standard VDF void convention) or to VDF implementation functions.
false = success (the operation completed without error)
true = error (the conversion failed)
The encode/decode return convention is inverted from what most developers
expect. false means success, true means error. This follows the general
convention of MySQL internal C/C++ APIs, which frequently return 0 (false)
for success. Double-check every return statement in your encode and decode
implementations.
Check for NULL via the is_null flag and return NULL by setting the result type:
void my_func_impl(vef_context_t* ctx, vef_invalue_t* input, vef_vdf_result_t* result) { // Check if input is NULL if (input->is_null) { result->type = VEF_RESULT_NULL; return; } // Access input data const char* str = input->str_value; size_t len = input->str_len; // Process and write result to result->str_buf // ... result->type = VEF_RESULT_VALUE; result->actual_len = output_length;}
NULL handling options:
Input NULL check:if (input->is_null)
Return NULL:result->type = VEF_RESULT_NULL
Return value:result->type = VEF_RESULT_VALUE + write to type-specific buffer
Return warning:result->type = VEF_RESULT_WARNING + warning message in result->error_msg (returns NULL for this row, adds SQL warning, continues execution; in strict mode, MySQL promotes this to an error on INSERT/UPDATE)
Return error:result->type = VEF_RESULT_ERROR + error message in result->error_msg
Every custom type needs encode, decode, and compare operations, and optionally a hash operation. There are two ways to provide these: VDF names (VEF_PROTOCOL_2 or later, recommended) or function pointers. For each operation, provide one or the other — not both.VDF names — reference VDFs from the same extension by name (see VDF-Name Type Operations):
Each VDF named via .encode(), .decode(), .compare(), or .hash() must be registered with the matching make_type_* builder in the same extension:
Operation
Builder
VDF SQL Signature
encode
make_type_encode<&f>("name", TYPE)
(STRING) -> CUSTOM(this type)
decode
make_type_decode<&f>("name", TYPE)
(CUSTOM(this type)) -> STRING
compare
make_type_compare<&f>("name", TYPE)
(CUSTOM(this type), CUSTOM(this type)) -> INT
hash
make_type_hash<&f>("name", TYPE)
(CUSTOM(this type)) -> INT
VDF-name operations use the bool return convention (false = success, true = error), same as function-pointer encode/decode. See development.mdx — Type Operation Builders for the full C++ signatures.
-- Convert string to custom typeSELECT mytype_from_string('(1.0,2.0)');-- Convert custom type to stringSELECT mytype_to_string(my_column) FROM my_table;-- Explicit conversion in INSERTINSERT INTO my_table (id, value)VALUES (1, mytype_from_string('(3.0,4.0)'));
The .from_string<>() and .to_string<>() builders automatically create SQL-callable conversion functions using your encode/decode implementations.
Here’s a complete example implementing a COMPLEX number type:
#include <cstdio>#include <cstring>// Encode complex number from "(real,imag)" to 16 bytesbool encode_complex(std::string_view from, villagesql::Span<unsigned char> buf, size_t *length) { if (buf.size() < 16) return true; double real, imag; if (sscanf(from.data(), "(%lf,%lf)", &real, &imag) != 2) return true; memcpy(buf.data(), &real, 8); memcpy(buf.data() + 8, &imag, 8); *length = 16; return false;}// Decode 16 bytes to "(real,imag)" stringbool decode_complex(villagesql::Span<const unsigned char> data, villagesql::Span<char> out, size_t *out_len) { if (data.size() < 16) return true; double real, imag; memcpy(&real, data.data(), 8); memcpy(&imag, data.data() + 8, 8); int len = snprintf(out.data(), out.size(), "(%.6f,%.6f)", real, imag); if (len < 0 || len >= (int)out.size()) return true; *out_len = len; return false;}// Compare for ORDER BY: compare real part first, then imaginaryint compare_complex(villagesql::Span<const unsigned char> a, villagesql::Span<const unsigned char> b) { if (a.size() < 16 || b.size() < 16) return 0; double a_real, a_imag, b_real, b_imag; memcpy(&a_real, a.data(), 8); memcpy(&a_imag, a.data() + 8, 8); memcpy(&b_real, b.data(), 8); memcpy(&b_imag, b.data() + 8, 8); if (a_real < b_real) return -1; if (a_real > b_real) return 1; if (a_imag < b_imag) return -1; if (a_imag > b_imag) return 1; return 0;}
After defining these operations, users can create tables with your custom type:
CREATE TABLE signals ( id INT PRIMARY KEY, impedance COMPLEX, frequency_response COMPLEX);INSERT INTO signals VALUES (1, '(50.0,10.0)', '(0.95,0.31)');-- ORDER BY works because we provided compare_complex!SELECT * FROM signals ORDER BY impedance;-- Prepared statements work with custom typesPREPARE stmt FROM 'SELECT * FROM signals WHERE impedance = ?';SET @val = '(50.0,10.0)';EXECUTE stmt USING @val;-- Aggregate operations work with custom typesSELECT COUNT(DISTINCT impedance), MIN(impedance), MAX(impedance), GROUP_CONCAT(impedance ORDER BY impedance) FROM signals;
VDFs can be used in generated column expressions. The VDF must be declared .deterministic() in the extension builder — the server blocks non-deterministic functions in this context.
CREATE TABLE signals ( id INT PRIMARY KEY, impedance COMPLEX, -- Generated column computed by a VDF magnitude DOUBLE GENERATED ALWAYS AS (complex_abs(impedance)) STORED);
complex_abs must be registered with .deterministic(). Traditional MySQL UDFs are not permitted in generated columns.
VDFs can be used in functional index expressions. The same .deterministic()
requirement from generated columns applies here because MySQL implements
functional indexes as hidden generated columns.
CREATE TABLE signals ( id INT PRIMARY KEY, sig COMPLEX, INDEX idx_magnitude ((COMPLEX_ABS(sig))));
The optimizer uses the index when the same VDF expression appears in WHERE,
ORDER BY, or GROUP BY. Cast the comparison value to the VDF’s return type
so the optimizer matches the expression:
SELECT id FROM signals WHERE COMPLEX_ABS(sig) > CAST(20.0 AS DOUBLE);
# Configure the buildcmake ..# Or, if building against VillageSQL source:cmake .. -DVillageSQL_BUILD_DIR=/path/to/villagesql/build# Or, to build against the unstable dev ABI headers:cmake .. -DVSQL_USE_DEV_ABI=ON# Build the extensionmake
This creates:
Compiled shared library (.so file)
VEB package (.veb file) - a tar archive containing manifest and library
For VEF SDK functions that need setup/cleanup per SQL statement (not per row):
// Called once before the first rowvoid my_prerun(vef_context_t* ctx, vef_prerun_args_t* args, vef_prerun_result_t* result) { // Allocate state, open connections, etc. void* state = malloc(sizeof(MyState)); result->user_data = state;}// Called once after all rows processedvoid my_postrun(vef_context_t* ctx, vef_postrun_args_t* args, vef_postrun_result_t* result) { // Clean up state allocated in prerun if (args->user_data) { free(args->user_data); }}// Register with extensionVEF_GENERATE_ENTRY_POINTS( make_extension("my_ext", "1.0.0") .func(make_func<&my_func_impl>("my_func") .returns(STRING) .param(STRING) .prerun(&my_prerun) .postrun(&my_postrun) .build()));
Most extensions don’t need prerun/postrun hooks. The VEF SDK automatically handles common cases like type checking and buffer allocation. Use prerun/postrun only when you need expensive per-statement setup (like opening connections) that shouldn’t happen per-row.If you find you need prerun/postrun for your use case, please share your scenario on the VillageSQL Discord - the team may be able to add SDK support to handle it automatically.
Only COUNT(DISTINCT), MIN, MAX, and GROUP_CONCAT are supported for custom types. SUM, AVG, and other aggregate functions are not.
Supported aggregate operations with custom types:
-- COUNT(DISTINCT) works with custom typesSELECT COUNT(DISTINCT impedance) FROM signals;-- MIN and MAX work with custom types (requires compare function)SELECT MIN(impedance), MAX(impedance) FROM signals;-- GROUP_CONCAT works with custom typesSELECT GROUP_CONCAT(impedance ORDER BY impedance SEPARATOR ', ') FROM signals;
Extension functions are called in a per-row execution model:
Each function call processes one row with its own result buffer (thread-safe)
prerun/postrun provide per-statement setup/teardown
Avoid global state - use function parameters and return values instead
If you must use global state, protect it with mutexes/locks
Best practice: Design functions to be stateless for simplicity and safety.
The following window functions work with custom types:
SELECT id, impedance, LAG(impedance) OVER (ORDER BY id) AS prev_impedance, LEAD(impedance) OVER (ORDER BY id) AS next_impedanceFROM signals;SELECT id, impedance, FIRST_VALUE(impedance) OVER w AS first_impedance, LAST_VALUE(impedance) OVER w AS last_impedance, NTH_VALUE(impedance, 2) OVER w AS second_impedanceFROM signalsWINDOW w AS (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
Triggers fire on tables with custom type columns. The trigger body can
reference non-custom-type columns from NEW and OLD. Accessing custom
type column values inside a trigger body is not yet supported.
CREATE TABLE signals ( id INT PRIMARY KEY, impedance COMPLEX, label VARCHAR(50));CREATE TABLE signal_log ( id INT, label VARCHAR(50), logged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);CREATE TRIGGER signals_after_insertAFTER INSERT ON signalsFOR EACH ROW INSERT INTO signal_log (id, label) VALUES (NEW.id, NEW.label);INSERT INTO signals VALUES (1, '(10,5)', 'sensor_a');SELECT id, label FROM signal_log;