MySQL in N-Dimensions: High-Complexity Data with vsql-cube

Share
MySQL in N-Dimensions: High-Complexity Data with vsql-cube

MySQL stores numbers well. What it can't do is treat a row's values as a multi-dimensional point in space. Standard MySQL spatial types are strictly limited to 2D (or 2D+Z) GIS data. It can’t check whether that point falls inside a bounding region, measure how far two rows are from each other across multiple attributes, or compute the envelope that spans a group of rows. If you need any of that, you write a pile of WHERE conditions or push the data out to application code.

The VillageSQL vsql-cube extension adds arbitrary multi-dimensional geometry to MySQL. Unlike MySQL's built-in GIS features designed for maps, vsql-cube adds a cube custom type representing n-dimensional points and boxes (up to 100 dimensions per value), plus 24 functions for constructing, comparing, and computing geometry between them. It’s a port of PostgreSQL's cube extension.

MySQL GIS vsql-cube
Dimensions 2D Up to 100D
Designed for Map coordinates Feature profiles, attribute ranges
Typical query Within 5 miles of a location Inside a multi-attribute envelope
Distance Geographic (Haversine) Euclidean, Manhattan, Chebyshev

vsql-cube makes MySQL capable of handling high-dimensional feature data—calculating Euclidean distances across dozens of attributes simultaneously—without the overhead of external application logic. It's a geometry toolkit for analytics and smaller datasets. You get containment, distance, and aggregation entirely in SQL.

To get started, install the extension with INSTALL EXTENSION vsql_cube;. Because cube is a native MySQL reserved word (used in GROUP BY CUBE), you must backtick-quote it in DDL and always specify an explicit dimension parameter:

CREATE TABLE products (
    id     INT PRIMARY KEY,
    name   VARCHAR(100),
    attrs  `cube`(5)    -- price, weight, rating, stock, lead_days
);

Storage is 8 + 2*n*8 bytes per row — a packed float64 binary layout. cube(3) uses 56 bytes; cube(32) uses 520. The dimension n isn't just a constraint: VEF TypeParameters persists it with the column definition and the extension reads it at decode time to size buffers and validate coordinates. Match n to your actual data to avoid waste. Values come in as string literals — a point is a comma-separated list of coordinates, a box is two corner points:

INSERT INTO products VALUES (1, 'Laptop Stand', '(29.99,0.5,4.2,150,3)');
INSERT INTO products VALUES (2, 'Keyboard',     '(79.99,0.9,4.6,40,5)');
INSERT INTO products VALUES (3, 'Mouse',        '(39.99,0.2,4.8,200,2)');

Each row is now a location in 5-dimensional attribute space. The Laptop Stand sits at (29.99, 0.5, 4.2, 150, 3)—$29.99, half a kilogram, 4.2 stars, 150 in stock, ships in 3 days. Once the data is modeled that way, geometric operations become available: draw a search box, measure distances between rows, compute the envelope that spans a group. That's what the rest of this post covers.

The main reason to reach for vsql-cube is multi-attribute containment. cube_contains takes a box (the search envelope) as the first argument and a point (the row) as the second, returning 1 if the point falls inside:

-- Price $20–$50, weight ≤1kg, rating ≥4.0, at least 1 in stock, ships within 30 days
SELECT id, name
FROM products
WHERE cube_contains(
    cube_from_string('(20.0,0.0,4.0,1,1),(50.0,1.0,5.0,999,30)'),
    attrs
);

That's 5 attribute filters in one call. Add or remove dimensions by changing the search envelope—no schema changes.

For interval-based data, a 1D cube is just a range with a start and end point. cube_overlaps returns 1 if two cubes share any region—for scheduling, that means two time slots overlap:

-- Find appointments that overlap a given time slot
SELECT id FROM appointments WHERE cube_overlaps(slot, cube_box(9.0, 11.0));

Containment also works for anomaly detection. A user's normal transaction behavior is a box: expected amount range, typical hours, usual distance from home, merchant categories. An incoming transaction is a point. cube_contains returns 0 if it falls outside the envelope. This example returns 0 so you’d flag it.

-- envelope dimensions: amount, hour of day, distance_km, merchant_category
SELECT cube_contains(
    cube_from_string('(5.00,6,0,1),(500.00,23,50,8)'),   -- user's normal profile
    cube_from_string('(1200.00,3,340,5)')                 -- incoming transaction
) AS within_profile;

Three distance metrics ship with the extension. Euclidean (L2) measures straight-line distance—the right choice when all dimensions contribute equally to similarity. Manhattan (L1) sums the per-dimension differences—better when dimensions are independent and you don't want one large gap on a single attribute to dominate the score. Chebyshev (L-infinity) takes the single largest gap across all dimensions—useful when any one attribute being far off is enough to consider two rows dissimilar. All three return 0 if the inputs overlap; otherwise distance is measured between the nearest faces of the two cubes. For nearest-neighbor lookups:

-- Find the 3 products most similar to product #1 by attribute distance
SELECT p.id, p.name, cube_distance(p.attrs, ref.attrs) AS dist
FROM products p
JOIN products ref ON ref.id = 1 AND p.id != 1
ORDER BY dist
LIMIT 3;

cube_agg answers: what range of values does this group actually cover? It computes the tightest bounding box containing every row in the group—minimum and maximum across all dimensions simultaneously, in a single pass. That's the operating envelope of a device's sensor readings, the price-and-rating range of a product category, the bounding region of a cluster:

SELECT device_id, cube_to_string(cube_agg(readings)) AS envelope
FROM sensor_data
GROUP BY device_id;

cube_scalar_agg does the same for plain REAL columns, returning a 1D [min, max] interval as a cube. Both return NULL for empty groups.

The extension also provides cube_union and cube_inter for merging and intersecting two cubes, cube_enlarge to expand or shrink a box by a radius, and cube_subset to extract and reorder specific dimensions. For building search windows dynamically—expand a center point into a search box of a given radius, or pull a 2D slice from a higher-dimensional cube for a simpler comparison:

-- Expand a product's attribute point into a search envelope of ±5 on all 3 dimensions
SELECT cube_to_string(cube_enlarge(attrs, 5.0, 3)) AS search_box FROM products WHERE id = 1;

There are two limitations to be aware of that we will address as VillageSQL gains additional functionality. First, there's no Generalized Search Tree (GiST) indexing. Range queries and nearest-neighbor lookups require a full table scan. For small tables or batch analytics that's usually fine; for large tables with frequent spatial queries, it may be a constraint. Second, there's no infix operator syntax—VEF doesn't have a custom operator API yet, so && becomes cube_overlaps, <-> becomes cube_distance, and so on. Code ported from PostgreSQL needs a mechanical substitution pass, nothing deeper.

Get Started

We built vsql-cube for MySQL users who need multi-attribute range queries, similarity scoring, or bounding box aggregation without spinning up application code to handle it. For search, analytics, and feature-data comparisons where a full scan is acceptable, vsql-cube brings the capability directly into SQL. As we expand VillageSQL's storage and indexing capabilities, we'll be introducing specialized storage and indexing extensions designed for low-latency vector similarity lookups at scale

Source, build instructions, and full documentation at vsql-cube on GitHub.

Get started with VillageSQL at villagesql.com