VillageSQL is a drop-in replacement for MySQL with extensions.
All examples in this guide work on VillageSQL. Install Now →
Types of Subqueries
An uncorrelated subquery runs once and its result is used by the outer query:EXPLAIN to see how MySQL executes them.
IN vs EXISTS vs JOIN
These three patterns often solve the same problem:| Approach | When to use |
|---|---|
IN (subquery) | Subquery returns a small, static list; readable for simple cases |
EXISTS | Checking existence only; stops at first match; safe with NULLs |
JOIN | Need columns from both tables; MySQL optimizer has more flexibility |
IN (subquery) as a semi-join internally, so the performance difference is frequently negligible. For large subquery result sets, EXISTS or a JOIN is more reliable.
The NULL Problem with NOT IN
NOT IN behaves unexpectedly when the subquery returns any NULL:
orders.customer_id has even one NULL row, the entire NOT IN returns empty. This is correct SQL behavior — NULL is unknown, so NOT IN (NULL, 1, 2) is UNKNOWN for every comparison. Use NOT EXISTS instead:
Subqueries in FROM (Derived Tables)
A subquery in theFROM clause creates a derived table:
Performance Considerations
The MySQL optimizer rewrites many subquery patterns. Before optimizing manually:- Run
EXPLAINon both the subquery and JOIN versions to see what MySQL actually executes - Ensure indexes exist on the join/correlation columns — that’s usually the bottleneck
- Only rewrite if
EXPLAINshows a significant difference in execution plan
| Pattern | Watch for |
|---|---|
| Correlated subquery in SELECT | Runs per row — can be O(n²) on large tables |
IN (subquery) returning large result set | Can be slow pre-8.0; usually optimized in 8.0+ |
NOT IN with possible NULLs | Wrong results — use NOT EXISTS |
| Derived table in FROM | Optimizer may or may not merge it; check EXPLAIN |
Frequently Asked Questions
Is JOIN always faster than a subquery?
Not necessarily. MySQL 8.0’s optimizer rewritesIN (subquery) to semi-joins in many cases, producing the same execution plan as an explicit JOIN. Measure with EXPLAIN rather than assuming one form is always faster.
When should I use EXISTS instead of IN?
UseEXISTS when you only care whether a match exists (not what the matched data is), when the subquery could return NULLs, or when the subquery returns a large result set. EXISTS short-circuits on the first match; IN must build the full list.
What’s a semi-join?
A semi-join returns rows from the left table that have at least one matching row on the right — without producing duplicate left-table rows for multiple matches. MySQL’s optimizer uses semi-joins internally to executeIN (subquery) and EXISTS efficiently. You can’t write a semi-join directly in SQL, but EXISTS and IN (subquery) both trigger the optimizer to consider it.
Troubleshooting
| Problem | Solution |
|---|---|
NOT IN returning empty results unexpectedly | NULL in the subquery result — switch to NOT EXISTS |
| Correlated subquery very slow | Add an index on the correlated column; consider rewriting as a JOIN |
| Same query faster as JOIN than as IN subquery | On MySQL 5.7, the optimizer sometimes fails to rewrite IN subqueries — upgrade or rewrite manually |
| Derived table query slow | Check if MySQL materialized the derived table — may benefit from CTE with SQL_NO_CACHE hints |

