VillageSQL is a drop-in replacement for MySQL with extensions.
All examples in this guide work on VillageSQL. Install Now →
First Normal Form (1NF)
A table is in 1NF when every column holds atomic values — one value per cell — and there are no repeating groups. Violates 1NF — multiple values packed into one column:Second Normal Form (2NF)
A table is in 2NF when it is in 1NF and every non-key column depends on the whole primary key — not just part of it. This only applies to tables with composite primary keys. Violates 2NF —product_name depends only on product_id, not on the full (order_id, product_id) key:
product_name to its own table:
Third Normal Form (3NF)
A table is in 3NF when it is in 2NF and no non-key column depends on another non-key column. These are called transitive dependencies. Violates 3NF —city and state depend on zip_code, not directly on customer_id:
When to Denormalize
Normalization improves write integrity and reduces redundancy, but adds join overhead on reads. Read-heavy or analytics workloads sometimes warrant deliberate denormalization.| Scenario | Normalize | Denormalize |
|---|---|---|
| Frequent writes, data integrity critical | Yes | — |
| Reporting queries joining 6+ tables | — | Consider |
| Small dataset (< 100K rows) | Yes | Not needed |
| Data warehouse or analytics replica | — | Yes |
| OLTP application with mixed read/write | Yes | Only hot paths |
Frequently Asked Questions
Do I need to go beyond 3NF?
For most applications, 3NF is enough. Boyce-Codd Normal Form (BCNF) addresses edge cases where overlapping candidate keys still allow anomalies in 3NF — situations that rarely arise outside academic exercises. Stop at 3NF unless you have a specific anomaly you’re trying to eliminate.Should I denormalize for performance?
Profile before you restructure. Most slow queries are caused by missing indexes, not by joins. Add the index first. If well-indexed queries are still slow on large tables, consider denormalization on a read replica or a materialized summary table rather than on the transactional database.What’s the difference between a 2NF violation and a 3NF violation?
A 2NF violation is a column that depends on part of the primary key (only possible with composite PKs). A 3NF violation is a column that depends on another non-key column. 2NF is about partial key dependencies; 3NF is about transitive dependencies through non-key columns.Troubleshooting
| Problem | Solution |
|---|---|
| Duplicate data appearing across rows | Likely a 2NF or 3NF violation — move the repeated data to its own table |
| Update anomalies (change one row, others become stale) | Transitive dependency — extract to a lookup table |
| Queries joining 5+ tables are slow | Check indexes on FK columns first; denormalize only if indexes don’t help |
| Normalized schema too complex to query | Consider a read-optimized view or a denormalized analytics replica |

