Skip to main content

VillageSQL is a drop-in replacement for MySQL with extensions.

All examples in this guide work on VillageSQL. Install Now →
Database normalization is the process of organizing tables to reduce redundancy and prevent update anomalies. Most applications should reach third normal form (3NF). Past that, the gains are mostly theoretical and the join complexity is real.

First Normal Form (1NF)

A table is in 1NF when every column holds atomic values — one value per cell — and there are no repeating groups. Violates 1NF — multiple values packed into one column:
CREATE TABLE orders (
    id          INT PRIMARY KEY,
    customer    VARCHAR(100),
    product_ids VARCHAR(255)  -- "1,4,7" — not atomic
);
Satisfies 1NF — one value per cell, repeating group extracted to a child table:
CREATE TABLE orders (
    id       INT PRIMARY KEY,
    customer VARCHAR(100)
);

CREATE TABLE order_items (
    order_id   INT,
    product_id INT,
    PRIMARY KEY (order_id, product_id)
);

Second Normal Form (2NF)

A table is in 2NF when it is in 1NF and every non-key column depends on the whole primary key — not just part of it. This only applies to tables with composite primary keys. Violates 2NFproduct_name depends only on product_id, not on the full (order_id, product_id) key:
CREATE TABLE order_items (
    order_id     INT,
    product_id   INT,
    product_name VARCHAR(100),  -- partial dependency: only needs product_id
    quantity     INT,
    PRIMARY KEY (order_id, product_id)
);
Satisfies 2NF — move product_name to its own table:
CREATE TABLE products (
    id   INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE order_items (
    order_id   INT,
    product_id INT,
    quantity   INT,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products (id)
);

Third Normal Form (3NF)

A table is in 3NF when it is in 2NF and no non-key column depends on another non-key column. These are called transitive dependencies. Violates 3NFcity and state depend on zip_code, not directly on customer_id:
CREATE TABLE customers (
    id       INT PRIMARY KEY,
    name     VARCHAR(100),
    zip_code VARCHAR(10),
    city     VARCHAR(100),  -- transitively depends on zip_code
    state    CHAR(2)        -- transitively depends on zip_code
);
Satisfies 3NF — extract the transitive dependency:
CREATE TABLE zip_codes (
    zip_code VARCHAR(10) PRIMARY KEY,
    city     VARCHAR(100),
    state    CHAR(2)
);

CREATE TABLE customers (
    id       INT PRIMARY KEY,
    name     VARCHAR(100),
    zip_code VARCHAR(10),
    FOREIGN KEY (zip_code) REFERENCES zip_codes (zip_code)
);
In practice, extracting zip codes to a separate table is overkill for most applications. The principle matters more than mechanical compliance at every edge case.

When to Denormalize

Normalization improves write integrity and reduces redundancy, but adds join overhead on reads. Read-heavy or analytics workloads sometimes warrant deliberate denormalization.
ScenarioNormalizeDenormalize
Frequent writes, data integrity criticalYes
Reporting queries joining 6+ tablesConsider
Small dataset (< 100K rows)YesNot needed
Data warehouse or analytics replicaYes
OLTP application with mixed read/writeYesOnly hot paths
Check indexes before denormalizing — most slow-join problems are missing-index problems. For How MySQL Indexes Work, adding a covering index on the join columns is often all you need. Denormalization works best on read replicas or OLAP stores — not on the same tables your application writes to.

Frequently Asked Questions

Do I need to go beyond 3NF?

For most applications, 3NF is enough. Boyce-Codd Normal Form (BCNF) addresses edge cases where overlapping candidate keys still allow anomalies in 3NF — situations that rarely arise outside academic exercises. Stop at 3NF unless you have a specific anomaly you’re trying to eliminate.

Should I denormalize for performance?

Profile before you restructure. Most slow queries are caused by missing indexes, not by joins. Add the index first. If well-indexed queries are still slow on large tables, consider denormalization on a read replica or a materialized summary table rather than on the transactional database.

What’s the difference between a 2NF violation and a 3NF violation?

A 2NF violation is a column that depends on part of the primary key (only possible with composite PKs). A 3NF violation is a column that depends on another non-key column. 2NF is about partial key dependencies; 3NF is about transitive dependencies through non-key columns.

Troubleshooting

ProblemSolution
Duplicate data appearing across rowsLikely a 2NF or 3NF violation — move the repeated data to its own table
Update anomalies (change one row, others become stale)Transitive dependency — extract to a lookup table
Queries joining 5+ tables are slowCheck indexes on FK columns first; denormalize only if indexes don’t help
Normalized schema too complex to queryConsider a read-optimized view or a denormalized analytics replica