> ## Documentation Index
> Fetch the complete documentation index at: https://villagesql.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Normalization in MySQL

> How to normalize a MySQL database schema: first, second, and third normal form explained with examples, plus when denormalization makes sense.

<Card title="VillageSQL is a drop-in replacement for MySQL with extensions." icon="database" href="/mysql-8.4/0.0.4/quickstart">
  All examples in this guide work on VillageSQL. Install Now →
</Card>

Database normalization is the process of organizing tables to reduce redundancy and prevent update anomalies. Most applications should reach third normal form (3NF). Past that, the gains are mostly theoretical and the join complexity is real.

## First Normal Form (1NF)

A table is in 1NF when every column holds atomic values — one value per cell — and there are no repeating groups.

**Violates 1NF** — multiple values packed into one column:

```sql theme={null}
CREATE TABLE orders (
    id          INT PRIMARY KEY,
    customer    VARCHAR(100),
    product_ids VARCHAR(255)  -- "1,4,7" — not atomic
);
```

**Satisfies 1NF** — one value per cell, repeating group extracted to a child table:

```sql theme={null}
CREATE TABLE orders (
    id       INT PRIMARY KEY,
    customer VARCHAR(100)
);

CREATE TABLE order_items (
    order_id   INT,
    product_id INT,
    PRIMARY KEY (order_id, product_id)
);
```

## Second Normal Form (2NF)

A table is in 2NF when it is in 1NF and every non-key column depends on the **whole** primary key — not just part of it. This only applies to tables with composite primary keys.

**Violates 2NF** — `product_name` depends only on `product_id`, not on the full `(order_id, product_id)` key:

```sql theme={null}
CREATE TABLE order_items (
    order_id     INT,
    product_id   INT,
    product_name VARCHAR(100),  -- partial dependency: only needs product_id
    quantity     INT,
    PRIMARY KEY (order_id, product_id)
);
```

**Satisfies 2NF** — move `product_name` to its own table:

```sql theme={null}
CREATE TABLE products (
    id   INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE order_items (
    order_id   INT,
    product_id INT,
    quantity   INT,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products (id)
);
```

## Third Normal Form (3NF)

A table is in 3NF when it is in 2NF and no non-key column depends on another non-key column. These are called transitive dependencies.

**Violates 3NF** — `city` and `state` depend on `zip_code`, not directly on `customer_id`:

```sql theme={null}
CREATE TABLE customers (
    id       INT PRIMARY KEY,
    name     VARCHAR(100),
    zip_code VARCHAR(10),
    city     VARCHAR(100),  -- transitively depends on zip_code
    state    CHAR(2)        -- transitively depends on zip_code
);
```

**Satisfies 3NF** — extract the transitive dependency:

```sql theme={null}
CREATE TABLE zip_codes (
    zip_code VARCHAR(10) PRIMARY KEY,
    city     VARCHAR(100),
    state    CHAR(2)
);

CREATE TABLE customers (
    id       INT PRIMARY KEY,
    name     VARCHAR(100),
    zip_code VARCHAR(10),
    FOREIGN KEY (zip_code) REFERENCES zip_codes (zip_code)
);
```

In practice, extracting zip codes to a separate table is overkill for most applications. The principle matters more than mechanical compliance at every edge case.

## When to Denormalize

Normalization improves write integrity and reduces redundancy, but adds join overhead on reads. Read-heavy or analytics workloads sometimes warrant deliberate denormalization.

| Scenario                                 | Normalize | Denormalize    |
| :--------------------------------------- | :-------- | :------------- |
| Frequent writes, data integrity critical | Yes       | —              |
| Reporting queries joining 6+ tables      | —         | Consider       |
| Small dataset (\< 100K rows)             | Yes       | Not needed     |
| Data warehouse or analytics replica      | —         | Yes            |
| OLTP application with mixed read/write   | Yes       | Only hot paths |

Check indexes before denormalizing — most slow-join problems are missing-index problems. For [How MySQL Indexes Work](/guides/mysql-indexes), adding a covering index on the join columns is often all you need.

Denormalization works best on read replicas or OLAP stores — not on the same tables your application writes to.

## Frequently Asked Questions

#### Do I need to go beyond 3NF?

For most applications, 3NF is enough. Boyce-Codd Normal Form (BCNF) addresses edge cases where overlapping candidate keys still allow anomalies in 3NF — situations that rarely arise outside academic exercises. Stop at 3NF unless you have a specific anomaly you're trying to eliminate.

#### Should I denormalize for performance?

Profile before you restructure. Most slow queries are caused by missing indexes, not by joins. Add the index first. If well-indexed queries are still slow on large tables, consider denormalization on a read replica or a materialized summary table rather than on the transactional database.

#### What's the difference between a 2NF violation and a 3NF violation?

A 2NF violation is a column that depends on *part* of the primary key (only possible with composite PKs). A 3NF violation is a column that depends on *another non-key column*. 2NF is about partial key dependencies; 3NF is about transitive dependencies through non-key columns.

## Troubleshooting

| Problem                                                | Solution                                                                  |
| :----------------------------------------------------- | :------------------------------------------------------------------------ |
| Duplicate data appearing across rows                   | Likely a 2NF or 3NF violation — move the repeated data to its own table   |
| Update anomalies (change one row, others become stale) | Transitive dependency — extract to a lookup table                         |
| Queries joining 5+ tables are slow                     | Check indexes on FK columns first; denormalize only if indexes don't help |
| Normalized schema too complex to query                 | Consider a read-optimized view or a denormalized analytics replica        |

## See also

* [Choosing MySQL Data Types](/guides/choosing-data-types) — picking the right types once your schema is normalized
* [Foreign Keys in MySQL](/guides/foreign-keys) — enforcing the relationships that normalization creates
* [CHECK Constraints in MySQL](/guides/check-constraints) — additional rules beyond foreign key integrity
