Database normalization

Data is everywhere and can be sourced using numerous methods. Data in its raw form can be tedious to utilize as it is often unstructured, inconsistent and unreliable. The process of arranging this ordinary data into a more structured and consistent format that enables efficient querying, storage optimization and data integrity is termed database normalization.

Terms To Know

Primary Key: A column or columns that uniquely identify each row in a table
Composite key: A combination of multiple columns that is used to uniquely identify all the rows involved

What is Database Normalization?

Database normalization is a design technique used to optimize databases by organizing data into a structured and consistent format. The process involves transforming data from lower normal forms(least structured and consistent) to higher normal forms(more structured and consistent).

Objectives of Database Normalization

Normalizing a database improves the general state of its data by;

Minimizing Redundancy: Database redundancy occurs when the same piece of data exists in multiple places within a database. In the table below, details like customers’ address and email were repeated numerous times. Duplicate data waste storage space and can lead to inconsistencies.

product_name	customer_name	customer_email	customer_address	transaction_date
Jigsaw	John Doe	johnnyd@mail	10 Downing Str	10/12/2024
Hammer	John Doe	johnnyd@mail	10 Downing Str	10/12/2024
Lotion	John Doe	johnnyd@mail	10 Downing Str	10/12/2024
Biscuit	Mary Graham	maryg@mail	Radisson Close	8/10/2024
Lip Gloss	Nneka Obi	nenyo@mail	Upper Iweka	6/10/2024

Efficient Querying: Organized and structured data helps databases process queries more efficiently, leading to faster retrieval time. Structured data reduces the need for complex filtering or aggregation logic.
Ensure Data Integrity: Normalizing a database improves the integrity of data as it ensures that the data is consistent across all fields.
Preventing Anomalies: Database redundancy can cause anomalies like:

Update Anomaly: This kind of anomaly occurs when updating a single piece of data requires us to update multiple fields where that data is stored.

In the table above, an update anomaly would occur if John Does changes their address and we have to update all the cells where this address is stored. If during the process of updating this record, we mistakenly skip a cell, we have by this action introduced an inconsistency known as Update Anomaly to our database.
Insertion Anomaly: This occurs when inserting a new record into a table is impossible without also inserting information about a different Entity.

Example:

consider a table to store information about students and their courses;

In the table below, if we create a new course -Mathematics- we can not directly add it to the table without first associating it with a new student.

Student Id	Student Name	Course ID	Course Name
1	Ike Umeh	103	Chemistry
2	Sandra Drea	104	Physics
3	Nancy Owens	105	Biology

Deletion Anomaly: this occurs when deleting a piece of data causes the deletion of a separate entity. For Example, in the table above, if we decide to remove Physics from our list of courses, the record for the student “Sandra Drea” will also be deleted.

First Normal Form(1NF):

This is the lowest level of normalization. This level of normalization is built upon the following principles:

Each row in the table must have a unique identifier i.e a primary key
Each value in the table must be indivisible(atomic)

Example:

The table below records the purchase of items at a grocery store. This table obviously does not adhere to the principles of 1st normal form as one can clearly see that the items purchased by John are saved as a list which violates the rule that states that each value must be indivisible.

Secondly, the rows of this table are not unique as there are no primary identifiers of each row in this database.

Customer Name	Item Purchased	Contact Address	Customer email	Transaction Date
John	Jigsaw, hammer, lotion	10 Downing Str	johnnyd@mail	10/12/2024
Mary	Biscuit	Radisson Close	maryg@mail	8/12/2024
Nneka	Lip Gloss	Upper Iweka	nnenyg@mail	6/10/2024

To organize the table above in 1NF, we first introduce a unique identifier in the form of customer ID, then de-itemize the Item Purchased column by saving each individual item in a cell of its own as shown below;

Table In First Normal Form:

Customer id	Customer name	Item purchased	Contact address	Customer email	Transaction date
1	John	Jigsaw	10 Downing Str	johnnyd@mail	10/12/2024
1	John	Hammer	10 Downing Str	johnnyd@mail	10/12/2024
1	John	Lotion	10 Downing Str	johnnyd@mail	10/12/2024
2	Mary	Biscuit	Radisson Close	maryg@mail	8/12/2024
3	Nneka	Lip Gloss	Upper Iweka	nnenyg@mail	6/10/2024

Second Normal Form(2NF):

Second normal form seeks to eliminate partial dependencies.

Its rules are as follows:

Must already be in 1st normal form(1NF)
Non-primary attributes (non-key columns) should depend entirely on the primary key, not just part of it.

Example:

Now that we have a table in 1NF in the example above, we can look to see if it satisfies the constraints of 2NF.

To normalize the table above into 2NF, we need to identify the primary composite keys and any partial dependencies.

Our 1NF table above has a primary composite key of Customer ID and Items Purchased as each transaction is uniquely identified by this combination.

Partial Dependency:

Customer Name, Customer Email, and Customer Address depend only on the Customer ID, not on the entire primary key(Customer ID and Item Purchased).

We can split the table into separate two tables(Transactions Table and Customer Table) to eliminate partial dependency:

Transactions Table

Customer ID	Item Purchased	Transaction Date
1	Jigsaw	10/12/2024
1	Hammer	10/12/2024
1	Lotion	10/12/2024
2	Biscuit	8/12/2024
3	Lip Gloss	6/12/2024

Customer Table

Customer ID	Customer Name	Customer Email	Customer Address
1	John	johnnyd@mail	10 Downing Str
2	Mary	maryo@mail
3	Nneka	nennyo@mail

Now we have successfully eliminated Partial dependency by ensuring the customer information depends only on the customer ID.

Third Normal Form(3NF):

3rd normal form aims to eliminate transitive dependencies.

It’s rules are as follows:

Must first be in second normal form
No transitive dependencies: non-primary attributes must not depend on other non-primary attributes.

Example:

The 2NF table above already exists in 3NF form as there are no transitive dependencies.

Consider the Employee table below;

Employee ID	Name	Department	Manager
1	Chisom	Finance	Jake
2	Kene	IT	Jide

Transitive Dependency:

Employee ID → Name: The Name attribute depends directly on the Employee ID(primary attribute) which is a valid dependency.
Employee ID → Department: The Department depends directly on the Employee ID which is also a valid dependency.
The Manager does not depend directly on the Employee ID. Instead it depends on the Department which in turn depends on the Employee ID. Thus the transition goes like this; Employee ID → Department → Manager. This type of dependency is transitive in nature.

To eliminate transitivity in the table above into two separate tables- Employee and Department Tables;

Employee Table

Employee ID	Name	Department
1	Chisom	Finance
2	Kene	IT

Department Table

Department	Manager
Finance	Jake
IT	Jide

Conclusion

Database Normalization is a very important design technique for creating reliable and efficient databases. Normalization minimizes redundancy, ensures data integrity and prevents anomalies. From the foundational First Normal Form (1NF) to the more advanced Third Normal Form (3NF), each level of normalization builds on the previous to enhance database performance and reliability. While normalization brings significant advantages, it’s also essential to balance it with practical considerations, such as query complexity and performance in specific use cases.
By understanding and applying normalization principles, developers and data engineers can build databases that stand the test of time, supporting scalable and consistent data management for applications of any size.

Database Normalization For Beginners

Table of contents

Terms To Know

What is Database Normalization?

Objectives of Database Normalization

First Normal Form(1NF):

Second Normal Form(2NF):

Third Normal Form(3NF):

Conclusion