Database Normalization

All right, this is for those of us who learned this stuff 20 years ago and haven’t used it much ever since. I recently came across a problem which required me to normalize some tables. Well OK, I still remember the hard facts, but it did take some reading up to get the data into 3rd Normal Form. So I guess I’m doing this post a bit for my own sake too. Here it goes:

Data Normalization is one of the key steps to be taken in database modeling and it touches base on the set theory in good old Algebra (now you remember the set theory, don’t you?).

Normalization is the process of organizing Data inside a Database. It is necessary in order to avoid:

(1) Data redundancy
(2) Inconsistent dependency

By normalizing the data we make sure that we can trust our database to perform as originally intended. Thus, the first step is to bring the data into 1st Normal Form (1NF). This means eliminating repeating groups of data within tables, creating separate tables for all data sets which are related to each other and marking every set of data with a unique primary key. Consider the following student database:

StID

Advisor

Room

Course1

Course2

Course3

1022 Jones 412 101-07 143-01 159-02
4123 Smith 216 201-01 211-02 214-01
7861 Grey 323 302-06 233-03 217-09

Having data like this would mean that if we were to take, say Course Number 159-02 out of the curriculum, we would delete the Student Jones and the Room 412 at the same time. This does not seem like a very good idea.

This table is said to be in non-normalized form, since there are repeating groups evident. We are running the risk of losing data each time we update our little database. Data should always be 2-dimensional within tables – which the above data clearly is not. The repeating groups are the course numbers Course1, Course2 and Course3 (since a student can be enrolled into more than one course, it does not make sense to have the courses sitting in rows. Thus, this table violates 1NF.

The first step is to put this table into 1NF: No repeating groups. One thing to remember here is that 1 and n are not supposed to be in the same table. Now consider this:

StID

Advisor

Room

CourseID

1022 Jones 412 101-07
1022 Jones 412 143-01
1022 Jones 412 159-02
4123 Smith 216 201-01
4123 Smith 216 211-02
4123 Smith 216 214-01
7861 Grey 323 302-06
7861 Grey 323 233-03
7861 Grey 323 217-09

Looks a bit better, right? The above table is now in 1NF, however, there are still some problems with it. Also, there seems to be no functional dependency between the Student ID and the Course ID. This means that the table is lacking 2NF. We will have to split this table in order to create functional dependencies. Thus we’ll create another entity called “Registration”.

Table Students:

StID

Advisor

Room

1022 Jones 412
4123 Smith 216
7861 Grey 323

Table Registration:

StID

CourseID

1022 101-07
1022 143-01
1022 159-02
4123 201-01
4123 211-02
4123 214-01
7861 302-06
7861 233-03
7861 217-09

Now this seems yet better, but we’re not quite there yet. In order to finish this properly, the database must be in 3NF, which means eliminating data which is not directly dependent on the primary key. For example, in the table “Students” above, “Room” (the advisor’s office number) is functionally dependent on the Advisor attribute. The solution is to create a third table “Faculty”:

Table Students:

StID

Advisor

1022 Jones
4123 Smith
7861 Grey

 Table Faculty:

Name

Room

Dept

Jones 412 42
Smith 216 42
Grey 323 42

Overall, these are the the steps to follow in database modeling:

(1) Identify candidate keys
(2) Select the primary key
(3) Apply normalization (NF1 thru NF3)
(4) Do an ERD (Entity Relationship Diagram)

The following is a summary of the Data Normalization Rules:

First Normal Form (1NF)
– Eliminate repeating groups in individual tables
– Create a separate table for each set of related data
– Identify each set of related data with a primary key

Second Normal Form (2NF)
– Create separate tables for sets of values that apply to multiple records
– Relate these tables with a foreign key

Third Normal Form (3NF)
– Eliminate fields that do not depend on the primary key

For the purpose of having a functioning database, going through Normal Forms 1-3 is sufficient. While there are several more Normal Forms, they do little more than offer database perfection which in some instances can be more an obstacle than a benefit. Thus, I will not discuss these at this point.

I hope that I was able to shed some light into this classical “motivation killer” for ongoing database students. I would also like to use this opportunity to give credit to the people who have helped refresh my gray brain cells on this subject. Here they are:

– Gerald V. Post: Database Management Systems 3rd Ed., McGraw Hill 2005
http://de.wikipedia.org/wiki/Normalisierung_(Datenbank)
– Microsoft Support KB283878
– Datamodel.Org – Rules of Data Normalization

And remember: Practice makes perfect!

Cheers!

Boris

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: