3 Primary Levels of Database Normalization

Bellow you will find a copy of the original table: Grade Report

Student_ID Student_Name Major Course_ID Course_Title Instructor_Name Instructor_Location Grade
168300458 Williams IT ITS420 Database Imp Codd B104 A
168300458 Williams IT ITP310 Programming Parsons B317 B
543291073 Baker Acctg ITD320 Database Imp Codd B104 C
543291073 Baker Acctg Acct201 Fund Acctg Miller H310 B
543291073 Baker Acctg Mktg300 Intro Mktg Bennett B212 A

This table is in an un-normalized form due to the fact that is has repeating groups. For instance the Student_ID = 168300458 shows here twice and Student_ID = 543291073 shows 3 times. So there is no single field that may be used as a primary key for this table.  We will bring this table to 1NF (first normal form) in a moment, but for now let’s look at the dependence of the attributes of this table.

The functional dependencies are shown below:

  • Student_ID -> Student_Name, Major
  • Course_ID -> Course_Title, Instructor_Name, Instructor_Location
  • Student_ID, Course_ID -> Grade
  • Instructor_Name -> Instructor_Location

The dependencies of the attributes can be shown in the following dependence diagram.

To decompose this table into the first normal form (1NF) we will need to expand the primary key to include the both the Student_ID and the Course_ID and will create a composite key. When the primary key is expanded we now have 5 distinct records in the table. The table would now look like this:

Grade Report (Student_ID, Student_Name, Major, Course_ID, Course_Title, Instructor_Name,                 Instructor_Location, Grade)

Next, in order to get the table from a 1NF to a 2NF we will need to get rid of all partial dependencies. These dependencies are found whenever you have a composite key (key made of more than one field). So in short we will create a couple of tables and these table will contain each part of the former composite key so there will no longer be a composite key.

Students

Student_ID Student_Name Major
168300458 Williams IT
168300458 Williams IT
543291073 Baker Acctg
543291073 Baker Acctg
543291073 Baker Acctg

Courses

Course_ID Course_Title Instructor_Name Instructor_Location
ITS420 Database Imp Codd B104
ITP310 Programming Parsons B317
ITD320 Database Imp Codd B104
Acct201 Fund Acctg Miller H310
Mktg300 Intro Mktg Bennett B212

Grades

Student_ID Course_ID Grade
168300458 ITS420 A
168300458 ITP310 B
543291073 ITD320 C
543291073 Acct201 B
543291073 Mktg300 A

These tables can also be shown as:

Students (Student_ID, Student_Name, Major)

Courses (Course_ID, Course_Title, Instructor_Name, Instructor_Location)

Grades (Student_ID, Course_ID, Grade)

Foreign key: Student_ID to Students table

Foreign key: Course_ID to Courses table

We still have an issue that were a user could enter the Instructor_Name or Instructor_Location incorrectly and there could be a data mismatch. This is referred to as a data anomaly. For instance if a user was to in one record saw the instructor Codd was in location B104 and in another record indicate that Codd was in location B317, there is nothing to prevent this to happen. To prevent this kind of anomaly we will need to pull the Instructor_Name and Instructor_Location out and create a 4th table. In the Course table I will create a new field called Instructor_ID and this will be a foreign key to the Instructors table; due to the fact that Instructor_Name is too vague and could reference two instructors with the same name. In that 4th table the Instructor_ID will be the primary key.  The newly created database will now look like this and are now in a 3NF form:

Students

Student_ID Student_Name Major
168300458 Williams IT
168300458 Williams IT
543291073 Baker Acctg
543291073 Baker Acctg
543291073 Baker Acctg

Courses

Course_ID Course_Title Instructor_ID
ITS420 Database Imp 1001
ITP310 Programming 1002
ITD320 Database Imp 1001
Acct201 Fund Acctg 1003
Mktg300 Intro Mktg 1004

Grades

Student_ID Course_ID Grade
168300458 ITS420 A
168300458 ITP310 B
543291073 ITD320 C
543291073 Acct201 B
543291073 Mktg300 A

Instructors

Instructor_ID Instructor_Name Instructor_Location
1001 Codd B104
1002 Parsons B317
1003 Miller H310
1004 Bennett B212

These tables can also be shown as such:

Students (Student_ID, Student_Name, Major)

Courses (Course_ID, Course_Title, Instructor_ID)

Foreign key: Instructor_ID to Instructors table

Grades (Student_ID, Course_ID, Grade)

Foreign key: Student_ID to Students table

Foreign key: Course_ID to Courses table

Instructors (Instructor_ID, Instructor_Name, Instructor_Location)

Bellow you will find a Chen’s Model ER diagram of the 3NF tables:

References:

Adamski, J., & Finnegan, K. (2008). Microsoft Office Access 2007. Boston: Course Technology.

http://www.sethi.org/classes/cet415/lab_notes/lab_03.html, retrieved May 1, 2009

Advertisements

7 Basic Steps to Designing a Relational Database

There are 7 basic steps to designing a relational database. There are several other steps that could be included in the database implementation and the database life cycle like, research, rollout and maintenance. However, I will be focusing strictly on the database design.

 

The seven basic steps are; first, determine the purpose of your system. Next, you will need to determine the entities for your system. These entities will become your table names. Next you will decide what attributes for those entities will be necessary for the purpose of the system. These attributes will become your fields within the tables. Next, you will need to determine what attributes/field names will help identify the records/entities as being unique. This field(s) will make up the primary key for each table. After you have determined the primary keys for the tables you will need to define the relationships between these tables. The best way to truly do this is to follow the standard steps in table normalization and at minimum get the table to a 3NF form. The process of normalization is the sixth step in the database design. The 7th, and final step, in the database design in the actual data population.

 

In short the key steps are;

  1. Determine the purpose of the system
  2. Determine what entities/tables will be included
  3. Determine what attributes/fields will be included
  4. Indentify unique fields (primary keys)
  5. Determine relationships between tables
  6. Refine the design (normalization)
  7. Populate the tables with raw data

 

References:

http://www.databasedev.co.uk/database_design_requirements.html, retrieved May 1, 2009

 

Adamski, J., & Finnegan, K. (2008). Microsoft Office Access 2007. Boston: Course Technology.

 

Database Table Relationships

The first type of relationship we will discuss is the one-to-one relationship. A one-to-one relationship is where an entity from within one table references one and only one entity within another table. For instance, within most offices you will find users that have laptops. This relation is typically found as a one-to-one relation as follows, one user per one laptop. Users do not typically share laptops like they may with standard desktops. The users would be within one table and the laptops would be found in another table if they were represented within a relational database. This relation can be seen as follows:

 

Users (User_ID, First_Name, Last_Name, Phone, Service_Tag)

Foreign key: Service_Tag to Laptops table

Laptops (Service_Tag, Brand, Model, CPU, Memory)

 

The second type of relationship you will find within a referential database is a one-to-many relationship. A one-to-many/many-to-one is all a matter of perspective. With a one-to-many relationship there is one entity within one table that relates to many entities within another table. My example here is something that you will commonly find within a small office, 5-10 users. Typically a small office will only have 1 networked printer and many users will use that printer. This relationship can be shown in the following diagram.

 

Users (User_ID, First_Name, Last_Name, Phone, PrinterID)

Foreign key: PrinterID to Printers table

Printers (PrinterID, Brand, Model, Location)

The last relationship I will show is the many-to-many relationship. With a many-to-many relationship you will find many entities within one table that relate to many entities in another table. To do this properly you need what is called a intersection table, junction table or a link table. I personally use the term link table. This link table is used to split up the many-to-many relation and will create two one-to-many relationships this greatly simplifies the logic within the database and can help prevent issues that may come up latter down the road. This helps enforce referential integrity. Let’s take the previous example with the small office and the single printer and many users and expand this to a larger office with many printers. For instance, with an office of 100 users there is likely going to be at least 1 printer for every 20 or 25 users. In addition to this each user may print to more that or printer depending on what they want to print, some printer may be color while others only black and white. This example can be show as such:

 

Users (User_ID, First_Name, Last_Name, Phone)

PrinterUserLink (User_ID, PrinterID)

Foreign key: User_ID to Users table

Foreign key: PrinterID to Printers table

Printers (PrinterID, Brand, Model, Location)

 

References:

Adamski, J., & Finnegan, K. (2008). Microsoft Office Access 2007. Boston: Course Technology.