Normalization
Normalization
Ø It is process of minimize the redundancy. Redundancy
means duplication of data.
Ø It divides larger table into smaller tables and links
them using relationship.
Ø Example
|
Empid |
Ename |
Sal |
Deptname |
Location |
|
101 |
Ramu |
10000 |
IT |
Hyd |
|
102 |
Kamal |
14500 |
Account |
Mumbai |
|
103 |
Satish |
15000 |
Account |
Mumbai |
|
104 |
Praveen |
16500 |
IT |
Hyd |
|
Empid |
Ename |
Sal |
Deptno |
|
101 |
Ramu |
10000 |
10 |
|
102 |
Kamal |
14500 |
20 |
|
103 |
Satish |
15000 |
20 |
|
104 |
Praveen |
16500 |
10 |
|
Deptno |
Deptname |
Location |
|
10 |
IT |
Hyd |
|
20 |
Account |
Mumbai |
Ø It removes anomaly, these are insert anomaly, update
anomaly, delete anomaly.
Insert anomaly: - Repetition of data and if we insert employee who does not allotted
dept. this could be insertion anomaly. Because data which is not having dept
details, after knowing dept for those employees it is difficult to insert all
rows so insertion anomaly is formed. To reduce it we can use
normalization.
Update anomaly: - in the above base table if Deptname is changed for employees. So, it
is difficult to update all thousands of rows. So update anomaly is formed, to
reduce it we can use normalization.
Delete anomaly: - In the above table if employees are transfer or promoted or resign,
if we delete that employee row all data deleted with dept details also. So here
delete anomaly is formed.
Ø Normal forms
1. 1NF
2. 2NF
3. 3NF
4. BCNF
5. 4NF
6. 5NF
7. 6NF
Ø 1NF: - 1st
normal form contains set of rules these are
·
Every record
contains only single values means atomic values.
·
Column data
type should not change, if we convert to new table.
·
Unique name of
each column means no two columns have same name.
·
No need to
follow order if new table is created.
|
Empid |
Ename |
Sal |
Deptname |
|
101 |
Ramu |
10000 |
IT, Manager |
|
102 |
Kamal |
14500 |
Account |
|
103 |
Satish |
15000 |
Account |
|
104 |
Praveen |
16500 |
IT |
We want to change this table as
|
Empid |
Ename |
Sal |
Deptname |
|
101 |
Ramu |
10000 |
IT |
|
101 |
Ramu |
10000 |
Manager |
|
102 |
Kamal |
14500 |
Account |
|
103 |
Satish |
15000 |
Account |
|
104 |
Praveen |
16500 |
IT |
Ø 2NF: - 2nd
normal form is having set of rules these are
·
The table must
be in 1NF.
·
No Partial
dependency.
|
Student ID |
Student Name |
Phone number |
|
1 |
Kamal |
1236549870 |
|
2 |
Ramesh |
1452369870 |
|
3 |
Sai |
1365984125 |
|
Subject ID |
Subject Name |
|
100 |
Maths |
|
101 |
Physics |
|
Marks ID |
Student ID |
Subject ID |
Marks |
Teacher |
|
1000 |
1 |
100 |
50 |
Raju |
|
1001 |
2 |
101 |
68 |
Ravi |
|
1002 |
3 |
100 |
85 |
Raju |
In the above table Marks ID, Student ID, Subject ID
are Primary keys. These are three table student table, subject table, marks
table. In the marks table Teacher column is dependent on only subject ID
primary key column so it is called Partial dependency.
Dependency means all non-key columns must
dependent on primary key or candidate key it is aloso called as functional
dependency.
So, in the above tables are marks table should not
follow 2NF rules so we changed that tables to as below.
|
Marks ID |
Student ID |
Subject ID |
Marks |
|
1000 |
1 |
100 |
50 |
|
1001 |
2 |
101 |
68 |
|
1002 |
3 |
100 |
85 |
|
Subject ID |
Subject Name |
Teacher |
|
100 |
Maths |
Raju |
|
101 |
Physics |
Ravi |
Ø 3NF: - 3rd
normal form is having set of rules these are
·
The must
qualify in 2NF.
·
No Transitive
dependency.
Transitive dependency means when one
non-primary key column depends on another non-primary key columns. Basically,
non primary key column also contains null values also so one column is
dependent on another null column so it causes error.
|
Marks ID |
Student ID |
Subject ID |
Marks |
Examination |
Total Marks |
|
1000 |
1 |
100 |
15 |
Internal |
20 |
|
1001 |
2 |
101 |
58 |
External |
80 |
|
1002 |
3 |
100 |
68 |
External |
80 |
Here we observe
that total marks and examination are non- primary keys, and total marks
dependent on Examination so here
Transitive dependency is formed. Then we remove this table and create two
tables as
|
Marks ID |
Student ID |
Subject ID |
Marks |
Examination ID |
|
1000 |
1 |
100 |
15 |
21 |
|
1001 |
2 |
101 |
58 |
22 |
|
1002 |
3 |
100 |
68 |
22 |
|
Examination ID |
Examination |
Total Marks |
|
21 |
Internal |
20 |
|
22 |
External |
80 |
Ø BCNF: - Boyce Codd
Normal form is having set of rules these are
·
The table must
be in 3NF.
·
If one column
is dependent on second column, the 1st column must be a super key.
It is advanced form of 3NF so it is called
as 3.5NF.
Super key means either it is primary key
or foreign key.
|
Student ID |
Subject |
Teacher |
|
1 |
Maths |
Arun |
|
1 |
Physics |
Babu |
|
2 |
Maths |
Chakri |
Here student id and subject both have
composite primary key. Because student enroll for any no of tuition subject and
any one subject have more than one student so it is many to many relations
that’s why we are given composite primary for both combinations. Teacher is non
key attribute.
Here teacher is determining which subject
is teaching. It means TeacheràSubject. But teacher
is not a super key, so it need normalization.
|
Student ID |
Teacher ID |
|
1 |
21 |
|
1 |
22 |
|
2 |
23 |
|
Teacher ID |
Teacher |
Subject |
|
21 |
Arun |
Maths |
|
22 |
Babu |
Physics |
|
23 |
Chakri |
Maths |
Ø 4NF: - 4th
normal form is containing set of rules these are
·
The table must
be in BCNF.
·
The table
should not have multi-valued dependency.
Multi-valued dependency means if a table
contain three columns, 1st column of a table determines 2nd
and 3rd column this relation is called multi-valued dependency. And
also 2nd and 3rd columns independent of each other.
|
Student ID |
Student name |
Fav Subject |
Fav Game |
|
1 |
Kamal |
Maths |
Cricket |
|
1 |
Kamal |
Physics |
Kabaddi |
|
2 |
Raj |
Social |
Cricket |
Here student have multiple favorite
subjects as well as favorite games. So, one person is determining multiple
values so it is called multi-valued dependency.
So we can change this table as
|
Student ID |
Fav Subject |
|
1 |
Maths |
|
1 |
Physics |
|
2 |
Social |
|
Student ID |
Fav Game |
|
1 |
Cricket |
|
1 |
Kabaddi |
|
2 |
Cricket |
|
Student ID |
Student name |
|
1 |
Kamal |
|
2 |
Raj |
Here student ID and Fav subject are composite primary
key and also student ID and Fav game also composite primary key. So no
duplicate values are enter here.
Comments
Post a Comment