2. Combine Schemas?
Suppose we combine instructor and department into inst_dept
(No connection to relationship set inst_dept)
Result is possible repetition of information
3. Normalization
Database Normalisation is a technique of organizing the data in the
database. Normalization is a systematic approach of decomposing
tables to eliminate data redundancy and undesirable characteristics
like Insertion, Update and Deletion Anomalies.
It is a multi-step process that puts data into tabular form by removing
duplicated data from the relation tables.
Normalization is used for mainly two purpose,
Eliminating reduntant(useless) data.
Ensuring data dependencies make sense i.e data is logically
stored.
4. Problem Without Normalization
Without Normalization, it becomes difficult to handle and update the
database, without facing data loss. Insertion, Updation and Deletion
Anomalies are very frequent if database is not Normalized.
To understand these anomalies let us take an example of Student
table.
5. Problem Without Normalization
Updating Anomaly : To update address of a student who occurs
twice or more than twice in a table, we will have to update S_Address
column in all the rows, else data will become inconsistent.
Insertion Anomaly : Suppose for a new admission, we have a
Student id(S_id), name and address of a student but if student has not
opted for any subjects yet then we have to insert NULL there, leading
to Insertion Anamoly.
Deletion Anomaly : If (S_id) 401 has only one subject and
temporarily he drops it, when we delete that row, entire student record
will be deleted along with it.
6. Normalization Techniques
Normalization rule are divided into following normal form.
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
BCNF
Atomic: Domain is atomic if its elements are considered to be indivisible units
Examples of non-atomic domains: Set of names, composite attributes
Identification numbers like CS101 that can be broken up into parts.
Non-atomic values complicate storage and encourage redundant (repeated)
storage of data. Example: Set of accounts stored with each customer, and set of
owners stored with each account
7. First Normal Form (Cont.)
A relational schema R is in first normal form if the domains of all
attributes of R are atomic.
As per First Normal Form, no two Rows of data must contain repeating
group of information i.e each set of column must have a unique value,
such that multiple columns cannot be used to fetch the same row.
Each table should be organized into rows, and each row should have a
primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than
one column can be combined to create a single primary key.
8. First Normal Form (Cont.)
For example consider a table which is not in First normal form.
In First Normal Form, any row must not have a column in which more
than one value is saved, like separated with commas. Rather than that,
we must separate such data into multiple rows.
9. First Normal Form (Cont.)
Using the First Normal Form, data redundancy increases, as there will
be many columns with same data in multiple rows but each row as a
whole will be unique.
10. Second Normal Form
Remove subsets of data that apply to multiple rows of a table and place
them in separate tables.
Create relationships between these new tables and their predecessors
through the use of foreign keys.
Although there are a few complex cases in which table in Second
Normal Form suffers Update Anomalies, and to handle those scenarios
Third Normal Form is there.
First
Name
Las t
Name
Address City State Zip
Lisa Hestings Bertha
Street
Miami FL 33157
Adam Gabriel Fleming
Street
Miami FL 33157
Lucy Herts Bridge
Road
NY Sea Cliff 11579
11. Second Normal Form
A brief look at this table reveals a small amount of redundant data.
We're storing the Sea Cliff, NY 11579 and Miami, FL 33157
entries twice each.
Additionally, if the ZIP code for FL were to change, we'd need to
make that change in many places throughout the database.
In a 2NF-compliant database structure, this redundant information
is extracted and stored in a separate table. Our new table (let's call
it ZIPs) might have the following columns-
Zip City State
We’ll need to use a foreign key to tie the two tables together.
We'll use the ZIP code (the primary key from the ZIPs table) to
create that relationship. Here's our new Customers table:
First Name Las t
Name
Address Zip
12. Third Normal Form
Third Normal form applies that every non-prime attribute of table must
be dependent on primary key.
The transitive functional dependency should be removed from the table.
The table must be in Second Normal form. For example, consider a
table with following fields.
Order No Customer No Unit
Price
Quantity Total
123J09 NY65031 500 $ 2 1000 $
120J11 ST90452 300 $ 1 300 $
123J09 NY65031 100 $ 4 400 $
Now, are all of the columns fully dependent upon the primary key?
The customer number varies with the order number and it doesn't
appear to depend upon any of the other fields.
It appears sometimes charge the same customer different prices. The
quantity of items also varies from order to order. So, the unit price and
quamtity is fully dependent upon the order number.
13. Third Normal Form
What about the total?
The total can be derived by multiplying the unit price by the quantity,
therefore it's not fully dependent upon the primary key. We must
remove it from the table to comply with the third normal form.
Order No Customer
No
Price
Price Unit Price Quantity Total
14. Boyce-Codd Normal Form
Boyce and Codd Normal Form is a higher version of the
Third Normal form.
This form deals with certain type of anomaly that is not handled
by 3NF.
A 3NF table which does not have multiple overlapping candidate
keys is said to be in BCNF.