This document summarizes information about ER diagrams, schema refinement, and database normalization. It provides examples of ER diagrams and how they can be converted to tables. It discusses different normal forms including Boyce-Codd normal form (BCNF) and third normal form (3NF), and provides algorithms for decomposing a schema into BCNF and 3NF. The goal of normalization is to reduce data redundancy and avoid data anomalies.
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
ย
9 normalization
1. ER Diagrams (Concluded),
Schema Refinement, and Normalization
Zachary G. Ives
University of Pennsylvania
CIS 550 โ Database & Information Systems
October 6, 2005
Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
2. 2
Examples of ER Diagrams
๏ง Please interpret these ER diagrams:
COURSESSTUDENTS Takes
COURSESSTUDENTS Takes
STUDENTS COURSESTakes
4. 4
1:1 Relationships
If you borrow money or have credit, you might get:
What are the table options?
CreditReport Borrower
delinquent?
ssn
namedebt
Describesrid
5. 5
ISA Relationships: Subclassing
(Structurally)
๏ง Inheritance states that one entity is a โspecial kindโ
of another entity: โsubclassโ should be member of
โbase classโ
name
ISA
People
id
Employees salary
6. 6
But How Does thisTranslate
into the Relational Model?
Compare these options:
๏ง Two tables, disjoint tuples
๏ง Two tables, disjoint attributes
๏ง One table with NULLs
๏ง Object-relational databases (allow subclassing of tables)
7. 7
Weak Entities
A weak entity can only be identified uniquely using the primary
key of another (owner) entity.
๏ง Owner and weak entity sets in a one-to-many relationship
set, 1 owner : many weak entities
๏ง Weak entity set must have total participation
People Feeds Pets
ssn name weeklyCost name species
8. 8
Translating Weak Entity Sets
Weak entity set and identifying relationship set are translated
into a single table; when the owner entity is deleted, all
owned weak entities must also be deleted
CREATE TABLE Feed_Pets (
name VARCHAR(20),
species INTEGER,
weeklyCost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
10. 10
Summary of ER Diagrams
๏ง One of the primary ways of designing logical
schemas
๏ง CASE tools exist built around ER
(e.g. ERWin, PowerBuilder, etc.)
๏ง Translate the design automatically into DDL, XML, UML,
etc.
๏ง Use a slightly different notation that is better suited to
graphical displays
๏ง Some tools support constraints beyond what ER diagrams
can capture
๏ง Can you get different ER diagrams from the same data?
11. 11
Schema Refinement & DesignTheory
๏ง ER Diagrams give us a start in logical schema design
๏ง Sometimes need to refine our designs further
๏ง Thereโs a system and theory for this
๏ง Focus is on redundancy of data
๏ Causes update, insertion, deletion anomalies
12. 12
Not All Designs are Equally Good
Why is this a poor schema design?
And why is this one better?
Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name)
Course(serno, cid)
Subject(cid, subj)
Takes(sid, serno, exp-grade)
13. 13
Focus on the Bad Design
๏ง Certain items (e.g., name) get repeated
๏ง Some information requires that a student be enrolled
(e.g., courses) due to the key
sid name serno subj cid exp-grade
1 Sam 570103 AI 520 B
23 Nitin 550103 DB 550 A
45 Jill 505103 OS 505 A
1 Sam 505103 OS 505 C
14. 14
Functional Dependencies
Describe โKey-Likeโ Relationships
A key is a set of attributes where:
If keys match, then the tuples match
A functional dependency (FD) is a generalization:
If an attribute set determines another, written X !Y
then if two tuples agree on attribute set X, they must
agree on X:
sid ! name
What other FDs are there in this data?
๏ FDs are independent of our schema design choice
15. 15
Formal Definition of FDโs
Def. Given a relation schema R and subsets X,Y of R:
An instance r of R satisfies FD X ๏ฎY if,
for any two tuples t1, t2 2 r,
t1[X ] = t2[X] implies t1[Y] = t2[Y]
๏ง For an FD to hold for schema R, it must hold for
every possible instance of r
(Can a DBMS verify this? Can we determine this by looking
at an instance?)
16. 16
GeneralThoughts on Good Schemas
We want all attributes in every tuple to be determined
by the tupleโs key attributes, i.e. part of a superkey
(for key X ๏ฎY, a superkey is a โnon-minimalโ X)
What does this say about redundancy?
But:
๏ง What about tuples that donโt have keys (other than the entire
value)?
๏ง What about the fact that every attribute determines itself?
17. 17
Armstrongโs Axioms: Inferring FDs
Some FDs exist due to others; can compute using
Armstrongโs axioms:
๏ง Reflexivity: If Y ๏ X then X ๏ฎ Y (trivial dependencies)
name, sid ๏ฎ name
๏ง Augmentation: If X ๏ฎY then XW ๏ฎYW
serno ๏ฎ subj so serno, exp-grade ๏ฎ subj, exp-grade
๏ง Transitivity: If X ๏ฎ Y andY ๏ฎ Z then X ๏ฎ Z
serno ๏ฎ cid and cid ๏ฎ subj
so serno ๏ฎ subj
18. 18
Armstrongโs Axioms Lead toโฆ
๏ง Union: If X ๏ฎ Y and X ๏ฎ Z
then X ๏ฎ YZ
๏ง Pseudotransitivity: If X ๏ฎ Y and WY ๏ฎ Z
then XW ๏ฎ Z
๏ง Decomposition: If X ๏ฎ Y and Z ๏ Y
then X ๏ฎ Z
Letโs prove these from Armstrongโs Axioms
19. 19
Closure of a Set of FDโs
Defn. Let F be a set of FDโs.
Its closure, F+,is the set of all FDโs:
{X ๏ฎ Y | X ๏ฎ Y is derivable from F by Armstrongโs
Axioms}
Which of the following are in the closure of our Student-Course
FDโs?
name ๏ฎ name
cid ๏ฎ subj
serno ๏ฎ subj
cid, sid ๏ฎ subj
cid ๏ฎ sid
20. 20
Attribute Closures: Is Something
Dependent on X?
Defn.The closure of an attribute set X, X+, is:
X+ = ๏ {Y | X ๏ฎY ๏ F +}
๏ง This answers the question โisY determined
(transitively) by X?โ; compute X+ by:
๏ง Does sid, serno ๏ฎ subj, exp-grade?
closure := X;
repeat until no change {
if there is an FD U ๏ฎ V in F
such that U is in closure
then add V to closure}
21. 21
Equivalence of FD sets
Defn. Two sets of FDโs, F and G, are equivalent if
their closures are equivalent, F + = G +
e.g., these two sets are equivalent:
{XY ๏ฎ Z, X ๏ฎ Y} and
{X ๏ฎ Z, X ๏ฎ Y}
๏ง F + contains a huge number of FDโs
(exponential in the size of the schema)
๏ง Would like to have smallest โrepresentativeโ FD
set
22. 22
Minimal Cover
Defn. A FD set F is minimal if:
1. Every FD in F is of the form X ๏ฎ A,
where A is a single attribute
2. For no X ๏ฎ A in F is:
F โ {X ๏ฎ A } equivalent to F
3. For no X ๏ฎ A in F and Z ๏ X is:
F โ {X ๏ฎ A } ๏ {Z ๏ฎ A } equivalent to F
Defn. F is a minimum cover for G if F is minimal and is
equivalent to G.
e.g.,
{X ๏ฎ Z, X ๏ฎ Y} is a minimal cover for
{XY ๏ฎ Z, X ๏ฎ Z, X ๏ฎ Y}
in a sense,
each FD is
โessentialโ
to the cover
we express
each FD in
simplest form
23. 23
More on Closures
If F is a set of FDโs and X ๏ฎ Y ๏ F +
then for some attribute A ๏ Y, X ๏ฎ A ๏ F +
Proof by counterexample.
Assume otherwise and let Y = {A1,..., An}
Since we assume X ๏ฎ A1, ..., X ๏ฎ An are in F +
then X ๏ฎ A1 ...An is in F + by union rule,
hence, X ๏ฎY is in F + which is a contradiction
24. 24
Why Armstrongโs Axioms?
Why are Armstrongโs axioms (or an equivalent rule
set) appropriate for FDโs? They are:
๏ง Consistent: any relation satisfying FDโs in F will satisfy
those in F +
๏ง Complete: if an FD X ๏ฎ Y cannot be derived by
Armstrongโs axioms from F, then there exists some
relational instance satisfying F but not
X ๏ฎ Y
๏ In other words,Armstrongโs axioms derive all the
FDโs that should hold
25. 25
Proving Consistency
We prove that the axiomsโ definitions must be true
for any instance, e.g.:
๏ง For augmentation (if X ๏ฎ Y then XW ๏ฎ YW):
If an instance satisfies X ๏ฎY, then:
๏ง For any tuples t1, t2 ๏r,
if t1[X] = t2[X] then t1[Y] = t2[Y] by defn.
๏ง If, additionally, it is given that t1[W] = t2[W],
then t1[YW] = t2[YW]
26. 26
Proving Completeness
Suppose X ๏ฎ Y ๏ F + and define a relational instance
r that satisfies F + but not X ๏ฎ Y:
๏ง Then for some attribute A ๏ Y, X ๏ฎ A ๏ F +
๏ง Let some pair of tuples in r agree on X+ but disagree
everywhere else:
x1 x2 ... xn a1,1 v1 v2 ... vm w1,1 w2,1...
x1 x2 ... xn a1,2 v1 v2 ... vm w1,2 w2,2...
X A X+ โ X R โ X+ โ {A}
27. 27
Proof of Completeness contโd
๏ง Clearly this relation fails to satisfy X ๏ฎ A and X ๏ฎ Y.
We also have to check that it satisfies any FD in F + .
๏ง The tuples agree on only X + .
Thus the only FDโs that might be violated are of the form
Xโ ๏ฎ Yโ where Xโ ๏ X+ and Yโ contains attributes in
R โ X+ โ {A}.
๏ง But if Xโ ๏ฎ Yโ๏ F+ and Xโ ๏ X+ then Yโ ๏ X+ (reflexivity
and augmentation).
Therefore Xโ ๏ฎ Yโ is satisfied.
28. 28
Decomposition
๏ง Consider our original โbadโ attribute set
๏ง We could decompose it into
๏ง But this decomposition loses information about
the relationship between students and courses.
Why?
Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name)
Course(serno, cid)
Subject(cid, subj)
29. 29
Lossless Join Decomposition
R1, โฆ Rk is a lossless join decomposition of R w.r.t. an FD set F if
for every instance r of R that satisfies F,
๏R1
(r) โ ... โ ๏Rk
(r) = r
Consider:
What if we decompose on
(sid, name) and (serno, subj, cid, exp-grade)?
sid name serno subj cid exp-grade
1 Sam 570103 AI 570 B
23 Nitin 550103 DB 550 A
30. 30
Testing for Lossless Join
R1, R2 is a lossless join decomposition of R with respect to F
iff at least one of the following dependencies is in F+
(R1 ๏ R2) ๏ฎ R1 โ R2
(R1 ๏ R2) ๏ฎ R2 โ R1
So for the FD set:
sid ๏ฎ name
serno ๏ฎ cid, exp-grade
cid ๏ฎ subj
Is (sid, name) and (serno, subj, cid, exp-grade) a lossless
decomposition?
31. 31
Dependency Preservation
Ensures we can โeasilyโ check whether a FD X ๏ฎY
is violated during an update to a database:
๏ง The projection of an FD set F onto a set of attributes Z,
FZ is
{X ๏ฎY | X ๏ฎY ๏ F +, X ๏Y ๏ Z}
i.e., it is those FDs local to Zโs attributes
๏ง A decomposition R1, โฆ, Rk is dependency preserving if
F + = (FR1 ๏...๏ FRk)+
The decomposition hasnโt โlostโ any essential FDโs, so we
can check without doing a join
32. 32
Example of Lossless and
Dependency-Preserving Decompositions
Given relation scheme
R(name, street, city, st, zip, item, price)
And FD set name ๏ฎ street, city
street, city ๏ฎ st
street, city ๏ฎ zip
name, item ๏ฎ price
Consider the decomposition
R1(name, street, city, st, zip) and R2(name, item, price)
๏Is it lossless?
๏Is it dependency preserving?
What if we replaced the first FD by name, street ๏ฎ city?
33. 33
Another Example
Given scheme: R(sid, fid, subj)
and FD set: fid ๏ฎ subj
sid, subj ๏ฎ fid
Consider the decomposition
R1(sid, fid) and R2(fid, subj)
๏ Is it lossless?
๏ Is it dependency preserving?
34. 34
FDโs and Keys
๏ง Ideally, we want a design s.t. for each nontrivial
dependency X ๏ฎY, X is a superkey for some
relation schema in R
๏ง We just saw that this isnโt always possible
๏ง Hence we have two kinds of normal forms
35. 35
Two Important Normal Forms
Boyce-Codd Normal Form (BCNF). For every relation
scheme R and for every X ๏ฎ A that holds over R,
either A ๏ X (it is trivial) ,or
or X is a superkey for R
Third Normal Form (3NF). For every relation scheme
R and for every X ๏ฎ A that holds over R,
either A ๏ X (it is trivial), or
X is a superkey for R, or
A is a member of some key for R
36. 36
Normal Forms Compared
๏ง BCNF is preferable, but sometimes in conflict with
the goal of dependency preservation
๏ง Itโs strictly stronger than 3NF
๏ง Letโs see algorithms to obtain:
๏ง A BCNF lossless join decomposition
๏ง A 3NF lossless join, dependency preserving decomposition
37. 37
BCNF Decomposition Algorithm
(from Korth et al.; our book gives recursive version)
result := {R}
compute F+
while there is a schema Ri in result that is not in BCNF
{
let A ๏ฎ B be a nontrivial FD on Ri
s.t. A ๏ฎ Ri is not in F+
and A and B are disjoint
result:= (result โ Ri) ๏ {(Ri - B), (A,B)}
}
38. 38
3NF Decomposition Algorithm
by Phil Bernstein, now @ MS Research
Let F be a minimal cover
i:=0
for each FD A ๏ฎ B in F {
if none of the schemas Rj, 1๏ฃ j ๏ฃ i, contains AB
{
increment i
Ri := (A, B)
}
}
if no schema Rj, 1 ๏ฃ j ๏ฃ i contains a candidate key for R {
increment i
Ri := any candidate key for R
}
return (R1, โฆ, Ri)
Build dep.-
preserving
decomp.
Ensure
lossless
decomp.
39. 39
Summary
๏ง We can always decompose into 3NF and get:
๏ง Lossless join
๏ง Dependency preservation
๏ง But with BCNF we are only guaranteed lossless joins
๏ง BCNF is stronger than 3NF: every BCNF schema is
also in 3NF
๏ง The BCNF algorithm is nondeterministic, so there is
not a unique decomposition for a given schema R