1. 2 December 2005
Introduction to Databases
Relational Database Design
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
2. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2March 10, 2017
Relational Database Design
There are two major relational database design
approaches
Top-down design
develop a conceptual model (e.g. ER model)
reduction (mapping) of the conceptual model to relation schemas
use normalisation as a validation technique to check the quality of
the resulting relation schemas
- a relational database schema resulting from the mapping of a good ER model
(with the correct entity sets) normally requires no further normalisation
Bottom-up design
design by decomposition
use normalisation to iteratively create (decompose) a set of
relations starting with a single relation
3. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3March 10, 2017
Relational Database Design ...
A relation schema might contain certain dependencies in
which case it should be decomposed (normalised) into
multiple smaller relation schemas
this normalisation process is based on functional dependencies
and multivalued dependencies
Sometimes multiple relations resulting from an ER to
relation schema reduction might be merged to save
some join query operations
we have to ensure that the resulting larger relation schema does
not introduce new undesirable dependencies
4. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4March 10, 2017
Reduction
A conceptual ER model can be reduced to a set of
relation schemas (relational database schema)
The quality of the resulting set of relation schemas
depends on the quality of the original ER design (there is
no magic)
In the following we discuss the reduction of the different
ER model concepts introduced earlier
5. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5March 10, 2017
Strong Entity Sets
A strong entity set E with only simple attributes a1,..., an is
mapped to a relation R with attributes a1,..., an
the primary key of the entity set E becomes the primary key of the
relation R
Employees
id name
Employee (id, name)
id name
1234 Beat Signer
1576 Lode Hoste
3212 Sandra Trullemans
... ...
relation schema
employee = (Employee)
6. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6March 10, 2017
Composite Attributes
For each component of a composite attribute, we create
an attribute ai in the relation R
no special attribute is created for the composite attribute itself
Employee (id, name, street, city)
Employees
id name address
street city
7. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7March 10, 2017
Multivalued Attributes
Multivalued attributes are treated separately since a
relation should only contain attributes with atomic values
for each multivalued attribute ai of an entity set E, we create a
new relation S containing the attribute ai as well as the primary
key attributes of the relation R that is created for the entity set E
- define a foreign key constraint to the original relation R
Employees
id name phone
Phones (id, phone)
id phone
1234 032 2 612 1337
1234 032 2 612 3123
1576 032 2 623 8765
... ...
phones = (Phones)
8. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8March 10, 2017
Weak Entity Sets
A weak entity set E with attributes a1,..., an is mapped to a
relation R with attributes a1,..., an combined with the pri-
mary key attributes b1,..., bm of the identifying entity set F
the primary key of R is defined by the primary key attributes of the
identifying entity set F combined with the discriminator of E
a foreign key constraint is defined from the attributes b1,..., bm to
the primary key of the relation that is created for the identifying
entity set F
9. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9March 10, 2017
Weak Entity Sets ...
Seat (id, number, colour)
id number colour
1 1 red
1 20 black
4 1 black
... ... ...
seat = (Seat)
10. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10March 10, 2017
Relationship Sets
A relationship set over the entity sets E1,..., En with the
optional descriptive attributes b1,..., bm is mapped to a
relation R with the primary key attributes of E1,..., En
combined with b1,..., bm
The primary key of relation R is defined as follows
binary many-to-many relationship
- union of all primary key attributes of E1 and E2
binary one-to-one relationship
- choose the primary key of E1 or E2
binary one-to-many or many-to-one relationship
- choose the primary key of the entity set on the "many" side
11. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11March 10, 2017
Relationship Sets ...
The primary key of relation R is defined as follows ...
n-ary relationship without cardinality constraints
- union of all primary key attributes of E1,..., En
n-ary relationship with one 0..1 or 1..1 cardinality
constraint over the entity set Ej
- union of all primary key attributes of E1,..., En , except the primary key of Ej
- note that we allow only one such 0..1 or 1..1 cardinality constraint for
n-ary relationships
A foreign key constraint is defined for each set of primary
key attributes (provided by the entity set Ei) to the
primary key of the corresponding relation that is defined
for Ei
12. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12March 10, 2017
Relationship Sets ...
LocatedAt (id, name, address, duration)
id name address duration
1234 10F721 Pleinlaan 2 1
1576 10F733 Pleinlaan 2 1
... ... ... ...
locatedAt = (LocatedAt)
LocatedAt OfficesEmployees
id name name address
duration
0..* 0..*
size
13. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13March 10, 2017
Relationship Sets ...
LocatedAt (id, name, address, duration)
id name address duration
1234 10F721 Pleinlaan 2 1
1576 10F733 Pleinlaan 2 1
... ... ... ...
locatedAt = (LocatedAt)
LocatedAt OfficesEmployees
id name name address
duration
1..1
size
0..*
14. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14March 10, 2017
Weak Entity Existence Relationship
The special relationship set from a weak entity set to its
defining entity set is always a many-to-one relationship
the special weak entity existence relationship does not have to be
mapped to a separate relation since it is already covered by the
relation that is created for the weak entity set
- e.g. potential Offers relation schema already covered by Seat relation schema
Seat (id, number, colour)
15. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15March 10, 2017
Combination of Schemas
Relations resulting from the mapping of a relationship set
with a total participation constraint can be integrated with
the relation over which the constraint is defined
key of the relation with the constraint (1..1) used as primary key
also works for partial relationships (have to use null values)
LocatedAt OfficesEmployees
id name name address
duration
1..1
size
0..*
Employee (id, employeeName, duration, name, address)
Office (name, address, size)
16. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16March 10, 2017
Specialisation and Generalisation
Create a new relation R for each entity subset
combine the attributes of the entity set with the primary key
attributes of the superclass
Personsid name
Students
ISA
Teachers teaching
hours
studentID
Person (id, name)
Student (id, studentID)
Teacher (id, teachingHours)
17. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17March 10, 2017
Specialisation and Generalisation ...
For a disjoint and total ISA constraint we might omit the
separate superclass relation
saves some join operations but it is no longer possible to define a
foreign key constraint on the id attribute (now at two places)
Personsid name
Students
ISA
Teachers teaching
hours
studentID
disjoint
Student (id, name, studentID)
Teacher (id, name, teachingHours)
18. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18March 10, 2017
Aggregations
Like the regular
relationship set
mapping
note that the name
attribute is the one
from the Companies
entity set
WorksFor CompaniesEmployees
id name name address
Durationsfrom to
Manages
ManagersmId name
Manages (id, from, to, name, address, mId)
19. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19March 10, 2017
Relational Database Design
The goal of relational database design is to create a set
of relation schemas that
can be used to store information without unnecessary redundancy
allow us to easily retrieve information
The quality of the set of schemas resulting from a
reduction (top-down design) depends on how good the
original ER design was
In a design by decomposition approach (bottom-up
design) we need a way to reduce any redundancy via a
decomposition process
split large relation schemas into multiple smaller relation schemas
20. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20March 10, 2017
Update Anomalies
Insertion anomaly
redundant information has to be kept consistent
- e.g. insertion of a new order for an already existing CD
information about a CD can only be inserted if there is an order or
we have to populate the customer information (i.e. name and
street) with null values
id name street cdName price
1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90
2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90
53 Albert Einstein Bergstrasse 18 Chromatic 16.50
5 Max Frisch Bahnhofstrasse 7 Carcassonne 15.50
Order (id, name, street, cdName, price)
order = (Order)
21. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21March 10, 2017
Update Anomalies ...
Modification anomaly
if we want to modify information about a particular CD, we have to
ensure that the information is updated in all redudant entries
- e.g. modification of the price of the CD named "Falling into Place"
Deletion anomaly
if we delete a customer who is the only buyer of a specific CD, we
also lose the information about that specific CD
- e.g. deletion of the customer "Albert Einstein"
id name street cdName price
1 Max Frisch Bahnhofstrasse 7 Falling into Place 17.90
2 Eddy Merckx Pleinlaan 25 Falling into Place 17.90
53 Albert Einstein Bergstrasse 18 Chromatic 16.50
5 Max Frisch Bahnhofstrasse 7 Carcassonne 15.50
22. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22March 10, 2017
Normalisation
Normalisation is a formal method to analyse relation
schemas based on their keys, functional dependen-
cies (FD) as well as multivalued dependencies (MVD)
remove redundancy
prevent certain update anomalies
- insertion, modification and deletion
There exists a set of rules
to check if a relation is in a
specific normal form
original normal forms
described by Codd
Fifth Normal Form (5NF)
Fourth Normal Form (4NF)
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Second Normal Form (2NF)
First Normal Form (1NF)
stronger
23. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23March 10, 2017
Normalisation ...
A relation that does not conform to a certain degree of
normalisation can be decomposed (lossless-join
decomposition) into multiple relations that are in the
desired normal form
can be done automatically
Normalisation is often done in a stepwise manner
a higher normal form means a more restricted format and less
problems with update anomalies
note that only the first normal form (1NF) is mandatory for the
relational model and all the other normal forms are optional
24. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24March 10, 2017
First Normal Form (1NF)
As we have seen earlier, the ER model supports
complex attributes
composite attributes
multivalued attributes
In the reduction process, we remove this substructure
from attributes to create a relational model with atomic
attribute values only
A relation schema R is in first normal form (1NF) if the
domains D1,..., Dn of all attributes a1,..., an of R are atomic
no composite attributes or attributes with a set of values
the intersection of each row and column contains one and only
one value
25. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25March 10, 2017
Functional Dependencies
In this example, there are various sets of attributes that
uniquely identify a set of other attributes
teacherID teacher
teacherID salary
teacherID {teacher, salary}
{teacherID, teacher} {salary}
department {building, budget}
...
We say that there is a functional dependency ()
between these two sets of attributes
a functional dependency should always hold on a relation schema
and not just on a particular relation instance
TeacherDept (teacherID, teacher, salary, department, building, budget)
26. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26March 10, 2017
Functional Dependencies ...
A functional dependency can be used to express
constraints (generalisation of keys) over a set of
attributes (determinant) that uniquely identify a set of
other attributes (dependent attributes)
For a relation schema R with a R and b R the
functional dependency a b holds on R, if for any r(R)
" t1,t2 r(R) with t1[a] = t2[a] t1[b] = t2[b]
Note that any K R is a superkey if K R
we can use functional dependencies to check whether K is a
superkey
27. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27March 10, 2017
Functional Dependencies ...
The relation r(R) contains the follow-
ing set F of functional dependencies
A B
C E
...
A functional dependency a b is trivial if b a
trivial dependencies are satisfied by all relations
A full functional dependency has a minimal determinant
if the determinant is not minimal, we talk about a partial functional
dependency (e.g. AD B in the example)
For a relation r(R) with a b and b we say that is
transitively dependent on a via b
A B C D E
a1 b1 c1 d1 e1
a2 b2 c2 d1 e2
a2 b2 c3 d1 e3
a3 b2 c4 d3 e3
r(R)
28. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28March 10, 2017
Closure of Attributes
For a given relation schema R, a number of functional
dependencies and a set of attributes a R, the closure
a+ is defined by all attributes Bi such that a Bi
Computing the closure
If the closure a+ contains all attributes of the relation
schema R, then the attributes a form a superkey of R
Initialise the set s with the attributes of a
Repeat until the set s does not grow anymore {
if there is a functional dependency b and b is in s, then
add to the set s
}
29. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29March 10, 2017
Computation of Superkeys
We can test whether a is a superkey for a given relation
schema R by checking whether the closure a+ contains
all attributes of R
We can further use this approach to find all the
superkeys for a relation schema R and a given set of
functional dependencies
check for each set a R of attributes whether the closure a+
contains all attributes
the search process can be slightly optimised by starting with the
smallest possible subsets
30. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30March 10, 2017
Functional Dependency Inference
For a given set F of functional dependencies we can
derrive new functional dependencies based on a set of
axioms to compute the closure F+ of F
the closure F+ includes all functional dependencies that are
logically implied by F
Three rules (Armstrong's axioms) can be used to
compute F+
reflexivity
- for a given set of attributes a and b a, a b holds (see trivial dependency)
augmentation
- for given a set of attributes ; if a b then a b holds
transitivity
- if a b and b , then a holds
31. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31March 10, 2017
Functional Dependency Inference ...
Armstrong's axioms are sound (produce only elements
of F+) and complete (produce all elements in F+)
since it may take a lot of time to compute F+ with Armstrong's
axioms only, there exist some additional rules
Decomposition
if a b, then a b and a hold
Union
if a b and a , then a b holds
Trivial dependency rules
if a b, then a a b holds
if a b, then a a b holds
32. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32March 10, 2017
Second Normal Form (2NF)
A relation schema R is in second normal form (2NF)
if it is in 1NF and if there exists no non-prime attribute that
is functionally dependent on a part of a candidate key
every non-prime attribute has to be fully functionally dependent on
a candidate key
a non-prime attribute is
an attribute that is not
part of any candidate key
the Lecturer relation
schema shown in the
example is not in 2NF
since the office attribute
functionally depends on
the teacher attribute
teacher course office
Beat Signer Databases 10G731d
Beat Signer WIS 10G731d
Lode Hoste Databases 10F716
Lode Hoste ATIS 10F716
Sandra Trullemans WIS 10G731e
Lecturer (teacher, course, office)
lecturer = (Lecturer)
33. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33March 10, 2017
Second Normal Form (2NF) ...
2NF normalisation process
remove any partially dependent attributes from the relation and
put them in a new relation together with their determinant
The original Lecturer relation can be losslessly
decomposed into two relations which are both in 2NF
relations with single attribute keys are automatically in 2NF
teacher office
Beat Signer 10G731d
Lode Hoste 10F716
Sandra Trullemans 10G731e
Lecturer (teacher, office)
Course (teacher, course)
teacher course
Beat Signer Databases
Beat Signer WIS
Lode Hoste Databases
Lode Hoste ATIS
Sandra Trullemans WIS
lecturer = (Lecturer)
course = (Course)
34. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34March 10, 2017
Lossless Decomposition
Given a relation schema R and the two decompositions
R1 and R2 of R, we say that R1 and R2 form a lossless
decomposition if pR1
(r) ⋈ pR2
(r) = r
Let F be a set of functional dependencies on R
R1 and R2 form a lossless decomposition of R if either R1 R2 R1
or R1 R2 R2 are in F+
- this means that R1 R2 is a superkey of R1 or R2
35. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35March 10, 2017
Third Normal Form (3NF)
A relation schema R is in third normal form (3NF) if it
is in 2NF and no non-prime attribute is transitively de-
pendent on a candidate key, i.e. for all functional
dependencies a b in F+ one of the following has to
hold
a b is a trivial functional dependency (i.e. b a)
a is a superkey of R
each attribute Ai in b - a is contained in a candidate key of R
- note that each Ai can be in different candidate keys
Each non-key attribute "must provide a fact about the
key, the whole key, and nothing but the key" [Bill Kent]
36. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36March 10, 2017
Third Normal Form (3NF) ...
The Prize relation example schema is in 2NF
The Prize relation schema is not in 3NF since birthdate
is functionally dependent on winner and non of the three
conditions holds for this functional dependency
birthdate is transitively dependent on the key (award, year)
award year winner birthdate
ACM Turing Award 1981 Edgar F. Codd 23.08.1923
Nobel Peace Prize 1979 Mother Teresa 26.08.1910
ACM Turing Award 1984 Niklaus Wirth 15.02.1934
Nobel Peace Prize 1984 Desmond Tutu 07.10.1931
prize = (Prize)
Prize (award, year, winner, birthdate)
37. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37March 10, 2017
Third Normal Form (3NF) ...
3NF normalisation process
remove any transitively dependent attributes from the relation and
place them in a new relation together with their determinant
Decomposition of the Prize relation schema into two 3NF
relation schemas
winner birthdate
Edgar F. Codd 23.08.1923
Mother Teresa 09.01.1959
Niklaus Wirth 15.02.1934
Desmond Tutu 07.10.1931
prize = (Prize)
Prize (award, year, winner)
Birthdate (winner, birthdate)
award year winner
ACM Turing Award 1981 Edgar F. Codd
Nobel Peace Prize 1992 Mother Teresa
ACM Turing Award 1984 Niklaus Wirth
Nobel Peace Prize 1984 Desmond Tutu
bdate = (Birthdate)
38. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38March 10, 2017
Boyce-Codd Normal Form (BCNF)
The Boyce-Codd normal form is a stronger form of 3NF
A relation schema R is in Boyce-Codd Normal
Form (BCNF) if it is in 3NF and if every determinant is a
candidate key, i.e. for all functional dependencies a b
in F+ one of the following holds
a b is a trivial functional dependency (i.e. b a)
a is a superkey of R
Any relation that is in BCNF is also in 3NF since the
BCNF conditions are equivalent to the first two 3NF
conditions
39. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39March 10, 2017
BCNF Decomposition
If a relation R is not in BCNF, then there exists a least
one nontrivial functional dependency a b where a is
not a superkey of R
the relation R can then be decomposed into the two relation
schemas R1 (a b) and R2 (R - (b - a))
We can for example apply the BCNF decomposition to
the previous Prize relation schema example with the
functional dependency winner birthdate
a b = (winner, birthdate)
(R - (b - a)) = (award, year, winner)
Further details about the algorithms for BCNF and 3NF
decomposition can be found in the course book
40. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40March 10, 2017
Multivalued Dependencies
Some relation schemas that are in BCNF may still
contain redundant information
The fourth normal form (4NF) deals with some of these
problems based on multivalued dependencies
for a given relation schema R with a R and b R the
multivalued dependency a ↠ b holds if for all pairs of tuples t1 and
t2 in r(R) (with t1[a] = t2[a]) there exist tuples t3 and t4 in r(R) such
that
- t1[a] = t2[a] = t3[a] = t4[a]
- t3[b] = t1[b]
- t3[R - b] = t2[R - b]
- t4[b] = t2[b]
- t4[R - b] = t1[R - b]
a b R - a - b
t1 a1...ai ai+1...aj aj+1...an
t2 a1...ai bi+1...bj bj+1...bn
t3 a1...ai ai+1...aj bj+1...bn
t4 a1...ai bi+1...bj aj+1...an
41. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41March 10, 2017
Multivalued Dependencies ...
Every functional dependency is also a multivalued
dependency, e.g. if a b then a ↠ b
42. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42March 10, 2017
Fourth Normal Form (4NF)
A relation schema R is in fourth normal fom (4NF) if
it is in BCNF and if any non-trivial multivalued depen-
dency is a dependency on a candidate key, i.e. for all
multivalued dependencies a ↠ b in D+ one of the
following has to hold
a ↠ b is a trivial functional dependency (i.e. b a or b a = R)
a is a superkey of R
Note that the fourth normal form is very similar to BCNF
except that we use multivalued dependencies
4NF normalisation process
remove any multivalued attributes from the relation and
place them in a new relation together with their determinant
43. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43March 10, 2017
Fifth Normal Form (5NF)
There are some forms of constraints called join
dependencies that generalise multivalued dependencies
leads to the project-join normal form or fifth normal form (5NF)
not discussed in detail in this course
44. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 44March 10, 2017
Normalisation Summary
Relations in higher normal forms are less vulnerable to
update anomalies
generally it is recommended that relations are at least in 3NF
Fifth Normal Form (5NF)
Fourth Normal Form (4NF)
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Second Normal Form (2NF)
First Normal Form (1NF)
stronger
Unnormalised (UN)
remove repeating groups
remove partial dependencies
remove transitive dependencies
every determinant has to be a candidate key
remove multivalued dependencies
remove join dependencies
45. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 45March 10, 2017
Denormalisation
Sometimes a database designer decides to store
information in a redudant way to save join operations
and improve the performance
may result in additional work for insert, update and delete
operations
An alternative is to keep the normalised schema and
introduce additional materialised views
46. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 46March 10, 2017
Homework
Study the following chapter of the
Database System Concepts book
chapter 7
- sections 7.6 and 7.8.6
- Reduction to Relation Schemas
chapter 8
- sections 8.1-8.9
- Relational Database Design
47. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 47March 10, 2017
Exercise 4
Relational algebra
Relational database design
ER to relational model reduction
48. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 48March 10, 2017
References
A. Silberschatz, H. Korth and S. Sudarshan,
Database System Concepts (Sixth Edition),
McGraw-Hill, 2010