2. Module Objective:
After completing this Module, you should :
Understand what is a Database System
Explain briefly different types of Database Systems
Be able to create a Database environment with ER Modeling
Have a broad overview on Relational Database Management
System
Have an introduction to Structured Query Language
Understand how the DBMS & its host computer system
intercommunicate
Be aware of the new trends in Database
3. Module Outline
1.
2.
What is a
Database System
Types of
Database Systems
5.
6.
Internal
Management
Database
Trends
3.
4.
Creating a
Database
Environment
Structured
Query
Language
4. 1.0 Database System
Learning Objective:
At the end of this Topic you will be
able to –
•
Understand what is a Database System
•
Know how files are organized
•
Appreciate the advantages of using a
DBMS over a traditional file system
•
Be aware of the Database Architecture
5. What is a Database System
A Database System is essentially a
computerized record-keeping system.
A database-management system (DBMS)
consists of a collection of interrelated data and a
set of programs to access those data.
Database systems are designed to manage
large volume of information
6. File Organization : Terms and
Concepts
Database: Group of related
files
File: Group of records of same
type
Record: Group of related fields
Field: Group of words or a
complete number
Byte: Group of bits that
represents a single character
Bit: Smallest unit of data;
binary digit (0,1)
Data Hierarchy in a
Computer System
7. File Organization : Terms and
Concepts
Entity: Person, place, thing, event about
which information is maintained
Attribute: Description of a particular entity
Key Field: Identifier field used to
retrieve, update, sort a record
8. File Organization : Terms and
Concepts
Problems with the Traditional File Environment
Data redundancy
Program-Data dependence
Lack of flexibility
Poor security
Lack of data-sharing and
availability
No concurrency control
Traditional File Processing
9. DBMS and its Advantages
•
A Database Management System is a collection of programs that
enables users to create and maintain a database. It is a general purpose
software system that facilitates processes of defining, constructing and
manipulating databases for various applications.
•
Advantages of Database approach:
• Controlling Redundancy
• Restricting Unauthorized access
• Providing persistent storage for program objects and data
structures
• Permitting inference and actions using deduction rules
• Providing multiple user interface
• Representing complex relationships among data
• Enforcing integrity constraints
• Providing backup and recovery
10. Database Management System
(DBMS)
Acts as an interface between
application programs and physical
data files.
Separates logical and physical
views of data
Eliminates redundancy of data
Creates and maintains databases
Enforces security of data
Figure 7-4
11. DBMS Architecture
•
Internal Schema : Describes
physical storage structure of
database
•
Conceptual Schema :
Describes structure of whole
database for a community of
users.
•
External Schema : Each view
describes that part of
database that a particular
user requires, and hides the
rest.
12. DBMS Architecture
•
Data Independence
Logical data independence :
capacity to change conceptual schema
without having to change external
schema.
Physical data independence :
capacity to change internal schema
without changing conceptual schema.
13. Functions of DBMS
•
Data definition :
•
•
Data manipulation :
•
•
Enforces certain controls for recovery and concurrency
Data dictionary:
•
•
Monitors user requests and rejects any unauthorized attempts
Data recovery and concurrency :
•
•
Manipulates data in a database
Data security and integrity :
•
•
Specifies content and structure of database and defines each data
element
Stores definitions of data elements, and data characteristics
Performance :
•
Functions should be performed efficiently
14. Requirements of a DBMS
Key elements in a database environment:
•
Data Administration
•
Data Planning and Modeling Methodology
•
Database Technology and Management
•
Users
15. Database System : Recap
•
Why do businesses have trouble finding the information
they need in their information systems?
•
How does a database management system help businesses
improve the organization of their information?
•
What are the advantages of using a DBMS over a traditional
file system
•
State the major functions and requirements of a DBMS
16. Quiz
If a Customer Database has the following
fields : EmpId, EmpName, Salary and
DeptName, What would be the ideal Key field
and why ?
EmpID
EmpName
DeptName
EmpId+DeptName
17. 2.0 Types of Databases
Learning Objective:
At the end of this Topic you will be able to –
•
Explain briefly the various types of Database Systems
•
Relational DBMS
•
Hierarchical DBMS
•
Network DBMS
•
Object-Oriented Databases
18. Relational Database Model
•
•
Represents data as two-dimensional tables called relations
Relates data across tables based on common data element
Examples: DB2, Oracle, MS SQL Server
19. Three Basic Operations in a
Relational Database
•
Select: Creates subset of rows that meet
specific criteria
•
Join: Combines relational tables to provide
users with information
•
Project: Enables users to create new tables
containing only relevant information
21. Hierarchical Database Model
•
•
•
•
•
It is a pointer based model
Organizes data in a tree-like structure
Stores data in tables and views relationships as links
Supports one-to-many parent-child relationships
Prevalent in large legacy systems
22. Network DBMS
Depicts data logically as many-to-many relationships
Organizes data in tables and views relationships as links
It is also a pointer based model
Organizes data in arbitrary graphs
23. Hierarchical and Network
DBMS
Some of the Disadvantages
Outdated
Complex pointer based organization
Less flexible compared to RDBMS
Lack support for ad-hoc and English language-like
queries
24. Object-Oriented Databases
Object-oriented DBMS: Stores data and
procedures as objects that can be retrieved
and shared automatically
Object-relational DBMS: Provides capabilities
of both object-oriented and relational DBMS
25. Types of Databases : Summary
•
In a relational database the data is
perceived as tables (and nothing but
tables) by the user
•
The relational operators available are used
to manipulate the data in the tables
26. 3.0 Creating a DB environment
Learning Objective:
At the end of this Topic you will –
• Have the ability to model an application system based
on the E-R Modeling approach.
• Understand the Relational Database concepts like
Normalization, Data Integrity, Relational Operations
like Union, Intersection etc.
• Be able to Design Relational Databases based on E-R
Models or System Requirements for an application.
27. Introduction to Data Modeling
What is Data Modeling?
A technique for analyzing requirements
and for identifying the information needs of
an organization
•
Why Data Modeling is important?
Cannot build a good system without knowing
what data needs to be captured and how it
needs to be organized
28. Introduction to Data Modeling
•
An Overview :
•
•
Data structures include the data objects, the associations
between data objects, and the rules which govern operations
on the objects
•
Focuses on what data is required and how it should be
organized
•
•
Conceptual representation of the data structures required by
a database
Independent of hardware or software constraints
Data Model And Database Design:
•
Data Model is to a Database what a Building plan or a
blueprint is to a Building
•
A Database Design translates a data model into a database
•
A Data Model is the conceptual design of a database
29. E-R Modeling
Originally proposed by Peter Chen (1976)
Views the real world as entities and relationships
Key component is the E-R Diagram
Most common model used for designing relational databases
• Entity- An identifiable object or concept of
significance
• Attribute- Property of an entity or relationship
• Relationship- An association between entities
• Identifier- one or more attributes identifying an
instance
(occurrence) of an entity
32. E-R Modeling
•
Entity
•
Any object or thing of significance about which data needs to
be collected and maintained
•
Could be
•
•
•
Concrete or tangible like a person or a building
Abstract like a concept or activity
Analogous to a table in a relational database
Examples: EMPLOYEES, PROJECTS, INVOICES
33. E-R Modeling
•
Entity Rules
•
Any thing or object may only be represented by one entity. Entities are
mutually exclusive in all cases.
•
Each entity must be uniquely identifiable. Each instance (occurrence) of
an entity must be separate and distinctly identifiable from all other
instances of that type of entity.
•
Entity Classification and Types
•
Classified as dependent and independent
•
An independent entity is one that does not rely on another for
identification
•
A dependent entity is one that relies on another for identification
•
In some, methodologies, the terms used are strong and
weak, respectively
34. E-R Modeling
•
Entity Classification and Types
•
Fundamental entity - An entity that exists and is of interest in its own right.
Generally, most entities in the data model are fundamental entities.
Example :Department and Employee are both fundamental entities
•
Special Entity Types
•
Associative Entity -Used to associate two entities in order to reconcile a
many-many relationship
•
Sub-type/super-type- Used in generalization hierarchies to represent a
subset of instances of their of parent entity
35. E-R Modeling
Example of Associative entity :
ORDER
has
belongs to
ORDER LINE
for a
appears on
ITEM
36. E-R Modeling
•
Generalization Hierarchies
•
Generalization occurs when two or more entities
represent categories of the same real-world object.
Example: CAR and TRUCK represent categories of the
same entity, VEHICLE is the super-type; CAR and
TRUCK would be the subtypes
37. E-R Modeling
• Generalization Hierarchies
•
Form of abstraction that specifies that two or more
entities that share common attributes can be
generalized into a higher level entity type called a
super-type or generic entity.
•
The lower-level of entities become the sub-type, or
categories, to the super-type. Sub-types are
dependent entities.
38. E-R Modeling
• Generalization Hierarchies
•
Sub-types can be either mutually exclusive (disjoint) or overlapping
(inclusive)
•
In an overlapping hierarchy an entity instance can be part of multiple
subtypes
Example: Entity PERSON represents people at a university. It has three subtypes,
FACULTY, STAFF, and STUDENT. A STAFF member could also be registered as a
STUDENT
PERSON
STUDENT
STAFF
FACULTY
39. E-R Modeling
•
Generalization Hierarchies
•
In a disjoint hierarchy, an entity instance can
be in only one subtype.
Example: Entity EMPLOYEE, may have two subtypes,
CLASSIFIED and WAGES. An employee may be one
type or the other but not both
41. E-R Modeling
•
Attribute
•
Attributes describe a property or a characteristic of an entity
•
A particular instance of an attribute is a value.
For example “John Doe” is one value of the attribute Name.
•
Simple attribute
Contains only atomic values
•
Composite attribute
Has component attributes
FName
MI
Student
Name
DOB
Simple
LName
Composite
42. E-R Modeling
•
Attribute Classification
• Single-valued attribute
• Has exactly one value per instance of an entity
• Multi-valued attribute
• Contains repeating values per instance of an entity
Multi-valued
Singlevalued
Math
Module
Id
Student
Physics
43. E-R Modeling
•
Identifiers and Descriptors
• Attributes can be classified as identifiers or descriptors
•
Identifiers, more commonly called keys, uniquely identify an
instance of an entity.
• A descriptor describes a non-unique characteristic of an entity
instance.
An Example :
Entity: Employee
Unique Identifier: Employee No.
Descriptor: Name, DOJ, DOB
44. E-R Modeling
• Relationship
• Represents an association between two or more entities
Examples
- Employees work for Departments
- Departments manage one or more projects
- Employees are assigned to projects
- Projects have sub-tasks
- Orders have line items
• Defined in terms of:
- Degree
- Connectivity
- Cardinality
- Direction
- Type
- Existence
45. E-R Modeling
•
Degree
•
•
Binary relationships, the association between two entities is the
•
most common type in the real world. N-ary is the general form for
•
•
Number of entities associated with the relationship
degree n
Connectivity
•
•
•
Mapping of associated entity instances in the relationship.
The values of connectivity are "one" or "many”.
Cardinality
Actual number of related occurrences for each of the two entities.
The basic types of connectivity for relations are: one-to-one, one-to-many, and manyto-many.
46. E-R Modeling
• Connectivity and Cardinality
• A one-to-one (1:1) relationship is when at most one instance of a entity
A Is associated with one instance of entity B.
For example:
Employees in the company are each assigned their own office. For each
Employee there exists a unique office and for each office there exists a
unique employee.
• A one-to-many (1:N) relationships is when for one instance of entity A,
there are zero, one, or many instances of entity B, but for one instance
of entity B, there is only one instance of entity A.
An example :
A department has many employees each employee is assigned to
one department
47. E-R Modeling
• Connectivity and Cardinality
• A many-to-many relationship, is when for one
instance of entity A, there are zero, one, or many
instances of entity B and for one instance of entity
B there are zero, one, or many instances of entity
A.
An example is:
employees can be assigned to no more than two projects at the
same time; Project must have assigned at least three employees
48. E-R Modeling
•
Direction
•
Indicates the originating entity of a binary relationship. The entity
from which a relationship originates is the parent entity; the entity
where the relationship terminates is the child entity.
•
Type
•
The direction of a relationship is determined by its connectivity.
Identifying and Non-identifying
•
An identifying relationship is one in which one of the child entities
is also dependent entity.
•
A non-identifying relationship is one in which both entities are
independent.
49. E-R Modeling
•
Existence
•
•
•
•
Denotes whether the existence of an entity instance is dependent
upon the existence of another, related, entity instance.
Defined as either mandatory or optional.
Mandatory and optional relationship
If an instance of an entity must always occur for an entity to be included in a
relationship, then it is mandatory. If the instance of the entity is not required, it
is optional.
Example:
Mandatory : Every project must be managed by a single department
Optional : Employees may be assigned to work on projects
50. E-R Modeling
•
E-R Notation
•
No standard notation
•
Original notation by Chen
•
Common notations are: Bachman, crow's foot, and
IDEFIX
•
All styles represent entities as rectangular boxes and
relationships as lines connecting boxes
•
Each style uses a special set of symbols to represent
the cardinality of a connection
51. E-R Modeling
• Entities
• Represented by labeled rectangles
• The label is the name of the entity
• Entity names should be singular nouns.
• Relationships
• Represented by a solid line connecting two
entities.
• Name written above the line
• Relationship names should be verbs
Employee
Works for
Department
52. E-R Modeling
•
Attributes
• Listed inside the entity rectangle
Underlined
• Names should be singular nouns
Cardinality
• Many is represented by a line ending in a
crow's foot. If omitted, cardinality is one
Existence
• Represented by placing a circle or a
perpendicular bar on the line
• Mandatory existence is shown by the bar next
to the entity for an instance that is required
• Optional existence is shown by placing a
circle next to the entity that is optional
•
•
•
Employee
•EmpID
•EmpName
53. E-R Modeling : Assignment
How to create an E-R Model from Requirements ?
Step 1: Identify Entities
•
Entities are things people talk about, record information about and do work on –
by definition
•
Any keyword (noun) is a candidate
•
Identify generic object from reference to instances or occurrences
•
Combine synonyms to represent a single entity
An Example : Purchase Order - System Requirements
A buyer creates a purchase order (PO) as and when the need arises. A PO is for a
Specific vendor. A PO has one or more line items. A buyer cannot create a PO of
Total value more than his approval limit. A PO can be sent to the vendor by mail,
fax, EDI. A PO can be canceled before it is submitted. A PO can be linked to a
sales order…
54. E-R Modeling
Step 1: Identify Entities
• Entities
Purchase Order (PO)
Buyer?
Vendor
Line Items
Sales Order
Approval Limit?
• Buyer characterizes a PO
• Approval Limit characterizes a Buyer
What does it tell us?
•
•
•
Approval Limit is not an entity
Buyer is an entity
Approval Limit is an attribute of the entity Buyer
55. E-R Modeling
Step 2: Identify Relationships
Look for phrases describing a link between two things or
objects
Verbs relating two nouns often suggest relationships
e.g. A buyer creates a purchase order, A purchase order
has one or more
Lines
Requirements may or may not contain information
regarding degree,
existence, cardinality of a relationship up front
Further questioning may need to be done to determine
the above
56. E-R Modeling
Step 2: Identify Relationships
Grid Technique
PO
PO
replaced by
Buyer
Buyer
creates a
is approver Vendor
of
Vendor
supplies
against a
-
-
Line
belongs to a
-
created for
item supplied
by
Line
-
57. E-R Modeling
Step 2 : Identify Relationships
•
Analyzing Existing Systems (Files, Databases)
•
Look for
Foreign Keys
Repeating Groups
•
Pointers
Structured Codes
All of the above imply relationships
58. E-R Modeling
• Step 3 : Identify Attributes
• An attribute is any detail that server to identify, classify, quantify
or
express the the state of an entity
• Ask the following question for each entity “What information do
you need to know or hold about …?”
• Potential attributes are easily found by examining paper forms
59. E-R Modeling
•
Step 3: Identify Attributes
Example Purchase Order Form
Purchase Order No __________
Buyer _________ Vendor ___________
Date Created ______
No Item
Quantity Value
___ ___________ ______ __________
___ ___________ ______ __________
___ ___________ ______ __________
Shipping Address
Street _________
City __________
Total Value ______
Zip _______
• Purchase Order No
• Vendor
• Buyer
• Date Created
• Item?
• Address
• City
• State
• Zip
• Total Value?
60. E-R Modeling
E-R Model of the Purchase Order Example
creates
BUYER
created by
created for a
PURCHASE
ORDER
has
supplies against
belongs to
exists on
ITEM
LINE
created for
VENDOR
61. E-R Modeling
Major Modeling Techniques
Peter Chen‟s original entity/relationship
diagrams
Information Engineering
Richard Barker‟s notation, used by Oracle
corporation
IDEF1X
Object Role Modeling
Unified Modeling Language (UML)
Extensible Markup Language (XML)
62. E-R Modeling
•
Major Modeling Techniques
•
Data Modeling has sets of two audiences:
•
User community - Uses the models to verify that the analysts understand
their environment and their requirements.
•
Systems designers - Use the business rules implied by the models as the
basis for their design of computer systems.
•
Different techniques are better for one audience or the other.
•
All techniques are fundamentally the same
•
Differences are mainly in syntactic or notational
63. Relational Model
Objective :
•
•
To give an informal introduction to relational
concepts especially as they
relate to relational database design issues.
What it is not ?
This does not give a complete description of relational
theory.
64. Relational Model
Formally introduced by Dr. E. F. Codd in 1970
Represents data in the form of two-dimension
tables
A relational database is a collection of two-
dimensional tables
Basic understanding of the model needed to design
and use relational databases
65. Relational Model
Tables, Columns and Rows
Relationships and Keys
Data Integrity
Normalization
What is a table?
•
Represents some real-world person, place, thing, or
event
•
Two-dimensional
•
•
Columns
Rows
Course No.
Course_Title
C_Hrs. Dept. C
CIS 120
Intro to CIS
4 Cis
MKT 333
Intro to Mkting
3 MKT
ECO 473
BA201
CIS 345
Labor Econ.
Intro to Stat.
Intro to Dbase
3 ECO
5 ECO
4 CIS
66. Relational Model
Table
•
•
•
•
Columns represent a property of the person, place, thing or
event that the table represents
Rows represent an occurrence or instance of what the table
represents
A data value is stored in the intersection of a row and
column
Each named column has a domain, which is the set of
values that may appear in that column
Empid
Level
DOJ
Manager
101412
Employee
Name
John
M3
4/10/98
101667
102235
Nancy
M4
1/23/01
101412
101398
Mike
S1
8/15/95
101667
101667
Jeff
M2
6/2/96
100351
103893
Cindy
M3
7/17/95
101284
101116
Rahul
S2
2/20/00
101412
102739
Scott
C1
4/13/01
101667
67. Relational Model
Table - Terminology
In this
document
Formal Terms
Many Database
Manuals
Table
Relation
Table
Column
Attribute
Field
Row
Tuple
Record
68. Relational Model
• Salient features of a relational table
• Values are atomic (1NF)
• Column values are of the same kind (Domain)
• Each Row is unique (Primary Key)
• Sequence of columns is insignificant
• Sequence of rows is insignificant
• Each column must have a unique name
• Relationships and Keys
• Keys - Fundamental to the concept of relational
databases
• Relationship - An association between two or more
tables defined by means of keys
69. Relational Model
• Primary Key
• Column or a set of columns that uniquely identify a row
in a
table
•
Must be unique and must have a value
• Foreign Key
• Column or set of columns which references the primary key
or a unique key of another table
•
Rows in two tables are linked by matching the values of the
foreign key in one table with the values of the primary key in
another
•EMP_ID in table EMPLOYEE is the primary key
• DEPT_NO in table DEPARTMENT is the primary key
• DEPT_NO in table EMPLOYEE is a foreign key
Examples
70. Relational Model
• Data Integrity
• Ensures correct and consistent navigation and manipulation of
relational tables
• Two types of integrity rules
• Entity integrity
•
Referential integrity
• The entity integrity rule states that the value of the primary key
can never be a null value
• The referential integrity rule states that if a relational table has a
foreign key, then every value of the foreign key must either be null
or match the values in the relational table in which that foreign key
is a primary key
71. Relational Model
• Data Manipulation
• Relational tables are equivalent to sets
• Operations that can be performed on sets can be
performed on relational tables
• Relational Operations such as :
• Selection
•
•
•
•
•
•
•
Projection
Join
Union
Intersection
Difference
Product
Division
INTERSECTION
UNION
DIFFERENCE
72. Relational Model
• Selection
• The select operator, sometimes called restrict to prevent confusion with
the SQL SELECT command, retrieves subsets of rows from a relational
table based on a value(s) in a column or columns
A
B
C
D
E
1
A
212
Y
2
2
C
45
N
84
3
B
8656
N
4
4
D
324
N
56
5
C
5656
Y
34
6
A
445
N
4
7
B
546
Y
55
73. Relational Model
• Projection
• The project operator retrieves subsets of columns from a relational table
removing duplicate rows from the result
A
B
C
D
E
1
A
212
Y
2
2
C
45
N
84
3
B
8656
N
4
4
D
324
N
56
5
C
5656
Y
34
6
A
445
N
4
7
B
546
Y
55
74. Relational Model
• Product
• The product of two relational tables, also called the Cartesian Product, is the
concatenation of every row in one table with every row in the second.
• The product of table A (having m rows) and table B (having n rows) is the table
C (having m x n rows). The product is denoted as A X B or A TIMES B
ak ax
ay
bk bx
by
y
1
A
2
1
A
2
1
A
2
1
A
2
4
D
8
B
4
1
A
2
5
E
10
3
C
6
2
B
4
1
A
2
k
Table B
x
2
Table A
k
x
y
2
B
4
4
D
8
1
A
2
2
B
4
5
E
10
4
D
8
3
C
6
1
A
2
5
E
10
3
C
6
4
D
8
3
C
6
5
E
10
A TIMES B
75. Relational Model
• Join
•
•
•
Combines the product, selection and projection operations
Combines (concatenates) data from one row of a table with rows from
another or same table
Criteria involve a relationship among the columns in the join relational table
If the join criterion is based on equality of column value, the result is called an equi join
A natural join is an equi join with redundant columns removed
Joins can also be done on criteria other than equality. Such joins are called non-equi joins
k
k
a
1
A
2
Table B
c
Equi-Join
2
B
4
k
3
C
6
1
bb
5
cc
b k
c
1
A
2 1
aa
C
6 3
bb
aa
3
a
3
b
Table A
Natural Join
k
a
b c
1
A
2 aa
3
C
6 bb
76. Relational Model
• Union
•
•
The UNION operation of two tables is formed by appending rows from one
table to those of a second to produce a third. Duplicate rows are eliminated
Tables in an UNION operation must have the same number of columns and
corresponding columns must come from the same domain
A Union B
k
k
x
1
A
2
2
B
4
3
C
6
x
y
1
Table A
k
A
2
4
D
8
5
E
10
A
2
2
Table B
y
1
y
x
B
4
3
C
6
4
D
8
5
E
10
77. Relational Model
•
•
The UNION operation of two tables is formed by appending rows from one table
to those of a second to produce a third. Duplicate rows are eliminated
Tables in an UNION operation must have the same number of columns and
corresponding columns must come from the same domain
A Union B
k
x
y
k
x
y
1
A
2
1
A
2
2
B
4
2
B
4
3
C
6
4
D
8
5
E
10
3 Table A 6
C
Table B
k
x
y
1
A
2
4
D
8
5
E
10
78. Relational Model
• Intersection
• The intersection of two relational tables is a third table that contains
common rows. Both tables must be union compatible. The notation for the
intersection of A and B is A [intersection] B = C or A INTERSECT B
k
x
y
1
A
2
2
B
4
3
C
6
A Intersect B
y
1
A
2
Table B
y
A
2
D
8
5
x
x
4
k
k
1
Table A
E
10
79. Relational Model
• Difference
• The difference of two relational tables is a third that contains those
rows that occur in the first table but not in the second. The Difference
operation requires that the tables be union compatible.
The notation for difference is A MINUS B or A-B. As with arithmetic, the order of
subtraction matters. That is, A - B is not the same as B - A.
k
x
y
1
A
2
2
B
4
3
C
6
k
x
y
1
A
2
4
D
8
5
E
10
A MINUS B
Table B
y
B
4
3
B MINUS A
x
2
Table A
k
C
6
k
x
y
4
D
8
5
E
10
80. Relational Model
• Division
• The division operator results in columns values in one table for which
there are other matching column values corresponding to every row in
another table.
k
x
y
k
1
A
2
1
1
B
4
3
2
A
2
x
y
3
B
4
A
2
4
B
4
B
3
A
Table A
2
A DIV B
4
Table B
81. Normalization
Normalization theory is based on the concepts of normal forms. A
relational table is said to be a particular normal form if it satisfied
a certain set of constraints.
We shall discuss four normal forms in this Module.
What is Functional Dependency ?
The concept of functional dependency is the basis for the first three normal forms.
A column Y of a relational table is said to be functionally dependent upon column X
when values of column Y are uniquely identified by values of column X.
Full functional dependence applies to tables with composite keys. Column Y in relational
table R is fully functional on X of R where X is a composite key if it is functionally
dependent on X and not functionally dependent upon any subset of X.
83. Normalization
An Example : A company obtains parts from a number of suppliers. Each
supplier is located in one city. A city can have more than one supplier located
there and each city has a status code associated with it. Each supplier may
provide many parts.
The company creates a simple relational table to store this information:
FIRST (s#, status, city, p#, qty)
s#
status
City
p#
Qty
Supplier identification number
Status code assigned to city
City where supplier is located
Part number of part supplied
Qty of parts supplied to date
Composite primary key is (s#, p#)
84. Normalization
• FIRST NORMAL FORM –1NF
A relational table is said to be in the first normal form if all values of the columns
are atomic. That is, they contain no repeating values.
s#
city
status
p#
qty
s1
London
20
p1
300
s1
London
20
p2
100
s1
London
20
p3
200
s1
London
20
p4
100
s2
Paris
10
p1
250
s2
Paris
10
p3
100
s3
Tokyo
30
p2
300
s3
Tokyo
30
p4
200
85. Normalization
•
SECOND NORMAL FORM – 2NF
•
Table FIRST contains redundant data. Redundancy causes update
anomalies.
•
Update anomalies - problems that arise when information is inserted,
deleted, or updated.
• INSERT. The fact that a certain supplier (s5) is located in a particular city
(Athens) cannot be added until they supplied a part.
• DELETE. If a row is deleted, then not only is the information about quantity and
part lost but also information about the supplier.
• UPDATE. If supplier s1 moved from London to New York, then six rows would
have to be updated with this new information.
86. Normalization
A relational table is in second normal form 2NF if it is in 1NF and every non-key
column is fully dependent upon the primary key. That is, every non-key column
must be dependent upon the entire primary key.
FIRST is in 1NF but not in 2NF because status and city are functionally
dependent upon only on the column s# of the composite key (s#, p#).
Steps for transforming a 1NF table to 2NF is:
1. Identify any determinants other than the composite key, and the columns they
determine.
2. Create and name a new table for each determinant and the unique columns it
determines.
3. Move the determined columns from the original table to the new table.
Determinate becomes the primary key of the new table.
4. Delete the columns you just moved from the original table except for the
determinate which will serve as a foreign key.
87. Normalization
• SECOND NORMAL FORM – 2NF
• Modification Anomalies
• Tables in 2NF but not in 3NF still contain modification
anomalies:
• INSERT. The fact that a particular city has a certain status
(Rome has a status of 50) cannot be inserted until there is a
supplier in the city.
• DELETE. Deleting any row in SUPPLIER destroys the
status information about the city as well as the association
between supplier and city.
88. Normalization
SECOND NORMAL FORM – 2NF
PARTS
s#
p#
qty
s1
p1
300
s1
p2
100
SECOND
s1
p3
200
s#
s1
p4
100
s1
London
20
s2
p1
250
s2
Paris
10
s2
p3
100
s3
Tokyo
30
s3
p2
300
s3
p4
200
city
status
89. Normalization
•
THIRD NORMAL FORM – 2NF
A relational table is in third normal form (3NF) if it is already in 2NF and
every non-key column is non transitively dependent upon its primary key.
In other words, all non-key attributes are functionally dependent only
upon the primary key.
SUPPLIER
s#
city
status
s1
London
20
s2
Paris
10
s3
Tokyo
30
s4
Paris
10
The table supplier is in 2NF but not in
3NF because it contains a transitive
dependency
SUPPLIER.s# —> SUPPLIER.city
SUPPLIER.city —>
SUPPLIER.status
SUPPLIER.s# —> SUPPLIER.status
90. Normalization
•
Steps for transforming a table into 3NF is:
1. Identify any determinants, other the primary key, and the columns they
determine.
2. Create and name a new table for each determinant and the unique
columns it determines.
3. Move the determined columns from the original table to the new table.
The determinant becomes the primary key of the new table.
SUPPLIER
s#
CITY_STATUS
city
s1
The transformation of
SUPPLIER into 3NF
city
status
London
London
20
s2
Paris
Paris
10
s3
Tokyo
Tokyo
30
s4
Paris
Rome
50
s5
London
91. •
Normalization
Advantages of 3rd Normal form :
• Eliminates redundant data which in turn saves space and
reduces manipulation anomalies.
Example:
INSERT: Facts about the status of a city, Rome has a status of
50, can be added even though there is not supplier in that
city.
DELETE: Information about supplier can be deleted without
destroying information about a city.
UPDATE: Changing the location of a supplier or the status of a
city requires modifying only one row.
s#
city
CITY_STATUS
city
status
s1
s2
SUPPLIER
Paris
London
20
s3
The transformation of
SUPPLIER into 3NF
London
Tokyo
Paris
10
s4
Paris
Tokyo
30
s5
London
Rome
50
92. Normalization
•
Advanced Forms :: BOYCE CODD NORMAL FORM
Many practitioners argue that placing entities in 3NF is generally
sufficient because it is rare that entities that are in 3NF are not
also in 4NF and 5NF. The advanced forms of normalization are:
Boyce-Codd Normal Form
Fourth Normal Form
Fifth Normal Form
Boyce-Codd normal form (BCNF) is a more rigorous version of
the 3NF.
BCNF is based on the concept of determinants. A determinant
column is one on which some of the columns are fully
functionally dependent.
A relational table is in BCNF if and only if every determinant is a
candidate key.
93. Database Design
•
This section presents and discusses –
•
•
•
How to translate the E-R (conceptual) model
(diagram) to an RDBMS (logical) schema.
Exercise on E-R Modeling and Database Design
Some Guidelines • Entities: Create one table for each simple (not a
sub-type or super-type) entity.
• Attributes: Map each attribute to a candidate
column with a more precise format.
• Optional attributes become null columns
• Mandatory attributes become not null columns
• Unique Identifier: Convert the components of the
unique identifier to the primary key of the table.
94. Database Design
•
Sub-types: A sub-type entity is simply an entity with its own attributes
or relationships, but it also inherits any attributes and/or relationships
from its parent entity (super-type)
•
1:1 relationships: Merge the two entities into a single table, keeping
all attributes. Identify (add if needed) the primary key.
•
1:Many relationships: Create two tables, one for each entity. Post
the primary key from the 1 side to the N side (add attributes), and
identify it as a foreign key. (Add the primary key from the 1 side to the
attributes on the Many side. The posted attributes are a foreign key.)
•
M:N (Many:Many) relationships: Create a new (bridge) table and
post the primary keys from both entities as attributes in the new table.
The posted attributes are foreign keys.
95. Database Design
A few comments…
There are more rules, treating exceptions, but these
are good enough in most cases
There may occur reasons to violate the rules.
Always: use common sense and expect iterative
development.
Use CASE tools like Erwin wherever possible. Tools
can automatically generate SQL table definitions
from drawn E-R diagrams.
97. Creating a DB environment : Summary
The first step in designing a database application is to
understand what information the database needs to store and
what integrity constraints or business rules apply to the data.
Data Model is to a Database what a Building plan or a
blueprint is to a Building. It is the conceptual model of the
Database.
Given a relational schema we need to decide whether it is a
good design or whether we need to decompose it into smaller
relations. Normalization gives the guidance to such
decomposition.
98. 4.0 Structured Query Language
Learning Objectives:
At the end of this Topic you will be able to –
• Write simple SQL queries
• Get familiar with the various relational operations
such as SELECT,
PROJECT and JOIN
99. An Introduction
• Structured Query Language - (SQL) is the most widely
used commercial relational database language. The SQL
has several parts :
•DML – The Data Manipulation Language (DML)
•DDL – The Data Definition Language (DDL)
•Embedded and dynamic SQL
•Security
•Transaction management
•Client-server execution and remote database access
SELECT column-list FROM table-names WHERE condition(s)
100. Query Processing
Query Processing
•
Query in a High Level Language (typically a 4 GL)
•
Parsing : The parser converts a query, submitted by a database user and
written in a high-level language, into an algebraic operators expression.
•
Optimization : It is the key Topic for query processing design. It receives the
expression and builds a good execution plan. The plan determines the order of
execution of the operators and selects suitable algorithms for implementation of
the operators.
•
Code Generation for the Query : The planned code is built with the aim of
retrieving the result of the query with high performance.
•
Code execution by Database Processor : The query plan is executed by the
execution engine Topic that delivers the result for the user.
•
Result of the Query
102. Query Processing
• The SQL Select Statement performs three Types of Operations
1. Projection
SELECT column-list FROM tables-names
WHERE condition(s)
2. Join
3. Selection
103. Performing Projection
SELECT Module_Title, C_Hrs FROM Module
Module
Result Table
Cours e No.
Cours e_Title
C_Hrs . Dept. C
CIS 120
Intro to CIS
4 Cis
Intro to C IS
4
M K T 333
Intro to M k ting
3 MKT
Intro to M k ting
3
E CO 473
B A 201
CIS 345
Labor E c on.
Intro to S tat.
Intro to Dbas e
3 E CO
5 E CO
4 CIS
Labor E c on.
Intro to S tat.
Intro to D bas e
3
5
4
C ours e_Title
C _H rs .
104. Performing a Selection Operation
SELECT * FROM Module WHERE C_Hrs = 4
Course No.
Course Title
CIS 120
MKT 333
ECO 473
BA201
CIS 345
Intro to CIS
Intro to Mkting
Labor Econ.
Intro to Stat.
Intro to Dbase
Course No.
Course Title
CIS 120
CIS 345
Intro to CIS
Intro to Dbase
C. Hrs. Dept. C
4
3
3
5
4
Module
Cis
MKT
ECO
ECO
CIS
C. Hrs. Dept. C
4 Cis
4 CIS
Result Table
105. Performing both Projection and Selection
SELECT Module_Title, C_Hrs FROM Module WHERE Dept_C =„CIS‟
Result Table
Module
C ours e_N o
C ours e_Title
C _ H rs . D ept_C
C IS 120
Intro to C IS
4 C IS
M K T 333
Intro to M k ting
3 MKT
E C O 473
B A 201
C IS 345
Labor E c on.
Intro to S tat.
Intro to D bas e
3 E CO
5 E CO
4 C IS
Cours e_Title
Intro to CIS
Intro to Dbas e
C_ Hrs .
4
4
106. Performing both Projection and Selection
• Basic SELECT Statement WHERE Clause Operators
•
•
=, <, >, <=, >=
IN (List)
•
•
•
BETWEEN min_val AND max_val
•
•
•
WHERE CODE IN („ABC‟, „DEF‟, „HIJ‟) - would return only rows with
one of those 3 literal values for the code attribute
WHERE Qty_Ord BETWEEN 5 and 15 - would return rows where
Qty_Ord is >= 5 and <= 15 - Works on character data using ascending
alphabetical order
LIKE “literal with wildcards” % used for multiple chars. _ single char.
•
WHERE Name LIKE „_o%son‟ - returns rows where name has o as the
2nd character and ends with son - Torgeson or Johnson
•
NOT
•
WHERE NOT Name = „Johnson‟ - would return all rows where name <>
Johnson - lowest priority in operator order
AND and OR, Use Parentheses to control order
•
107. Joining Tables
Joining Tables
•
To appropriately join tables, the tables must be related and we apply a
where clause which equates the primary key column of the table on the one
side of the relationship with the parallel foreign key column of the many side
table.
This type of join is called an Equi-join.
Our example will join Modules and departments where dept_code is the
linking “key” column.
•
The next series of slides takes you through a step by step process of
combining data rows from one table with data rows in another table.
•
The next slides show progressive steps in the join process.
•
The first slide introduces the SQL Select statement the shows the join
operation and a picture of the two tables that the join will operate on.
108. Joining Tables
Joining Two Tables - Select and Tables
SELECT * FROM Module C, department D WHERE D.Dept_Code = C.Dept_Code
Module
Course_No
Course_Title
C_Hrs Dept_Code
CIS 120
Intro to CIS
4 Cis
MKT 333
Intro to Mkting
3 MKT
ECO 473
BA201
CIS 345
Labor Econ.
Intro to Stat.
Intro to Dbase
3 ECO
5 ECO
4 CIS
Department
SQL will compare every row of the
1st table with the first row of the 2nd
table. Then it will compare all rows of
the 1st with the second row of the second,
and so on only rows where the condition
is met are placed in the result table.
D e p t C o d e D e p t nam e
O ffic e #
MK T
M arke ting
244
C IS
C o m p . Info . S ys .
302
ECO
E c o no m ic s
244
109. Joining Tables
Joining Two Tables - Row 1 Module to Row 1 Dept
SELECT * FROM Module C, department D WHERE D.Dept_Code = C.Dept_Code
Course_No
CIS 120
Intro to CIS
4 CIS
MKT 333
Module
Course_Title
C_Hrs Dept_Code
Intro to Mkting
3 MKT
ECO 473
BA201
CIS 345
Labor Econ.
Intro to Stat.
Intro to Dbase
3 ECO
5 ECO
4 CIS
No match so row not
placed in results
Department
Dept Code
MKT
CIS
ECO
Dept name
Office#
Marketing
244
Comp. Info. Sys.
302
Economics
244
RESULT TABLE
Course_No
Course_Title
C_Hrs Dept_Code
Dept_Nam e Office#
110. Joining Tables
Joining Two Tables - Row 1 Module to Row 2 Dept
SELECT * FROM Module C, department D WHERE D.Dept_Code = C.Dept_Code
Course_No
CIS 120
Intro to CIS
4 Cis
MKT 333
Module
Course_Title
C_Hrs Dept_Code
Intro to Mkting
3 MKT
ECO 473
BA201
CIS 345
Labor Econ.
Intro to Stat.
Intro to Dbase
3 ECO
5 ECO
4 CIS
Match on condition
causes a result row to
be produced.
Department
Dept Code
MKT
CIS
ECO
Dept name
Office#
Marketing
244
Comp. Info. Sys.
302
Economics
244
RESULT TABLE
Course_No
Course_Title
CIS 120
Intro to CIS
C_Hrs Dept_Code Dept_Name Office#
4 Cis
Comp. Info S
302
111. Joining Tables
Joining Two Tables - Row 1 Module to Row 3 Dept
SELECT * FROM Module C, department D WHERE D.Dept_Code = C.Dept_Code
Course_No
CIS 120
Intro to CIS
4 Cis
MKT 333
Module
Course_Title
C_Hrs Dept_Code
Intro to Mkting
3 MKT
ECO 473
BA201
CIS 345
Labor Econ.
Intro to Stat.
Intro to Dbase
3 ECO
5 ECO
4 CIS
Department
Dept Code
MKT
CIS
ECO
Dept name
Office#
Marketing
244
Comp. Info. Sys.
302
Economics
244
RESULT TABLE
Course_No
Course_Title
CIS 120
Intro to CIS
C_Hrs Dept_Code Dept_Name Office#
4 Cis
Comp. Info S
302
112. 5.0 Internal Management
Learning Objective
After completing this topic you will be able to :
Describe the various components of the computer
system that provide data storage facilities to a
DBMS
Understand how DBMS communicates with the
host system
Outline some of the database tuning factors
113.
Computer file management and DBMS
Computer files are stored in external media such as disks and
tapes.
• Direct access
• Sequential access
Input output of data and memory management is managed by the
Operating system
• File manager
DBMS
• Disk manager
File Request
DBMS/Host inter-com
File Manager
Logical
Page Req
Disk Manager
Physical
Page Access
114. Intercommunication
DBMS/Host communication :
•
A file is a collection of pages. A page is a unit of Input
Output.
•
The DBMS sends a file request to the file manager.
•
The file manager has no idea where the requested page is
physically stored.
•
The file manager in turn communicates with the disk
manager.
•
The file manager provides the database system with the
given page.
•
The database system converts the same into a logical form
as understandable by the user.
115. Tuning at the internal level
Indexes
•
•
•
Hashing
•
•
Database indexes are important means of speeding up access to
set of records. Especially in a relational database.
Index is very useful in existence tests.
Once a index is created it is transparent to the user.
Hashing is directly determining a page address for a given record
without the overhead of creating indexes.
The main problem associated with hashing are overflow &
underflow.
Clusters
•
•
Physically storing related pages in the form of intra file subsets.
Inter file clustering to store records from distributed databases in
the same physical page.
116. Internal Management : Summary
Database files are stored in logical page sets.
The underlying physical files that store a database need not map
to the logical representation of the DBMS.
Indexes are useful means of speeding up data access in large
databases . They incur overheads.
Hashed functions speed up individual record access, however
has overflow & underflow problems.
Intra and inter file clustering of the physical records speed up
certain operations at the cost of other types of data
manipulations.
117. 6.0 Database Trends
Learning Objective
–
At the end of this Topic you will be :
• Familiar with various terms like
• OLAP
• Data warehousing
• Data mining
• Aware of the business needs that require data to be analyzed in
multiple dimensions
119. Types of databases
• Major Types of Databases
Databases
centralis ed databases dis trib uted databases
network databases
120. Centralized database
Used by single central processor or multiple
processors in
client/server network
disk
CPU
printer
Disk Controller
Printer
Controller
Tape Drive
Tape drive
Controller
System bus
Memory Controller
Memory
122. Multidimensional data model
On-line analytical processing (OLAP)
•
Multidimensional data analysis
•
Supports manipulation and analysis of large volumes
of data from multiple dimensions/perspectives
123. Data warehouse
Supports reporting and query tools
Stores current and historical data
Consolidates data for management analysis and
decision making
124. Data warehouse
Data mart
•
•
Subset of data warehouse
Contains summarized or highly focused portion of
data for a specified function or group of users
Data mining
•
Tools for analyzing large pools of data
•
Find hidden patterns and infer rules to predict trends
125. Databases and the web
Hypermedia database
•
Organizes data as network of nodes
•
Links nodes in pattern specified by user
•
Supports text, graphic, sound, video and executable
programs
126. Databases and the web
Database server
•
Computer in a client/server environment runs a
DBMS to process SQL statements and perform
database management tasks
Application server
Software handling all application operations
127. Database Trends : Summary
The database forms the backend for any kind
of application architecture be it a client
server, distributed system such as the web
etc.
Users want to see data in as many
dimensions possible, therefore it is important
to be aware of concepts regarding Data
warehousing , Data mining and On-line
analytical processing (OLAP)
128. Database Fundamentals: Next
Step
Resource
Type
Description
Book
Case*Method: Entity Relationship
Modeling - Richard Barker
Book
Data & Databases – Joe Celko
Book
An Introduction to Database
Systems – C. J. Date
Book
The Data Modeling Handbook Rein Gruber and Gregory
Book
Data Modeling for Information
Professionals – Bob Schmidt
Book
Data Model Patterns – David C.
Hay, Richard Barker
Reference Topic or
Topic