SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
Christian Reina, CISSP, CISA
2010
MET CS669
Version 1.0
Table of Contents
Chapter 1: DB Systems....................................................................................................... 3
Chapter 2: Data Models ...................................................................................................... 7
Chapter 3: Relational DB Model ...................................................................................... 11
Chapter 4: Entity Relationship (ER) Modeling ................................................................ 14
Chapter 5: Normalization of DB tables ............................................................................ 18
Chapter 6: Advanced Data Modeling ............................................................................... 21
Chapter 8: Advanced SQL................................................................................................ 26
Chapter 9: Database Design.............................................................................................. 27
Chapter 10: Database Design............................................................................................ 32
Chapter 11 DB Performance Tuning & Query Optimization........................................... 37
Chapter 12: Distributed database management system .................................................... 42
Chapter 13: Business Intelligence and Data Warehouses................................................. 47
Chapter 1: DB Systems
(Section 1.1) Data vs. Info
Data: Raw Facts, constitutes the building block of information.
Information: The result of processing raw data to reveal its meaning.
Knowledge: The body of info and facts about a specific subject. Key characteristic is new knowledge can
be derived from old knowledge.
Data Management: A discipline that focuses on the proper generation, storage, and retrieval of data.
(Section 1.2) Introducing the Database and the DBMS
Database: A shared, integrated computer structure that stores a collection of:
 End-user data that is raw facts of interest to the end user.
 Metadata or data about data, through which the end-user data are integrated and managed.
Example, the metadata component stores info such as the name of each data element, the type of
values (numeric, dates or text) stored on each data element, whether or not the data element can be
left empty, and so on.
 Collection of self-describing data:
 A well-designed db facilitates data management and becomes a valuable info generator. A poorly
designed db is likely to lead to errors in processing data and to bad decisions.
The most popular way to classify db is by the use and timeliness of the data:
Production DB: Contains up-to-the-minute real world info
Data warehouse: stores data for making decisions
(Section 1.2.1) Role and advantages of the DBMS
Database Management System (DBMS): Collection of programs that manages the database structure and
controls access to the data stored in the database. Helps manage the cabinet‘s contents. Advantages are
improved data sharing, data security, and data integration, minimized data inconsistency, improved data
access, improved decision making, and increased end-user productivity.
Query: A specific request issued to the DBMS for data manipulation to read or update the data. A query is
a question.
Ad hoc query: is a spur of the moment questions.
Query result set: is when the DBMS sends back an answer to the application.
(Section 1.2.2) Types of databases
DBMS can support many different types of databases:
(Number of users)
Single-user database: supports only one user at a time.
Desktop Database: A single user database that runs on a personal computer.
Multi user database: supports multiple users at the same time.
Workgroup Database: Multi user database supports a relatively small number of users (usually fewer than
50) or a specific department within an org.
Enterprise Database: Database is used by the entire organization and supports many users across many
departments. Usually in the hundreds.
(Db site location)
Centralized Database: It supports data located at a single site.
Distributed Database: It supports data distributed across several different sites.
(Db use)
Operational or Transactional or Production Database: A db that is designed primarily to support a
company‘s day to day operations.
Data warehouse: focuses primarily on storing data used to generate information required to make tactical
or strategic decisions. They derive most of their data from production db‘s.
Unstructured data: Data that exist in their original raw state.
Structured data: The result of taking unstructured data and formatting or structuring such data to
facilitate storage, use, and the generation of info.
Semi-structured data: data that have already been processed to some extent or prearranged.
XML: Language used to represent and manipulate data elements in a textual format.
(Section 1.3) Why Database design is important
DB Design: Refers to the activities that focus on the design of the database structure that will be used to
store and manage end user data. A well designed database facilitates data management and generates
accurate and valuable info. A poorly designed db will likely become a breeding ground for redundant data
and data anomalies.
(Section 1.4) Historical roots: files and file systems
 An understanding of the relatively simple characteristics of file systems makes the complexity of
database design easier to understand.
 An awareness of the problems that plagued file systems can help you avoid those same pitfalls
with DMBS software.
 If you intend to convert an obsolete file system to a database system, knowledge of the file
systems basic limitations will be useful.
Data Processing (DP) Specialists: Created the necessary computer file structures, often wrote the software
that managed the data within those structures, and designed the app programs that produced reports based
on the file data. As files increased they would have to hire more Data processing Specialists for
accommodations and the original DP specialist would become the DP manager.
(Section 1.5) Problems with files system and data management
Making changes in a existing structure can be difficult in a file system environment.
1) Reads a record from the original file
2) Transforms the original data to conform to the new structures storage requirements.
3) Writes the transformed data into the new file structure.
4) Repeats steps 2 to 4 for each record in the original file.
 It requires extensive programming for pulling records, deleting, and updating.
 It can not perform ad hoc queries.
 System admin can be complex and difficult when records or files expand.
 It is difficult to make changes to existing structures.
 Security features are likely to be inadequate.
 Each file typically required its own set of data management programs.
 Many files would suffer from data redundancy leading to inconsistencies, anomalies, and lack of
data integrity.
 A mature files based data system might require hundreds of thousands of programs.
These limitations lead to problems of structural and data dependency.
(Section 1.5.1) Structural and data dependence
Structural dependence: A file system exhibits this, which means that access to the file is dependent on its
structure. For example adding a customer DOB to the Customer file would require the 4 steps described in
section 1.5.
Structural independence: Exists when it is possible to make changes in the file structure without affecting
the app programs ability to access the data.
Data dependence: When data access programs are subject to change when any of the files data storage
characteristics change, (That is, changing the data type). Makes the file system cumbersome.
Data independence: Exists when it is possible to make changes in the data storage characteristics without
affecting the application programs ability to access the data.
Logical data format: How the human views the data.
Physical data format: How the computer must work with the data.
Any program that accesses a file system‘s file must tell the computer what to do and how to do it.
(Section 1.5.2) Field definitions and naming conventions
Be descriptive in the field names but be aware of DBMS character length restrictions: Example REN
should be CUS_RENEW_DATE
(Section 1.5.3) Data Redundancy
Islands of Info: They contain different versions of the same data. It‘s the storage of the same basic data in
different locations.
Redundant Data: Source of difficult-to-trace info errors. It‘s when the same data about the same entity is
kept in different locations. It can result in storage of different values for the same attribute of the same
entity. They are a result of a poorly designed db which can lead to poor decision making.
Data Redundancy: Exists when the duplicated data are stored unnecessarily at different places. They are a
result of a poorly designed db which can lead to poor decision making.
Data Integrity: Condition in which all of the data in the db are consistent with the real-world events and
conditions. In other worlds, data integrity means that:
 Data are accurate—there are no data inconsistencies
 Data are verifiable—the data will always yield consistent results
Data Anomaly: Develops when all of the required changes in the redundant data are no made successfully.
(Section 1.6) Db systems
DBMS provides numerous advantages over file system management by making it possible to eliminate
most of the files systems data inconsistency, data anomaly, data dependency, and structural dependency
problems.
(Section 1.6.1) The Db system environment
Database: Organization of components that define and regulate the collection, storage, management, and
use of data within a database environment. It‘s composed of 5 major parts Hardware, Software, People,
Procedures, and Data.
(Jobs in the db field)
 DB Admin: Focused on individual db and DBMSs & strong technical skills in specific DBMSs.
 Data Admin: Plans for db and technology, sets standards for data (Privacy & risk of loss), works
with computerized and non-computerized dbs.
 DB Modeler/Analyst/Designer/Programmer: Responsible for design & implementation of db
and the app systems that interface with a DBMS. Modeler’s primary responsibility is gathering
the data requirements and representing them in the data model. Designer may participate in the
modeling, and translates the model into an operational db, often with the assistance of system and
storage admins. App Analysts gather, doc and coordinate the app and user requirements.
Programmers write the software apps, based on the application and data requirements.
(Section 1.6.2) DBMS functions
 Data Dictionary management: DBMS stores data elements & their relationships (metadata in a
data dictionary. DBMS uses the Data dictionary to look up the required data component structures
and relationships, thus relieving you from having to code such complex relationships in each
program. Any changes made in a db structure are automatically recorded in the data dictionary.
 Data storage management: DBMS provides storage not only for the data but also for the related
data entry forms or screen definition, report definition, data validation rules, procedural code,
structures to handle video and pic formats. It‘s also important for Performance Tuning, which
relates to the activities that make the db perform more efficiently in terms of storage and access
speed. (DBMS creates the complex structures required for data storage)
 Data transformations & presentation: When the DBMS formats the physically retrieved data to
make it conform to the users logical expectations. (DBMS transforms entered data to conform to
the data structures)
 Security management: The DBMS creates a security system that enforces user security and data
privacy. They determine which users can access the db, which data items each user can access,
and which data operations (read, add, delete, or modify) the user can perform. This is important
during multi-user mode. (DBMS creates a security system and enforces security within that
system)
 Multi-user access control: To provide data integrity and data consistency, the DBMS uses
sophisticated algorithms to ensure that multiple users can access the db concurrently without
compromising the integrity of the db. (DBMS allows multiple users to have concurrent access to
the data)
 Backup & Recovery: The DBMS provides backup and data recovery to ensure data safety and
integrity. Current DBMS systems provide special utilities that allow the DBA to perform routine
and special backup and restore procedures. (DBMS performs backup and data recovery procedures
to ensure data safely)
 Data integrity management: The DBMS promotes and enforces integrity rules, thus minimizing
data redundancy and maximizing data consistency. The data relationships stored in the data
dictionary are used to enforce data integrity. Ensuring data integrity is especially important in
transaction-oriented db systems. (DBMS promotes and enforces integrity rules to eliminate data
integrity problems)
 Db access languages & application programming interfaces: The DBMS provides data access
through a query language. A Query Language is a nonprocedural language that lets the user
specify what must be done without having to specify how it is to be done. Structured Query
Language (SQL) is the de facto query language and data access standard supported by the
majority of DBMS vendors. (DBMS provides access to the data via utility programs and
programming language interfaces)
 Db comm Interfaces: DBMS accept en-user requests via multiple, different network
environments. For example, DBMS might provide access to the database via the internet through
the use of firefox or IE. DBMS and can automatically publish predefined reports on a web-site.
DBMS can connect to third party systems to distribute info via email or other apps. (DBMS
provides access to data within a computer network environment)
(Section 1.6.3) Managing the db system: A shift in focus
Db systems significant disadvantages:
 Increased costs
 Management complexity:
 Maintaining currency:
 Vendor dependence:
 Frequent upgrade/replacement cycles:
---------------------------------------------------------------------------------------------------------------------------------
Chapter 2: Data Models
(Section 2.1) Data modeling and data models
Data modeling: The first step in db design refers to the process of creating a specific data model for a
determined problem domain. This is an iterative process.
Problem Domain: Is a clearly defined area within the real world environment.
Data Model: Collection of concepts that can be used to describe the structure of a db. Its main function is
to help us understand the complexities of the real world environment. It facilitates comm. between users, db
designers, and app programmers. There are 3 categories of data models:
 High level or conceptual data models, which are based on entities (objects) and relationships.
 Low level or physical data modes, which are specific to particular DBMS such as Oracle.
 Representational or implementation data models, which are also termed logical data models.
(Section 2.2) The importance of data models
When a good db blueprint is not available, problems are likely to happen. Data models are like blue prints
and they are an abstraction.
(Section 2.3) Data model basic building blocks
The basic building blocks of all data models are entities, attributes, and relationships.
 Entity: Is anything, such as a person, place, thing, idea, or event, about which data are to be
collected and stored. They can be physical such and customers or products or abstractions such as
flight routes or accounts.
 Attributes: are equivalent to fields for an entity such as a CUSTOMER. They can be Fname,
Lname, etc.
 Relationship: describes an association among two or more entities. For example, ―An AGENT
can serve many CUSTOMERS, and each CUSTOMER may be served by one AGENT‖. Data
models use 3 types of relationship:
o One-to-many (1:M): painter paints many different paintings, but each one is painted by
only one painter.
o Many-to-many (M:N): a student can take many classes and each class can be taken by
many students.
o One-to-one (1:1): each store employee only manages one store.
Entity-relationship model (ERM): helps identify the db‘s main entities and their relations. They are
graphically represented so it‘s more easily understood by users and designers.
Entity-relationship Diagram (ERD): Chen model and Crows foot model.
Constraints: Restrictions placed on the data. They help to ensure data integrity.
(Section 2.3) Business rules
Business rule: is a brief, precise, and unambiguous description of a policy, procedure, or principle within a
specific org. They are used to define entities, attributes, relationships, and constraints.
(Section 2.4.1) Discovering business rules
The process of id and doc business rules is essential to db design for several reasons:
 They help standardize the company‘s view of data
 They can be comm. Tool between users and designers.
 They allow the designer to understand the nature, role, and scope of the data.
 They allow the designer to understand business processes.
 They allow the designer to develop appropriate relationship participation rules and constraints
and to create an accurate data model.
(Section 2.4.2) Translating business rules into data model components
A noun is a business rule translating into an entity in the model, and a verb (active or passive) associating
nouns will translate into a relationship among the entities. Example ―a customer may generate many
invoices‖ contains two nouns (customer & invoices) and a verb ―generate‖ that associates the nouns.
To id relationship type, you should ask two questions:
 How many instances of B are related to one instance of A?
 How many instances of A are related to one instance of B?
(Section 2.5) The evolution of data models
Evolution of Major Data Models Table 2.1 page 35
(Section 2.51) The hierarchical model
Hierarchical: model developed in 1960s to manage large amounts of data for complex manufacturing
projects such as the Apollo rocket that landed on the moon in 1969. Logical structure is depicted by an
upside-down tree. Disadvantages were: too complex to implement, it was difficult to manage, and it lacked
structural independence. No standards for how to implement the model. Record based model.
Segment: equivalent of a file system‘s record type.
(Section 2.5.2) Network model
Network model: was created to represent complex data relationships more effectively than the hierarchical
model, to improve db performance, and to impose a db standard. Its disadvantages were limited data
independence and lack of ad hoc query capability. Record based model class.
To help establish db standards, the Conference on Data Systems Languages (CODASYL) created the
Database Task Group (DBTG) in the late 1960s. The DBTG report contained specifications for 3 crucial
db components.
 Schema: It includes a definition of the db name, the record type for each record, and the
components that make up those records.
 Subschema: The existence of subschema definitions allows all app programs to simply invoke the
subschema required to access the appropriate db files.
 Data management language (DML): defines the environment in which data can be managed. To
produce the desired standardization for each of the three components, the DBTG specified three
distinct DMB components:
o Schema Data definition language (DDL) enables the db admin to define the schema
components.
o Subschema DDL allows the app program to define the db components that will be used
by the app.
o Data manipulation language to work with a data in the db.
Network model allows a record to have more than one parent unlike the hierarchical model. In network db
terminology a relationship is called a Set and each Set is composed of at least 2 record types.
(Section 2.5.3) The relational model
Relational model: introduced in 1970 by E.F. Codd of IBM. You can think Relations (or Table) as a
matrix composed of intersecting rows and columns. Each row in a relation is called a Tuple. They are
record based models.
Relational db management system (RDBMS): It performs the same basic functions provided by the
hierarchical and network DBMS systems, but other functions that make the relational data model easier to
understand and implement. Its disadvantages were that it did not examine structures graphically.
 Most important advantage is its ability to hide the complexities of the relational model from
the user. It manages all the physical details as the user sees the relational db as a collection of
tables in which data are stored.
 RDBMS uses SWL to translate user queries into instructions for retrieving the requested data.
There is one crucial difference between a table and a file: the table yields complete data and structural
independence because it is a purely logical structure.
Any SQL based relational db app involves 3 parts:
 End-user interface: Allows the end user to interact with the data.
 Tables: Tables are independent of each other which hold the data.
 SQL engine: that tells what must be done but not how it must be done. It does all the work in the
back-ground, such as executing queries or data requests.
(Section 2.5.4) The entity relationship model
ERM: Peter Chen introduced ERM in 1976. It‘s the graphical representation of entities and their
relationships in a db structure. ERM is represented in ERD. The ER model is based on the following
components: Object based models.
 Entity: each row in the relational table is known as an entity instance or entity occurrence in the
ER model. Each entity is described by a set of attributes that describes particular characteristics of
the entity.
 Relationships: they describe associations among data. Most relationships describe associations
between two entities. ER model uses the term connectivity to label the relationship types. The
name of the relationship is an active or a passive verb.
(Section 2.5.5) The Object Oriented (OO) Model
OODM: both data and their relationships are contained in a single structure known as an object. In turn,
the OODM is the basis for OODBMS. Unlike an entity, an object includes info about relationships
between the facts within the object, as well as info about its relationship with other objects. OODM is said
to be Semantic Data Model because semantic indicates meaning. The OODM is based on the following
components.
 An object is an abstraction of a real world entity.
 Attributes describe the properties of an object.
 Objects that share similar characteristics are grouped in classes. A class is a collection of similar
objects with shared structure (attributes) and behavior (Methods). Methods represent a real world
action such as finding a selected PERSON‘s name, changing a PERSON‘s name, or printing a
PERSON‘s address. Methods are like procedures. In OO methods are defined as behaviors.
 Classes are organized in a class hierarchy which represents an upside-down tree where each class
only has one parent.
 Inheritance is the ability of an object within the class hierarchy to inherit the attributes and
methods of the classes above it
 Disadvantage is steeper learning curve.
(Section 2.5.6) The convergence of data models
Extended relational data model (ERDM): It‘s semantic and it‘s described as object/relational db
management system (O/RDBMS). It‘s primarily geared to business apps, while the OODM tends to focus
on very specialized engineering and scientific apps.
The traditional entity-relationship model and the most important features of object-oriented models have
been combined in the extended (or enhanced) entity-relationship model (EERM).
(Section 2.5.7) Database models and the internet
(Section 2.5.8) Data Models: Summary
Advantages and disadvantages of db modes depicted on page 47.
Data model basic terminology comparison on page 48.
(Section 2.6) Degrees of data abstraction
A db designer starts with an abstract view of the overall data environment and adds details as the design
comes closer to implementation. The design of a DB can be divided into 4 four models with decreasing
level of abstraction: Conceptual, Internal, External, & Physical.
American National Standards Institute (ANSI): Standards Planning and Requirements Committee
(SPARC) defined a framework for data modeling based on degrees of data abstraction. They define three
levels of data abstraction: External, Conceptual, and internal.
(Section 2.6.1) External model
External Model: is the end users view of the data environment.
External Schema: It‘s a specific representation of an external view.
External Views advantages:
 It makes it easy to id specific data required to support each business unit‘s ops.
 It makes the designer‘s job easy by providing feedback about the model‘s adequacy.
 It helps to ensure security constraints in the db design.
 It makes app program development much simpler.
(Section 2.6.2) The conceptual Model
Conceptual Model: represents a global view of the entire db as viewed by the entire org. That is, the
conceptual model integrates all external views (entities, relationships, constraints, and processes) into a
single global view of the entire data in the enterprise. Also know as Conceptual Schema as it is the basis
for the id and high level description of the main data objects. The most widely used conceptual model is the
ER model, which is the basic db blueprint. ERD is used to graphically represent the conceptual schema.
Advantages of Conceptual Models:
 It provides a relatively easily understood bird‘s-eye view of the data environment
 Is independent of both software and hardware.
Software independence: model does not depend on the DBMS software used to implement the model.
Hardware independence: model does not depend on the hardware used in the implementation of the
model.
Logical design: used to refer to the task of creating a conceptual data model that could be implemented in
any DBMS.
(Section 2.6.3) The internal model
Internal model: is the representation of the db as ―seen‖ by the DBMS. It requires the designer to match
the conceptual models characteristics and constraints to those of the selected implementation model.
Internal model is software-dependent but hardware-independent because it is unaffected by the choice of
the computer on which the software is installed.
Internal schema: depicts a specific representation of an internal model, using the db constructs supported
by the chosen db.
Logical independence: When you can change the internal model without affecting the conceptual model.
(Section 2.6.4) The physical model
Physical model: operates at the lowest level of abstraction, describing the way data are saved on storage
media. It is software and hardware dependent. Physical model is dependent on the DBMS.
Physical independence: when you can change the physical model without affecting the internal model.
Summary on page 52.
---------------------------------------------------------------------------------------------------------------------------------
Chapter 3: Relational DB Model
(Section 3.1) A Logical View o Data
The relational data model allows designer to focus on the logical representation of the data and its
relationships, rather than on the physical storage details. Like an automatic transmission. The relational db
provides the advantages of structural and data independence. The relational model was introduced by Ted
Codd of IBM Research in 1970. Record based.
(Section 3.1.1) Tables and their characteristics
1) Table is perceived as a 2 dimensional structure of rows and columns,
2) Each table row represents a single entity occurrence within the entity set,
3) Each table column represents an attribute, and each column has a distinct name,
4) Each row/column intersection represents a single data value,
5) All values in a column must conform to the same data formant,
6) Each column has a specific range of values known as the attribute domain,
7) The order of the rows and columns is immaterial to the DBMS,
8) Each table must have an attribute or a combination of attributes that uniquely id each row.
Most DBMSs support the following data types.
Numeric: Anything concerned with arithmetic‘s.
Characters: Text data or string data.
Date: Date attributes contain calendar dates stored in a special format known as the Julian data format. It
allows you to do a special kind of arithmetic known as Julian date arithmetic.
Logical: Logical data can have only a true or false (yes or no) condition.
Domain: Is the column‘s range of permissible values.
Primary Key: Each table must have one, PK is an attribute or (combination of attributes) that uniquely ids
any given row. It‘s a unique identifier. No duplicate values are allowed. PK generally cannot be changed.
(Section 3.2) Keys
Keys are important in a relational model because they are used to ensure that each row in a table is
uniquely identifiable. They are used to establish relationships among tables and to ensure the integrity of
the data. A Key is consists of one or more attributes that determine other attributes. Keys are used to id
specific occurrences of entities within an entity group.
Determination: A keys role is based upon this. For example if A determines B, C, and D, than A 
B,C,D, for example STU_NUM determines STU_LNAME. This principle is important because it is used in
the definition of a central relation db concept known as functional dependence.
Functional Dependence: The attribute B is functionally dependent on the attribute A if each value in
column A determines one and only one value in column B.
Composite Key: It‘s a multi-attribute key which can be composed of more than one attribute.
Key Attribute: Any attribute that is part of a key.
Full Functional Dependence: If attribute (B) is functionally dependent on a composite key (A) but not on
any subset of that composite key, the attribute (B) is fully functionally dependent on (A).
Superkey: Any key that uniquely id each row. It functionally determines all of a row‘s attributes.
Candidate key: can be described as a superkey without unnecessary attributes, that is, a minimal superkey.
If there was a STU_SSN and STU_NUM =, both would be candidate keys because either one would
uniquely id each student.
Entity Integrity: It‘s when all the rows in a table can be uniquely id by a primary key. To maintain entity
integrity, a null (that is, no data entry at all) is not permitted in the primary key.
Null: no value at all. It does not mean 0 or a space. A null is created when you press the Enter Key or the
Tab key to move to the next entry without making a prior entry of any kind. The can never be part of a
primary key and they should be avoided in other attributes. The existence of nulls in a table is often an
indication of poor db design. A null can represent an unknown attribute value, a known, but missing
attribute value, or a ―not applicable‖ condition.
Relational Schema: Textual representation of the database tables where each table is listed by its name
followed by the list of its attributes in parentheses.
Foreign Key: Attribute or combination of attributes in one table whose values match the primary key
values in the related table or be null. Example, VEND_CODE is the primary key in VENDOR table and it
occurs as a FK in the PRODUCT table. You can logically relate data from multiple tables using FK. FK are
based on data values and are purely logical, not physical, pointers. A FK value must match an existing PK
value or unique key value, or else be NULL.
Referential Integrity: If the FK contains a value, that value refers to an existing valid tuple or row in
another relation. It‘s maintained between the PRODUCT and VENDOR tables. To maintain Referential
Integrity, the FK must contain values only found in the other table, or null values to indicate that the rows
are not linked.
Secondary Key: Is used strictly for data retrieval purposes. It‘s not the Customer number but it can be a
combination of attributes such as customer phone and last name. Buts it‘s not always entirely unique. It‘s
another way of narrowing down a search when you don‘t know the unique customer number.
(Section 3.3) Integrity Rules
Entity Integrity:
 Requirement: All primary key entries are unique, and no part of a primary key may be null.
 Purpose: each row will have a unique id, and foreign key values can properly reference primary
key values.
 Example: No invoice can have a duplicate number, nor can it be null. In short, all invoices are
uniquely id by their invoice number.
Referential Integrity:
 Requirement: A foreign key may have either a null entry, as long as it is not a part of its table‘s
primary key, or an entry that matches the primary key value in a table to which it is related. Every
non-null foreign key value must reference an existing primary key value.
 Purpose: it is possible for an attribute not to have a corresponding value, but it will be impossible
to have an invalid entry. The enforcement of the referential integrity rule makes it impossible to
delete a row in one table whose primary key has mandatory matching foreign key values in
another table.
 Example: A customer might not have an assigned sales rep (number), but it will be impossible to
have an invalid sales rep (number).
Flags are used to indicate the absence of some values. It‘s a trick to avoid using nulls.
(Section 3.4) Relational set operators
Relational Algebra: Defines the theoretical way of manipulating table contents using the eight relational
operators: Select, Project, Join, Intersect, Union, Difference, Product, and Divide.
Closure: The use of relational algebra operators on existing tables (relations) produces new relations.
UNION: Combines all rows from two tables, excluding duplicate rows. Tables must have the same
attribute characteristics (columns and domains must be id) to be used in the UNION. When two or more
tables share the same number of columns, when the columns have the same names, and when they share the
same (or compatible) domains, they are said to be Union-Compatible.
INTERSECT: Yields only the rows that appear in both tables. Cannot intersect if one of the attributes is
numeric and one is character based. The tables must be union compatible.
DIFFERENCE: Yields all rows in one table that are not found in the other table. It subtracts one table
from the other. The tables must be union compatible.
PRODUCT: Yields all possible pairs of rows from two tables – also known as the Cartesian product. If
one table has 6 rows and the other table has 3 rows, the PRODUCT yields a list composed of 6x3=18.
SELECT: AKA RESTRICT, yields values for all rows found in a table that satisfy a given condition. It‘s
used to list all the row values, or it can yield only those row values that match a specified criterion.
PROJECT: Yields all values for selected attributes. It yields a vertical subset of a table.
JOIN: It allows info to be combined from 2 or more tables. It‘s the real power behind the relational db,
allowing the use of independent tables linked by common attributes.
Natural Join: Links tables by selecting only the rows with common values in their common attributes. It‘s
the result of the three staged process.
Join Columns: or common columns.
Equijoin: another form of join that links tables on the basis of an equality condition that compares
specified columns of each table. Equijoin takes its name from the equality comparison operator (=) used in
the condition. If any other comparison operator is used, the join is called a Theta Join, They are less
common than Equijoin and they represent inequalities.
Outer Join: The matched pairs would be retained and any unmatched values in the other table would be
left null. If an outer join is produced, 2 scenarios are possible:
 Left Outer Join: Yields all of the rows in the CUSTOMER table, including those that do not have
a matching value in the AGENT table.
 Right Outer Join: Yields all of the rows in the AGENT table, including those that do not have
matching values in the CUSTOMER table.
DIVIDE: uses on single-column table as the divisor and one 2-column table as the dividend. The tables
must have a common column.
(Section 3.5) The data dictionary and the system catalog
Data Dictionary: Provides a detailed description of all tables found within the user/designer-created db. It
contains all attribute names and characteristics for each table. It contains metadata. Sometimes referred to
as ―the db designer‘s db‖.
System Catalog: It contains metadata. It‘s a detailed system data dictionary that describes all objects
within the db. It contains more info than the data dictionary. Created by the dbms.
Homonyms: Similar or identically sounding words with different meanings, Such as boar and bore. It
indicates the use of the same attribute name to label different attributes. This should be avoided though.
Synonym: Is the use of different names to describe the same attribute. For example a car and auto refer to
the same thing. This should also be avoided in db design.
(Section 3.6) Relationships within the Relational db
1:M is the relational modeling ideal
1:1 should be rare in any relational db design.
M:N cannot be implemented as such in the relational model.
(Section 3.6.1) 1:M Relationship Page 80 review
(Section 3.6.2) 1:1 Relationship Page 82 review
1:1 sometimes means that the entity components where not defined properly. It could indicate that two
entities belong in the same table. They should be rare but certain conditions require their use. There is a
great example of 1:1 relationship regarding employees in page 84.
(Section 3.6.3) M:N Relationship Page 84 review
M:N is no supported directly in the relational environment. However, M:N relationships can be
implemented by creating a new entity in 1:M relationships with the original entities. The problems inherent
in M:N relationships can be avoided by creating Composite entity (or bridge entity or associative entity)
Linking Table is the implementation of a composite entity.
(Section 3.7) Data Redundancy Revisited
Relational dbs makes it possible to control data redundancies by using common attributes that are shared by
tables, called foreign keys. Although the use of FK does not totally eliminate data redundancies because the
FK values can be repeated many times, the proper use of FK minimizes data redundancies, thus minimizing
the chance that destructive data anomalies will develop.
Db designers must reconcile three often contradictory requirements: Design elegance, Processing speed,
and Info requirements.
As important as data redundancy control is, there are times when the level of data redundancy must actually
be increased to make the db serve crucial info purposes. Illustrated on page 89.
(Section 3.8) Indexes
Index: An orderly arrangement used to logically access rows in a table. It is composed of an index key and
a set of pointers. An index is an ordered arrangement of keys and pointer, each key points to the location of
the data identified by the key. Indexes play an important role in DBMSs for the implementation of primary
keys. When you define a table‘s primary key, the DBMS automatically creates a unique index on the
primary key column you declared. A table can have many indexes, but each index is associated with only
one table. If a table is dropped so will be the index that was created for it.
Index Key: Is the index‘s reference point. It can also be composed of one or more attributes.
Unique Index: is an index in which the index key can have only one pointer value (row) associated with it.
They are used to enforce uniqueness constraints.
(Section 3.9) CODD’S Relational db rules
In 1985 DR. E.F. Codd published a list of 12 rules to define a relational db system. The 12 rules are located
on page 92. Not all db vendors fully support all 12 rules.
---------------------------------------------------------------------------------------------------------------------------------
Chapter 4: Entity Relationship (ER) Modeling
(Section 4.1) Entity Relationship Model (ERM)
Conceptual models are used in the conceptual design of dbs, while relational models are used in the logical
design of dbs. ERM is a conceptual model. OO.
ERD should be developed prior to building a db & it depicts the dbs main components such as entities,
attributes, and relationships, and relationship types. There are various notations used with ERD such as
Chens notation, Crows foot, UML notations.
 Chens notation favors conceptual modeling.
 The Crows foot notation favors a more implementation-orientated approach.
 The UML notation can be used for both conceptual and implementation modeling.
What role does the ER diagram play in the design process?
The ER diagram must reflect an organization's operations accurately if the database is to meet that
organization's data requirements. The completed ER diagram forms the basis for design review processes
that verify whether the included entities are appropriate and sufficient, whether the attributes found within
those entities are needed and correct, and whether the relationships between those entities are needed and
correctly represented. The ER diagram is also used as a final crosscheck against the proposed data
dictionary entries. The ER diagram helps the database designer communicate more precisely with those
who most completely understand the business data requirements. Finally, the completed ER diagram serves
as the implementation guide to those who create the actual database. Many ERD software tools can
generate the SQL statements to produce the tables represented in the ERD. In short, the ER diagrams are as
important to the database designer as blueprints are to architects and builders.
Why is data modeling so important to the database designer?
We are said to live in the information age, and data constitute the most basic information units employed by
an information system. Data modeling provides a way to reconcile the very different end-user views of the
nature and roles of data.
A data model is an abstraction that provides an easily understood representation of complex real-world data
structures. A data model helps us understand, communicate and document the complexities of a real-world
data environment. Such understanding yields useful solutions to the problems inherent in creating,
organizing, using, and managing data.
If a database is to be useful and flexible, it must be well designed. The database design process must be
based on an appropriate data model if it is to yield a proper database design blueprint.
(Section 4.1.1) Entities
The world entity in the ERM corresponds to a table, not to a row, in the relational environment.
The ERM refers to a table row as an entity instance or entity occurrence. Entity name is a noun with the
shape of a rectangle written in cap letters. It‘s a person, place, tings, or shared ideas about which data are
collected and stored. They are the basic building blocks of a relational db.
Entity Sets: Entities that are often grouped according to common attributes. They are stored in tables.
(Section 4.1.2) Attributes
Attributes are characteristics that describe entities. Example STUDENT entity includes STU_LNAME
STU_FNAME and so on. They are oval shaped in Chen model. They are columns that represent a
characteristic of the entity.
Required and Optional Attributes:
Required Attributes: is an attribute that must have a value. I cannot be left empty.
Optional Attribute: is an attribute that does not require a value, therefore it can be left empty.
Domains: Attributes have a domain. A domain is the set of possible values for a given attribute. The
domain for the GPA attribute is written (0,4) because the lowest possible GPA value is 0 and the highest is
4. Domain for male and female is M or F. Attributes may share a domain.
Identifiers: One or more attributes that uniquely id each entity instance. They are underlined in ERD and
are mapped to PK.
Composite Identifiers: A PK composed of more than one attribute.
Composite and simple attributes: Attributes are classified as simple or composite.
 Composite attribute: An attribute that can be further sub-divided to yield additional attributes.
Example ADDRESS can be sub-divided into Street, City, & Zip.
 Simple Attribute: It cannot be sub-divided. Example Age, Sex, and Marital Status.
Single Valued Attributes: Is an attribute that can have only a single value. Example a person can only
have one SSN. It‘s not always a simple attribute.
Multivalued Attributes: They can have many values. Example is a person can have several college
degrees. They are a double line in the Chen notation. It‘s not identified in crows foot notation. Although the
conceptual model can handle M:N relationships and multivalued attributes, you should not implement them
in the RDBMS.
Implementing Multivalued Attributes: They should not be implemented in RDBMS. There is one or two
possible course of action with multivalued attributes in a relational table.
 Splitting the multivalued attribute into new attributes
 Create a new entity composed of the original multivalued attribute components. Its preferred.
Derived Attributes (Computed Attributes): Attribute whose value is calculated (derived) from other
attributes. It‘s a dashed line in Chen notation. It‘s derived by using algorithms. Example INT ((DATE() –
EMP_DOB)/365) to calculate the age of the person. Advantages & disadvantages of storing derived
attributes:
Adv Stored: Saves CPU processing cycles, Saves Data access time, Data value is readily available, Can be
used to keep track of historical data. Adv-Not Stored: Saves storage space and computation always yields
current value.
Dis-adv Stored: Requires constant maintenance to ensure derived value is current, especially if any values
used in the calculation change. Dis-Adv Not Stored: Uses CPU processing cycles, increases data access
time, and Adds coding complexity to queries.
(Section 4.1.3) Relationships
A relationship is an association between entities. The entities that participate in a relationship are also
known as participants. Relationships name is an active or passive verb; for example; a STUDENT takes a
CLASS. They operate in both directions.
(Section 4.1.4) Connectivity and Cardinality
Connectivity: It‘s used to describe the relationship classification.
Cardinality: It expresses the min and max number of entity occurrences associated with one occurrence of
the related entity. ERD depicts it as (1,4) 1 being MIN and 4 being MAX.
Connectivities and cardinalities are generally based on business rules and must consider the data
environment, transactions, and information requirements.
(Section 4.1.5) Existence Dependence
Existence-Dependent: it can exist only when it is associated with another related entity occurrence.
Example EMPLOYEE claims DEPENDENT. The entity DEPENDENT is clearly existence-dependent on
the EMPLOYEE entity because it is impossible for the dependent o exist apart form the EMPLOYEE in the
db.
Existence-Independent (strong or regular): If an entity can exist apart from one or more related entities.
(Section 4.1.6) Relationship Strength
Entities that are existence-independent on another entity are said to have weak or non-identifying
relationships. The concept of relationship strength is based on how the PK of a related entity is defined.
Weak or (Non-identifying) Relationships: Exists if the PK of the related entity does not contain a PK
component of the parent entity. COURSE (CRS_CODE, DEPT_CODE)
CLASS (CLASS_CODE, CRS_CODE) Dashed line ------Crows foot
Strong or identifying relationships exist when the entities are existence-dependent.
Strong or (Identifying) Relationships: Exists when the PK of the related entity contains a PK component
of the parent entity. COURSE (CRS_CODE, DEPT_CODE)
CLASS (CRS_CODE, CLASS_SECTION) Solid line _____ Crows foot
(Section 4.1.7) Weak Entities
Weak Entity is one that meets two conditions: The entity is existence-dependent or the entity has a
primary key that is partially or totally derived from the parent entity in the relationship.
EMPLOYEE has DEPENDENT, DEPENDENT is weak because it cannot exist without
EMPLOYEE. Its shape is a double rectangle. Weak entity inherits part of its PK from its
strong counterpart. EMPLOYEE (EMP_NUM
DEPENDENT (EMP_NUM, DEP_NUM
(Section 4.1.8) Relationship Participation
It‘s either optional or mandatory. Optionally Participation means that on entity occurrence does not
require a corresponding entity occurrence in a particular relationship. Each entity is implemented as a table.
In the COURSE generates CLASS at least some courses do not generate a class. In other words, an entity
occurrence (row) in the COURSE table does not necessarily require the existence of a corresponding entity
occurrence in the CLASS table. Therefore the CLASS entity is considered to be optional to the COURSE
entity. It‘s the O shape on the line of the Crows foot diagram.
Mandatory Participation: Means that one entity occurrence requires a corresponding entity occurrence in
a particular relationship. It indicates that the min cardinality is 1 for the mandatory entity. Mandatory on
the ―1‖ side and optional on the ―Many‖ side. Crows Foot Symbols on page 120.
(Section 4.1.9) Relationship Degree
Relationship Degree indicates the number of entities or participants associated with a relationship.
Unary Relationship exists when an association is maintained within a single entity.
 An employee within the EMPLOYEE entity is the manager for one or more employees within that
entity. In this case the existence of the ―managers‖ relationship means that EMPLOYEE requires
another EMPLOYEE to be the manager that is, EMPLOYEE has a relationship with it self. Such
relationship is known as the Recursive Relationship.
Binary Relationship exists when two entities are associated in a relationship. Most common.
Ternary relationship exists when three entities are associated.
(Section 4.1.10) Recursive Relationships
Recursive Relationship is one in which a relationship can exist between occurrences of the same entity
set. Naturally it‘s found in the Unary Relationship.
(Section 4.1.11) Associative (Composite or Bridge) Entities
Relational models generally requires the use of 1:M relationships. (Also, recall that the 1:1 relationship has
its place, but it should be used with caution and proper justification.) If M:N relationships are encountered,
you must create a bridge between the entities that display such relationships. The Associative Entities are
used to implement a M:M relationship between two or more entities.
(Section 4.2) Developing an ER diagram
Iterative process: Repetition of processes and procedures. The business rules define the ERD components.
Building an ERD involves the following:
 Create a detailed narrative of the organization‘s description of operations.
 Id the business rules based on the description of the operations.
 Id the main entities and relations from the business rules.
 Develop the initial ERD
 Id the attributes and PK that adequately describe the entities.
 Revise and review the ERD.
(Section 4.3) DB design challenges: Conflicting goals
DB designer often must make design compromises that are triggered by conflicting goals, such as
adherence to design standards or elegance, processing speed, and info requirements.
 Design Standards: To guide you in developing logical structures that minimize data redundancies,
avoiding nulls to the greatest extent, and allows you to work with well defined components and to
evaluate the interaction of those components with some precision.
 Processing Speed: Its top priority for large numbers of transactions. Means minimal access time,
which may be achieved by minimizing the number and complexity of logically desirable
relationships. A perfect design may use 1:1 relationships to avoid nulls, while a higher transaction
seeped design might combine the two tables to avoid the use of an additional relationship, using
dummy entries to avoid nulls.
 Information Requirements: Complex info requirements may dictate data transformations, and they
may expand the number of entities and attributes within the design. Therefore, the db may have to
sacrifice some of its ―clean‖ design structures and or some of its high transaction speed to ensure
max info generation.
Business rules are an important element of database design in every organization. Business rules drive all
business processes, and an organization's business rules must be correctly implemented by the
organization's IT systems, including the databases.
Business rules are precise statements, derived from a detailed description of the organization's operations.
When written properly, business rules define one or more of the following modeling components:
 entities
 relationships
 attributes
 connectivities
 cardinalities
 constraints
Because the business rules form the basis of the data-modeling process, precisely phrasing them is crucial
to the success of the database design. Because the business rules are derived from a precise description of
operations, much of the design's success depends on the accuracy of the description of operations.
Examples of business rules are:
 An invoice contains one or more invoice lines.
 Each invoice line is associated with a single invoice.
 A store employs many employees.
 Each employee is employed by only one store.
 A college has many departments.
 Each department belongs to a single college. (This business rule reflects a university that has
multiple colleges such as Business, Liberal Arts, Education, Engineering, etc.)
 A driver may be assigned to drive many different vehicles.
 Each vehicle can be driven by many drivers. (Note: Keep in mind that this business rule reflects
the assignment of drivers during some period of time.)
 A client may sign many contracts.
 A sales representative may write many sales contracts.
 Each sales contract is written by one sales representative.
 Each sale involves a sales representative, a customer, and one or more products.
Note that each relationship definition requires the definition of two business rules. For example, the
relationship between the INVOICE and (invoice) LINE entities is defined by the first two business rules in
the bulleted list. This two-way requirement exists because there is always a two-way relationship between
any two related entities. (This two-way relationship description also reflects the implementation by many of
the available CASE tools.) The last business rule above describes a three-way sale relationship between
sales representatives, customers, and products.
---------------------------------------------------------------------------------------------------------------------------------
Chapter 5: Normalization of DB tables
(Section 5.1) DB tables and Normalization
Normalization: is the answer to recognize a poor table structure and how to produce a good table. It‘s a
process of evaluating and correcting table structures to minimize data redundancies, thereby reducing the
likelihood of data anomalies. It involves assigning attributes to tables based on the concept of
determination. It is a sequence of tests that are applied to candidate entities and their attributes. Works
through a series of stages called Normal Forms. They are First Normal (1NF), Second Normal Form
(2NF), and Third Normal Form (3NF). 3NF is the highest. 3NF is not always the way to go because it can
effect fast performance.
With excessive normalization it can result in less easily understood entities and slow processing speed.
Denormalization produces a lower normal form; that is, a 3NF will be converted to a 2NF through
denormalization. However the price you pay for increased performance through denormalization is greater
data redundancy.
(Section 5.2) The Need for Normalization
It‘s to decrease anomalies and eliminate data redundancies. It‘s critical to a successful db design. The goal
of normalizations is to create a table such that all non-key attributes are dependent on the PK and nothing
but the PK.
(Section 5.3) The Normalization Process
The objective of Normalization is to ensure that each table conforms to the concept of well formed
relations, that is, tables that have the following characteristics.
 Each table represents a single subject. For example a student tables will contain only student data.
 No data item will be unnecessarily stored in more than one table. This is to ensure data is updated
in only one place.
 All nonprime attributes in a table are depended on the PK. This is to ensure that the data are
uniquely identifiable by a primary key value.
 Each table is void of insertion, update, or deletion anomalies. This is to ensure the integrity and
consistency of the data.
 First Normal Form (1NF): Table format, all key attributes defined with no repeating groups, and
PK identified. All remaining attributes are dependent on the PK. It still may contain partial
dependencies. Dependencies based on only part of the PK. All repeating groups must be removed
means that each row in a table must define only a single entity. To do this, the appropriate entry
must be added to the PK column.
 Second Normal Form (2NF): It‘s in 1NF and includes no partial dependencies. It may contain
Transitive dependencies based on attributes that are not part of the PK. The table can then be put
into a 2NF by ensuring no attribute is dependent on only part of the primary key. If this partial
dependency exists, a new table can be created with a primary key equal to the required portion of
the original key. The dependent attributes are moved to this table. If the 2NF table has any
transitive dependencies, the dependencies can be eliminated by breaking them off and storing
them in a separate table.
 Third Normal Form (3NF): It‘s in 2NF and includes no Transitive dependencies.
 Boyce-Codd Normal Form (BCNF): Every determinant is a candidate key (special case of 3NF).
If a 3NF table has only a single candidate key, it‘s automatically in BCNF. It can be violated only
if the table contains more than one candidate key.
 Fourth Normal Form (4NF): It‘s in 3NF or BCNF and no independent multi-valued
dependencies. Splitting the table to remove all multi-valued dependencies.
 (5NF) & (DKNF) are not likely to be encountered in a business environment and are mainly of
theoretical interest.
 The normalizations process works one relation at a time. It starts by identifying the dependencies
of a given relation and progressively breaking up the relation into a set of new relations based on
the identified dependencies.
 Update Anomaly: When you modify duplicate data in the system. It runs the risk of the data to
being properly modified.
 Insert Anomalies: You can‘t insert data due to missing info (especially a missing key!)
 Delete Anomaly: You can‘t delete data without deleting other essential data (Data that you don‘t
want to delete)
Functional Dependency: Before outlining normalization process, it is good to review the concepts of
determination and functional dependency. Check table 5.3.
(Section 5.3) Conversion to First Normal Form (1NF)
Repeating groups: Derives its name from the fact that a group of multiple entries of the same type can
exist for any single key attribute occurrence.
The normalization process starts with a simple three-step procedure.
 Step 1: Eliminate the Repeating Groups: Eliminate the nulls by making sure that each repeating
group attribute contains an appropriate data value. That change converts the table to 1NF.
 Step 2: Identify the PK:
 Step3: Identify all Dependencies:
Dependency Diagram: Helpful in getting a bird‘s-eye view of all of the relationships among a table‘s
attributes, and their use makes it less likely that you will overlook an important dependency.
Partial Dependency: A dependency based on only a part of a composite primary key that determines other
attributes. They are used for performance reasons; they should be used with caution. A table that contains
partial dependencies is still subject to data redundancies and to various anomalies.
Transitive Dependency: A dependency of one nonprime attribute on another nonprime attribute.
The problem is that they still yield data anomalies. It exists because a nonkey attribute determines the
values of another nonkey attribute.
(Section 5.3.2) Conversion to Second Normal Form (2NF)
Converting to 2NF is done only when the 1NF has a composite PK. If the 1NF has a single attribute PK,
then the table is automatically in 2NF.
Prime or Key Attribute: Any attribute that is at least part of a key.
Nonprime or Nonkey attribute: is not part of any key.
(Section 5.3.3) Conversion to Third Normal Form (3NF)
 Step1: Identify each new Determinant: A Determinant is any attribute whose value determines
other values within a row. IF there are three different transitive dependencies, you will have three
different determinants.
 Step2: Identify the Dependent Attributes:
 Step3: Remove the Dependent Attributes from transitive dependencies.
(Section 5.4) Improving the design
Evaluate PK Assignments:
Surrogate Key: An artificial PK introduced by the designer with the purpose of simplifying the assignment
of PK to tables. They are usually numeric, and often automatically generated by the DBMS.
Evaluate Naming Conventions: It is better to change the attribute name to reflect the table name. Table =
JOB, and attribute should be from CHG_HOUR to JOB_CHG_HOUR.
Refine Attribute Atomicity: Atomic Attribute is one that cannot be further subdivided or decomposed.
By improving the degree of atomicity, you also gain querying flexibility.
Identify new Attributes:
Identify New Relationships:
Refine PK as required for Data Granularity:
Granularity refers to the level of detail represented by the values stored in a table‘s row. Data stored at
their lowest level of granularity are said to be atomic data.
Maintain Historical Accuracy:
Evaluate Using derived attributes: The availability of the derived attribute will save reporting time.
(Section 5.5) Surrogate key considerations
At the implementation level a surrogate key is a system defined attribute generally created and managed via
the DBMS.
(Section 5.6) Higher Level Normal Forms
Tables in 3NF will perform suitably in business transactional db.
(Section 5.6.1) The Boyce-Codd Normal Form (BCNF)
This is a special case of the 3NF. A table is in BCNF when every determinant in the table is candidate key.
A BCNF can be violated only when the table contains more than one candidate key. When a nonkey
attribute is the determinant of a key attribute, the condition does not violate 3NF, yet it fails to meet the
BCNF requirements because BCNF requires that every determinant in the table be a candidate key.
(Section 5.6.2) Fourth Normal Form (4NF)
 All attributes must be dependent on the PK, but they must be independent of each other.
 No row may contain two or more multivalued facts about an entity.
A table is in $NF when it is in 3NF and has no multiple sets of multivalued dependencies.
(Section 5.7) Normalizations and DB design
ER modeling and Normalization are difficult to separate and the two are used in an iterative and
incremental process. ER diagram looks at the "big picture" and normalization provides a "micro" view of
individual entities.
Normalization takes place in tandem with data modeling. The proper procedure is to follow these
steps:
1) Create a description of operations at an appropriate level of detail.
2) Derive appropriate business rules from the description of operations.
3) Model the data with the help of a tool such as Visio's Crow's Foot option to produce an initial
ERD. This ERD is the initial database blueprint.
4) Use the normalization procedures to identify and remove data redundancies. This process
may produce additional entities.
5) Revise the ERD created in step 3.
6) Use the normalization procedures to audit the revised ERD. If significant additional data
redundancies are discovered, repeat steps 4 and 5.
(Section 5.8) Denormalization
It is important to remember that the optimal relational db implementation requires that all tables be at least
in 3NF. The problem with normalization is that as tables are decomposed to conform to normalization
requirements, the number of db tables expands. Therefore, in order to generate info, data must be put
together from various tables. Joining a large number of tables takes additional input/output I/O operations
and processing logic, thereby reducing system speed. Rare and occasional circumstances may allow some
degree of denormalization so processing speed can be increased. The problem with denormalized relations
and redundant data is that the data integrity could be compromised due to the possibility of data anomalies
(insert, update, and deletion anomalies).
Unnormalized tables in a productions db tend to suffer from the following defects.
 Data updates are less efficient because programs that read and update tables must deal with larger
tables.
 Indexing is more cumbersome. It simply is not practical to build all of the indexes required for the
many attributes that might be located in a single unnormalized table.
 Unnormalized tables yield no simple strategies for creating virtual tables known as views.
---------------------------------------------------------------------------------------------------------------------------------
Chapter 6: Advanced Data Modeling
(Section 6.1) The EERM
EERM: sometimes referred to as the Enhanced ERM is the result of adding more semantic constructs
(entity supertypes, entity subtypes, and entity clustering) to the original ERM. Entity-relationship modeling
is missing the ability to represent relationships based on specialization and generalization. For example,
you can't directly represent that students and faculty are people in an ERD. This shortcoming is addressed
in Extended Entity-Relationship Modeling, which includes specialization-generalization relationships.
Abstractions, Entities & Classes:
Abstraction means identifying the common characteristics of things and using those common
characteristics to classify or organize things. We used abstraction when we identified entities and produced
Entity-Relationship models.
(Section 6.1.1) Entity supbertypes and subtypes
Employee = Supertype Pilot = Subtype because not all employees have the attributes of pilots. This is to
prevent nulls.
Entity Supertype is a generic entity type that is related to on e or more entity subtypes.
Entity Subtypes is where the entity supertype contains the common characteristics, and the entity subtypes
contain the unique characteristics of each entity subtype.
(Section 6.1.2) Specialization Hierarchy
Specialization Hierarchy: A hierarchy that is based on the top-down process of identifying lower-level,
more specific entity subtypes from a higher-level entity supertype. Specialization is based on grouping
unique characteristics and relationships of the subtypes. The relationships depicted within the specialization
hierarchy are sometimes described in terms of ―is-a‖ relationships. Example ―a pilot is an employee‖.
Within a specialization hierarchy, a subtype can exist only within the context of a supertype, and every
subtype can have only one supertype to which t is directly related.
A specialization hierarchy provides the means to:
 Support attribute inheritance.
 Define a special supertype attribute known as the subtype discriminator.
 Define disjoint/overlapping constraints and complete/partial constraints.
In specialization Hierarchies with multiple levels of supertype/subtypes, a lower-level subtype inherits all
of the attributes and relationships from all of its upper-level supertypes.
(Section 6.1.3) Inheritance
Inheritance: enables an entity subtype to inherit the attributes and relationships of the supertype.
One important Inheritance characteristic is that all entity subtypes inherit their primary key attribute from
their supertype.
(Section 6.1.4) Subtype Discriminator
Subtype Discriminator is the attribute in the supertype entity that determines to which subtype the
supertype occurrence is related. The EMP_TYPE attribute is the subtype discriminator because it‘s the
attribute in the supertype that determines to which subtype the supertype occurrence is related.
Note that the default comparison condition for the subtype discriminator attribute is the equality
comparison. However there are situations in which the subtype discriminator is not necessarily based on an
equality comparison.
(Section 6.1.5) Disjoint and overlapping constraints
Disjoint Subtypes or Non-overlapping subtypes are subdues that contain a unique subset of the supertype
entity set; in other words, each entity instance of the supertype can appear in only one of the subtypes.
Overlapping Subtypes are subtypes that contain nonunique subsets of the supertype entity set; that is, each
entity instance for the supertype may appear in more than one subtype. For example a person can be an
employee, student, or both. In turn an employee may be a professor as well as an administrator. Because an
employee also may be a student, student and employee are overlapping subtuypes of the supertype person,
just as professor and admin are overlapping subtypes of the supertype employee.
(Section 6.1.6) Completeness Constraint
Completeness Constraint specifies whether each entity supertype occurrence must also be a member of at
least one subtype. It can be partial or total.
Partial Completeness: (symbolized by a circle over a single line) means that not every supertype
occurrence is a member of a subtype; that is; there may be some supertype occurrences that are not
members of any subtype.
Total Completeness: (symbolized by a circle over a double line) means that every supertype occurrence
must be a member of at least one subtype.
(Section 6.1.2) Specializations and Generalization:
Specialization: is the top-down process of identifying lower-level, more specific entity subtypes from a
higher-level entity supertype. Specialization is based on grouping unique characteristics and relationships
of the subtypes. For example we used specialization to id multiple entity subtypes from the original
employee supertype.
Generalization: Is the bottom-up process of id a higher-level, more generic entity supertype from lower-
level entity subtypes. Generalization is based on grouping common characteristics and relationships of the
subtypes. For example, you might id multiple types of musical instruments: Piano, violin, and guitar.
(Section 6.2) Entity Clustering
Entity Cluster: is a ―virtual‖ entity type used to represent multiple entities and relationships in the ERD.
Entity clustering is a technique used to hide potentially confusing detail in an ERD. An entity cluster is
formed by combining multiple interrelated entities into a single abstract entity object. It‘s considered virtual
or abstract in a sense that it is not actually an entity in the final ERD. When using entity clusters, the key
attributes of the combined entities are no longer available. Avoid the display of attributes when entity
clusters are used to prevent problems such as changes in relationships from identifying to non-identifying
or vice versa and the loss of FK attributes from some entities.
(Section 6.3) Entity Integrity: Selecting PK
The importance of properly selecting the PK has a direct bearing on the efficiency and effectiveness of db
implementation.
(Section 6.3.1) Natural Keys and PK
Natural Key or Natural Identifier is a real world, generally accepted identifier used to distinguish—that
is, uniquely identify – real world objects.
(Section 6.3.2) PK Guidelines
The function of the PK is to guarantee entity integrity, not to describe an entity.
PK and FK are used to implement relationships among entities.
Desirable primary key characteristics should be UNIQUE VALUES, NONINTELLIGENT, NO
CHANGE OVER TIME, PREFERABLY SING-ATTRIBUTE, PREFERABLY NUMERIC, SECURITY
COMPLIANT, MANIFESTNESS, IMMUTABILITY, COMPACTNESS.
(Section 6.3.3) When to use composite PK
Composite PK‘s are useful in two cases.
 As identifiers of composite entities, where each PK combination is allowed only once in the M:N
relationship.
 As identifiers of weak entities, where the weak entity has a strong identifying relationship with the
parent entity.
The ENROLL entity mainly represents the many-to-many relationship between students and classes. Such
entities are termed association entities, bridge entities, or composite entities. Note that the table has foreign
keys to both STUDENT and CLASS, and that the primary key is the composite of those two foreign keys.
PK in Existence-Dependent Relationships:
If one entity depends for its existence within the database on one or more other entities, then the existence-
dependent table should include the primary key of all tables upon which its existence depends. As the text
indicates, these existence dependencies can be natural, such as the existence dependence of DEPENDENT
on EMPLOYEE or the existence dependence of GRADED_ITEM on CLASS. Existence dependence can
arise in relational database as a result of the normalization process required to correctly represent composite
entities in a relational database.
(Section 6.3.4) When to use Surrogate PK
It can often be very difficult or impossible to identify correct primary keys for natural entities,
particularly natural events. In these situations the only solution is to have the computer or user create a
unique primary key for each entity that is inserted into the table that represents such an entity. These keys
are called synthetic primary keys or surrogate keys.
It is famously difficult to identify correct natural keys for people, and it is not desirable for
security reasons if you had an ID card that uses your SSN number as your ID #. This is why SSN numbers
are not used as PK.
They are helpful when there is no natural key, when the selected candidate key has embedded
semantic contents, or when the selected candidate key is too long or cumbersome. If you use a surrogate
key you must ensure that the candidate key of the entity n questions performs properly through the use of
―unique index‖ & ―not null‖ constraints.
Integer surrogate keys are the norm in large and high performance databases. This is because
integers are the most compact representation of an identifier that is unique for a number of unique entities,
and because it is usually most efficient for computers to store and operate on integers.
Tables that should not have PK’s:
The text assumes that all tables should have primary keys, but this is not always true. Tables that represent
real-world entities or parts of entities should always have primary keys. Most operational databases in
financial or other sensitive applications include tables whose sole role is to preserve a record of events or
changes to the database. These history or audit tables often record a variety of internal events within the
system which may have no corresponding durable entity in the real world, and no natural key. Clients of
mine have tried in vain to develop a natural primary key for these tables, including as many as a dozen
columns, often with a timestamp to help assure uniqueness. After such an application has run for a while
they have discovered to their dismay that they occasionally have primary key uniqueness violations that
prevent the insertion of a history record.
The problem in these situations is not that they have selected incorrect columns for the natural primary key,
but that such tables usually have no reliable natural primary key, even including a timestamp. The solution
is to not have a primary key. Not having to maintain the unique index for the primary key speeds inserts
into history tables. You may want to index the tables so that you can efficiently retrieve the historic data. If
you do need a primary key, for example if a history table must be referenced from another table, then use a
synthetic (surrogate) primary key.
(Section 6.4) Design cases: Learning Flexible DB Design
Databases are the most long-lived of all software components. Many databases have been continuously
used and updated for decades. Most of the life cycle cost of databases is thus in the maintenance phase of
the database life cycle. Thus the most important characteristic of a database design is that it be easy to
modify as business needs change. There is an old saying in the computer industry that an easily maintained
design that doesn't happen to be completely correct is not a problem, because you can just fix it, but that an
un-maintainable design is a disaster, because even if it works now something will inevitably happen that
will require you to change it, and then you have a big problem. In this section we describe the things that
you can do to make sure that your designs are flexible enough so that they can be easily maintained.
There are two more advanced topics in the design of foreign keys for 1:1 relationships. One consideration
is that modern DBMS including Oracle support clustering of tables which have 1:1 mandatory relationships
and a common key that identifies the rows that are related 1:1. What clustering of tables does is combine
the corresponding logical rows of the clustered tables into one physical row in storage. Because the
clustered rows are actually one physical row there is no need to repeat the shared columns in the cluster
key. As a result when tables are clustered the result is a smaller database I often cluster tables that are
related by a 1:1 mandatory relationship. With clustering the columns in the common cluster key are stored
only once, so the database is smaller. Clustering stores the related rows together in one physical row, so
joining the tables is essentially free, and you can effectively ignore the performance consequences of
joining both tables in requests. Clustering reduces the table size, because there is only primary key.
(Section 6.4.1) Design Case #1: Implementing 1:1 Relationships
FK‘s work with PK‘s to properly implement relationships in the relation model. The basic rule is to put the
PK of the ―one‖ side (the parent entity) on the ―many‖ side (the dependent entity) as a foreign key.
However, where do you lace the FK when you are working with a 1:1 relationship? There are two options.
 Place a FK in both entities
 Place a FK in one of the entities which is the preferred solution.
(Section 6.4.2) Design Case #2: Maintaining History of Time-Variant Data
Time –Variant data refer to data whose values change over time and for which must keep a history of the
data changes. Keeping the history of time-variant data is equivalent to having a multivalued attribute in
your entity. To model time-variant data, you must create a new entity in a 1:M relationship with the
original entity.
Representing History: One nice feature of designs with a current data table and a corresponding history
table is that the same queries can be run against the current status table (e.g., DEPARTMENT) or against
the corresponding history table (e.g., DEPARTMENT_HIST). Queries can be run against the history table
to return results corresponding to the state at any previous time. For example, we can run a report today as
if it were being run at the end of the previous quarter, reflecting the state of the DEPARTMENT table or
any number of additional tables at that time. This is very useful for many kinds of businesses.
Queries run against the history table will have at least one additional WHERE clause, and often a
subquery. The history table is typically much larger than the current data table. For these reasons queries
against the history table are not as fast as queries against the current data table. This is why it is convenient
to have both tables. The current data table supports current operational transactions, while the
corresponding history table supports historic analysis and reporting. Performance is not so important for
these historic functions, so the extra size and overhead of the history table is acceptable. Note that the
current data table redundantly stores the latest data in the history table, so care must be taken to assure that
these are always consistent. I usually encapsulate updates to the pair of tables in a stored procedure, and
write a stored procedure or script that checks that they are consistent. Triggers can also be used to maintain
the denormalized data. Triggers can be used to add history to an existing database and applications without
requiring changes to existing SQL.
Designing DB History and Audit:
We often need to maintain a record of transactions, for some time after the transactions have been
completed, to support a review of the transactions or for internal or external audit. The requirements for this
history data are quite different from those of the operational database, and consequently the designs for
history and audit tables are correspondingly quite different. The differences in the requirements are
summarized in the following table. On Lecture 6 section 3.11
When history is kept in separate tables those history tables are often quite denormalized, so that each record
in the history table represents the entire event or transaction that it is being recorded for later analysis or
audit. The following table summarizes common denormalizations in history tables:
On Lecture 6 section 3.11
Designing History and Audit Tables:
History and audit tables may record more information than is required for operations. For example, let's
look at what happens at a financial services firm, such as a bank, when any change is made to a customer's
address. While the operational tables only store the new address, the audit tables will record who made the
change, when they made it, from where they made it, and references to any paper documents, voice
recordings or other audit data that may be outside the database.
Historic data is also stored in data warehouses and other decision support systems. Modern data
warehouses store fact data at the level of atomic business transactions, so it is sometimes feasible to use a
data warehouse as the longer-term history and audit repository, but this can be problematic. The following
table summarizes differences between audit database requirements and data warehouse requirements, as
well as the problems created when data warehouses are used as history and audit repositories.
(Section 6.4.3) Design Case #3: Fan Traps
Design Trap: Occurs when a relationship is improperly or incompletely identified and is therefore
represented in a way that is not consistent with the real world. The most consistent design trap is known as
a Fran Trap: It occurs when you have one entity in two 1:M relationships to other entities, thus producing
an association among the other entities that is not expressed in the model. Fan traps occur when fewer
relationships are explicitly represented than matter in the real world and the subset of the relationships that
are represented are not sufficient to infer missing important relationships.
(Section 6.4.4) Design Case #4: Redundant Relationships
Redundant relationships occur when there are multiple relationship paths between related entities. The
main concern with redundant relationships is that they remain consistent across the model. It is important to
note that some designs use redundant relationships as a way to simplify the design.
(Section 6.5) Data Modeling Checklist
This is to ensure one fulfills data modeling tasks successfully. The checklist is on page 212 on the text.
Some checklist for generalization-specialization not mentioned in the text:
 Verify that all attributes of the superclass are needed in all subclasses
 Verify that all common attributes of subclasses have been correctly migrated to the superclass.
 Verify that domain experts agree that the subclasses are really specializations of the superclass.
 Verify that the business rules associated with the superclass really do apply to all subclasses.
---------------------------------------------------------------------------------------------------------------------------------
Chapter 8: Advanced SQL
(Section 8.7) Procedural SQL
Persistent stored module (PSM) is a block of code containing standard SQL statements and procedural
extensions that is stored and executed at the DMBS server.
Procedural SQL (PL/SQL) is a language that makes it possible to use and store procedural code and SQL
statements within the db and to merge SQL and traditional programming constructs, such as variables,
conditional processing (IF-THEN-ELSE), basic loops (FOR and WHILE) and error trapping.
Anonymous PL/SQL block
PL/SQL starts with a DECLARE section.
CHAR
VARCHAR2
NUMBER DATE
%TYPE
WHILE Loop
END LOOP
|| to display the output.
(Section 8.7.1) Triggers
Trigger is procedural SQL code that is automatically invoked by the RDBMS upon the occurrence of a
given data manipulation event.
 A trigger is invoked before or after a data row is inserted, update, or deleted
 A trigger is associated with a db table
 Each db table may have one or more triggers
 A trigger is executed as part of the transaction that triggered it
Triggers are critical to proper db operations and management
 Triggers can be used to enforce constraints that cannot be enforced at the DBMS design and
implementation levels.
 Triggers add functionality by automating critical actions and providing appropriate warnings and
suggestions for remedial action. In fact, one of the most common uses for triggers is to facilitate
the enforcement of referential integrity.
 Triggers can be used to update table values, insert records in tables, and call other stored
procedures.
Triggers play a critical role in making the db truly useful; they also add processing power to the RDBMS
and to the db system as a whole. Oracle recommends triggers for:
 Auditing purposes creating audit logs.
 Automatic generation of derived column values
 Enforcement of business or security constraints
 Creation of replica tables for backup purposes.
Statement Level Triggers: Is assumed if you omit the FOR EACH ROW keywords. This trigger is
executed once, before or after the triggering statement is complete. This is the default case.
Row Level Trigger: Requires use of the FOR EACH ROW keywords. This type of trigger is executed
once for each row affected by the triggering statement. If you update 10 rows the trigger executes 10 times.
---------------------------------------------------------------------------------------------------------------------------------
Chapter 9: Database Design
(Section 9.1) IS
Info System: data collection, storage, and retrieval. It also facilitates the transformation of data into info,
and it allows for the management of both data and info.
System Analysis: is the process that establishes the need for and the extent of an info system.
System Development: is the process of creating an IS. When apps transform data into info for decision
making.
Every app is composed of two parts: Data and the code.
Performance of an IS depends on a triad of factors.
 DB design and Implementation
 App design and implementation
 Admin procedures
Db Development: The process of db design and implementation.
Db Design: Primary objective is to create complete, normalized, non-redundant (to the extent possible),
and fully integrated conceptual, logical, and physical db models.
(Section 9.2) System Development Life Cycle (SDLC)
SDLC: Traces the history life cycle of an IS. Provides the big picture.
It‘s divided into 5 phases, Panning, Analysis, Detailed system Design, Implementation, & Maintenance.
SDLC is an iterative process.
(Section 9.2.1) Planning
Planning: Yields a general overview of the company and its objectives.
 Should the existing system be continued?
 Should the existing system be modified?
 Should the existing system be replaced?
If the new system is necessary, the next question is whether it is feasible. The feasibility study must address
the following:
 The technical aspects of hardware and software requirements.
 The system cost.
 The operational cost.
(Section 9.2.2) Analysis
The problems defined during the planning phase are examined n greater detail during the analysis phase.
 What are the requirements of the current system‘s end users?
 Do those requirements fit into the overall info requirements?
 During analysis phase is a through audit of user requirements.
 The existing hardware/software systems are also studied during the analysis phase.
 DB data modeling activities take place, DFD HIP diagrams.
(Section 9.2.3) Detailed Systems Design
The designer completes the design of the system‘s process. Includes screens, menus, reports, and other
devices that might be used to help make the system am ore efficient info generator.
(Section 9.2.4) Implementation
During this phase, the hardware, DMBS software, and app programs are installed, and the db design is
implemented. The system enters into a cycle of coding, testing, and debugging until it is ready to be
delivered.
(Section 9.2.5) Maintenance
 Corrective maintenance in response to systems errors.
 Adaptive maintenance due to changes in the business environment.
 Perfective maintenance to enhance the system.
Computer-aided systems engineering (CASE): technology such as system Architect or Visio helps make
it possible to produce better systems within a reasonable amount of time and at a reasonable cost.
(Section 9.3) The DB life cycle (DBLC)
DBLC Contains 6 phases: Db initial study, db design, implementation & loading, testing & evaluation,
operation, & maintenance and evolution.
(Section 9.3.1) The DB Initial Study
Analyze the company situation, define problems and constraints, and define objectives, scope, &
boundaries. The purpose of DB initial Study is to:
Analyze the company situation:
This describes the general conditions in which a company operates its organizational structure, and its
mission. These issues must be resolved:
 What is the org general operating environment, and what is its mission within that environment?
 What is the org structure?
Define Problems & Constraints:
 How does the existing system function?
 What input does the system require?
 What docs does the system generate?
 By whom and how is the system output used?
Define Objectives:
 What is the proposed system‘s initial objective?
 Will the system interface with other existing or future systems in the company?
 Will the system share the data with other systems or users?
Define Scope and Boundaries:
Scope defines the extent of the design according to operational requirements.
 Will the db design encompass the entire org, one or more departments within the org, or one or
more functions of a single department?
Boundaries: limits of the proposed system. They are external to the system.
The scope and boundaries become the factors that force the design into a specific mold, and the designer‘s
job is to design the best system possible within those constraints.
(Section 9.3.2) DB design
In the process of db deign, you must concentrate on the data characteristics required to build the database
model. At this point, there are two views of the data within the system: The business view of data as a
source of info and the designer‘s view of data within the system: The business view of data as a source of
info and the designer‘s view of the data structure, its access, and the activities required to transform the
data into info. Defining data is an integral part of the DBLC second phase.
1: Conceptual Design: data modeling is used to create an abstract db structure that represents real world
objects in the most realistic way.
Four steps:
 Data analysis and requirements
 Entity relationship modeling and normalization
 Data model verification
 Distributed db design
Minimal data rule: all that is needed is there, all that is there is needed. Make sure that all data needed are
in the model and that all data in the model are needed.
Data analysis and requirement:
The first step in conceptual design is to discover the characteristics of the data elements. Designer is
focused on:
 Information needs
 Information users
 Information sources
 Information constitution such as what data elements are needed to produce the info?
The designer obtains the answers by the following:
 Developing and gathering end-user data views
 Directly observing the current system
 Interfacing with the systems design group
From a db point of view, the collection of data becomes meaningful only when business rules are defined.
Description of operations is a doc that provides a precise, up-to-date, and thoroughly reviewed description
of the activities that define an org operating environment. To db designer operating environments is both
data sources and the data users.
Entity Relationship Modeling and Normalization:
ER model is a comm tool as well as a design blueprint.
During the ER modeling process, the designer must:
 Define entities, attributes, pk, fk.
 Make decisions about adding new pk attributes to satisfy end-user and or processing requirements.
 Make decisions about the treatment of multi-valued attributes.
 Make decisions about adding derived attributes to satisfy processing requirements.
 Make decisions about the placement of fk in 1:1 relationships. Avoid unnecessary ternary
relationships.
 Draw the corresponding ER diagram.
 Include all data element definitions in the data dic.
 Make decisions about standard naming conventions.
Data Model Verification:
The ER model must be verified against the proposed system processes in order to corroborate that the
intended processes can be supported by the db model.
ER model verification process:
1) Id the ER models central entity.
2) Id each module and its components
3) Id each module transaction requirements: Internal: updates/inserts/deletes/queries/reports
External: module interfaces.
4) Verify all processes against the ER model.
5) Make all necessary changes suggested in step 4.
6) Repeat steps 2-5 for all modules.
Module: IS component that handles a specific function, such as inventory, orders, payroll, and so on. At
the design level, a module is an ER segment that is an integrated part of the overall ER model. They speed
up development work, simplify the design work, and can be prototyped quickly. Think of this as a WBS.
Disadvantage is that it does create fragmentations, which creates potential problem: The fragments might
not include all of the ER model components and might not, therefore, be able to support all of the required
processes. To avoid this issue the models must be verified against the complete ER Model.
Within the central entity/module framework you must:
 A module must display high Cohesivity which describes the strength of the relationships found
among the module‘s entities.
 Module coupling describes the extent to which modules are independent of one another. Modules
must display low coupling, indicating that they are independent of other modules.
Process may be classified according to their:
Frequency: Daily, weekly, monthly, yearly, or exceptions.
Operational type: Insert or Add, Update or Change, Delete, queries and reports, batches, maintenance, and
backups.
2 DBMS Software Selection:
Some of the common factors affecting the purchasing decision are:
Cost:
DBMS features and tools:
Underlying model:
Portability:
DBMS hardware requirements:
3 Logical Design:
It translates the conceptual design into the internal model for a selected db management system. Therefore
the logical design is software dependent.
Physical Design:
Is the process of selecting the data storage and data access characteristics of the database. It becomes more
complex when data are distributed at different locations because the performance is affected by the comm.
Medias throughput.
(Section 9.3.3) Implementation and Loading
 Create db storage group. Sysadmin
 Create db within the storage group. Sysadmin
 Assign the rights to use the db to a db admin. DBA
 Create the table space within the db. DBA
 Create the table within the table space. DBA
 Assign access rights to the table spaces and to the tables within specified table spaces. DBA
 You also must address performance, security, backup and recovery, integrity, and company
standards.
Performance: DB size will affect performance. Important factors in db performance also include system
and db config parameters, such as data placement, access path definition, the use of indexes, and buffer
size.
Security: Data stored in the company db must be protected from access by unauthorized users.
 Physical Security allows only authorized personnel physical access to specific areas.
 Password Security allows the assignment of access rights to specific authorized users.
 Access rights can be established through the use of db software.
 Audit Trails Usually provided by the DBMS to check for access violations.
 Data Encryption can be used to render data useless to unauthorized users who might have
violated some of the db security layers.
 Diskless Workstations allows end users to access the db without being able to download the info
from their workstations.
Backup & Recovery:
 Full Backup or dump of the entire db.
 Differential Backup, in which only the last modifications to the db are copied. Only the objects
that have been updated since the last full backup are backed up.
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation
Database Design and Implementation

Contenu connexe

Tendances

Entity Relationship design issues
Entity Relationship design issuesEntity Relationship design issues
Entity Relationship design issuesMegha Sharma
 
Dbms Concepts
Dbms ConceptsDbms Concepts
Dbms Conceptsadukkas
 
Architecture of dbms(lecture 3)
Architecture of dbms(lecture 3)Architecture of dbms(lecture 3)
Architecture of dbms(lecture 3)Ravinder Kamboj
 
1 introduction databases and database users
1 introduction databases and database users1 introduction databases and database users
1 introduction databases and database usersKumar
 
1. Introduction to DBMS
1. Introduction to DBMS1. Introduction to DBMS
1. Introduction to DBMSkoolkampus
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database DesignArchit Saxena
 
Presentation on Database management system
Presentation on Database management systemPresentation on Database management system
Presentation on Database management systemPrerana Bhattarai
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1Sonia Mim
 
Introduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewIntroduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewPrabu U
 
Encapsulation of operations, methods & persistence
Encapsulation of operations, methods & persistenceEncapsulation of operations, methods & persistence
Encapsulation of operations, methods & persistencePrem Lamsal
 
Basic Concept Of Database Management System (DBMS) [Presentation Slide]
Basic Concept Of Database Management System (DBMS) [Presentation Slide]Basic Concept Of Database Management System (DBMS) [Presentation Slide]
Basic Concept Of Database Management System (DBMS) [Presentation Slide]Atik Israk
 

Tendances (20)

Elmasri Navathe DBMS Unit-1 ppt
Elmasri Navathe DBMS Unit-1 pptElmasri Navathe DBMS Unit-1 ppt
Elmasri Navathe DBMS Unit-1 ppt
 
Odbms concepts
Odbms conceptsOdbms concepts
Odbms concepts
 
Entity Relationship design issues
Entity Relationship design issuesEntity Relationship design issues
Entity Relationship design issues
 
Dbms Concepts
Dbms ConceptsDbms Concepts
Dbms Concepts
 
Architecture of dbms(lecture 3)
Architecture of dbms(lecture 3)Architecture of dbms(lecture 3)
Architecture of dbms(lecture 3)
 
1 introduction databases and database users
1 introduction databases and database users1 introduction databases and database users
1 introduction databases and database users
 
Ordbms
OrdbmsOrdbms
Ordbms
 
1. Introduction to DBMS
1. Introduction to DBMS1. Introduction to DBMS
1. Introduction to DBMS
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
Presentation on Database management system
Presentation on Database management systemPresentation on Database management system
Presentation on Database management system
 
Basic DBMS ppt
Basic DBMS pptBasic DBMS ppt
Basic DBMS ppt
 
RDBMS
RDBMSRDBMS
RDBMS
 
Conceptual Data Modeling
Conceptual Data ModelingConceptual Data Modeling
Conceptual Data Modeling
 
Chapt 1 odbms
Chapt 1 odbmsChapt 1 odbms
Chapt 1 odbms
 
Data models
Data modelsData models
Data models
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
 
Introduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewIntroduction to DBMS and SQL Overview
Introduction to DBMS and SQL Overview
 
Type constructor
Type constructorType constructor
Type constructor
 
Encapsulation of operations, methods & persistence
Encapsulation of operations, methods & persistenceEncapsulation of operations, methods & persistence
Encapsulation of operations, methods & persistence
 
Basic Concept Of Database Management System (DBMS) [Presentation Slide]
Basic Concept Of Database Management System (DBMS) [Presentation Slide]Basic Concept Of Database Management System (DBMS) [Presentation Slide]
Basic Concept Of Database Management System (DBMS) [Presentation Slide]
 

En vedette

Database design, implementation, and management -chapter02
Database design, implementation, and management -chapter02Database design, implementation, and management -chapter02
Database design, implementation, and management -chapter02Beni Krisbiantoro
 
Database design process
Database design processDatabase design process
Database design processTayyab Hameed
 
Database design, implementation, and management -chapter04
Database design, implementation, and management -chapter04Database design, implementation, and management -chapter04
Database design, implementation, and management -chapter04Beni Krisbiantoro
 
Database Design Process
Database Design ProcessDatabase Design Process
Database Design Processmussawir20
 
Database Design Slide 1
Database Design Slide 1Database Design Slide 1
Database Design Slide 1ahfiki
 
Bsc cs ii-dbms- u-i-database systems
Bsc cs ii-dbms- u-i-database systemsBsc cs ii-dbms- u-i-database systems
Bsc cs ii-dbms- u-i-database systemsRai University
 
Chapter02 succeeding as a systems analyst
Chapter02 succeeding as a systems analystChapter02 succeeding as a systems analyst
Chapter02 succeeding as a systems analystDhani Ahmad
 
Chapter06 initiating and planning systems development projects
Chapter06 initiating and planning systems development projectsChapter06 initiating and planning systems development projects
Chapter06 initiating and planning systems development projectsDhani Ahmad
 
Database systems
Database systemsDatabase systems
Database systemsDhani Ahmad
 
TID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseTID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseWanBK Leo
 
Strategic planning
Strategic planningStrategic planning
Strategic planningDhani Ahmad
 
Types of islamic institutions and records
Types of islamic institutions and recordsTypes of islamic institutions and records
Types of islamic institutions and recordsDhani Ahmad
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to databaselubna19
 
Information system
Information systemInformation system
Information systemDhani Ahmad
 

En vedette (20)

Database design, implementation, and management -chapter02
Database design, implementation, and management -chapter02Database design, implementation, and management -chapter02
Database design, implementation, and management -chapter02
 
Database - Design & Implementation - 1
Database - Design & Implementation - 1Database - Design & Implementation - 1
Database - Design & Implementation - 1
 
Database design
Database designDatabase design
Database design
 
Database design process
Database design processDatabase design process
Database design process
 
Database design, implementation, and management -chapter04
Database design, implementation, and management -chapter04Database design, implementation, and management -chapter04
Database design, implementation, and management -chapter04
 
Database Design Process
Database Design ProcessDatabase Design Process
Database Design Process
 
Database Proposal
Database ProposalDatabase Proposal
Database Proposal
 
Database Design Slide 1
Database Design Slide 1Database Design Slide 1
Database Design Slide 1
 
Database
DatabaseDatabase
Database
 
Bsc cs ii-dbms- u-i-database systems
Bsc cs ii-dbms- u-i-database systemsBsc cs ii-dbms- u-i-database systems
Bsc cs ii-dbms- u-i-database systems
 
database
databasedatabase
database
 
Chapter02 succeeding as a systems analyst
Chapter02 succeeding as a systems analystChapter02 succeeding as a systems analyst
Chapter02 succeeding as a systems analyst
 
Chapter01 1
Chapter01 1Chapter01 1
Chapter01 1
 
Chapter06 initiating and planning systems development projects
Chapter06 initiating and planning systems development projectsChapter06 initiating and planning systems development projects
Chapter06 initiating and planning systems development projects
 
Database systems
Database systemsDatabase systems
Database systems
 
TID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseTID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To Database
 
Strategic planning
Strategic planningStrategic planning
Strategic planning
 
Types of islamic institutions and records
Types of islamic institutions and recordsTypes of islamic institutions and records
Types of islamic institutions and records
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Information system
Information systemInformation system
Information system
 

Similaire à Database Design and Implementation

Chap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.pptChap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.pptLisaMalar
 
D I T211 Chapter 1
D I T211    Chapter 1D I T211    Chapter 1
D I T211 Chapter 1askme
 
D I T211 Chapter 1 1
D I T211    Chapter 1 1D I T211    Chapter 1 1
D I T211 Chapter 1 1askme
 
Data base management system
Data base management systemData base management system
Data base management systemSuneel Dogra
 
File systems versus a dbms
File systems versus a dbmsFile systems versus a dbms
File systems versus a dbmsRituBhargava7
 
Chapter 1 Database Systems.pptx
Chapter 1 Database Systems.pptxChapter 1 Database Systems.pptx
Chapter 1 Database Systems.pptxMaxamedAbiib1
 
Database management system
Database management systemDatabase management system
Database management systemRizwanHafeez
 
DBMS-INTRODUCTION.pptx
DBMS-INTRODUCTION.pptxDBMS-INTRODUCTION.pptx
DBMS-INTRODUCTION.pptxDivyaKS12
 
Mis chapter 4 database management - copy
Mis chapter 4   database management - copyMis chapter 4   database management - copy
Mis chapter 4 database management - copyAjay Khot
 
Database Systems
Database SystemsDatabase Systems
Database SystemsUsman Tariq
 

Similaire à Database Design and Implementation (20)

Chap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.pptChap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.ppt
 
D I T211 Chapter 1
D I T211    Chapter 1D I T211    Chapter 1
D I T211 Chapter 1
 
D I T211 Chapter 1 1
D I T211    Chapter 1 1D I T211    Chapter 1 1
D I T211 Chapter 1 1
 
Dbms mca-section a
Dbms mca-section aDbms mca-section a
Dbms mca-section a
 
Data base management system
Data base management systemData base management system
Data base management system
 
DataMgt - UNIT-I .PPT
DataMgt - UNIT-I .PPTDataMgt - UNIT-I .PPT
DataMgt - UNIT-I .PPT
 
File systems versus a dbms
File systems versus a dbmsFile systems versus a dbms
File systems versus a dbms
 
Chapter 1 Database Systems.pptx
Chapter 1 Database Systems.pptxChapter 1 Database Systems.pptx
Chapter 1 Database Systems.pptx
 
Database management system
Database management systemDatabase management system
Database management system
 
DBMS and its Models
DBMS and its ModelsDBMS and its Models
DBMS and its Models
 
Dbms models
Dbms modelsDbms models
Dbms models
 
DBMS-INTRODUCTION.pptx
DBMS-INTRODUCTION.pptxDBMS-INTRODUCTION.pptx
DBMS-INTRODUCTION.pptx
 
Mis chapter 4 database management - copy
Mis chapter 4   database management - copyMis chapter 4   database management - copy
Mis chapter 4 database management - copy
 
Database Systems
Database SystemsDatabase Systems
Database Systems
 
Dbms
DbmsDbms
Dbms
 
Database Management System 1
Database Management System 1Database Management System 1
Database Management System 1
 
Dbms Useful PPT
Dbms Useful PPTDbms Useful PPT
Dbms Useful PPT
 
Mis chapter 7 database systems
Mis chapter 7 database systemsMis chapter 7 database systems
Mis chapter 7 database systems
 
Assign 1
Assign 1Assign 1
Assign 1
 
Database
DatabaseDatabase
Database
 

Plus de Christian Reina

Focusing-on-Cost-Management
Focusing-on-Cost-ManagementFocusing-on-Cost-Management
Focusing-on-Cost-ManagementChristian Reina
 
Business-Models-Competitive-Strategies
Business-Models-Competitive-StrategiesBusiness-Models-Competitive-Strategies
Business-Models-Competitive-StrategiesChristian Reina
 
Business Data Communications and Networks
Business Data Communications and NetworksBusiness Data Communications and Networks
Business Data Communications and NetworksChristian Reina
 
IT Strategy and Management
IT Strategy and ManagementIT Strategy and Management
IT Strategy and ManagementChristian Reina
 
Information Systems Analysis and Design
Information Systems Analysis and DesignInformation Systems Analysis and Design
Information Systems Analysis and DesignChristian Reina
 

Plus de Christian Reina (7)

Focusing-on-Cost-Management
Focusing-on-Cost-ManagementFocusing-on-Cost-Management
Focusing-on-Cost-Management
 
PMP-Processes
PMP-ProcessesPMP-Processes
PMP-Processes
 
Cloud-Computing
Cloud-ComputingCloud-Computing
Cloud-Computing
 
Business-Models-Competitive-Strategies
Business-Models-Competitive-StrategiesBusiness-Models-Competitive-Strategies
Business-Models-Competitive-Strategies
 
Business Data Communications and Networks
Business Data Communications and NetworksBusiness Data Communications and Networks
Business Data Communications and Networks
 
IT Strategy and Management
IT Strategy and ManagementIT Strategy and Management
IT Strategy and Management
 
Information Systems Analysis and Design
Information Systems Analysis and DesignInformation Systems Analysis and Design
Information Systems Analysis and Design
 

Database Design and Implementation

  • 1. Christian Reina, CISSP, CISA 2010 MET CS669 Version 1.0
  • 2. Table of Contents Chapter 1: DB Systems....................................................................................................... 3 Chapter 2: Data Models ...................................................................................................... 7 Chapter 3: Relational DB Model ...................................................................................... 11 Chapter 4: Entity Relationship (ER) Modeling ................................................................ 14 Chapter 5: Normalization of DB tables ............................................................................ 18 Chapter 6: Advanced Data Modeling ............................................................................... 21 Chapter 8: Advanced SQL................................................................................................ 26 Chapter 9: Database Design.............................................................................................. 27 Chapter 10: Database Design............................................................................................ 32 Chapter 11 DB Performance Tuning & Query Optimization........................................... 37 Chapter 12: Distributed database management system .................................................... 42 Chapter 13: Business Intelligence and Data Warehouses................................................. 47
  • 3. Chapter 1: DB Systems (Section 1.1) Data vs. Info Data: Raw Facts, constitutes the building block of information. Information: The result of processing raw data to reveal its meaning. Knowledge: The body of info and facts about a specific subject. Key characteristic is new knowledge can be derived from old knowledge. Data Management: A discipline that focuses on the proper generation, storage, and retrieval of data. (Section 1.2) Introducing the Database and the DBMS Database: A shared, integrated computer structure that stores a collection of:  End-user data that is raw facts of interest to the end user.  Metadata or data about data, through which the end-user data are integrated and managed. Example, the metadata component stores info such as the name of each data element, the type of values (numeric, dates or text) stored on each data element, whether or not the data element can be left empty, and so on.  Collection of self-describing data:  A well-designed db facilitates data management and becomes a valuable info generator. A poorly designed db is likely to lead to errors in processing data and to bad decisions. The most popular way to classify db is by the use and timeliness of the data: Production DB: Contains up-to-the-minute real world info Data warehouse: stores data for making decisions (Section 1.2.1) Role and advantages of the DBMS Database Management System (DBMS): Collection of programs that manages the database structure and controls access to the data stored in the database. Helps manage the cabinet‘s contents. Advantages are improved data sharing, data security, and data integration, minimized data inconsistency, improved data access, improved decision making, and increased end-user productivity. Query: A specific request issued to the DBMS for data manipulation to read or update the data. A query is a question. Ad hoc query: is a spur of the moment questions. Query result set: is when the DBMS sends back an answer to the application. (Section 1.2.2) Types of databases DBMS can support many different types of databases: (Number of users) Single-user database: supports only one user at a time. Desktop Database: A single user database that runs on a personal computer. Multi user database: supports multiple users at the same time. Workgroup Database: Multi user database supports a relatively small number of users (usually fewer than 50) or a specific department within an org. Enterprise Database: Database is used by the entire organization and supports many users across many departments. Usually in the hundreds. (Db site location)
  • 4. Centralized Database: It supports data located at a single site. Distributed Database: It supports data distributed across several different sites. (Db use) Operational or Transactional or Production Database: A db that is designed primarily to support a company‘s day to day operations. Data warehouse: focuses primarily on storing data used to generate information required to make tactical or strategic decisions. They derive most of their data from production db‘s. Unstructured data: Data that exist in their original raw state. Structured data: The result of taking unstructured data and formatting or structuring such data to facilitate storage, use, and the generation of info. Semi-structured data: data that have already been processed to some extent or prearranged. XML: Language used to represent and manipulate data elements in a textual format. (Section 1.3) Why Database design is important DB Design: Refers to the activities that focus on the design of the database structure that will be used to store and manage end user data. A well designed database facilitates data management and generates accurate and valuable info. A poorly designed db will likely become a breeding ground for redundant data and data anomalies. (Section 1.4) Historical roots: files and file systems  An understanding of the relatively simple characteristics of file systems makes the complexity of database design easier to understand.  An awareness of the problems that plagued file systems can help you avoid those same pitfalls with DMBS software.  If you intend to convert an obsolete file system to a database system, knowledge of the file systems basic limitations will be useful. Data Processing (DP) Specialists: Created the necessary computer file structures, often wrote the software that managed the data within those structures, and designed the app programs that produced reports based on the file data. As files increased they would have to hire more Data processing Specialists for accommodations and the original DP specialist would become the DP manager. (Section 1.5) Problems with files system and data management Making changes in a existing structure can be difficult in a file system environment. 1) Reads a record from the original file 2) Transforms the original data to conform to the new structures storage requirements. 3) Writes the transformed data into the new file structure. 4) Repeats steps 2 to 4 for each record in the original file.  It requires extensive programming for pulling records, deleting, and updating.  It can not perform ad hoc queries.  System admin can be complex and difficult when records or files expand.  It is difficult to make changes to existing structures.  Security features are likely to be inadequate.  Each file typically required its own set of data management programs.  Many files would suffer from data redundancy leading to inconsistencies, anomalies, and lack of data integrity.  A mature files based data system might require hundreds of thousands of programs. These limitations lead to problems of structural and data dependency. (Section 1.5.1) Structural and data dependence
  • 5. Structural dependence: A file system exhibits this, which means that access to the file is dependent on its structure. For example adding a customer DOB to the Customer file would require the 4 steps described in section 1.5. Structural independence: Exists when it is possible to make changes in the file structure without affecting the app programs ability to access the data. Data dependence: When data access programs are subject to change when any of the files data storage characteristics change, (That is, changing the data type). Makes the file system cumbersome. Data independence: Exists when it is possible to make changes in the data storage characteristics without affecting the application programs ability to access the data. Logical data format: How the human views the data. Physical data format: How the computer must work with the data. Any program that accesses a file system‘s file must tell the computer what to do and how to do it. (Section 1.5.2) Field definitions and naming conventions Be descriptive in the field names but be aware of DBMS character length restrictions: Example REN should be CUS_RENEW_DATE (Section 1.5.3) Data Redundancy Islands of Info: They contain different versions of the same data. It‘s the storage of the same basic data in different locations. Redundant Data: Source of difficult-to-trace info errors. It‘s when the same data about the same entity is kept in different locations. It can result in storage of different values for the same attribute of the same entity. They are a result of a poorly designed db which can lead to poor decision making. Data Redundancy: Exists when the duplicated data are stored unnecessarily at different places. They are a result of a poorly designed db which can lead to poor decision making. Data Integrity: Condition in which all of the data in the db are consistent with the real-world events and conditions. In other worlds, data integrity means that:  Data are accurate—there are no data inconsistencies  Data are verifiable—the data will always yield consistent results Data Anomaly: Develops when all of the required changes in the redundant data are no made successfully. (Section 1.6) Db systems DBMS provides numerous advantages over file system management by making it possible to eliminate most of the files systems data inconsistency, data anomaly, data dependency, and structural dependency problems. (Section 1.6.1) The Db system environment Database: Organization of components that define and regulate the collection, storage, management, and use of data within a database environment. It‘s composed of 5 major parts Hardware, Software, People, Procedures, and Data. (Jobs in the db field)  DB Admin: Focused on individual db and DBMSs & strong technical skills in specific DBMSs.  Data Admin: Plans for db and technology, sets standards for data (Privacy & risk of loss), works with computerized and non-computerized dbs.
  • 6.  DB Modeler/Analyst/Designer/Programmer: Responsible for design & implementation of db and the app systems that interface with a DBMS. Modeler’s primary responsibility is gathering the data requirements and representing them in the data model. Designer may participate in the modeling, and translates the model into an operational db, often with the assistance of system and storage admins. App Analysts gather, doc and coordinate the app and user requirements. Programmers write the software apps, based on the application and data requirements. (Section 1.6.2) DBMS functions  Data Dictionary management: DBMS stores data elements & their relationships (metadata in a data dictionary. DBMS uses the Data dictionary to look up the required data component structures and relationships, thus relieving you from having to code such complex relationships in each program. Any changes made in a db structure are automatically recorded in the data dictionary.  Data storage management: DBMS provides storage not only for the data but also for the related data entry forms or screen definition, report definition, data validation rules, procedural code, structures to handle video and pic formats. It‘s also important for Performance Tuning, which relates to the activities that make the db perform more efficiently in terms of storage and access speed. (DBMS creates the complex structures required for data storage)  Data transformations & presentation: When the DBMS formats the physically retrieved data to make it conform to the users logical expectations. (DBMS transforms entered data to conform to the data structures)  Security management: The DBMS creates a security system that enforces user security and data privacy. They determine which users can access the db, which data items each user can access, and which data operations (read, add, delete, or modify) the user can perform. This is important during multi-user mode. (DBMS creates a security system and enforces security within that system)  Multi-user access control: To provide data integrity and data consistency, the DBMS uses sophisticated algorithms to ensure that multiple users can access the db concurrently without compromising the integrity of the db. (DBMS allows multiple users to have concurrent access to the data)  Backup & Recovery: The DBMS provides backup and data recovery to ensure data safety and integrity. Current DBMS systems provide special utilities that allow the DBA to perform routine and special backup and restore procedures. (DBMS performs backup and data recovery procedures to ensure data safely)  Data integrity management: The DBMS promotes and enforces integrity rules, thus minimizing data redundancy and maximizing data consistency. The data relationships stored in the data dictionary are used to enforce data integrity. Ensuring data integrity is especially important in transaction-oriented db systems. (DBMS promotes and enforces integrity rules to eliminate data integrity problems)  Db access languages & application programming interfaces: The DBMS provides data access through a query language. A Query Language is a nonprocedural language that lets the user specify what must be done without having to specify how it is to be done. Structured Query Language (SQL) is the de facto query language and data access standard supported by the majority of DBMS vendors. (DBMS provides access to the data via utility programs and programming language interfaces)  Db comm Interfaces: DBMS accept en-user requests via multiple, different network environments. For example, DBMS might provide access to the database via the internet through the use of firefox or IE. DBMS and can automatically publish predefined reports on a web-site. DBMS can connect to third party systems to distribute info via email or other apps. (DBMS provides access to data within a computer network environment) (Section 1.6.3) Managing the db system: A shift in focus Db systems significant disadvantages:  Increased costs  Management complexity:  Maintaining currency:
  • 7.  Vendor dependence:  Frequent upgrade/replacement cycles: --------------------------------------------------------------------------------------------------------------------------------- Chapter 2: Data Models (Section 2.1) Data modeling and data models Data modeling: The first step in db design refers to the process of creating a specific data model for a determined problem domain. This is an iterative process. Problem Domain: Is a clearly defined area within the real world environment. Data Model: Collection of concepts that can be used to describe the structure of a db. Its main function is to help us understand the complexities of the real world environment. It facilitates comm. between users, db designers, and app programmers. There are 3 categories of data models:  High level or conceptual data models, which are based on entities (objects) and relationships.  Low level or physical data modes, which are specific to particular DBMS such as Oracle.  Representational or implementation data models, which are also termed logical data models. (Section 2.2) The importance of data models When a good db blueprint is not available, problems are likely to happen. Data models are like blue prints and they are an abstraction. (Section 2.3) Data model basic building blocks The basic building blocks of all data models are entities, attributes, and relationships.  Entity: Is anything, such as a person, place, thing, idea, or event, about which data are to be collected and stored. They can be physical such and customers or products or abstractions such as flight routes or accounts.  Attributes: are equivalent to fields for an entity such as a CUSTOMER. They can be Fname, Lname, etc.  Relationship: describes an association among two or more entities. For example, ―An AGENT can serve many CUSTOMERS, and each CUSTOMER may be served by one AGENT‖. Data models use 3 types of relationship: o One-to-many (1:M): painter paints many different paintings, but each one is painted by only one painter. o Many-to-many (M:N): a student can take many classes and each class can be taken by many students. o One-to-one (1:1): each store employee only manages one store. Entity-relationship model (ERM): helps identify the db‘s main entities and their relations. They are graphically represented so it‘s more easily understood by users and designers. Entity-relationship Diagram (ERD): Chen model and Crows foot model. Constraints: Restrictions placed on the data. They help to ensure data integrity. (Section 2.3) Business rules Business rule: is a brief, precise, and unambiguous description of a policy, procedure, or principle within a specific org. They are used to define entities, attributes, relationships, and constraints. (Section 2.4.1) Discovering business rules The process of id and doc business rules is essential to db design for several reasons:  They help standardize the company‘s view of data  They can be comm. Tool between users and designers.  They allow the designer to understand the nature, role, and scope of the data.
  • 8.  They allow the designer to understand business processes.  They allow the designer to develop appropriate relationship participation rules and constraints and to create an accurate data model. (Section 2.4.2) Translating business rules into data model components A noun is a business rule translating into an entity in the model, and a verb (active or passive) associating nouns will translate into a relationship among the entities. Example ―a customer may generate many invoices‖ contains two nouns (customer & invoices) and a verb ―generate‖ that associates the nouns. To id relationship type, you should ask two questions:  How many instances of B are related to one instance of A?  How many instances of A are related to one instance of B? (Section 2.5) The evolution of data models Evolution of Major Data Models Table 2.1 page 35 (Section 2.51) The hierarchical model Hierarchical: model developed in 1960s to manage large amounts of data for complex manufacturing projects such as the Apollo rocket that landed on the moon in 1969. Logical structure is depicted by an upside-down tree. Disadvantages were: too complex to implement, it was difficult to manage, and it lacked structural independence. No standards for how to implement the model. Record based model. Segment: equivalent of a file system‘s record type. (Section 2.5.2) Network model Network model: was created to represent complex data relationships more effectively than the hierarchical model, to improve db performance, and to impose a db standard. Its disadvantages were limited data independence and lack of ad hoc query capability. Record based model class. To help establish db standards, the Conference on Data Systems Languages (CODASYL) created the Database Task Group (DBTG) in the late 1960s. The DBTG report contained specifications for 3 crucial db components.  Schema: It includes a definition of the db name, the record type for each record, and the components that make up those records.  Subschema: The existence of subschema definitions allows all app programs to simply invoke the subschema required to access the appropriate db files.  Data management language (DML): defines the environment in which data can be managed. To produce the desired standardization for each of the three components, the DBTG specified three distinct DMB components: o Schema Data definition language (DDL) enables the db admin to define the schema components. o Subschema DDL allows the app program to define the db components that will be used by the app. o Data manipulation language to work with a data in the db. Network model allows a record to have more than one parent unlike the hierarchical model. In network db terminology a relationship is called a Set and each Set is composed of at least 2 record types. (Section 2.5.3) The relational model Relational model: introduced in 1970 by E.F. Codd of IBM. You can think Relations (or Table) as a matrix composed of intersecting rows and columns. Each row in a relation is called a Tuple. They are record based models.
  • 9. Relational db management system (RDBMS): It performs the same basic functions provided by the hierarchical and network DBMS systems, but other functions that make the relational data model easier to understand and implement. Its disadvantages were that it did not examine structures graphically.  Most important advantage is its ability to hide the complexities of the relational model from the user. It manages all the physical details as the user sees the relational db as a collection of tables in which data are stored.  RDBMS uses SWL to translate user queries into instructions for retrieving the requested data. There is one crucial difference between a table and a file: the table yields complete data and structural independence because it is a purely logical structure. Any SQL based relational db app involves 3 parts:  End-user interface: Allows the end user to interact with the data.  Tables: Tables are independent of each other which hold the data.  SQL engine: that tells what must be done but not how it must be done. It does all the work in the back-ground, such as executing queries or data requests. (Section 2.5.4) The entity relationship model ERM: Peter Chen introduced ERM in 1976. It‘s the graphical representation of entities and their relationships in a db structure. ERM is represented in ERD. The ER model is based on the following components: Object based models.  Entity: each row in the relational table is known as an entity instance or entity occurrence in the ER model. Each entity is described by a set of attributes that describes particular characteristics of the entity.  Relationships: they describe associations among data. Most relationships describe associations between two entities. ER model uses the term connectivity to label the relationship types. The name of the relationship is an active or a passive verb. (Section 2.5.5) The Object Oriented (OO) Model OODM: both data and their relationships are contained in a single structure known as an object. In turn, the OODM is the basis for OODBMS. Unlike an entity, an object includes info about relationships between the facts within the object, as well as info about its relationship with other objects. OODM is said to be Semantic Data Model because semantic indicates meaning. The OODM is based on the following components.  An object is an abstraction of a real world entity.  Attributes describe the properties of an object.  Objects that share similar characteristics are grouped in classes. A class is a collection of similar objects with shared structure (attributes) and behavior (Methods). Methods represent a real world action such as finding a selected PERSON‘s name, changing a PERSON‘s name, or printing a PERSON‘s address. Methods are like procedures. In OO methods are defined as behaviors.  Classes are organized in a class hierarchy which represents an upside-down tree where each class only has one parent.  Inheritance is the ability of an object within the class hierarchy to inherit the attributes and methods of the classes above it  Disadvantage is steeper learning curve. (Section 2.5.6) The convergence of data models Extended relational data model (ERDM): It‘s semantic and it‘s described as object/relational db management system (O/RDBMS). It‘s primarily geared to business apps, while the OODM tends to focus on very specialized engineering and scientific apps. The traditional entity-relationship model and the most important features of object-oriented models have been combined in the extended (or enhanced) entity-relationship model (EERM). (Section 2.5.7) Database models and the internet
  • 10. (Section 2.5.8) Data Models: Summary Advantages and disadvantages of db modes depicted on page 47. Data model basic terminology comparison on page 48. (Section 2.6) Degrees of data abstraction A db designer starts with an abstract view of the overall data environment and adds details as the design comes closer to implementation. The design of a DB can be divided into 4 four models with decreasing level of abstraction: Conceptual, Internal, External, & Physical. American National Standards Institute (ANSI): Standards Planning and Requirements Committee (SPARC) defined a framework for data modeling based on degrees of data abstraction. They define three levels of data abstraction: External, Conceptual, and internal. (Section 2.6.1) External model External Model: is the end users view of the data environment. External Schema: It‘s a specific representation of an external view. External Views advantages:  It makes it easy to id specific data required to support each business unit‘s ops.  It makes the designer‘s job easy by providing feedback about the model‘s adequacy.  It helps to ensure security constraints in the db design.  It makes app program development much simpler. (Section 2.6.2) The conceptual Model Conceptual Model: represents a global view of the entire db as viewed by the entire org. That is, the conceptual model integrates all external views (entities, relationships, constraints, and processes) into a single global view of the entire data in the enterprise. Also know as Conceptual Schema as it is the basis for the id and high level description of the main data objects. The most widely used conceptual model is the ER model, which is the basic db blueprint. ERD is used to graphically represent the conceptual schema. Advantages of Conceptual Models:  It provides a relatively easily understood bird‘s-eye view of the data environment  Is independent of both software and hardware. Software independence: model does not depend on the DBMS software used to implement the model. Hardware independence: model does not depend on the hardware used in the implementation of the model. Logical design: used to refer to the task of creating a conceptual data model that could be implemented in any DBMS. (Section 2.6.3) The internal model Internal model: is the representation of the db as ―seen‖ by the DBMS. It requires the designer to match the conceptual models characteristics and constraints to those of the selected implementation model. Internal model is software-dependent but hardware-independent because it is unaffected by the choice of the computer on which the software is installed. Internal schema: depicts a specific representation of an internal model, using the db constructs supported by the chosen db. Logical independence: When you can change the internal model without affecting the conceptual model. (Section 2.6.4) The physical model Physical model: operates at the lowest level of abstraction, describing the way data are saved on storage media. It is software and hardware dependent. Physical model is dependent on the DBMS.
  • 11. Physical independence: when you can change the physical model without affecting the internal model. Summary on page 52. --------------------------------------------------------------------------------------------------------------------------------- Chapter 3: Relational DB Model (Section 3.1) A Logical View o Data The relational data model allows designer to focus on the logical representation of the data and its relationships, rather than on the physical storage details. Like an automatic transmission. The relational db provides the advantages of structural and data independence. The relational model was introduced by Ted Codd of IBM Research in 1970. Record based. (Section 3.1.1) Tables and their characteristics 1) Table is perceived as a 2 dimensional structure of rows and columns, 2) Each table row represents a single entity occurrence within the entity set, 3) Each table column represents an attribute, and each column has a distinct name, 4) Each row/column intersection represents a single data value, 5) All values in a column must conform to the same data formant, 6) Each column has a specific range of values known as the attribute domain, 7) The order of the rows and columns is immaterial to the DBMS, 8) Each table must have an attribute or a combination of attributes that uniquely id each row. Most DBMSs support the following data types. Numeric: Anything concerned with arithmetic‘s. Characters: Text data or string data. Date: Date attributes contain calendar dates stored in a special format known as the Julian data format. It allows you to do a special kind of arithmetic known as Julian date arithmetic. Logical: Logical data can have only a true or false (yes or no) condition. Domain: Is the column‘s range of permissible values. Primary Key: Each table must have one, PK is an attribute or (combination of attributes) that uniquely ids any given row. It‘s a unique identifier. No duplicate values are allowed. PK generally cannot be changed. (Section 3.2) Keys Keys are important in a relational model because they are used to ensure that each row in a table is uniquely identifiable. They are used to establish relationships among tables and to ensure the integrity of the data. A Key is consists of one or more attributes that determine other attributes. Keys are used to id specific occurrences of entities within an entity group. Determination: A keys role is based upon this. For example if A determines B, C, and D, than A  B,C,D, for example STU_NUM determines STU_LNAME. This principle is important because it is used in the definition of a central relation db concept known as functional dependence. Functional Dependence: The attribute B is functionally dependent on the attribute A if each value in column A determines one and only one value in column B. Composite Key: It‘s a multi-attribute key which can be composed of more than one attribute. Key Attribute: Any attribute that is part of a key. Full Functional Dependence: If attribute (B) is functionally dependent on a composite key (A) but not on any subset of that composite key, the attribute (B) is fully functionally dependent on (A). Superkey: Any key that uniquely id each row. It functionally determines all of a row‘s attributes. Candidate key: can be described as a superkey without unnecessary attributes, that is, a minimal superkey. If there was a STU_SSN and STU_NUM =, both would be candidate keys because either one would uniquely id each student. Entity Integrity: It‘s when all the rows in a table can be uniquely id by a primary key. To maintain entity integrity, a null (that is, no data entry at all) is not permitted in the primary key. Null: no value at all. It does not mean 0 or a space. A null is created when you press the Enter Key or the Tab key to move to the next entry without making a prior entry of any kind. The can never be part of a
  • 12. primary key and they should be avoided in other attributes. The existence of nulls in a table is often an indication of poor db design. A null can represent an unknown attribute value, a known, but missing attribute value, or a ―not applicable‖ condition. Relational Schema: Textual representation of the database tables where each table is listed by its name followed by the list of its attributes in parentheses. Foreign Key: Attribute or combination of attributes in one table whose values match the primary key values in the related table or be null. Example, VEND_CODE is the primary key in VENDOR table and it occurs as a FK in the PRODUCT table. You can logically relate data from multiple tables using FK. FK are based on data values and are purely logical, not physical, pointers. A FK value must match an existing PK value or unique key value, or else be NULL. Referential Integrity: If the FK contains a value, that value refers to an existing valid tuple or row in another relation. It‘s maintained between the PRODUCT and VENDOR tables. To maintain Referential Integrity, the FK must contain values only found in the other table, or null values to indicate that the rows are not linked. Secondary Key: Is used strictly for data retrieval purposes. It‘s not the Customer number but it can be a combination of attributes such as customer phone and last name. Buts it‘s not always entirely unique. It‘s another way of narrowing down a search when you don‘t know the unique customer number. (Section 3.3) Integrity Rules Entity Integrity:  Requirement: All primary key entries are unique, and no part of a primary key may be null.  Purpose: each row will have a unique id, and foreign key values can properly reference primary key values.  Example: No invoice can have a duplicate number, nor can it be null. In short, all invoices are uniquely id by their invoice number. Referential Integrity:  Requirement: A foreign key may have either a null entry, as long as it is not a part of its table‘s primary key, or an entry that matches the primary key value in a table to which it is related. Every non-null foreign key value must reference an existing primary key value.  Purpose: it is possible for an attribute not to have a corresponding value, but it will be impossible to have an invalid entry. The enforcement of the referential integrity rule makes it impossible to delete a row in one table whose primary key has mandatory matching foreign key values in another table.  Example: A customer might not have an assigned sales rep (number), but it will be impossible to have an invalid sales rep (number). Flags are used to indicate the absence of some values. It‘s a trick to avoid using nulls. (Section 3.4) Relational set operators Relational Algebra: Defines the theoretical way of manipulating table contents using the eight relational operators: Select, Project, Join, Intersect, Union, Difference, Product, and Divide. Closure: The use of relational algebra operators on existing tables (relations) produces new relations. UNION: Combines all rows from two tables, excluding duplicate rows. Tables must have the same attribute characteristics (columns and domains must be id) to be used in the UNION. When two or more tables share the same number of columns, when the columns have the same names, and when they share the same (or compatible) domains, they are said to be Union-Compatible. INTERSECT: Yields only the rows that appear in both tables. Cannot intersect if one of the attributes is numeric and one is character based. The tables must be union compatible. DIFFERENCE: Yields all rows in one table that are not found in the other table. It subtracts one table from the other. The tables must be union compatible. PRODUCT: Yields all possible pairs of rows from two tables – also known as the Cartesian product. If one table has 6 rows and the other table has 3 rows, the PRODUCT yields a list composed of 6x3=18. SELECT: AKA RESTRICT, yields values for all rows found in a table that satisfy a given condition. It‘s used to list all the row values, or it can yield only those row values that match a specified criterion. PROJECT: Yields all values for selected attributes. It yields a vertical subset of a table.
  • 13. JOIN: It allows info to be combined from 2 or more tables. It‘s the real power behind the relational db, allowing the use of independent tables linked by common attributes. Natural Join: Links tables by selecting only the rows with common values in their common attributes. It‘s the result of the three staged process. Join Columns: or common columns. Equijoin: another form of join that links tables on the basis of an equality condition that compares specified columns of each table. Equijoin takes its name from the equality comparison operator (=) used in the condition. If any other comparison operator is used, the join is called a Theta Join, They are less common than Equijoin and they represent inequalities. Outer Join: The matched pairs would be retained and any unmatched values in the other table would be left null. If an outer join is produced, 2 scenarios are possible:  Left Outer Join: Yields all of the rows in the CUSTOMER table, including those that do not have a matching value in the AGENT table.  Right Outer Join: Yields all of the rows in the AGENT table, including those that do not have matching values in the CUSTOMER table. DIVIDE: uses on single-column table as the divisor and one 2-column table as the dividend. The tables must have a common column. (Section 3.5) The data dictionary and the system catalog Data Dictionary: Provides a detailed description of all tables found within the user/designer-created db. It contains all attribute names and characteristics for each table. It contains metadata. Sometimes referred to as ―the db designer‘s db‖. System Catalog: It contains metadata. It‘s a detailed system data dictionary that describes all objects within the db. It contains more info than the data dictionary. Created by the dbms. Homonyms: Similar or identically sounding words with different meanings, Such as boar and bore. It indicates the use of the same attribute name to label different attributes. This should be avoided though. Synonym: Is the use of different names to describe the same attribute. For example a car and auto refer to the same thing. This should also be avoided in db design. (Section 3.6) Relationships within the Relational db 1:M is the relational modeling ideal 1:1 should be rare in any relational db design. M:N cannot be implemented as such in the relational model. (Section 3.6.1) 1:M Relationship Page 80 review (Section 3.6.2) 1:1 Relationship Page 82 review 1:1 sometimes means that the entity components where not defined properly. It could indicate that two entities belong in the same table. They should be rare but certain conditions require their use. There is a great example of 1:1 relationship regarding employees in page 84. (Section 3.6.3) M:N Relationship Page 84 review M:N is no supported directly in the relational environment. However, M:N relationships can be implemented by creating a new entity in 1:M relationships with the original entities. The problems inherent in M:N relationships can be avoided by creating Composite entity (or bridge entity or associative entity) Linking Table is the implementation of a composite entity. (Section 3.7) Data Redundancy Revisited Relational dbs makes it possible to control data redundancies by using common attributes that are shared by tables, called foreign keys. Although the use of FK does not totally eliminate data redundancies because the FK values can be repeated many times, the proper use of FK minimizes data redundancies, thus minimizing the chance that destructive data anomalies will develop.
  • 14. Db designers must reconcile three often contradictory requirements: Design elegance, Processing speed, and Info requirements. As important as data redundancy control is, there are times when the level of data redundancy must actually be increased to make the db serve crucial info purposes. Illustrated on page 89. (Section 3.8) Indexes Index: An orderly arrangement used to logically access rows in a table. It is composed of an index key and a set of pointers. An index is an ordered arrangement of keys and pointer, each key points to the location of the data identified by the key. Indexes play an important role in DBMSs for the implementation of primary keys. When you define a table‘s primary key, the DBMS automatically creates a unique index on the primary key column you declared. A table can have many indexes, but each index is associated with only one table. If a table is dropped so will be the index that was created for it. Index Key: Is the index‘s reference point. It can also be composed of one or more attributes. Unique Index: is an index in which the index key can have only one pointer value (row) associated with it. They are used to enforce uniqueness constraints. (Section 3.9) CODD’S Relational db rules In 1985 DR. E.F. Codd published a list of 12 rules to define a relational db system. The 12 rules are located on page 92. Not all db vendors fully support all 12 rules. --------------------------------------------------------------------------------------------------------------------------------- Chapter 4: Entity Relationship (ER) Modeling (Section 4.1) Entity Relationship Model (ERM) Conceptual models are used in the conceptual design of dbs, while relational models are used in the logical design of dbs. ERM is a conceptual model. OO. ERD should be developed prior to building a db & it depicts the dbs main components such as entities, attributes, and relationships, and relationship types. There are various notations used with ERD such as Chens notation, Crows foot, UML notations.  Chens notation favors conceptual modeling.  The Crows foot notation favors a more implementation-orientated approach.  The UML notation can be used for both conceptual and implementation modeling. What role does the ER diagram play in the design process? The ER diagram must reflect an organization's operations accurately if the database is to meet that organization's data requirements. The completed ER diagram forms the basis for design review processes that verify whether the included entities are appropriate and sufficient, whether the attributes found within those entities are needed and correct, and whether the relationships between those entities are needed and correctly represented. The ER diagram is also used as a final crosscheck against the proposed data dictionary entries. The ER diagram helps the database designer communicate more precisely with those who most completely understand the business data requirements. Finally, the completed ER diagram serves as the implementation guide to those who create the actual database. Many ERD software tools can generate the SQL statements to produce the tables represented in the ERD. In short, the ER diagrams are as important to the database designer as blueprints are to architects and builders. Why is data modeling so important to the database designer? We are said to live in the information age, and data constitute the most basic information units employed by an information system. Data modeling provides a way to reconcile the very different end-user views of the nature and roles of data. A data model is an abstraction that provides an easily understood representation of complex real-world data structures. A data model helps us understand, communicate and document the complexities of a real-world
  • 15. data environment. Such understanding yields useful solutions to the problems inherent in creating, organizing, using, and managing data. If a database is to be useful and flexible, it must be well designed. The database design process must be based on an appropriate data model if it is to yield a proper database design blueprint. (Section 4.1.1) Entities The world entity in the ERM corresponds to a table, not to a row, in the relational environment. The ERM refers to a table row as an entity instance or entity occurrence. Entity name is a noun with the shape of a rectangle written in cap letters. It‘s a person, place, tings, or shared ideas about which data are collected and stored. They are the basic building blocks of a relational db. Entity Sets: Entities that are often grouped according to common attributes. They are stored in tables. (Section 4.1.2) Attributes Attributes are characteristics that describe entities. Example STUDENT entity includes STU_LNAME STU_FNAME and so on. They are oval shaped in Chen model. They are columns that represent a characteristic of the entity. Required and Optional Attributes: Required Attributes: is an attribute that must have a value. I cannot be left empty. Optional Attribute: is an attribute that does not require a value, therefore it can be left empty. Domains: Attributes have a domain. A domain is the set of possible values for a given attribute. The domain for the GPA attribute is written (0,4) because the lowest possible GPA value is 0 and the highest is 4. Domain for male and female is M or F. Attributes may share a domain. Identifiers: One or more attributes that uniquely id each entity instance. They are underlined in ERD and are mapped to PK. Composite Identifiers: A PK composed of more than one attribute. Composite and simple attributes: Attributes are classified as simple or composite.  Composite attribute: An attribute that can be further sub-divided to yield additional attributes. Example ADDRESS can be sub-divided into Street, City, & Zip.  Simple Attribute: It cannot be sub-divided. Example Age, Sex, and Marital Status. Single Valued Attributes: Is an attribute that can have only a single value. Example a person can only have one SSN. It‘s not always a simple attribute. Multivalued Attributes: They can have many values. Example is a person can have several college degrees. They are a double line in the Chen notation. It‘s not identified in crows foot notation. Although the conceptual model can handle M:N relationships and multivalued attributes, you should not implement them in the RDBMS. Implementing Multivalued Attributes: They should not be implemented in RDBMS. There is one or two possible course of action with multivalued attributes in a relational table.  Splitting the multivalued attribute into new attributes  Create a new entity composed of the original multivalued attribute components. Its preferred. Derived Attributes (Computed Attributes): Attribute whose value is calculated (derived) from other attributes. It‘s a dashed line in Chen notation. It‘s derived by using algorithms. Example INT ((DATE() – EMP_DOB)/365) to calculate the age of the person. Advantages & disadvantages of storing derived attributes: Adv Stored: Saves CPU processing cycles, Saves Data access time, Data value is readily available, Can be used to keep track of historical data. Adv-Not Stored: Saves storage space and computation always yields current value. Dis-adv Stored: Requires constant maintenance to ensure derived value is current, especially if any values used in the calculation change. Dis-Adv Not Stored: Uses CPU processing cycles, increases data access time, and Adds coding complexity to queries.
  • 16. (Section 4.1.3) Relationships A relationship is an association between entities. The entities that participate in a relationship are also known as participants. Relationships name is an active or passive verb; for example; a STUDENT takes a CLASS. They operate in both directions. (Section 4.1.4) Connectivity and Cardinality Connectivity: It‘s used to describe the relationship classification. Cardinality: It expresses the min and max number of entity occurrences associated with one occurrence of the related entity. ERD depicts it as (1,4) 1 being MIN and 4 being MAX. Connectivities and cardinalities are generally based on business rules and must consider the data environment, transactions, and information requirements. (Section 4.1.5) Existence Dependence Existence-Dependent: it can exist only when it is associated with another related entity occurrence. Example EMPLOYEE claims DEPENDENT. The entity DEPENDENT is clearly existence-dependent on the EMPLOYEE entity because it is impossible for the dependent o exist apart form the EMPLOYEE in the db. Existence-Independent (strong or regular): If an entity can exist apart from one or more related entities. (Section 4.1.6) Relationship Strength Entities that are existence-independent on another entity are said to have weak or non-identifying relationships. The concept of relationship strength is based on how the PK of a related entity is defined. Weak or (Non-identifying) Relationships: Exists if the PK of the related entity does not contain a PK component of the parent entity. COURSE (CRS_CODE, DEPT_CODE) CLASS (CLASS_CODE, CRS_CODE) Dashed line ------Crows foot Strong or identifying relationships exist when the entities are existence-dependent. Strong or (Identifying) Relationships: Exists when the PK of the related entity contains a PK component of the parent entity. COURSE (CRS_CODE, DEPT_CODE) CLASS (CRS_CODE, CLASS_SECTION) Solid line _____ Crows foot (Section 4.1.7) Weak Entities Weak Entity is one that meets two conditions: The entity is existence-dependent or the entity has a primary key that is partially or totally derived from the parent entity in the relationship. EMPLOYEE has DEPENDENT, DEPENDENT is weak because it cannot exist without EMPLOYEE. Its shape is a double rectangle. Weak entity inherits part of its PK from its strong counterpart. EMPLOYEE (EMP_NUM DEPENDENT (EMP_NUM, DEP_NUM (Section 4.1.8) Relationship Participation It‘s either optional or mandatory. Optionally Participation means that on entity occurrence does not require a corresponding entity occurrence in a particular relationship. Each entity is implemented as a table. In the COURSE generates CLASS at least some courses do not generate a class. In other words, an entity occurrence (row) in the COURSE table does not necessarily require the existence of a corresponding entity occurrence in the CLASS table. Therefore the CLASS entity is considered to be optional to the COURSE entity. It‘s the O shape on the line of the Crows foot diagram. Mandatory Participation: Means that one entity occurrence requires a corresponding entity occurrence in a particular relationship. It indicates that the min cardinality is 1 for the mandatory entity. Mandatory on the ―1‖ side and optional on the ―Many‖ side. Crows Foot Symbols on page 120. (Section 4.1.9) Relationship Degree Relationship Degree indicates the number of entities or participants associated with a relationship. Unary Relationship exists when an association is maintained within a single entity.  An employee within the EMPLOYEE entity is the manager for one or more employees within that entity. In this case the existence of the ―managers‖ relationship means that EMPLOYEE requires
  • 17. another EMPLOYEE to be the manager that is, EMPLOYEE has a relationship with it self. Such relationship is known as the Recursive Relationship. Binary Relationship exists when two entities are associated in a relationship. Most common. Ternary relationship exists when three entities are associated. (Section 4.1.10) Recursive Relationships Recursive Relationship is one in which a relationship can exist between occurrences of the same entity set. Naturally it‘s found in the Unary Relationship. (Section 4.1.11) Associative (Composite or Bridge) Entities Relational models generally requires the use of 1:M relationships. (Also, recall that the 1:1 relationship has its place, but it should be used with caution and proper justification.) If M:N relationships are encountered, you must create a bridge between the entities that display such relationships. The Associative Entities are used to implement a M:M relationship between two or more entities. (Section 4.2) Developing an ER diagram Iterative process: Repetition of processes and procedures. The business rules define the ERD components. Building an ERD involves the following:  Create a detailed narrative of the organization‘s description of operations.  Id the business rules based on the description of the operations.  Id the main entities and relations from the business rules.  Develop the initial ERD  Id the attributes and PK that adequately describe the entities.  Revise and review the ERD. (Section 4.3) DB design challenges: Conflicting goals DB designer often must make design compromises that are triggered by conflicting goals, such as adherence to design standards or elegance, processing speed, and info requirements.  Design Standards: To guide you in developing logical structures that minimize data redundancies, avoiding nulls to the greatest extent, and allows you to work with well defined components and to evaluate the interaction of those components with some precision.  Processing Speed: Its top priority for large numbers of transactions. Means minimal access time, which may be achieved by minimizing the number and complexity of logically desirable relationships. A perfect design may use 1:1 relationships to avoid nulls, while a higher transaction seeped design might combine the two tables to avoid the use of an additional relationship, using dummy entries to avoid nulls.  Information Requirements: Complex info requirements may dictate data transformations, and they may expand the number of entities and attributes within the design. Therefore, the db may have to sacrifice some of its ―clean‖ design structures and or some of its high transaction speed to ensure max info generation. Business rules are an important element of database design in every organization. Business rules drive all business processes, and an organization's business rules must be correctly implemented by the organization's IT systems, including the databases. Business rules are precise statements, derived from a detailed description of the organization's operations. When written properly, business rules define one or more of the following modeling components:  entities  relationships  attributes  connectivities  cardinalities  constraints
  • 18. Because the business rules form the basis of the data-modeling process, precisely phrasing them is crucial to the success of the database design. Because the business rules are derived from a precise description of operations, much of the design's success depends on the accuracy of the description of operations. Examples of business rules are:  An invoice contains one or more invoice lines.  Each invoice line is associated with a single invoice.  A store employs many employees.  Each employee is employed by only one store.  A college has many departments.  Each department belongs to a single college. (This business rule reflects a university that has multiple colleges such as Business, Liberal Arts, Education, Engineering, etc.)  A driver may be assigned to drive many different vehicles.  Each vehicle can be driven by many drivers. (Note: Keep in mind that this business rule reflects the assignment of drivers during some period of time.)  A client may sign many contracts.  A sales representative may write many sales contracts.  Each sales contract is written by one sales representative.  Each sale involves a sales representative, a customer, and one or more products. Note that each relationship definition requires the definition of two business rules. For example, the relationship between the INVOICE and (invoice) LINE entities is defined by the first two business rules in the bulleted list. This two-way requirement exists because there is always a two-way relationship between any two related entities. (This two-way relationship description also reflects the implementation by many of the available CASE tools.) The last business rule above describes a three-way sale relationship between sales representatives, customers, and products. --------------------------------------------------------------------------------------------------------------------------------- Chapter 5: Normalization of DB tables (Section 5.1) DB tables and Normalization Normalization: is the answer to recognize a poor table structure and how to produce a good table. It‘s a process of evaluating and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies. It involves assigning attributes to tables based on the concept of determination. It is a sequence of tests that are applied to candidate entities and their attributes. Works through a series of stages called Normal Forms. They are First Normal (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). 3NF is the highest. 3NF is not always the way to go because it can effect fast performance. With excessive normalization it can result in less easily understood entities and slow processing speed. Denormalization produces a lower normal form; that is, a 3NF will be converted to a 2NF through denormalization. However the price you pay for increased performance through denormalization is greater data redundancy. (Section 5.2) The Need for Normalization It‘s to decrease anomalies and eliminate data redundancies. It‘s critical to a successful db design. The goal of normalizations is to create a table such that all non-key attributes are dependent on the PK and nothing but the PK. (Section 5.3) The Normalization Process The objective of Normalization is to ensure that each table conforms to the concept of well formed relations, that is, tables that have the following characteristics.  Each table represents a single subject. For example a student tables will contain only student data.
  • 19.  No data item will be unnecessarily stored in more than one table. This is to ensure data is updated in only one place.  All nonprime attributes in a table are depended on the PK. This is to ensure that the data are uniquely identifiable by a primary key value.  Each table is void of insertion, update, or deletion anomalies. This is to ensure the integrity and consistency of the data.  First Normal Form (1NF): Table format, all key attributes defined with no repeating groups, and PK identified. All remaining attributes are dependent on the PK. It still may contain partial dependencies. Dependencies based on only part of the PK. All repeating groups must be removed means that each row in a table must define only a single entity. To do this, the appropriate entry must be added to the PK column.  Second Normal Form (2NF): It‘s in 1NF and includes no partial dependencies. It may contain Transitive dependencies based on attributes that are not part of the PK. The table can then be put into a 2NF by ensuring no attribute is dependent on only part of the primary key. If this partial dependency exists, a new table can be created with a primary key equal to the required portion of the original key. The dependent attributes are moved to this table. If the 2NF table has any transitive dependencies, the dependencies can be eliminated by breaking them off and storing them in a separate table.  Third Normal Form (3NF): It‘s in 2NF and includes no Transitive dependencies.  Boyce-Codd Normal Form (BCNF): Every determinant is a candidate key (special case of 3NF). If a 3NF table has only a single candidate key, it‘s automatically in BCNF. It can be violated only if the table contains more than one candidate key.  Fourth Normal Form (4NF): It‘s in 3NF or BCNF and no independent multi-valued dependencies. Splitting the table to remove all multi-valued dependencies.  (5NF) & (DKNF) are not likely to be encountered in a business environment and are mainly of theoretical interest.  The normalizations process works one relation at a time. It starts by identifying the dependencies of a given relation and progressively breaking up the relation into a set of new relations based on the identified dependencies.  Update Anomaly: When you modify duplicate data in the system. It runs the risk of the data to being properly modified.  Insert Anomalies: You can‘t insert data due to missing info (especially a missing key!)  Delete Anomaly: You can‘t delete data without deleting other essential data (Data that you don‘t want to delete) Functional Dependency: Before outlining normalization process, it is good to review the concepts of determination and functional dependency. Check table 5.3. (Section 5.3) Conversion to First Normal Form (1NF) Repeating groups: Derives its name from the fact that a group of multiple entries of the same type can exist for any single key attribute occurrence. The normalization process starts with a simple three-step procedure.  Step 1: Eliminate the Repeating Groups: Eliminate the nulls by making sure that each repeating group attribute contains an appropriate data value. That change converts the table to 1NF.  Step 2: Identify the PK:  Step3: Identify all Dependencies: Dependency Diagram: Helpful in getting a bird‘s-eye view of all of the relationships among a table‘s attributes, and their use makes it less likely that you will overlook an important dependency. Partial Dependency: A dependency based on only a part of a composite primary key that determines other attributes. They are used for performance reasons; they should be used with caution. A table that contains partial dependencies is still subject to data redundancies and to various anomalies. Transitive Dependency: A dependency of one nonprime attribute on another nonprime attribute. The problem is that they still yield data anomalies. It exists because a nonkey attribute determines the values of another nonkey attribute.
  • 20. (Section 5.3.2) Conversion to Second Normal Form (2NF) Converting to 2NF is done only when the 1NF has a composite PK. If the 1NF has a single attribute PK, then the table is automatically in 2NF. Prime or Key Attribute: Any attribute that is at least part of a key. Nonprime or Nonkey attribute: is not part of any key. (Section 5.3.3) Conversion to Third Normal Form (3NF)  Step1: Identify each new Determinant: A Determinant is any attribute whose value determines other values within a row. IF there are three different transitive dependencies, you will have three different determinants.  Step2: Identify the Dependent Attributes:  Step3: Remove the Dependent Attributes from transitive dependencies. (Section 5.4) Improving the design Evaluate PK Assignments: Surrogate Key: An artificial PK introduced by the designer with the purpose of simplifying the assignment of PK to tables. They are usually numeric, and often automatically generated by the DBMS. Evaluate Naming Conventions: It is better to change the attribute name to reflect the table name. Table = JOB, and attribute should be from CHG_HOUR to JOB_CHG_HOUR. Refine Attribute Atomicity: Atomic Attribute is one that cannot be further subdivided or decomposed. By improving the degree of atomicity, you also gain querying flexibility. Identify new Attributes: Identify New Relationships: Refine PK as required for Data Granularity: Granularity refers to the level of detail represented by the values stored in a table‘s row. Data stored at their lowest level of granularity are said to be atomic data. Maintain Historical Accuracy: Evaluate Using derived attributes: The availability of the derived attribute will save reporting time. (Section 5.5) Surrogate key considerations At the implementation level a surrogate key is a system defined attribute generally created and managed via the DBMS. (Section 5.6) Higher Level Normal Forms Tables in 3NF will perform suitably in business transactional db. (Section 5.6.1) The Boyce-Codd Normal Form (BCNF) This is a special case of the 3NF. A table is in BCNF when every determinant in the table is candidate key. A BCNF can be violated only when the table contains more than one candidate key. When a nonkey attribute is the determinant of a key attribute, the condition does not violate 3NF, yet it fails to meet the BCNF requirements because BCNF requires that every determinant in the table be a candidate key. (Section 5.6.2) Fourth Normal Form (4NF)  All attributes must be dependent on the PK, but they must be independent of each other.  No row may contain two or more multivalued facts about an entity. A table is in $NF when it is in 3NF and has no multiple sets of multivalued dependencies. (Section 5.7) Normalizations and DB design ER modeling and Normalization are difficult to separate and the two are used in an iterative and incremental process. ER diagram looks at the "big picture" and normalization provides a "micro" view of individual entities.
  • 21. Normalization takes place in tandem with data modeling. The proper procedure is to follow these steps: 1) Create a description of operations at an appropriate level of detail. 2) Derive appropriate business rules from the description of operations. 3) Model the data with the help of a tool such as Visio's Crow's Foot option to produce an initial ERD. This ERD is the initial database blueprint. 4) Use the normalization procedures to identify and remove data redundancies. This process may produce additional entities. 5) Revise the ERD created in step 3. 6) Use the normalization procedures to audit the revised ERD. If significant additional data redundancies are discovered, repeat steps 4 and 5. (Section 5.8) Denormalization It is important to remember that the optimal relational db implementation requires that all tables be at least in 3NF. The problem with normalization is that as tables are decomposed to conform to normalization requirements, the number of db tables expands. Therefore, in order to generate info, data must be put together from various tables. Joining a large number of tables takes additional input/output I/O operations and processing logic, thereby reducing system speed. Rare and occasional circumstances may allow some degree of denormalization so processing speed can be increased. The problem with denormalized relations and redundant data is that the data integrity could be compromised due to the possibility of data anomalies (insert, update, and deletion anomalies). Unnormalized tables in a productions db tend to suffer from the following defects.  Data updates are less efficient because programs that read and update tables must deal with larger tables.  Indexing is more cumbersome. It simply is not practical to build all of the indexes required for the many attributes that might be located in a single unnormalized table.  Unnormalized tables yield no simple strategies for creating virtual tables known as views. --------------------------------------------------------------------------------------------------------------------------------- Chapter 6: Advanced Data Modeling (Section 6.1) The EERM EERM: sometimes referred to as the Enhanced ERM is the result of adding more semantic constructs (entity supertypes, entity subtypes, and entity clustering) to the original ERM. Entity-relationship modeling is missing the ability to represent relationships based on specialization and generalization. For example, you can't directly represent that students and faculty are people in an ERD. This shortcoming is addressed in Extended Entity-Relationship Modeling, which includes specialization-generalization relationships. Abstractions, Entities & Classes: Abstraction means identifying the common characteristics of things and using those common characteristics to classify or organize things. We used abstraction when we identified entities and produced Entity-Relationship models. (Section 6.1.1) Entity supbertypes and subtypes Employee = Supertype Pilot = Subtype because not all employees have the attributes of pilots. This is to prevent nulls. Entity Supertype is a generic entity type that is related to on e or more entity subtypes. Entity Subtypes is where the entity supertype contains the common characteristics, and the entity subtypes contain the unique characteristics of each entity subtype. (Section 6.1.2) Specialization Hierarchy Specialization Hierarchy: A hierarchy that is based on the top-down process of identifying lower-level, more specific entity subtypes from a higher-level entity supertype. Specialization is based on grouping
  • 22. unique characteristics and relationships of the subtypes. The relationships depicted within the specialization hierarchy are sometimes described in terms of ―is-a‖ relationships. Example ―a pilot is an employee‖. Within a specialization hierarchy, a subtype can exist only within the context of a supertype, and every subtype can have only one supertype to which t is directly related. A specialization hierarchy provides the means to:  Support attribute inheritance.  Define a special supertype attribute known as the subtype discriminator.  Define disjoint/overlapping constraints and complete/partial constraints. In specialization Hierarchies with multiple levels of supertype/subtypes, a lower-level subtype inherits all of the attributes and relationships from all of its upper-level supertypes. (Section 6.1.3) Inheritance Inheritance: enables an entity subtype to inherit the attributes and relationships of the supertype. One important Inheritance characteristic is that all entity subtypes inherit their primary key attribute from their supertype. (Section 6.1.4) Subtype Discriminator Subtype Discriminator is the attribute in the supertype entity that determines to which subtype the supertype occurrence is related. The EMP_TYPE attribute is the subtype discriminator because it‘s the attribute in the supertype that determines to which subtype the supertype occurrence is related. Note that the default comparison condition for the subtype discriminator attribute is the equality comparison. However there are situations in which the subtype discriminator is not necessarily based on an equality comparison. (Section 6.1.5) Disjoint and overlapping constraints Disjoint Subtypes or Non-overlapping subtypes are subdues that contain a unique subset of the supertype entity set; in other words, each entity instance of the supertype can appear in only one of the subtypes. Overlapping Subtypes are subtypes that contain nonunique subsets of the supertype entity set; that is, each entity instance for the supertype may appear in more than one subtype. For example a person can be an employee, student, or both. In turn an employee may be a professor as well as an administrator. Because an employee also may be a student, student and employee are overlapping subtuypes of the supertype person, just as professor and admin are overlapping subtypes of the supertype employee. (Section 6.1.6) Completeness Constraint Completeness Constraint specifies whether each entity supertype occurrence must also be a member of at least one subtype. It can be partial or total. Partial Completeness: (symbolized by a circle over a single line) means that not every supertype occurrence is a member of a subtype; that is; there may be some supertype occurrences that are not members of any subtype. Total Completeness: (symbolized by a circle over a double line) means that every supertype occurrence must be a member of at least one subtype. (Section 6.1.2) Specializations and Generalization: Specialization: is the top-down process of identifying lower-level, more specific entity subtypes from a higher-level entity supertype. Specialization is based on grouping unique characteristics and relationships of the subtypes. For example we used specialization to id multiple entity subtypes from the original employee supertype. Generalization: Is the bottom-up process of id a higher-level, more generic entity supertype from lower- level entity subtypes. Generalization is based on grouping common characteristics and relationships of the subtypes. For example, you might id multiple types of musical instruments: Piano, violin, and guitar. (Section 6.2) Entity Clustering Entity Cluster: is a ―virtual‖ entity type used to represent multiple entities and relationships in the ERD. Entity clustering is a technique used to hide potentially confusing detail in an ERD. An entity cluster is formed by combining multiple interrelated entities into a single abstract entity object. It‘s considered virtual or abstract in a sense that it is not actually an entity in the final ERD. When using entity clusters, the key attributes of the combined entities are no longer available. Avoid the display of attributes when entity
  • 23. clusters are used to prevent problems such as changes in relationships from identifying to non-identifying or vice versa and the loss of FK attributes from some entities. (Section 6.3) Entity Integrity: Selecting PK The importance of properly selecting the PK has a direct bearing on the efficiency and effectiveness of db implementation. (Section 6.3.1) Natural Keys and PK Natural Key or Natural Identifier is a real world, generally accepted identifier used to distinguish—that is, uniquely identify – real world objects. (Section 6.3.2) PK Guidelines The function of the PK is to guarantee entity integrity, not to describe an entity. PK and FK are used to implement relationships among entities. Desirable primary key characteristics should be UNIQUE VALUES, NONINTELLIGENT, NO CHANGE OVER TIME, PREFERABLY SING-ATTRIBUTE, PREFERABLY NUMERIC, SECURITY COMPLIANT, MANIFESTNESS, IMMUTABILITY, COMPACTNESS. (Section 6.3.3) When to use composite PK Composite PK‘s are useful in two cases.  As identifiers of composite entities, where each PK combination is allowed only once in the M:N relationship.  As identifiers of weak entities, where the weak entity has a strong identifying relationship with the parent entity. The ENROLL entity mainly represents the many-to-many relationship between students and classes. Such entities are termed association entities, bridge entities, or composite entities. Note that the table has foreign keys to both STUDENT and CLASS, and that the primary key is the composite of those two foreign keys. PK in Existence-Dependent Relationships: If one entity depends for its existence within the database on one or more other entities, then the existence- dependent table should include the primary key of all tables upon which its existence depends. As the text indicates, these existence dependencies can be natural, such as the existence dependence of DEPENDENT on EMPLOYEE or the existence dependence of GRADED_ITEM on CLASS. Existence dependence can arise in relational database as a result of the normalization process required to correctly represent composite entities in a relational database. (Section 6.3.4) When to use Surrogate PK It can often be very difficult or impossible to identify correct primary keys for natural entities, particularly natural events. In these situations the only solution is to have the computer or user create a unique primary key for each entity that is inserted into the table that represents such an entity. These keys are called synthetic primary keys or surrogate keys. It is famously difficult to identify correct natural keys for people, and it is not desirable for security reasons if you had an ID card that uses your SSN number as your ID #. This is why SSN numbers are not used as PK. They are helpful when there is no natural key, when the selected candidate key has embedded semantic contents, or when the selected candidate key is too long or cumbersome. If you use a surrogate key you must ensure that the candidate key of the entity n questions performs properly through the use of ―unique index‖ & ―not null‖ constraints. Integer surrogate keys are the norm in large and high performance databases. This is because integers are the most compact representation of an identifier that is unique for a number of unique entities, and because it is usually most efficient for computers to store and operate on integers. Tables that should not have PK’s: The text assumes that all tables should have primary keys, but this is not always true. Tables that represent real-world entities or parts of entities should always have primary keys. Most operational databases in financial or other sensitive applications include tables whose sole role is to preserve a record of events or changes to the database. These history or audit tables often record a variety of internal events within the
  • 24. system which may have no corresponding durable entity in the real world, and no natural key. Clients of mine have tried in vain to develop a natural primary key for these tables, including as many as a dozen columns, often with a timestamp to help assure uniqueness. After such an application has run for a while they have discovered to their dismay that they occasionally have primary key uniqueness violations that prevent the insertion of a history record. The problem in these situations is not that they have selected incorrect columns for the natural primary key, but that such tables usually have no reliable natural primary key, even including a timestamp. The solution is to not have a primary key. Not having to maintain the unique index for the primary key speeds inserts into history tables. You may want to index the tables so that you can efficiently retrieve the historic data. If you do need a primary key, for example if a history table must be referenced from another table, then use a synthetic (surrogate) primary key. (Section 6.4) Design cases: Learning Flexible DB Design Databases are the most long-lived of all software components. Many databases have been continuously used and updated for decades. Most of the life cycle cost of databases is thus in the maintenance phase of the database life cycle. Thus the most important characteristic of a database design is that it be easy to modify as business needs change. There is an old saying in the computer industry that an easily maintained design that doesn't happen to be completely correct is not a problem, because you can just fix it, but that an un-maintainable design is a disaster, because even if it works now something will inevitably happen that will require you to change it, and then you have a big problem. In this section we describe the things that you can do to make sure that your designs are flexible enough so that they can be easily maintained. There are two more advanced topics in the design of foreign keys for 1:1 relationships. One consideration is that modern DBMS including Oracle support clustering of tables which have 1:1 mandatory relationships and a common key that identifies the rows that are related 1:1. What clustering of tables does is combine the corresponding logical rows of the clustered tables into one physical row in storage. Because the clustered rows are actually one physical row there is no need to repeat the shared columns in the cluster key. As a result when tables are clustered the result is a smaller database I often cluster tables that are related by a 1:1 mandatory relationship. With clustering the columns in the common cluster key are stored only once, so the database is smaller. Clustering stores the related rows together in one physical row, so joining the tables is essentially free, and you can effectively ignore the performance consequences of joining both tables in requests. Clustering reduces the table size, because there is only primary key. (Section 6.4.1) Design Case #1: Implementing 1:1 Relationships FK‘s work with PK‘s to properly implement relationships in the relation model. The basic rule is to put the PK of the ―one‖ side (the parent entity) on the ―many‖ side (the dependent entity) as a foreign key. However, where do you lace the FK when you are working with a 1:1 relationship? There are two options.  Place a FK in both entities  Place a FK in one of the entities which is the preferred solution. (Section 6.4.2) Design Case #2: Maintaining History of Time-Variant Data Time –Variant data refer to data whose values change over time and for which must keep a history of the data changes. Keeping the history of time-variant data is equivalent to having a multivalued attribute in your entity. To model time-variant data, you must create a new entity in a 1:M relationship with the original entity. Representing History: One nice feature of designs with a current data table and a corresponding history table is that the same queries can be run against the current status table (e.g., DEPARTMENT) or against the corresponding history table (e.g., DEPARTMENT_HIST). Queries can be run against the history table to return results corresponding to the state at any previous time. For example, we can run a report today as if it were being run at the end of the previous quarter, reflecting the state of the DEPARTMENT table or any number of additional tables at that time. This is very useful for many kinds of businesses. Queries run against the history table will have at least one additional WHERE clause, and often a subquery. The history table is typically much larger than the current data table. For these reasons queries against the history table are not as fast as queries against the current data table. This is why it is convenient
  • 25. to have both tables. The current data table supports current operational transactions, while the corresponding history table supports historic analysis and reporting. Performance is not so important for these historic functions, so the extra size and overhead of the history table is acceptable. Note that the current data table redundantly stores the latest data in the history table, so care must be taken to assure that these are always consistent. I usually encapsulate updates to the pair of tables in a stored procedure, and write a stored procedure or script that checks that they are consistent. Triggers can also be used to maintain the denormalized data. Triggers can be used to add history to an existing database and applications without requiring changes to existing SQL. Designing DB History and Audit: We often need to maintain a record of transactions, for some time after the transactions have been completed, to support a review of the transactions or for internal or external audit. The requirements for this history data are quite different from those of the operational database, and consequently the designs for history and audit tables are correspondingly quite different. The differences in the requirements are summarized in the following table. On Lecture 6 section 3.11 When history is kept in separate tables those history tables are often quite denormalized, so that each record in the history table represents the entire event or transaction that it is being recorded for later analysis or audit. The following table summarizes common denormalizations in history tables: On Lecture 6 section 3.11 Designing History and Audit Tables: History and audit tables may record more information than is required for operations. For example, let's look at what happens at a financial services firm, such as a bank, when any change is made to a customer's address. While the operational tables only store the new address, the audit tables will record who made the change, when they made it, from where they made it, and references to any paper documents, voice recordings or other audit data that may be outside the database. Historic data is also stored in data warehouses and other decision support systems. Modern data warehouses store fact data at the level of atomic business transactions, so it is sometimes feasible to use a data warehouse as the longer-term history and audit repository, but this can be problematic. The following table summarizes differences between audit database requirements and data warehouse requirements, as well as the problems created when data warehouses are used as history and audit repositories. (Section 6.4.3) Design Case #3: Fan Traps Design Trap: Occurs when a relationship is improperly or incompletely identified and is therefore represented in a way that is not consistent with the real world. The most consistent design trap is known as a Fran Trap: It occurs when you have one entity in two 1:M relationships to other entities, thus producing an association among the other entities that is not expressed in the model. Fan traps occur when fewer relationships are explicitly represented than matter in the real world and the subset of the relationships that are represented are not sufficient to infer missing important relationships. (Section 6.4.4) Design Case #4: Redundant Relationships Redundant relationships occur when there are multiple relationship paths between related entities. The main concern with redundant relationships is that they remain consistent across the model. It is important to note that some designs use redundant relationships as a way to simplify the design. (Section 6.5) Data Modeling Checklist This is to ensure one fulfills data modeling tasks successfully. The checklist is on page 212 on the text. Some checklist for generalization-specialization not mentioned in the text:  Verify that all attributes of the superclass are needed in all subclasses  Verify that all common attributes of subclasses have been correctly migrated to the superclass.  Verify that domain experts agree that the subclasses are really specializations of the superclass.  Verify that the business rules associated with the superclass really do apply to all subclasses.
  • 26. --------------------------------------------------------------------------------------------------------------------------------- Chapter 8: Advanced SQL (Section 8.7) Procedural SQL Persistent stored module (PSM) is a block of code containing standard SQL statements and procedural extensions that is stored and executed at the DMBS server. Procedural SQL (PL/SQL) is a language that makes it possible to use and store procedural code and SQL statements within the db and to merge SQL and traditional programming constructs, such as variables, conditional processing (IF-THEN-ELSE), basic loops (FOR and WHILE) and error trapping. Anonymous PL/SQL block PL/SQL starts with a DECLARE section. CHAR VARCHAR2 NUMBER DATE %TYPE WHILE Loop END LOOP || to display the output. (Section 8.7.1) Triggers Trigger is procedural SQL code that is automatically invoked by the RDBMS upon the occurrence of a given data manipulation event.  A trigger is invoked before or after a data row is inserted, update, or deleted  A trigger is associated with a db table  Each db table may have one or more triggers  A trigger is executed as part of the transaction that triggered it Triggers are critical to proper db operations and management  Triggers can be used to enforce constraints that cannot be enforced at the DBMS design and implementation levels.  Triggers add functionality by automating critical actions and providing appropriate warnings and suggestions for remedial action. In fact, one of the most common uses for triggers is to facilitate the enforcement of referential integrity.  Triggers can be used to update table values, insert records in tables, and call other stored procedures. Triggers play a critical role in making the db truly useful; they also add processing power to the RDBMS and to the db system as a whole. Oracle recommends triggers for:  Auditing purposes creating audit logs.  Automatic generation of derived column values  Enforcement of business or security constraints  Creation of replica tables for backup purposes. Statement Level Triggers: Is assumed if you omit the FOR EACH ROW keywords. This trigger is executed once, before or after the triggering statement is complete. This is the default case. Row Level Trigger: Requires use of the FOR EACH ROW keywords. This type of trigger is executed once for each row affected by the triggering statement. If you update 10 rows the trigger executes 10 times. ---------------------------------------------------------------------------------------------------------------------------------
  • 27. Chapter 9: Database Design (Section 9.1) IS Info System: data collection, storage, and retrieval. It also facilitates the transformation of data into info, and it allows for the management of both data and info. System Analysis: is the process that establishes the need for and the extent of an info system. System Development: is the process of creating an IS. When apps transform data into info for decision making. Every app is composed of two parts: Data and the code. Performance of an IS depends on a triad of factors.  DB design and Implementation  App design and implementation  Admin procedures Db Development: The process of db design and implementation. Db Design: Primary objective is to create complete, normalized, non-redundant (to the extent possible), and fully integrated conceptual, logical, and physical db models. (Section 9.2) System Development Life Cycle (SDLC) SDLC: Traces the history life cycle of an IS. Provides the big picture. It‘s divided into 5 phases, Panning, Analysis, Detailed system Design, Implementation, & Maintenance. SDLC is an iterative process. (Section 9.2.1) Planning Planning: Yields a general overview of the company and its objectives.  Should the existing system be continued?  Should the existing system be modified?  Should the existing system be replaced? If the new system is necessary, the next question is whether it is feasible. The feasibility study must address the following:  The technical aspects of hardware and software requirements.  The system cost.  The operational cost. (Section 9.2.2) Analysis The problems defined during the planning phase are examined n greater detail during the analysis phase.  What are the requirements of the current system‘s end users?  Do those requirements fit into the overall info requirements?  During analysis phase is a through audit of user requirements.  The existing hardware/software systems are also studied during the analysis phase.  DB data modeling activities take place, DFD HIP diagrams. (Section 9.2.3) Detailed Systems Design The designer completes the design of the system‘s process. Includes screens, menus, reports, and other devices that might be used to help make the system am ore efficient info generator. (Section 9.2.4) Implementation During this phase, the hardware, DMBS software, and app programs are installed, and the db design is implemented. The system enters into a cycle of coding, testing, and debugging until it is ready to be delivered. (Section 9.2.5) Maintenance
  • 28.  Corrective maintenance in response to systems errors.  Adaptive maintenance due to changes in the business environment.  Perfective maintenance to enhance the system. Computer-aided systems engineering (CASE): technology such as system Architect or Visio helps make it possible to produce better systems within a reasonable amount of time and at a reasonable cost. (Section 9.3) The DB life cycle (DBLC) DBLC Contains 6 phases: Db initial study, db design, implementation & loading, testing & evaluation, operation, & maintenance and evolution. (Section 9.3.1) The DB Initial Study Analyze the company situation, define problems and constraints, and define objectives, scope, & boundaries. The purpose of DB initial Study is to: Analyze the company situation: This describes the general conditions in which a company operates its organizational structure, and its mission. These issues must be resolved:  What is the org general operating environment, and what is its mission within that environment?  What is the org structure? Define Problems & Constraints:  How does the existing system function?  What input does the system require?  What docs does the system generate?  By whom and how is the system output used? Define Objectives:  What is the proposed system‘s initial objective?  Will the system interface with other existing or future systems in the company?  Will the system share the data with other systems or users? Define Scope and Boundaries: Scope defines the extent of the design according to operational requirements.  Will the db design encompass the entire org, one or more departments within the org, or one or more functions of a single department? Boundaries: limits of the proposed system. They are external to the system. The scope and boundaries become the factors that force the design into a specific mold, and the designer‘s job is to design the best system possible within those constraints. (Section 9.3.2) DB design In the process of db deign, you must concentrate on the data characteristics required to build the database model. At this point, there are two views of the data within the system: The business view of data as a source of info and the designer‘s view of data within the system: The business view of data as a source of info and the designer‘s view of the data structure, its access, and the activities required to transform the data into info. Defining data is an integral part of the DBLC second phase. 1: Conceptual Design: data modeling is used to create an abstract db structure that represents real world objects in the most realistic way. Four steps:  Data analysis and requirements  Entity relationship modeling and normalization  Data model verification  Distributed db design
  • 29. Minimal data rule: all that is needed is there, all that is there is needed. Make sure that all data needed are in the model and that all data in the model are needed. Data analysis and requirement: The first step in conceptual design is to discover the characteristics of the data elements. Designer is focused on:  Information needs  Information users  Information sources  Information constitution such as what data elements are needed to produce the info? The designer obtains the answers by the following:  Developing and gathering end-user data views  Directly observing the current system  Interfacing with the systems design group From a db point of view, the collection of data becomes meaningful only when business rules are defined. Description of operations is a doc that provides a precise, up-to-date, and thoroughly reviewed description of the activities that define an org operating environment. To db designer operating environments is both data sources and the data users. Entity Relationship Modeling and Normalization: ER model is a comm tool as well as a design blueprint. During the ER modeling process, the designer must:  Define entities, attributes, pk, fk.  Make decisions about adding new pk attributes to satisfy end-user and or processing requirements.  Make decisions about the treatment of multi-valued attributes.  Make decisions about adding derived attributes to satisfy processing requirements.  Make decisions about the placement of fk in 1:1 relationships. Avoid unnecessary ternary relationships.  Draw the corresponding ER diagram.  Include all data element definitions in the data dic.  Make decisions about standard naming conventions. Data Model Verification: The ER model must be verified against the proposed system processes in order to corroborate that the intended processes can be supported by the db model. ER model verification process: 1) Id the ER models central entity. 2) Id each module and its components 3) Id each module transaction requirements: Internal: updates/inserts/deletes/queries/reports External: module interfaces. 4) Verify all processes against the ER model. 5) Make all necessary changes suggested in step 4. 6) Repeat steps 2-5 for all modules. Module: IS component that handles a specific function, such as inventory, orders, payroll, and so on. At the design level, a module is an ER segment that is an integrated part of the overall ER model. They speed up development work, simplify the design work, and can be prototyped quickly. Think of this as a WBS. Disadvantage is that it does create fragmentations, which creates potential problem: The fragments might not include all of the ER model components and might not, therefore, be able to support all of the required processes. To avoid this issue the models must be verified against the complete ER Model. Within the central entity/module framework you must:
  • 30.  A module must display high Cohesivity which describes the strength of the relationships found among the module‘s entities.  Module coupling describes the extent to which modules are independent of one another. Modules must display low coupling, indicating that they are independent of other modules. Process may be classified according to their: Frequency: Daily, weekly, monthly, yearly, or exceptions. Operational type: Insert or Add, Update or Change, Delete, queries and reports, batches, maintenance, and backups. 2 DBMS Software Selection: Some of the common factors affecting the purchasing decision are: Cost: DBMS features and tools: Underlying model: Portability: DBMS hardware requirements: 3 Logical Design: It translates the conceptual design into the internal model for a selected db management system. Therefore the logical design is software dependent. Physical Design: Is the process of selecting the data storage and data access characteristics of the database. It becomes more complex when data are distributed at different locations because the performance is affected by the comm. Medias throughput. (Section 9.3.3) Implementation and Loading  Create db storage group. Sysadmin  Create db within the storage group. Sysadmin  Assign the rights to use the db to a db admin. DBA  Create the table space within the db. DBA  Create the table within the table space. DBA  Assign access rights to the table spaces and to the tables within specified table spaces. DBA  You also must address performance, security, backup and recovery, integrity, and company standards. Performance: DB size will affect performance. Important factors in db performance also include system and db config parameters, such as data placement, access path definition, the use of indexes, and buffer size. Security: Data stored in the company db must be protected from access by unauthorized users.  Physical Security allows only authorized personnel physical access to specific areas.  Password Security allows the assignment of access rights to specific authorized users.  Access rights can be established through the use of db software.  Audit Trails Usually provided by the DBMS to check for access violations.  Data Encryption can be used to render data useless to unauthorized users who might have violated some of the db security layers.  Diskless Workstations allows end users to access the db without being able to download the info from their workstations. Backup & Recovery:  Full Backup or dump of the entire db.  Differential Backup, in which only the last modifications to the db are copied. Only the objects that have been updated since the last full backup are backed up.