Dbms notesization 2014

Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 1
Paper: BCA-302
DATABASE
MANAGEMENT
SYSTEM
DEPARTMENT OF COMPUTER SCIENCE
DEV SANSKRITI VISHWAVIDYALAYA, SHANTIKUNJ,HARIDWAR (UK)
July-Dec 2014. Notes-ization @ DSVV.

PREAMBLE
ACKNOWLEDGEMENTS
Department of Computer Science at Dev Sanskriti Vishwavidyalaya, Shantikunj,
Haridwar (Uttarakhand) was established in year 2006. Department started Bachelor
of Computer Applications (BCA) in year 2012. The serene and vibrant
environment of the university is a boon for the students. Academically they learn
new things everyday but along with that the curriculum of life management
induces virtues of humanities in them.
It was an initiative taken by students of BCA (2013-2016) batch to work in a team
and instead of doing revision only to do a prevision on the subject. They gave it a
name ―Notes-ization‖. Every one contributed to it as per his/her own caliber. But
finally it‘s an sincere effort by Manan Singh (Student BCA III Sem) to finally
make the work presentable and reliable to make the effort of his team mates
fruitful and worth significant. Special thanks to all the web sources. Thank you
every one for this inspirational work. Hope it will benefit one an all. Thanks again
for carrying the spirit of SHARE-CARE-PROSPER

TABLE OF CONTENTS
UNIT TOPICS
UNIT 1 Introduction to Database: Definition of Database, Components
of DBMS, Three Level of Architecture proposal for DBMS,
Advantage & Disadvantage of DBMS, Data independence,
Purpose of Database Management Systems, Structure of DBMS,
DBA and its responsibilities, Data Dictionary, Advantages of
Data Dictionary.
UNIT 2 Data Models: Introduction to Data Models, Object Based
Logical Model, Record Base Logical Model- Relational Model,
Network Model, Hierarchical Model. Entity Relationship
Model, Entity Set, Attribute, Relationship Set. Entity
Relationship Diagram (ERD), Extended features of ERD.
UNIT 3.1 Relational Databases: Introduction to Relational Databases and
Terminology- Relation, Tuple, Attribute, Cardinality, Degree,
Domain. Keys- Super Key, Candidate Key, Primary Key,
Foreign Key.
UNIT 3.2 Relational Algebra: Operations, Select, Project, Union,
Difference, Intersection Cartesian product, Join, Natural Join.
UNIT 4 Structured Query Language (SQL): Introduction to SQL,
History of SQL, Concept of SQL, DDL Commands, DML
Commands, DCL Commands, Simple Queries, Nested Queries,
Normalization: Benefits of Normalization, Normal Forms-
1NF, 2NF, 3NF, BCNF & and Functional Dependency.
UNIT 5 Relational Database Design: Introduction to Relational
Database Design, DBMS v/s RDBMS. Integrity rule, Concept of
Concurrency Control and Database Security.

UNIT 1
INTRODUCTION TO DATABASE
Introduction to Database: Definition of Database, Components of DBMS, Three
Level of Architecture proposal for DBMS, Advantage & Disadvantage of DBMS,
Data independence, Purpose of Database Management Systems, Structure of
DBMS, DBA and its responsibilities, Data Dictionary, Advantages of Data
Dictionary.

DEFINITION OF DATABASE
A database can be summarily described as a repository for data. A database is structured
collection of data. Thus, card indices, printed catalogues of archaeological artifacts and
telephone directories are all examples of databases. It may be stored on a computer and
examined using a program. These programs are often called `databases', but more strictly are
database management systems (DMS).
Computer-based databases are usually organized into one or more tables. A table stores data in a
format similar to a published table and consists of a series of rows and columns. To carry the
analogy further, just as a published table will have a title at the top of each column, so each
column in a database table will have a name, often called a field name. The term field is often
used instead of column. Each row in a table will represent one example of the type of object
about which data has been collected.

COMPONENTS OF DBMS
A database management system (DBMS) consists of several components. Each component plays
very important role in the database management system environment. The major components of
database management system are:
 Software
 Hardware
 Data
 Procedures
 Database Access Language
Software
The main component of a DBMS is the software. It is the set of programs used to handle the
database and to control and manage the overall computerized database
1. DBMS software itself is the most important software component in the overall system.
2. Operating system including network software being used in network, to share the data of
database among multiple users.
3. Application programs developed in programming languages such as C++, Visual Basic
that are used to access database in database management system. Each program contains
statements that request the DBMS to perform operation on database. The operations may
include retrieving, updating, deleting data etc. The application program may be
conventional or online workstations or terminals
Hardware
Hardware consists of a set of physical electronic devices such as computers (together with
associated I/O devices like disk drives), storage devices, I/O channels, electromechanical devices
that make interface between computers and the real world systems etc. and so on. It is impossible
to implement the DBMS without the hardware devices. In a network, a powerful computer with
high data processing speed and a storage device with large storage capacity are required as
database server.
Characteristics:
It is helpful to categorize computer memory into two classes: internal memory and external
memory. Although some internal memory is permanent, such as ROM, we are interested here
only in memory that can be changed by programs. This memory is often known as RAM. This
memory is volatile, and any electrical interruption causes the loss of data.
By contrast, magnetic disks and tapes are common forms of external memory. They are

Non-volatile memory and they retain their content for practically unlimited amounts of time. The
physical characteristics of magnetic tapes force them to be accessed sequentially, making them
useful for backup purposes, but not for quick access to specific data.
In examining the memory needs of a DBMS, we need to consider the following issues:
•Data of a DBMS must have a persistent character; in other words, data must remain available
long after any program that is using it has completed its work. Also, data must remain intact even
if the system breaks down.
•A DBMS must access data at a relatively high rate.
•Such a large quantity of data needs to be stored that the storage medium must be low cost.These
requirements are satisfied at the present stage of technological development only by magnetic
disks.
Data
Data is the most important component of the DBMS. The main purpose of DBMS is to process
the data. In DBMS, databases are defined, constructed and then data is stored, updated and
retrieved to and from the databases. The database contains both the actual (or operational) data
and the metadata (data about data or description about data).
Procedures
Procedures refer to the instructions and rules that help to design the database and to use the
DBMS. The users that operate and manage the DBMS require documented procedures on hot use
or run the database management system. These may include.
1. Procedure to install the new DBMS.
2. To log on to the DBMS.
3. To use the DBMS or application program.
4. To make backup copies of database.
5. To change the structure of database.
6. To generate the reports of data retrieved from database.
Database Access Language
The database access language is used to access the data to and from the database. The users use
the database access language to enter new data, change the existing data in database and to
retrieve required data from databases. The user writes a set of appropriate commands in a
database access language and submits these to the DBMS. The DBMS translates the user
commands and sends it to a specific part of the DBMS called the Database Jet Engine. The
database engine generates a set of results according to the commands submitted by user, converts

these into a user readable form called an Inquiry Report and then displays them on the screen.
The administrators may also use the database access language to create and maintain the
databases.
The most popular database access language is SQL (Structured Query Language). Relational
databases are required to have a database query language.
Users
The users are the people who manage the databases and perform different operations on the
databases in the database system. There are three kinds of people who play different roles in
database system
1. Application Programmers
2. Database Administrators
3. End-Users
Application Programmers
The people who write application programs in programming languages (such as Visual Basic,
Java, or C++) to interact with databases are called Application Programmer.
Database Administrators
A person who is responsible for managing the overall database management system is called
database administrator or simply DBA.
End-Users
The end-users are the people who interact with database management system to perform
different operations on database such as retrieving, updating, inserting, deleting data etc.

3 LEVEL OF ARCHITECTURE PROPOSAL OF DBMS
The logical architecture, also known as the ANSI/SPARC architecture, was elaborated at the
beginning of the 1970s. It distinguishes three layers of data abstraction:
1. The physical layer contains specific and detailed information that describe show data are
stored: addresses of various data components, lengths in bytes, etc. DBMSs aim to
achieve data independence, which means that the database organization at the physical
level should be indifferent to application programs.
2. The logical layer describes data in a manner that is similar to, say, definitions of
structures in C. This layer has a conceptual character; it shields the user from the tedium
of details contained by the physical layer, but is essential in formulating queries for the
DMBS.
3. The user layer contains each user‘s perspective of the content of the database.
The logical architecture describes how data in the database is perceived by users. It is not
concerned with how the data is handled and processed by the DBMS, but only with how it looks.
The method of data storage on the underlying file system is not revealed, and the users can
manipulate the data without worrying about where it is located or how it is actually stored. This
results in the database having different levels of abstraction.
The majority of commercial Database Management System available today is based on the
ANSI/SPARC generalized DBMS architecture, as proposed by the ANSI/SPARC Study Group
on Data Base Management Systems. Hence this is also called as the ANSI/SPARC model. It
divides the system into three levels of abstraction: the internal or physical level, the conceptual
level, and the external or view level.
The External or View Level:
The external or view level is the highest level of abstraction of database. It provides a window on
the conceptual view, which allows the user to see only the data of interest to them. The user can
be either an application program or an end user. There can be many external views as any
number of external schemas can be defined and they can overlap each other. It consists of the
definition of logical records and relationships in the external view. It also contains the method of
deriving the objects such as entities, attributes and relationships in the external view from the
conceptual view.
The Conceptual Level or Global Level:
The conceptual level presents a logical view of the entire database as a unified whole. It allows
the user to bring all the data in the database together and see it in a consistent manner. Hence,
there is only one conceptual schema per database. The first stage in the design of a database is to
define the conceptual view, and a DBMS provides a data definition language for this purpose. it
describes all the records and relationships included in the database.
The data definition language used to create the conceptual level must not specify any physical
storage considerations that should be handled by the physical level. It does not provide any
storage or access details, but defines the information content only.

The Internal or Physical Level:
The collection of files permanently stored on secondary storage devices is known as the physical
database. The physical or internal level is the one closest to the physical storage and it provide a
low level description of the physical database, and an interface between the operating system file
system and the record structures used in higher level of abstraction. It is at this level that record
types and methods of storage are defined, as well as how stored fields are represented, what
physical sequence the stored records are in, and what other physical structures exist.

ADVANTAGES & DISADVANTAGES OF DBMS
Advantages of the DBMS:
The DBMS serves as the intermediary between the user and the database. The database structure
itself is stored as a collection of files, and the only way to access the data in those files is through
the DBMS. The DBMS receives all application requests and translates them into the complex
operations required to fulfill those requests. The DBMS hides much of the database‘s internal
complexity from the application programs and users.
The different advantages of DBMS are as follows:
1. Improved data sharing.
The DBMS helps create an environment in which end users have better access to more and
better-managed data. Such access makes it possible for end users to respond quickly to changes
in their environment.
2. Improved data security.
The more users access the data, the greater the risks of data security breaches. Corporations
invest considerable amounts of time, effort, and money to ensure that corporate data are used
properly. A DBMS provides a framework for better enforcement of data privacy and security
policies.
3. Better data integration.
Wider access to well-managed data promotes an integrated view of the organization‘s operations
and a clearer view of the big picture. It becomes much easier to see how actions in one segment
of the company affect other segments.
4. Minimized data inconsistency.
Data inconsistency exists when different versions of the same data appear in different places.
For example, data inconsistency exists when a company‘s sales department stores a sales
representative‘s name as ―Bill Brown‖ and the company‘s personnel department stores that same
person‘s name as ―William G. Brown,‖ or when the company‘s regional sales office shows the
price of a product as $45.95 and its national sales office shows the same product‘s price as
$43.95. The probability of data inconsistency is greatly reduced in a properly designed database.
5. Improved data access.
The DBMS makes it possible to produce quick answers to ad hoc queries. From a database
perspective, a query is a specific request issued to the DBMS for data manipulation—for
example, to read or update the data. Simply put, a query is a question, and an ad hoc query is a
spur-of-the-moment question. The DBMS sends back an answer (called the query result set) to
the application. For example, end users, when dealing with large amounts of sales data, might
want quick answers to questions (ad hoc queries) such as:
- What was the dollar volume of sales by product during the past six months?
- What is the sales bonus figure for each of our salespeople during the past three months?
- How many of our customers have credit balances of $3,000 or more?

6.Improved decision making.
Better-managed data and improved data access make it possible to generate better-quality
information, on which better decisions are based. The quality of the information generated
depends on the quality of the underlying data. Data quality is a comprehensive approach to
promoting the accuracy, validity, and timeliness of the data. While the DBMS does not guarantee
data quality, it provides a framework to facilitate data quality initiatives.
7.Increased end-user productivity.
The availability of data, combined with the tools that transform data into usable information,
empowers end users to make quick, informed decisions that can make the difference between
success and failure in the global economy.
Disadvantages of Database:
Although the database system yields considerable advantages over previous data management
approaches, database systems do carry significant disadvantages. For example:
1. Increased costs.
Database systems require sophisticated hardware and software and highly skilled personnel. The
cost of maintaining the hardware, software, and personnel required to operate and manage a
database system can be substantial. Training, licensing, and regulation compliance costs are
often overlooked when database systems are implemented.
2. Management complexity.
Database systems interface with many different technologies and have a significant impact on a
company‘s resources and culture. The changes introduced by the adoption of a database system
must be properly managed to ensure that they help advance the company‘s objectives. Given the
fact that database systems hold crucial company data that are accessed from multiple sources,
security issues must be assessed constantly.
3. Maintaining currency.
To maximize the efficiency of the database system, you must keep your system current.
Therefore, you must perform frequent updates and apply the latest patches and security measures
to all components. Because database technology advances rapidly, personnel training costs tend
to be significant. Vendor dependence. Given the heavy investment in technology and personnel
training, companies might be reluctant to change database vendors. As a consequence, vendors
are less likely to offer pricing point advantages to existing customers, and those customers might
be limited in their choice of database system components.
4. Frequent upgrade/replacement cycles.
DBMS vendors frequently upgrade their products by adding new functionality. Such new
features often come bundled in new upgrade versions of the software. Some of these versions
require hardware upgrades. Not only do the upgrades themselves cost money, but it also costs
money to train database users and administrators to properly use and manage the new features.

DATA INDEPENDENCE
A major objective for three-level architecture is to provide data independence, which means that
upper levels are unaffected by changes in lower levels.
There are two kinds of data independence:
• Logical data independence
• Physical data independence
Logical Data Independence
Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schemas. The change would be absorbed by the mapping between
the external and conceptual levels. Logical data independence also insulates application
programs from operations such as combining two records into one or splitting an existing record
into two or more records. This would require a change in the external/conceptual mapping so as
to leave the external view unchanged.
Physical Data Independence
Physical data independence indicates that the physical storage structures or devices could be
changed without affecting conceptual schema. The change would be absorbed by the mapping
between the conceptual and internal levels. Physical data independence is achieved by the
presence of the internal level of the database and the mapping or transformation from the
conceptual level of the database to the internal level. Conceptual level to internal level mapping,
therefore provides a means to go from the conceptual view (conceptual records) to the internal
view and hence to the stored data in the database (physical records).
If there is a need to change the file organization or the type of physical device used as a result of
growth in the database or new technology, a change is required in the conceptual/ internal
mapping between the conceptual and internal levels. This change is necessary to maintain the
conceptual level invariant. The physical data independence criterion requires that the conceptual
level does not specify storage structures or the access methods (indexing, hashing etc.) used to
retrieve the data from the physical storage medium. Making the conceptual schema physically
data independent means that the external schema, which is defined on the conceptual schema, is
in turn physically data independent.
The Logical data independence is difficult to achieve than physical data independence as it
requires the flexibility in the design of database and prograll1iller has to foresee the future
requirements or modifications in the design.

PURPOSE OF DBMS
Database management systems were developed to handle the following difficulties of typical
file-processing systems supported by conventional operating systems. Data redundancy and
inconsistency. Difficulty in accessing data isolation – multiple files and formats. Integrity
problems, Atomicity of updates, Concurrent access by multiple users and Security problems.
 In the early days, database applications were built directly on top of the
file system.
 Drawbacks of using file systems to store data:
- Data redundancy and inconsistency.
- Multiple file formats, duplication of information in different file.
- Difficulty in accessing data.
- Need to write a new program to carry out each new task.
- Data isolation — multiple files and formats.
- Integrity constraints
- Hard to add new constraints or change existing ones.
These problems and others led to the development of database management systems.

STRUCTURE OF DBMS
The components in the structure of DBMS are described below:
DBA :- DBA means Database Administrator. HeShe is person which is responsible for the
installation, configuration, upgrading, administration, monitoring, maintenance, and security of
databases in an organization.
Database Schema: - A database schema defines its entities and the relationship among them.
Database schema is a descriptive detail of the database, which can be depicted by means of

schema diagrams. All these activities are done by database designer to help programmers in
order to give some ease of understanding all aspect of database.
DDL Processor: - The DDL Processor or Compiler converts the data definition statements into a
set of tables. These tables contain the metadata concerning the database and are in a form that
can be used by other components of DBMS.
Data Dictionary: - Information pertaining to the structure and usage of data contained in the
database, the metadata, is maintained in a data dictionary. The term system catalog also describes
this meta data. The data dictionary, which is a database itself, documents the data. Each database
user can consult the data dictionary to learn what each piece of data and various synonyms of the
data fields mean.
Integrity Checker: - It checks the integrity constraints so that only valid data can be entered into
the database.
User: - The users are either application programmers or on-line terminal users of any degree of
sophistication. Each user has a language at his or her disposal. For the application programmer it
will be a conventional programming language, such as COBOL or PL/I; for the terminal user it
will be either a query language or a special purpose language tailored to that user‘s requirements
and supported by an on-line application program.
Queries:- In DBMS a search questions that instruct the program to locate records that need
specific criteria is called Query.
Query Processor: - The query processor transforms user queries into a series of low level
instructions. It is used to interpret the online user's query and convert it into an efficient series of
operations in a form capable of being sent to the run time data manager for execution. The query
processor uses the data dictionary to find the structure of the relevant portion of the database and
uses this information in modifying the query and preparing and optimal plan to access the
database.
Programmer:- Programmer can manipulate the database in all possible ways.
Application Program:- Complete, self-contained computer program that performs a specific
useful task, other than system maintenance functions application programs.

DML Processor:- DML processor process the data manipulation statements such as select ,
update , delete etc. that are passed by the application programmer into a computer program that
perform specified task by programmer such as delete a table etc.
Authorization Control: - The authorization control module checks the authorization of users in
terms of various privileges to users.
Command Process: - The command processor processes the queries passed by authorization
control module.
Query Optimizer: - The query optimizers determine an optimal strategy for the query
execution.
Transaction Manager: - The transaction manager ensures that the transaction properties should
be maintained by the system.
Scheduler: - It provides an environment in which multiple users can work on same piece of data
at the same time in other words it supports concurrency.
Buffer Manager: - The buffer manager is the software layer responsible for bringing pages from
disk to main memory as needed. The buffer manager manages the available main memory by
partitioning it into a collection of pages, which we collectively refer to as the buffer pool.
Recovery Manager: - The recovery manager , which is responsible for maintaining a log and
restoring the system to a consistent state after a crash. It is responsible for ensuring transaction
atomicity and durability.
Physical Database: - The physical database specifies additional storage details. We must decide
what file organization to use to store the relations and create auxiliary data structure called
indexes.

DBA & ITS RESPONSIBILITIES
A Database Administrator (acronym: DBA) is an IT Professionals responsible for: Installation,
Configuration, Upgrade, Administration, Monitoring, Maintenance and Securing, of databases in
an organization.
Database administrator responsibilities are as follows:-
1. Database Installation and upgrading
2. Database configuration including configuration of background Processes
3. Database performance optimization & fine tuning
4. Configuring the Database in Archive log mode
5. Maintaining Database in archive log mode
6. Devising Database backup strategy
7. Monitoring & checking the Database backup & recovery process
8. Database troubleshooting
9. Database recovery in case of crash
10. Database security
11. Enabling auditing features wherever required
12. Table space management
13. Database Analysis report
14. Database health monitoring
15. Centralized controlled
List of skills required to become database administrators are:-
 Communication skills
 Knowledge of database theory
 Knowledge of database design
 Knowledge about the RDBMS itself, e.g. Oracle Database, IBM DB2, Microsoft SQL
Server, Adaptive Server Enterprise, MaxDB, PostgreSQL
 Knowledge of Structured Query Language (SQL) e.g. SQL/PSM, Transact-SQL
 General understanding of distributed computing architectures, e.g. Client/Server,
Internet/Intranet, Enterprise
 General understanding of the underlying operating system, e.g. Windows, Unix, Linux.
 General understanding of storage technologies, memory management, disk arrays,
NAS/SAN, networking
 General understanding of routine maintenance, recovery, and handling failover of a
Database

DATA DICTIONARY & ITS ADVANTAGES
A data dictionary, or metadata repository, as defined in the Dictionary of Computing, is a
"centralized repository of information about data such as meaning, relationships to other data,
origin, usage, and format." The term may have one of several closely related meanings pertaining
to databases and database management systems (DBMS):
 a document describing a database or collection of databases.
 an integral component of a DBMS that is required to determine its structure.
 a piece of middleware that extends or supplants the native data dictionary of a DBMS.
The term data dictionary and data repository are used to indicate a more general software
utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the
information stored in it to the user and the DBA, but it is mainly accessed by the various
software modules of the DBMS itself, such as DDL and DML compilers, the query optimizer,
the transaction processor, report generators, and the constraint enforcer. On the other hand, a
data dictionary is a data structure that stores metadata, i.e., (structured) data about data.
Any well designed database will surely include a data dictionary as it gives database
administrators and other users easy access to the type of data that they should expect to see in
every table, row, and column of the database, without actually accessing the database.
Since a database is meant to be built and used by multiple users, making sure that everyone is
aware of the types of data each field will accept becomes a challenge, especially when there is a
lack of consistency when assigning data types to fields. A data dictionary is a simple yet
effective add-on to ensure data consistency.
Some of the typical components of a data
dictionary entry are:
• Name of the table
• Name of the fields in each table
• Data type of the field (integer, date,
text…)
• Brief description of the expected data
for each field
• Length of the field
• Default value for that field
• Is the field Nullable or Not Nullable?
• Constraints that apply to each field, if
any
Not all of these fields (and many others) will apply to every single entry in the data dictionary.
For example, if the entry were about the root description of the table, it might not require any

information regarding fields. Some data dictionaries also include location details, such as each
field‘s current location, where it actually came from, and details of the physical location such as
the IP address or DNS of the server.
Format and Storage
There exists no standard format for creating a data dictionary. Meta-data differs from table to
table. Some database administrators prefer to create simple text files, while others use diagrams
and flow charts to display all their information. The only prerequisite for a data dictionary is that
it should be easily searchable.
Again, the only applicable rule for data dictionary storage is that it should be at a convenient
location that is easily accessible to all database users. The types of files used to store data
dictionaries range from text files, xml files, spreadsheets, an additional table in the database
itself, to handwritten notes. It is the database administrator‘s duty to make sure that this
document is always up to date, accurate, and easily accessible.
Creating the Data Dictionary
First, all the information required to create the data dictionary must be identified and recorded in
the design documents. If the design documents are in a compatible format, it should be possible
to directly export the data in them to the desired format for the data dictionary. For example,
applications like Microsoft Visio allow database creation directly from the design structure and
would make creation of the data dictionary simpler. Even without the use of such tools, scripts
can be deployed to export data from the database to the document. There is always the option of
manually creating these documents as well.
Advantages of a Data Dictionary
The primary advantage of creating an informative and well designed data dictionary is that it
exudes clarity on the rest of the database documentation. Also, when a new user is introduced to
the system or a new administrator takes over the system, identifying table structures and types
becomes simpler. In scenarios involving large databases where it is impossible for an
administrator to completely remember specific bits of information about thousands of fields, a
data dictionary becomes a crucial necessity.

UNIT 2
DATA MODELS
Data Models: Introduction to Data Models, Object Based Logical Model, Record
Base Logical Model- Relational Model, Network Model, Hierarchical Model.
Entity Relationship Model, Entity Set, Attribute, Relationship Set. Entity
Relationship Diagram (ERD), Extended features of ERD.

INTRODUCTION TO DATA MODELS
Data Model can be defined as an integrated collection of concepts for describing and
manipulating data, relationships between data, and constraints on the data in an organization.
The importance of data models is that data models can facilitate interaction among the designer,
the application programmer and the end user. Also, a well- developed data model can even foster
improved understanding of the organization for which the database design is developed. Data
models are a communication tool as well.
A data model comprises of three components:
• A structural part, consisting of a set of rules according to which databases can be constructed.
• A manipulative part, defining the types of operation that are allowed on the data (this includes
the operations that are used for updating or retrieving data from the database and for changing
the structure of the database).
• Possibly a set of integrity rules, which ensures that the data is accurate.
The purpose of a data model is to represent data and to make the data understandable. There
have been many data models proposed in the literature. They fall into three broad categories:
• Object Based Data Models
• Physical Data Models
• Record Based Data Models

OBJECT BASED LOGICAL MODEL
,
Object based data models use concepts such as entities, attributes, and relationships. An entity is a distinct
object (a person, place, concept, and event) in the organization that is to be represented in the database.
An attribute is a property that describes some aspect of the object that we wish to record, and a
relationship is an association between entities.
Some of the more common types of object based data model are:
• Entity-Relationship
• Object Oriented
• Semantic
• Functional

RECORD BASED LOGICAL MODEL & ITS TYPES
Record based logical models are used in describing data at the logical and view levels. In
contrast to object based data models, they are used to specify the overall logical structure of the
database and to provide a higher-level description of the implementation. Record based models
are so named because the database is structured in fixed format records of several types. Each
record type defines a fixed number of fields, or attributes, and each field is usually of a fixed
length.
The three most widely accepted record based data models are:
• Hierarchical Model
• Network Model
• Relational Model

RELATIONAL MODEL
The relational model for database is a database model based on first-order predicate logic, first
formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all
data is represented in terms of tuples, grouped into relations. A database organized in terms of
the relational model is a relational database.
Advantages of Relational Model:
Conceptual Simplicity: We have seen that both the hierarchical and network models are
conceptually simple, but relational model is simpler than both of those two.
Structural Independence: In the Relational model, changes in the structure do not affect the
data access.
Design Implementation: the relational model achieves both data independence and structural
independence.
Ad hoc query capability: the presence of very powerful, flexible and easy to use capability is
one of the main reason for the immense popularity of the relational database model.
Disadvantages of Relational Model:
Hardware overheads: relational database systems hide the implementation complexities and the
physical data storage details from the user. For doing this, the relational database system need
more powerful hardware computers and data storage devices.
Ease of design can lead to bad design: the relational database is easy to design and use. The
user needs not to know the complexities of the data storage. This ease of design and use can lead
to the development and implementation of the very poorly designed database management
system.

NETWORK MODEL
The network model is a database model conceived as a flexible way of representing objects and
their relationships. Its distinguishing feature is that the schema, viewed as a graph in which
object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or
lattice.
While the hierarchical database model structures data as a tree of records, with each record
having one parent record and many children, the network model allows each record to have
multiple parent and child records, forming a generalized graph structure.
Advantages Network Model :
Conceptual Simplicity: just like hierarchical model it also simple and easy to implement.
Capability to handle more relationship types: the network model can handle one to one1:1 and
many to many N: N relationship.
Ease to access data: the data access is easier than the hierarchical model.
Data Integrity: Since it is based on the parent child relationship, there is always a link between
the parent segment and the child segment under it.
Data Independence: The network model is better than hierarchical model in case of data
independence.
Disadvantages of Network Model:
System Complexity: All the records have to maintain using pointers thus the database structure
becomes more complex.
Operational Anomalies: As discussed earlier in network model large number of pointers is
required so insertion, deletion and updating more complex.
Absence of structural Independence: there is lack of structural independence because when we
change the structure then it becomes compulsory to change the application too.

HIERARCHICAL MODEL
A hierarchical database model is a data model in which the data is organized into a tree-like
structure. The data is stored as records which are connected to one another through links. A
record is a collection of fields, with each field containing only one value. The entity type of a
record defines which fields the record contains.
Advantages of Hierarchical model
1.Simplicity: Since the database is based on the hierarchical structure, the relationship between
the various layers is logically simple.
2.Data Security :Hierarchical model was the first database model that offered the data security
that is provided by the dbms.
3.Data Integrity: Since it is based on the parent child relationship, there is always a link
between the parent segment and the child segment under it.
4.Efficiency: It is very efficient because when the database contains a large number of 1:N
relationship and when the user require large number of transaction.
Disadvantages of Hierarchical model:
1. Implementation complexity: Although it is simple and easy to design, it is quite complex to
implement.
2.Database Management Problem: If you make any changes in the database structure, then you
need to make changes in the entire application program that access the database.
3.Lack of Structural Independence: there is lack of structural independence because when we
change the structure then it becomes compulsory to change the application too.
4.Operational Anomalies: Hierarchical model suffers from the insert, delete and update
anomalies, also retrieval operation is difficult.

ENTITY RELATIONSHIP MODEL
In DBMS, an entity–relationship model (ER model) is a data model for describing the data or
information aspects of a business domain or its process requirements, in an abstract way that
lends itself to ultimately being implemented in a database such as a relational database. The main
components of ER models are entities (things) and the relationships that can exist among them,
and databases.
Entity–relationship modeling was developed by Peter Chen and published in a 1976 paper.
However, variants of the idea existed previously, and have been devised subsequently such as
supertype and subtype data entities and commonality relationships.
ER model is represents real world situations using concepts, which are commonly used by
people. It allows defining a representation of the real world at logical level.ER model has no
facilities to describe machine-related aspects.
In ER model the logical structure of data is captured by indicating the grouping of data into
entities. The ER model also supports a top-down approach by which details can be given in
successive stages.
Entity: - An entity is something which is described in the database by storing its data, it
may be a concrete entity a conceptual entity.
Entity set:- An entity set is a collection of similar entities.
Attribute:- An attribute describes a property associated with entities. Attribute will have a
name and a value for each entity.
Domain:- A domain defines a set of permitted values for a attribute.

ENTITY SET
Entity set:- An entity set is a collection of similar entities.
A database can be modeled as:
*"a collection of entities,
*"relationship among entities.
An entity is an object that exists and is distinguishable from other objects.
Ex:- specific person, company, event, plant
Entities have attributes
Ex:- people have names and addresses.
An entity set is a set of entities of the same type that share the same properties.
Ex:- set of all persons, companies, trees, holidays.
Entity is a thing in the real world with an independent existence. and entity set is collection or set
all entities of a particular entity type at any point of time. Take an example: a company have
many employees ,and these employees are defined as entities(e1,e2,e3....) and all these entities
having same attributes are defined under ENTITY TYPE employee, and set{e1,e2,.....} is called
entity set. we can also understand this by an anology. entity type is like fruit which is a class .we
haven't seen any "fruit" yet though we have seen instance of fruit like "apple ,banana,mango etc.
hence..fruit=entity type=EMPLOYEE apple=entity=e1 or e2 or e3enity set= bucket of apple,
banana ,mango etc={e1,e2......}

ATTRIBUTE
In a database management system (DBMS), an attribute may describe a component of the
database, such as a table or a field, or may be used itself as another term for a field.
A table contains one or more columns there columns are the attribute in DBMS For Example--
say you have a table named "employee information" which have the following columns
ID,NAME,ADDRESS THEN id ,name address are the attributes of employee.

RELATIONSHIP SET
The association among entities is called relationship. For example, employee entity has relation
works at with department. Another example is for student who enrolls in some course. Here,
Works at and Enrolls are called relationship.
Relationship Set
Relationship of similar type is called relationship set. Like entities, a relationship too can have
attributes. These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in an relationship defines the degree of the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set which can be associated to
the number of entities of other set via relationship set.
One-to-one: one entity from entity set A can be associated with at most one entity of
entity set B and vice versa.
One-to-many: One entity from entity set A can be associated with more than one entities of
entity set B but from entity set B one entity can be associated with at most one entity.

Many-to-one: More than one entities from entity set A can be associated with at most one entity
of entity set B but one entity from entity set B can be associated with more than one entity from
entity set A.
Many-to-many: one entity from A can be associated with more than one entity from B and vice
versa

ENTITY RELATIONSHIP DIAGRAM (ERD)
Definition: An entity-relationship (ER) diagram is a specialized graphic that illustrates the
relationships between entities in a database. ER diagrams often use symbols to represent three
different types of information. Boxes are commonly used to represent entities. Diamonds are
normally used to represent relationships and ovals are used to represent attributes.

Components of ER Diagram
The ER diagram has three main components:
1) Entity
An Entity can be an object, place, person or class. In ER Diagram, an entity is represented using
rectangles. Consider an example of an Organization. Employee, manager, Department, Product
and many more can be taken as entities from an Organization.
Weak Entity
A weak entity is an entity that must defined by a foreign key relationship with another entity as it
cannot be uniquely identified by its own attributes alone.Weak entity is an entity that depends on
another entity. Weak entity doen‘t have key attribute of their own. Double rectangle represents
weak entity.
2) Attribute
An Attribute describes a property or characterstic of an entity. For example, Name, Age,
Address etc can be attributes of a Student. Databases contain information about each entity. This
information is tracked in individual fields known as attributes, which normally correspond to the
columns of a database table.An attribute is represented using eclipse.
Key Attribute
A key attribute is the unique, distinguishing characteristic of the entity. For example, an
employee‘s social security number might be the employee‘s key attribute.Key attribute

represents the main characterstic of an Entity. It is used to represent Primary key. Ellipse with
underlying lines represent Key Attribute.
Composite Attribute
An attribute can also have their own attributes. These attributes are known as Composite
attribute.
3) Relationship
Relationships illustrate how two entities share information in the database structure.A
Relationship describes relations between entities. Relationship is represented using diamonds.
There are three types of relationship that exist between Entities.
 Binary Relationship
 Recursive Relationship
 Ternary Relationship
Binary Relationship
Binary Relationship means relation between two Entities. This is further divided into three types.

1. One to One : This type of relationship is rarely seen in real world.
The above example describes that one student can enroll ony for one course and a course
will also have only one Student. This is not what you will usually see in relationship.
2. One to Many : It reflects business rule that one entity is associated with many number of
same entity. For example, Student enrolls for only one Course but a Course can have
many Students.
The arrows in the diagram describes that one student can enroll for only one course.
3. Many to Many :
The above diagram represents that many students can enroll for more than one courses.
Recursive Relationship
In some cases, entities can be self-linked. For example, employees can supervise other
employees.
Ternary Relationship
Relationship of degree three is called Ternary relationship.

EXTENDED FEATURES OF ERD
ER Model has the power of expressing database entities in conceptual hierarchical manner such
that, as the hierarchical goes up it generalize the view of entities and as we go deep in the
hierarchy it gives us detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to
represent a more generalized view. For example, a particular student named, Mira can be
generalized along with all the students, the entity shall be student, and further a student is person.
The reverse is called specialization where a person is student, and that student is Mira.
Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain
the properties of all the generalized entities is called Generalization. In generalization, a number
of entities are brought together into one generalized entity based on their similar characteristics.
For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds.
Specialization
Specialization is a process, which is opposite to generalization, as mentioned above. In
specialization, a group of entities is divided into sub-groups based on their characteristics. Take a
group Person for example. A person has name, date of birth, gender etc. These properties are
common in all persons, human beings. But in a company, a person can be identified as employee,
employer, customer or vendor based on what role do they play in company.
Similarly, in a school database, a person can be specialized as teacher, student or staff; based on
what role do they play in school as entities.

Inheritance
We use all above features of ER-Model, in order to create classes of objects in object oriented
programming. This makes it easier for the programmer to concentrate on what she is
programming. Details of entities are generally hidden from the user, this process known as
abstraction.
One of the important features of Generalization and Specialization, is inheritance, that is, the
attributes of higher-level entities are inherited by the lower level entities.
For example, attributes of a person like name, age, and gender can be inherited by lower level
entities like student and teacher etc.
Aggregation
The E-R model cannot express relationships among relationships.
When would we need such a thing?
Consider a DB with information about employees who work on a particular project and use a
number of machines doing that work. We get the E-R diagram shown in Figure below.

Figure 2.20: E-R diagram with redundant relationships
Relationship sets work and uses could be combined into a single set. However, they shouldn't be,
as this would obscure the logical structure of this scheme.
The solution is to use aggregation.
 An abstraction through which relationships are treated as higher-level entities.
 For our example, we treat the relationship set work and the entity sets employee and
project as a higher-level entity set called work.
 Figure below shows the E-R diagram with aggregation.
Figure 2.21: E-R diagram with aggregation
Transforming an E-R diagram with aggregation into tabular form is easy. We create a table for
each entity and relationship set as before.

The table for relationship set uses contains a column for each attribute in the primary key of
machinery and work.
Aggregation is an abstraction in which relationship sets are treated as higher level entity sets.
Here a relationship set is embedded inside an entity set, and these entity sets can participate in
relationships.

UNIT 3.1
RELATIONAL DATABASES
Relational Databases: Introduction to Relational Databases and Terminology-
Relation, Tuple, Attribute, Cardinality, Degree, Domain. Keys- Super Key,
Candidate Key, Primary Key, Foreign Key.

INTRODUCTION TO RELATIONAL DATABASES
Relational database was proposed by Edgar Codd (of IBM Research) around 1969. It has since
become the dominant database model for commercial applications (in comparison with other
database models such as hierarchical, network and object models). Today, there are many
commercial Relational Database Management System (RDBMS), such as Oracle, IBM DB2 and
Microsoft SQL Server. There are also many free and open-source RDBMS, such as MySQL,
mSQL (mini-SQL) and the embedded JavaDB.
A relational database organizes data in tables (or relations). A table is made up of rows and
columns. A row is also called a record (or tuple). A column is also called a field (or attribute). A
database table is similar to a spreadsheet. However, the relationships that can be created among
the tables enable a relational database to efficiently store huge amount of data, and effectively
retrieve selected data.
A language called SQL (Structured Query Language) was developed to work with relational
databases.
Features of RDBMS
Features and characteristics of an RDBMS can be best understood by the Codd‘s 12 rules.
Codd’s12 Rules
Codd's thirteen rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F.
Codd, a pioneer of the relational model for databases, designed to define what is required from a
database management system in order for it to be considered relational, i.e., a relational database
management system (RDBMS). They are sometimes jokingly referred to as "Codd's Twelve
Commandments". They are as follows:
Rule 0: The Foundation rule:
A relational database management system must manage its stored data using only its
relational capabilities. The system must qualify as relational, as a database, and as a
management system. For a system to qualify as a relational database management system
(RDBMS), that system must use its relational facilities (exclusively) to manage the
database.
Rule 1: The information rule:
All information in a relational database (including table and column names) is
represented in only one way, namely as a value in a table.
Rule 2: The guaranteed access rule:

All data must be accessible. It says that every individual scalar value in the database must
be logically addressable by specifying the name of the containing table, the name of the
containing column and the primary key value of the containing row.
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support
a representation of "missing information and inapplicable information" that is systematic,
distinct from all regular values (for example, "distinct from zero or any other number", in
the case of numeric values), and independent of data type. It is also implied that such
representations must be manipulated by the DBMS in a systematic way.
Rule 4: Active onlinecatalog based on the relational model:
The system must support an online, inline, relational catalog that is accessible to
authorized users by means of their regular query language. That is, users must be able to
access the database's structure (catalog) using the same query language that they use to
access the database's data.
Rule 5: The comprehensive data sublanguage rule:
The system must support at least one relational language that
1. Has a linear syntax
2. Can be used both interactively and within application programs,
3. Supports data definition operations (including view definitions), data
manipulation operations (update as well as retrieval), security and integrity
constraints, and transaction management operations (begin, commit, and
rollback).
Rule 6: The view updating rule:
All views that are theoretically updatable must be updatable by the system.
Rule 7: High-level insert, update, and delete:
The system must support set-at-a-time insert, update, and delete operators. This means
that data can be retrieved from a relational database in sets constructed of data from
multiple rows and/or multiple tables. This rule states that insert, update, and delete
operations should be supported for any retrievable set rather than just for a single row in a
single table.
Rule 8: Physical data independence:
Changes to the physical level (how the data is stored, whether in arrays or linked lists
etc.) must not require a change to an application based on the structure.
Rule 9: Logical data independence:
Changes to the logical level (tables, columns, rows, and so on) must not require a change
to an application based on the structure. Logical data independence is more difficult to
achieve than physical data independence.
Rule 10: Integrity independence:

Integrity constraints must be specified separately from application programs and stored in
the catalog. It must be possible to change such constraints as and when appropriate
without unnecessarily affecting existing applications.
Rule 11: Distribution independence:
The distribution of portions of the database to various locations should be invisible to
users of the database. Existing applications should continue to operate successfully:
1. when a distributed version of the DBMS is first introduced; and
2. when existing distributed data are redistributed around the system.
Rule 12: The non-subversion rule:
If the system provides a low-level (record-at-a-time) interface, then that interface cannot
be used to subvert the system, for example, bypassing a relational security or integrity
constraint.
Advantages of RDBMS
RDBMS offers an extremely structured way of managing data (although a good database design
is needed) as everything in an RDBMS is represented as values in relations (i.e. tables). Also,
many obvious advantages are visible within the 13 rules stated by Codd.
Disadvantages of RDBMS
RDBMS is very good for related data, but an unorganized and unrelated data creates only chaos
within RDBMS. That‘s a reason why the emerging trends such as Big Data (where a lot of data
from various sources is to be analyzed) don‘t welcome RDBMS, but non-relational (or non-SQL
DBMSs) DBMS for their purpose.

TERMINOLOGIES: (RELATION, TUPLE, ATTRIBUTE,
CARDINALITY, DEGREE, DOMAIN)
Relation:
Definition-
A database relation is a predefined row/column format for storing information in a relational
database. Relations are equivalent to tables. It is also known as table.
Example-
Tuple:
Definition-
In the context of databases, a tuple is one record (one row).
Example-

Attribute:
Definition-
In general, an attribute is a characteristic. In a database management system (DBMS), an
attribute refers to a database component, such a table. It also may refer to a database field.
Attributes describe the instances in the row of a database.
Example-
Degree:
Definition-
It is the number of attribute of its relation schema. It is an association among two or more
entities.
Example-

Cardinality:
Definition-
In the context of databases, cardinality refers to the uniqueness of data values contained in a
column.
It is not common, but cardinality also sometimes refers to the relationships between tables.
Cardinality between tables can be one-to-one, many-to-one, or many-to-many.
Example-
Domain
Definition-
In database technology, domain refers to the description of an attribute's allowed values. The
physical description is a set of values the attribute can have, and the semantic, or logical,
description is the meaning of the attribute.
Example-

KEYS: (SUPER KEYS, CANDIDATE KEY, PRIMARY
KEY, FOREIGN KEY)
Definition of a Key-
Simply consists of one or more attributes that determine other attributes.
The key is defined as the column or attribute of the database table. For example if a table has id,
name and address as the column names then each one is known as the key for that table. We can
also say that the table has 3 keys as id, name and address. The keys are also used to identify each
record in the database table.
The following are the various types of keys available in the DBMS system.
 Super key
 Candidate key
 Primary key
 Foreign key
Super Key-
A superkey is a combination of columns that uniquely identifies any row within a relational
database management system (RDBMS) table. A candidate key is a closely related concept
where the superkey is reduced to the minimum number of columns required to uniquely identify
each row.
For example, imagine a table used to store customer master details that contains columns such
as:
customer name
customer id
social security number (SSN)
address
date of birth
A certain set of columns may be extracted and guaranteed unique to each customer. Examples of
superkeys are as follows:
 Name, SSN, Birthdate
 ID, Name, SSN
However, this process may be further reduced. It can be assumed that each customer id is unique
to each customer. So, the superkey may be reduced to just one field, customer id, which is the
candidate key. However, to ensure absolute uniqueness, a composite candidate key may be

formed by combining customer id with SSN.
A primary key is a special term for candidate keys designated as unique identifiers for all table
rows. Until this point, only columns have been considered for suitability and are thus termed
candidate keys. Once a candidate key is decided, it may be defined as the primary key at the
point of table creation.
Candidate key-
A candidate key is a column, or set of columns, in a table that can uniquely identify any database
record without referring to any other data. Each table may have one or more candidate keys, but
one candidate key is special, and it is called the primary key. This is usually the best among the
candidate keys.
When a key is composed of more than one column, it is known as a composite key.
The best way to define candidate keys is with an example. For example, a bank‘s database is
being designed. To uniquely define each customer‘s account, a combination of the customer‘s ID
or social security number (SSN) and a sequential number for each of his or her accounts can be
used. So, Mr. Andrew Smith‘s checking account can be numbered 223344-1, and his savings
account 223344-2. A candidate key has just been created.
In this case, the bank‘s database can issue unique account numbers that are guaranteed to prevent
the problem just highlighted. For good measure, these account numbers can have some built-in
logic. For example checking accounts can begin with a ‗C,‘ followed by the year and month of
creation, and within that month, a sequential number.
Note that it was possible to uniquely identify each account using the aforementioned SSNs and a
sequential number (assuming no government mess-up, in which the same number is issued to
two people). So, this is a candidate key that can potentially be used to identify records. However,
a much better way of doing the same thing has just been demonstrated - creating a candidate key.
In fact, if the chosen candidate key is so good that it can certainly uniquely identify each and
every record, then it should be used as the primary key. All databases allow the definition of one,
and only one, primary key per table.
Primary key-
It is a candidate key that is chosen by the database designer to identify entities with in an entity
set. Primary key is the minimal super keys. In the ER diagram primary key is represented by
underlining the primary key attribute. Ideally a primary key is composed of only a single
attribute. But it is possible to have a primary key composed of more than one attribute.
A primary key is a special relational database table column (or combination of columns)
designated to uniquely identify all table records.

A primary key‘s main features are:
 It must contain a unique value for each row of data.
 It cannot contain null values.
A primary key is either an existing table column or a column that is specifically generated by the
database according to a defined sequence.
For example, students are routinely assigned unique identification (ID) numbers, uniquely-identifiable
Social Security numbers.
For example, a database must hold all of the data stored by a commercial bank. Two of the
database tables include the CUSTOMER_MASTER, which stores basic and static customer data
(e.g., name, date of birth, address and Social Security number, etc.) and the
ACCOUNTS_MASTER, which stores various bank account data (e.g., account creation date,
account type, withdrawal limits or corresponding account information, etc.).
To uniquely identify customers, a column or combination of columns is selected to guarantee
that two customers never have the same unique value. Thus, certain columns are immediately
eliminated, e.g., surname and date of birth. A good primary key candidate is the column that is
designated to hold unique and government-assigned Social Security numbers. However, some
account holders (e.g., children) may not have Social Security numbers, and this column‘s
candidacy is eliminated. The next logical option is to use a combination of columns such as the
surname to the date of birth to the email address, resulting in a long and cumbersome primary
key.
Foreign Key-
A foreign key is a column or group of columns in a relational database table that provides a link
between data in two tables. It acts as a cross-reference between tables because it references the
primary key of another table, thereby establishing a link between them.
In complex databases, data in a domain must be added across multiple tables, thus maintaining a
relationship between them. The concept of referential integrity is derived from foreign key
theory.
Foreign keys and their implementation are more complex than primary keys.
For any column acting as a foreign key, a corresponding value should exist in the link table.
Special care must be taken while inserting data and removing data from the foreign key column,
as a careless deletion or insertion might destroy the relationship between the two tables.
For instance, if there are two tables, customer and order, a relationship can be created between
them by introducing a foreign key into the order table that refers to the customer ID in the
customer table. The customer ID column exists in both customer and order tables. The customer
ID in the order table becomes the foreign key, referring to the primary key in the customer table.
To insert an entry into the order table, the foreign key constraint must be satisfied.

Some referential actions associated with a foreign key action include the following:
 Cascade: When rows in the parent table are deleted, the matching foreign key columns in the
child table are also deleted, creating a cascading delete.
 Set Null: When a referenced row in the parent table is deleted or updated, the foreign key values
in the referencing row are set to null to maintain the referential integrity.
 Triggers: Referential actions are normally implemented as triggers. In many ways foreign key
actions are similar to user-defined triggers. To ensure proper execution, ordered referential
actions are sometimes replaced with their equivalent user-defined triggers.
 Set Default: This referential action is similar to "set null." The foreign key values in the child
table are set to the default column value when the referenced row in the parent table is deleted or
updated.
 Restrict: This is the normal referential action associated with a foreign key. A value in the parent
table cannot be deleted or updated as long as it is referred to by a foreign key in another table.
 No Action: This referential action is similar in function to the "restrict" action except that a no-action
check is performed only after trying to alter the table.

UNIT 3.2
RELATIONAL ALGEBRA
Relational Algebra: Operations, Select, Project, Union, Difference, Intersection
Cartesian product, Join, Natural Join.

INTRODUCTION
Relational algebra, first described by E.F. Codd while at IBM, is a family of algebra with a
well-founded semantics used for modeling the data stored in relational databases, and defining
queries on it.
In relational algebra the queries are composed using a collection of operators, and each query
describes a step by step procedure for computing the desired result.
The queries are specified in operational and procedural manner that‘s why its called the
procedural language also.
There are many operations which we include in the relational algebra .
Each relational query describes a step by step procedure for computing the desired answer
,based on the order in which operators are applied in the query.
The procedural nature of the algebra allows us to think of an algebra as a recipe, or a plan for
evaluating a query, and relational system in fact use algebra expressions to represent query
evaluation plans.
Relational algebra expression
It is an expression which is a composition of the operators and it forms a complex query called
a relational algebra expression.
A unary algebra operator applied to a single expression ,and a binary algebra operator applied to
two expression
Fundamental operations of Relational algebra:
 Select
 Project
 Union
 Set different
 Cartesian product
 Rename

SELECT
The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a
relation based on a selection condition.
 The selection condition acts as a filter
 Keeps only those tuples that satisfy the qualifying condition
 Tuples satisfying the condition are selected whereas the other tuples are discarded
(filtered out)
Examples:
A. Select the STUDENT tuples whose age is 18
sigmaage=18 (STUDENT)
B. Select the STUDENT tuples whose course is bca
sigmacourse=BCA (STUDENT)
C. Select the students from the ―student relation instances‖ whose gender is male
sigmagender=F(STUDENT)
Student name Age gender course
Ritika 18 F BCA
Prerna 19 F Bsc.
Ankush 20 M BA
Preeti 18 F Bsc.
Pragyan 20 M BA
Ritu 18 F BCA
Janvi 20 F BCA
Answer of the first select statement is :
A.
Student name Age gender course
Ritika 18 F BCA
Preeti 18 F Bsc.
Ritu 18 F BCA

PROJECT
PROJECT Operation is denoted by p (pi)
If we are interested in only certain attributes of relation, we use PROJECT.
This operation keeps certain columns (attributes) from a relation and discards the other columns.
Example:
To list all the students name and course only in the student relation model.
Pistudent_name, course (student)
(output from the table first)
Student-name Course
Ritika BCA
Prerna Bsc.
Ankush BA
Preeti Bsc.
Pragyan BA
Ritu BCA
Janvi BCA

UNION
It is a Binary operation, denoted by sign of union in set theory. The result of R union S, is a
relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are
eliminated.
The two operand relations R and S must be ―type compatible‖ (or UNION compatible), & R and
S must have same number of attributes.
Each pair of corresponding attributes must be type compatible (have same or compatible
domains). Eg. in the bank enterprise we have depositor and borrower almost similar attributes
and types.
Customer name Id no.
RITA 301
GITA 302
RAM 303
(DEPOSITOR‘S RELATIONAL MODEL)
Customer name Id no.
Sham 300
Surbhi 304
Rita 301
Ram 303
(Borrower‘s relational model)
(Output: a union b)
Customer_name Id no
Rita 301
Gita 302
Ram 303
Sham 300
Surbhi 304

DIFFERENCE
SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by – .The result of R – S, is a
relation that includes all tuples that are in R but not in S. The attribute names in the result will be
the same as the attribute names in R. The two operand relations R and S must be ―type
compatible‖
Output: a-b
Customer name Idno
Gita 302
The elements of a which are not belongs to b contains only a single result

INTERSECTION
INTERSECTION: The result of the operation R intersection S, is a relation that includes all
tuples that are in both R and S.
 The attribute names in the result will be the same as the attribute names in R
 The two operand relations R and S must be ―type compatible‖

CARTESIAN PRODUCT
,
The resulting relation state has one tuple for each combination of tuples—one from R and one
from S. Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have
nR * nS tuples.
The two operands do NOT have to be "type compatible‖.
Example:
R.
A 1
B 2
D 3
F 4
S.
D 3
E 4
Output: R*S
A 1 D 3
A 1 E 4
B 2 D 3
B 2 E 4
D 3 D 3
D 3 E 4
F 4 D 3
F 4 E 4

JOIN
,
It is just a cross product of two relations.
 Join allow you to evaluate a join condition between the attributes of the relations on
which the join operations undertaken .
 It is used to combine related tuples from two relations.
 Join condition is called theta.
Notation:-
R JOINjoin condition S
Let us take an instance:-

NATURAL JOIN
Another variation of JOIN called NATURAL JOIN — denoted by *
Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such
joins result in two attributes in the resulting relation having exactly the same value. A 'natural
join' will remove the duplicate attribute(s).
 In most systems a natural join will require that the attributes have the same name to
identify the attribute(s) to be used in the join. This may require a renaming mechanism.
 If you do use natural joins make sure that the relations do not have two attributes with the
same name by accident.
Example:
The following query results refer to this database state.

A simple database:

Example Natural Join Operations on the sample database above:

SUMMARY OF OPERATIONS

UNIT 4
STRUCTURED QUERY LANGUAGE (SQL)
&
NORMALIZATION
Structured Query Language (SQL): Introduction to SQL, History of SQL,
Concept of SQL, DDL Commands, DML Commands, DCL Commands, Simple
Queries, Nested Queries,
Normalization: Benefits of Normalization, Normal Forms- 1NF, 2NF, 3NF,
BCNF & and Functional Dependency.

INTRODUCTION TO SQL
Introduction & Brief History:
SQL is a special-purpose programming language designed for managing data held in a relational
database management system (RDBMS). Originally based upon relational algebra and tuple
relational calculus, SQL consists of a data definition language and a data manipulation language.
The scope of SQL includes data insert, query, update and delete, schema creation and
modification, and data access control.
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as
described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data
Banks." Despite not entirely adhering to the relational model as described by Codd, it became the
most widely used database language.
SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the
International Organization for Standardization (ISO) in 1987. Since then, the standard has been
revised to include a larger set of features.
Why SQL?
 Allows users to access data in relational database management systems.
 Allows users to describe the data.
 Allows users to define the data in database and manipulate that data.
 Allows embedding within other languages using SQL modules, libraries & pre-compilers.
 Allows users to create and drop databases and tables.
 Allows users to create view, stored procedure, functions in a database.
 Allows users to set permissions on tables, procedures and views
Advantages of SQL:
 High Speed: SQL Queries can be used to retrieve large amounts of records from a
database quickly and efficiently.
 Well Defined Standards Exist: SQL databases use long-established standard, which is
being adopted by ANSI & ISO. Non-SQL databases do not adhere to any clear standard.
 No Coding Required: Using standard SQL it is easier to manage database systems
without having to write substantial amount of code.

 Emergence of ORDBMS: Previously SQL databases were synonymous with relational
database. With the emergence of Object Oriented DBMS, object storage capabilities are
extended to relational databases
Disadvantages of SQL:
 Difficulty in Interfacing: Interfacing an SQL database is more complex than adding a few
lines of code.
 More Features Implemented in Proprietary way: Although SQL databases conform to
ANSI &ISO standards, some databases go for proprietary extensions to standard SQL to
ensure vendor lock-in.

HISTORY OF SQL
 In 1970 Edgar F. Codd, member of IBM Lab, published the classic paper, ‘A relational
model of data large shared data banks‘.
 With Codd‘s paper ,a great deal of research and experiments started and led to the design
and prototype implementation of a number of relational languages.
 One such language was Structured English Query Language (SEQUEL), defined by
Donald D. Chamberlin and Raymond F. Boyce.
 The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark
of the UK-based Hawker Siddeley aircraft company.
 A revised version of SEQUEL was released in 1976-77 called SEQUEL/2 or SQL
 In 1978, IBM worked to develop Codd's ideas and released a product named System/R.
 In 1986IBM developed the first prototype of relational database and standardized by
ANSI. The first relational database was released by Relational Software and its later
becoming ORACLE.
 IN 1986 ANSI and ISO published an SQL standard called ‗SQU-86‘.
 The next version of standard was SQL-89,SQL-92, followed by SQL-1999,SQL-
2003,SQL-2006, SQL-2008.
According to the industry trends , it is obvious that the relational model and SQL Will continue
to enhance its position in near future

CONCEPT BEHIND SQL
SQL Process
When you are executing an SQL command for any RDBMS, the system determines the best way
to carry out your request and SQL engine figures out how to interpret the task.
There are various components included in the process. These components are:-
 Query Dispatcher
 Optimization Engines
 Classic Query Engine
 SQL Query Engine
Classic query engine handles all non-SQL queries but SQL query engine won't handle logical
files.
SQL Architecture
Types of SQL Commands
The following sections discuss the basic categories of commands used in SQL to perform various
functions . The main categories are:-
 DDL (Data Definition Language)
 DML (Data Manipulation Language)
 DQL (Data Query Language)
 DCL (Data Control Language)
 TCL (Transactional Control Language)

DDL COMMANDS
DDL (Data Definition Language) Commands of SQL allow the Data Definition functions like
creating, altering and dropping the tables.
The following are the various DDL Commands, along with their syntax, use and examples:
#1. CREATE
USE: creates a new table, view of a table, or other objects in database.
SYNTAX:
CREATE TABLE table_name(
Column_name1 data_type(size),
Column_name2 data_type(size),
….
);
EXAMPLE :
CREATE TABLE Persons
(PersonIDint,
LastNamevarchar(255),
FirstNamevarchar(255),
Address varchar(255),
City varchar(255)
);
#2. ALTER
USE : modifies an existing database object such as a table.
SYNTAX :
ALTER TABLE table_name
ADD column_namedatatype;
or
DROP COLUMN column_name;
or
MODIFY COLUMN column_namedatatype;
EXAMPLE :
ALTER TABLE Persons
ADD DateOfBirth date;
or

ALTER TABLE Persons
DROP COLUMN DateOfBirth;
or
ALTER TABLE Persons
ALTER COLUMN DateOfBirth year;
#3. DROP
USE : deletes an entire table, a view of a table, or other object in the database.
SYNTAX : DROP TABLE table_name;
EXAMPLE : DROP TABLE Persons;
#4. TRUNCATE
USE : remove all records from a table, including all spaces allocated for the
records are removed; also, reinitializes the primary key.
SYNTAX : TRUNCATE TABLE table_name;
EXAMPLE : TRUNCATE TABLE persons;
#5. COMMENT
USE : Add comments to the data dictionary.

DML COMMANDS
DML (Data Manipulation Language) Commands of SQL allow the Data Manipulation functions
like inserting, updating and deleting data values in the tables created using DDL Commands.
The following are the various DML Commands, along with their syntax, use and examples:
#1. INSERT
USE : creates a record.
SYNTAX :
INSERT INTO table_name
VALUES (value1,value2,value3,...);
or
INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);
EXAMPLE :
INSERT INTO Persons VALUES(1,‘manan’,’07-08-1994’);
#2. UPDATE
USE : modifies records.
SYNTAX :
UPDATE table_name
SET column1=value1,column2=value2,...
WHERE some_column=some_value;
EXAMPLE :
UPDATE Students
SET Fine=0
WHERE Stu_ID=404;
#3. DELETE
USE : delete records (but the structure remain intact).
SYNTAX :
DELETE FROM table_name
WHERE some_column=some_value;
EXAMPLE :
DELETE FROM Persons
WHERE Stu_ID=21;

#4. CALL
USE : call a PL/SQL or java subprogram.
#5. EXPLAIN PLAN
USE : explain access path to data.
SYNTAX :
EXPLAIN PLAN FOR
SQL_Statement;
EXAMPLE :
EXPLAIN PLAN FOR
SELECT last_name FROM employees;
#6. LOCK TABLE
USE : control concurrency.
SYNTAX :
LOCK TABLE table_name
IN EXCLUSIVE MODE
NOWAIT;
This locks the table in exclusive mode but does not wait if another user already has locked the table:
EXAMPLE :
LOCK TABLE employees
IN EXCLUSIVE MODE
NOWAIT;

DCL COMMANDS
DCL (Data Control Language) Commands of SQL allow the Data Manipulation functions like
granting and revoking permissions, committing changes, roll backing, etc.
The following are the various DCL Commands, along with their syntax, use and examples:
#1. GRANT
USE : gives a privilege to user(s).
SYNTAX :
GRANT permission [, ...]
ON [schema_name.]object_name [(column [, ...])]
TO database_principal[, ...]
[WITH GRANT OPTION]
EXAMPLE :
GRANT SELECT
ON Invoices
TO AnneRoberts;
#2. REVOKE
USE : takes back privileges/grants from users.
SYNTAX :
REVOKE [GRANT OPTION FOR] permission [, ...]
ON [schema_name.]object_name [(column [, ...])]
FROM database_principal[, ...]
[CASCADE]
EXAMPLE :
REVOKE SELECT
ON Invoices
FROM AnneRoberts;
#3. COMMIT
USE : save work done.
SYNTAX : COMMIT;

#4. ROLLBACK
USE : restore database to original sice the last COMMIT.
SYNTAX : ROLLBACK;
#5. SAVEPOINT
USE : identify a point in a transaction in which you can later rollback.
SYNTAX :
SAVEPOINT SAVEPOINT_NAME;
& then,
ROLLBACK TO SAVEPOINT_NAME;
RELEASE SAVEPOINT SAVEPOINT_NAME;
#6. SET TRANSACTION
USE : set space transaction, change transaction options like what rollback
segments to use.
SYNTAX : SET TRANSACTION [ READ WRITE | READ ONLY ];

SIMPLE QUERIES & NESTED QUERIES
A Simple Query is a query that searches using just one parameter. A simple query might use all
of the fields in a table and search using just one parameter, Or it might use just the necessary
fields which the information is required, but it will still use just one parameter(search criteria).
The following are some types of queries:
• A select query retrieves data from one or more of the tables in your database, or other
queries there, and displays the results in a datasheet. You can also use a select query to
group data, and to calculate sums, averages, counts, and other types of totals.
• A parameter query is a type of select query that prompts you for input before it runs. The
query then uses your input as criteria that control your results. For example, a typical
parameter query asks you for starting high and low values, and only returns records that
fall within those values.
• A cross-tab query uses row headings and column headings so you can see your data in
terms of two categories at once.
• An action query alters your data or your database. For example, you can use an action
query to create a new table, or add, delete, or change your data.
A Nested Query or a subquery or inner query is a query in a query.
A subquery is usually added in the WHERE Clause of sql statement. Most of the time, a
subquery is used when you know how to search for a value using a SELECT statement, but do
not know the exact value.
A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
A query result can be used in a condition of a Where clause. In such case, a query is called a
subquery and complete SELECT statement is called a nested query. We can also used subquery
can also be placed within HAVING clause. But subquery cannot be used with ORDERBY
clause.
Subqueries are queries nested inside other queries, marked off with parentheses, and sometimes
referred to as "inner" queries within "outer" queries. Most often, you see subqueries in WHERE
or HAVING clauses.
A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT, INSERT,
UPDATE, or DELETE statement, or inside another subquery.

A subquery can appear anywhere an expression can be used, if it returns a single value.
Statements that include a subquery usually take one of these formats:
 WHERE expression [NOT] IN (subquery).
 WHERE expression comparison_operator [ANY | ALL] (subquery).
 WHERE [NOT] EXISTS (subquery).
Following are the TYPES of Nested Queries:
Single - Row Subqueries
The single-row subquery returns one row. A special case is the scalar subquery, which returns a
single row with one column. Scalar subqueries are acceptable (and often very useful) in virtually
any situation where you could use a literal value, a constant, or an expression. The single row
query uses any operator in the query .i.e. (=, <=, >= <>, <, >). If any of the operators in the
preceding table are used with a subquery that returns more than one row, the query will fail.
Multiple-row subqueries
Multiple-row subqueries return sets of rows. These queries are commonly used to generate result
sets that will be passed to a DML or SELECT statement for further processing. Both single-row
and multiple-row subqueries will be evaluated once, before the parent query is run. Since it
returns multiple values, the query must use the set comparison operators (IN, ALL, ANY). If you
use a multi row sub query with the equals comparison operators, the database will return an error
if more than one row is returned. The operators in the following table can use multiple-row
subqueries:
Symbol Meaning
IN equal to any member in a list
ANY returns rows that match any value on a list
ALL returns rows that match all the values in a list
Multiple–Column Subquery
A subquery that compares more than one column between the parent query and subquery is
called the multiple column subqueries. In multiple-column subqueries, rows in the subquery
results are evaluated in the main query in pair-wise comparison. That is, column-to-column
comparison and row-to-row comparison.

Correlated Subquery
A correlated subquery has a more complex method of execution than single- and multiple-row
subqueries and is potentially much more powerful. If a subquery references columns in the
parent query, then its result will be dependent on the parent query. This makes it impossible to
evaluate the subquery before evaluating the parent query.
Some points to remember about the subquery are:
• Subqueries are queries nested inside other queries, marked off with parentheses.
• The result of inner query will pass to outer query for the preparation of final result.
• ORDER BY clause is not supported for Nested Queries.
• You cannot use Between Operator.
• Subqueries will always return only a single value for the outer query.
• A sub query must be put in the right hand of the comparison operator.
• A query can contain more than one sub-query.

NORMALIZATION
Normalization is the process of efficiently organizing data in a database. There are two goals of
the normalization process: eliminating redundant data (for example, storing the same data in
more than one table) and ensuring data dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the amount of space a database consumes
and ensure that data is logically stored.
Normalization is a process, in which we systematically examine relations for anomalies and,
when detected, remove those anomalies by splitting up the relation into two new, related
relations.
Normalization is an important part of the database development process: Often during
normalization, the database designers get their first real look into how the data are going to
interact in the database.
Finding problems with the database structure at this stage is strongly preferred to finding
problems further along in the development process because at this point it is fairly easy to cycle
back to the conceptual model (Entity Relationship model) and make changes. Normalization can
also be thought of as a trade-off between data redundancy and performance. Normalizing a
relation reduces data redundancy but introduces the need for joins when all of the data is required
by an application such as a report query.
 Problems without Normalization
Without normalization it becomes difficult to handle and update the database, without facing
data loss. Insertion, updation, deletion anomalies are very frequent if database is not normalized.
To understand these anomalies lets us take an example of student table.
S_id S_name S_address Subject_opted
401 Adam Noida Bio
402 Alex Panipat Maths
403 Stuart Jammu Maths
404 Adam Noida Physic

 Updation Anamoly:
To update address of the student who occur twice or more than twice in a table, we will have to
update S_address columns in all the row, else data will become inconsistent.
 Insertion anamoly:
Suppose for the new admission we have a S_id(student id), name, address of the student but if
student is not opted for any subjects yet than we have to inset Null there , leading to insertion
anamoly.
 Deletion Anamoly:
If S_id 401 has only one subject and temporarily he drops it , when we delete that row entire
student record will be deleted along with it.

BENEFITS OF NORMALIZATION
Normalization produces smaller tables with smaller rows:
 More rows per page (less logical I/O)
 More rows per I/O (more efficient)
 More rows fit in cache (less physical I/O)
The benefits of normalization include:
 Searching, sorting, and creating indexes is faster, since tables are narrower, and more
rows fit on a data page.
 You usually have more tables.
 You can have more clustered indexes (one per table), so you get more flexibility in tuning
queries.
 Index searching is often faster, since indexes tend to be narrower and shorter.
 More tables allow better use of segments to control physical placement of data.
 You usually have fewer indexes per table, so data modification commands are faster.
 Fewer null values and less redundant data, making your database more compact.
 Triggers execute more quickly if you are not maintaining redundant data.
 Data modification anomalies are reduced.
 Normalization is conceptually cleaner and easier to maintain and change as your needs
change.

NORMAL FORMS (1NF, 2NF, 3NF, BCNF)
Relations can fall into one or more categories (or classes) called Normal Forms .
Normal Form: A class of relations free from a certain set of modification anomalies.
Normal forms are given names such as:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF
The Normalization Process for a given relation consists of:
 Apply the definition of each normal form (starting with 1NF).
 If a relation fails to meet the definition of a normal form, change the relation (most often by
splitting the relation into two new relations) until it meets the definition.
 Re-test the modified/new relations to ensure they meet the definitions of each normal form.
First Normal Form (1NF)
 A relation is in first normal form if it meets the definition of a relation:
1. Each attribute (column) value must be a single value only.
2. All values for a given attribute (column) must be of the same type.
3. Each attribute (column) name must be unique.
4. The order of attributes (columns) is insignificant
5. No two tuples (rows) in a relation can be identical.
6. The order of the tuples (rows) is insignificant
Each table should be organized into row and each row should have a primary key that
distinguishes it as unique. The primary key is usually a single column but sometimes more than
one column can be combined to create a single primary key.
For example consider a table is not in first normal form

Dbms notesization 2014

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (18)

Similaire à Dbms notesization 2014

Similaire à Dbms notesization 2014 (20)

Plus de Dev Sanskriti Vishwavidyalaya (University)

Plus de Dev Sanskriti Vishwavidyalaya (University) (20)

Dernier

Dernier (20)

Dbms notesization 2014