RDBMS Fundamentals

Cloud IT Solution Page 224
RDBMS
A relational database management system (RDBMS) is a database management system (DBMS)
that is based on the relational model.
A Database Management System (DBMS) is system software used to manage the organization by
storage, access, modify and integrity of data in a structured database.
A DBMS makes it possible for end users to create, read, update and delete data in a database
systematically. The DBMS essentially serves as an interface between the database and end users,
ensuring that data is consistently organized and remains easily accessible.
Database Access Technology
Database Management System
ID# Name Phone DOB
500 Matt 555-4141 01/09/1989
501 Jery 867-5309 3/15/1981
502 Sean 876-9123 10/31/1982
ClassID Title Class Num
1001 Intro to Informatics 1101
1002 Data mining 1400
1003 Internet and society 1400
ID# ClassID Sem
500 1001 Fall02
501 1002 Fall02
501 1002 Spr03
502 1003 S203
Takes_Course
Courses
Stduents

DBMS Vs RDBMS
DBMS RDBMS
DBMS applications store data as file. RDBMS applications store data in a tabular
form.
In DBMS, data is generally stored in either a
hierarchical form or a navigational form.
In RDBMS, the tables have an identifier
called primary key and the data values are
stored in the form of tables.
Normalization is not present in DBMS. Normalization is present in RDBMS.
DBMS does not apply any security with
regards to data manipulation.
RDBMS defines the integrity
constraint for the purpose of ACID
(Atomicity, Consistency, Isolation and
Durability) property.
DBMS uses file system to store data, so there
will be no relation between the tables.
In RDBMS, data values are stored in the
form of tables, so a relationship between
these data values will be stored in the form of
a table as well.
DBMS has to provide some uniform methods
to access the stored information.
RDBMS system supports a tabular structure
of the data and a relationship between them
to access the stored information.
DBMS does not support distributed
database.
RDBMS supports distributed database.
DBMS is meant to be for small organization
and deal with small data. it supports single
user.
RDBMS is designed to handle large
amount of data. it supports multiple users.
Examples of DBMS are file systems, xml etc. Example of RDBMS are mysql, postgre, sql
server, oracle etc.
Steps in Database Design
1. Requirements Analysis
 User need; what must database do ?
2. Conceptual design
 High leve description (often done ER model).
3. Logical Design
 Translate ER into DBM data model.
4. Schema Refinement
 Consistency, normalization.
5. Physical DB Design
 Indexes,disklayout
6. Security Design
 Who access what and how?

Table
In Relational database model, a table is a collection of data elements organized in terms of rows
and columns. A table is also considered as a convenient representation of relations. But a table
can have duplicate row of data while a true relation cannot have duplicate data. Table is the
simplest form of data storage. Below is an example of an Employee table.
ID Name Age Salary
1 Adam 34 13000
2 Alex 28 15000
3 Stuart 20 18000
4 Ross 42 19020
SQL term Relational database term Description
Row Tuple or record A data set representing a single item
Column Attribute or field A labeled element of a tuple, e.g. Address or
Date of birth
Table Relation or Base relvar A set of tuples sharing the same attributes; a set
of columns and rows
View or result
set
Derived relvar Any set of tuples; a data report from the
RDBMS in response to a query
SQL Command
SQL defines following data languages to manipulate data of RDBMS.
DDL: Data Definition Language
All DDL commands are auto-committed. That means it saves all the changes permanently in the
database.
Command Description
create to create new table or database
alter for alteration
truncate delete data from table
drop to drop a table
rename to rename a table

DML: Data Manipulation Language
DML commands are not auto-committed. It means changes are not permanent to database, they
can be rolled back.
Command Description
insert to insert a new row
update to update existing row
delete to delete a row
merge merging two rows or two tables
TCL: Transaction Control Language
These commands are to keep a check on other commands and their affect on the database. These
commands can annul changes made by other commands by rolling back to original state. It can
also make changes permanent.
Command Description
commit to permanently save
rollback to undo change
savepoint to save temporarily
DCL: Data Control Language
Data control language provides command to grant and take back authority.
Command Description
grant grant permission of right
revoke Take back permission.
DQL: Data Query Language
Command Description
select retrieve records from one or more table

SQL Comment:
CREATETABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype
);
CREATETABLE Persons (
ID int NOTNULL,
LastName varchar(255) NOTNULL,
FirstName varchar(255),
Age int,
CONSTRAINT PK_Person PRIMARYKEY (ID,LastName)
);
ALTERTABLE Persons
ADDCONSTRAINT PK_Person PRIMARYKEY (ID,LastName);
ALTERTABLE Persons
DROPCONSTRAINT PK_Person;
CREATETABLE Orders (
OrderID int NOTNULL,
OrderNumber int NOTNULL,
PersonID int,
PRIMARYKEY (OrderID),
CONSTRAINT FK_PersonOrder FOREIGNKEY (PersonID)
REFERENCES Persons(PersonID)
);
SQL
Commands
DDL
Create
Alter
Drop
Truncate
Comment
Rename
DML
Select
Insert
Update
Delete
Marge
Call
Explain Plan
Lock Table
DCL
Grant
Revoke
TCL
Commit
Rollback
Save Point
Set Transaction

ALTERTABLE Orders
ADDFOREIGNKEY (PersonID) REFERENCES Persons(PersonID);
ALTERTABLE Orders
DROPCONSTRAINT FK_PersonOrder;
DROPTABLE Shippers;
TRUNCATETABLE table_name;
ALTERTABLE table_name
ADD column_name datatype;
ALTER TABLE table_name
DROP COLUMN column_name;
ALTER COLUMN column_name datatype;
MODIFY COLUMN column_name datatype;
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition
Default Method
DECODE: DECODE (supplier_id, 10000, 'IBM',
10001, 'Microsoft',
10002, 'Hewlett Packard',
'Gateway') result
CASE:CASE supplier_id
WHEN '10000' THEN 'IBM'
WHEN '10001' THEN 'Microsoft'
WHEN '10002' THEN 'Hewlett Packard '
ELSE 'Gateway'
END
NVL: NVL (‘commission_pct' , ' ' ) or NVL (
commission_pct,1 )
LENGTH: LENGTH('CANDIDE') Length in characters
TO_DATE:TO_DATE('11-10-2001','dd-mm-yy')
TO_NUMBER:TO_NUMBE ('100')
TO_CHAR:TO_CHAR (199)
TRIM:TRIM ('my name is mehedi')
LOWER: LOWER('MehedI')
UPPER: UPPER('MehedI')

SUBSTR: SUBSTR(string,start_position,length) or SUBSTR (
'This is a test', 6, 2 )
REPLACE: REPLACE (string1,
string_to_replace,replacement_string) or
REPLACE (‘222tech', '2', '3' );
LIKE Operator Description
WHERE CustomerName LIKE 'a%' Finds any values that start with a
WHERE CustomerName LIKE '%a' Finds any values that end with a
WHERE CustomerName LIKE '%or%' Finds any values that have or in any position
Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts
data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purposes
 Eliminating redundant (useless) data.
 Ensuring data dependencies make sense i.e. data is logically stored.
Problem without Normalization
 Updating Anomaly: To update address of a student who occurs twice or more than twice
in a table, we will have to update S_Address column in all the rows, else data will
become inconsistent.
 Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name
and address of a student but if student has not opted for any subjects yet then we have to
insert NULL there, leading to Insertion Anamoly.
 Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when
we delete that row, entire student record will be deleted along with it.
Normalization Rule
Normalization rule are divided into following normal form.
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
First Normal Form (1NF)
The following criteria’s must be satisfied for 1NF.
 Values of each attribute are atomic.
 No composite values
 All entries in any column must be of the same kind
 Each column must have a unique name
 No two rows are identical
Student Table

Student Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
In First Normal Form, any row must not have a column in which more than one value is saved, like
separated with commas. Rather than that, we must separate such data into multiple rows.
Student Table following 1NF will be-
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Second Normal Form (2NF)
1NF + all non-key attribute are fully functional dependent on the primary key.
As per the Second Normal Form there must not be any partial dependency of any column on
primary key. It means that for a table that has concatenated primary key, each column in the table
that is not part of the primary key must depend upon the entire concatenated key for its existence.
If any column depends only on one part of the concatenated key, then the table fails Second
normal form.
New Student Table following 2NF will be:
Student Age
Adam 15
Alex 14
Stuart 17
In Student Table, the candidate key will be Student column, because all other column i.e.Age is dependent
on it.
New Subject Table introduced for 2NF will be:
Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths

Third Normal Form (3NF)
2NF + There is no transitive function dependency. A→B, B→C
Transitive functional dependency should be removed from the table and also the table must be in
Second Normal form. For example, consider a table with following fields.
Student_Detail Table
Student_id Student_name DOB Street city State Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to apply 3NF,
we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table:
Student_id Student_name DOB Zip
Address Table :
Zip Street city state
The advantage of removing transitive dependency is-
 Amount of data duplication is reduced.
 Data integrity achieved.
The Boyce-Codd Normal Form
A relational schema R is in Boyce–Codd normal form (BCNF) if, for every one of its
dependencies X → Y, one of the following conditions holds true:
X → Y is a trivial functional dependency (i.e., Y is a subset of X)
X is a super key for schema R
ACID Properties
Database System plays with lots of different types of transactions where all transaction has
certain characteristic. This characteristic is known ACID Properties. ACID Properties take
grantee for all database transactions to accomplish all tasks.
1. Atomicity: Either commits all or nothing.
2. Consistency: Contradiction (inconsistency of data) does not mutually interfere.
3. Isolation: Multiple transactions do not mutually interfere.
4. Durability: committed data stored forever.
Key types
1. Candidate Key
2. Primary Key
3. Unique Key
4. Alternate Key
5. Composite Key
6. Super Key
7. Foreign Key
8. Surrogate Key

Example
STUDENT {SID, FNAME, LNAME, COURSEID}
Here in STUDENT table keys are:
Super key: SID, FNAME+LAME, FNAME+COURSEID, LNAME +LNAME
Candidate keys are SID or FNAME+LAME
Primary Key: SID
Foreign Key: COURSEID
Alternate Key: FNAME+LAME
Composite Key: FNAME+LAME
Primary key
A column or columns is called primary key (PK) that uniquely identifies each row in the table.
If you want to create a primary key, you should define a PRIMARY KEY constraint when you
create or modify a table.
When multiple columns are used as a primary key, it is known as composite primary key.
Points to remember for primary key
 Primary key enforces the entity integrity of the table.
 Primary key always has unique data.
 A primary key length cannot be exceeded than 900 bytes.
 A primary key cannot have null value.
 There can be no duplicate value for a primary key.
 A table can contain only one primary key constraint.
Main advantage of primary key
 The main advantage of this uniqueness is that we get fast access.
Foreign key
In the relational databases, a foreign key is a field or a column that is used to establish a link
between two tables.
In simple words you can say that, a foreign key in one table used to point primary key in another
table.
Super Key
Candidate Key
Primary
Key

Super key
 Super key=candidate key +zero/more attributes.
 Every Candidate key is a super key.
 But every super key is not a candidate key.
 An attribute or set of attributes that uniquely defines a tuple within a relation. However, a
super key may contain additional attributes that are not necessary for unique
identification.
Candidate key
A super key such that no proper subset is a super key within the relation. So, basically has two
properties: Each candidate key uniquely identifies tuple in the relation; & no proper subset of the
composite key has the uniqueness property.
Alternate key
 Any candidate key that has not been selected as the primary key.
 An alternate key is just a candidate key that has not been selected as the primary key.
Composite key
 When a candidate key consists of more than one attribute.
 It may be a candidate key or primary key.
Surrogate Key
Surrogate keys are keys that have no business meaning and are solely used to identify a record in
the table. The surrogate key is not derived from application data. The surrogate is internally
generated by the system and is invisible to the user or application
JOIN
A JOIN clause is used to combine rows from two or more tables, views based on a related column
between them.
SELECT Orders.OrderID, Customers.CustomerName,
Orders.OrderDate
FROM Orders
INNERJOIN Customers ON Orders.CustomerID=Customers.CustomerID;
Different Types of SQL JOINs
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the
right table
RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from
the left table
FULL (OUTER) JOIN: Return all records when there is a match in either left or right table

SELECT column_name(s)
FROM table1
INNER JOIN table2 ON table1.column_name =
table2.column_name;
FROM table1
LEFTJOIN table2 ON table1.column_name = table2.column_name;
FROM table1
RIGHTJOIN table2 ON table1.column_name =
table2.column_name;
FROM table1
FULLOUTERJOIN table2 ON table1.column_name =
table2.column_name;
Self Join
A self-join is a query in which a table is joined (compared) to itself. Self-joins are used to
compare values in a column with other values in the same column in the same table. One
practical use for self-joins: obtaining running counts and running totals in an SQL query.
Example
SELECT a.ID, b.NAME, a.SALARY
FROM CUSTOMERS a, CUSTOMERS b
WHERE a.SALARY < b.SALARY;
Aggregate functions
The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM,
AVG) to group the result-set by one or more columns.
Sequence of Clause
1. where
2. group by
3. having
4. order by

Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUPBY Country
ORDERBY COUNT(CustomerID) DESC;
SELECT COUNT(CustomerID), Country
FROM Customers
GROUPBY Country
HAVING COUNT (CustomerID) > 5
ORDERBY COUNT (CustomerID) DESC;
SQL View
A view in SQL is a logical subset of data from one or more tables. View is used to restrict data
access.
Syntax for creating a view
CREATE or REPLACE viewview_name AS
FROM table_name
WHERE condition
Types of view
There are two types of view
1. Simple View
2. Complex View
Simple View Complex View
Created from one table Created from one or more
table
Does not contain functions Contain functions
Does not contain groups of data Contains groups of data
SQL Sequence
Sequence is a feature supported by some database systems to produce unique values on demand.
Some DBMS like MySQL supports AUTO_INCREMENT in place of Sequence.
AUTO_INCREMENT is applied on columns, it automatically increments the column value by 1
each time a new record is entered into the table. Sequence is also somewhat like
AUTO_INCREMENT but its has some extra features.
Creating Sequence
Syntax to create sequences is,
CREATE Sequencesequence-name
start with initial-value
increment by increment-value
maxvaluemaximum-value
cycle|nocycle

Query Example
Max salary
SELECT MAX(salary) FROM Employee
WHERE Salary NOT IN (SELECT Max(Salary) FROM Employee);
SELECT MAX(Salary) From Employee
WHERE Salary < ( SELECT Max(Salary) FROM Employee);
SELECT Id, Salary FROM Employee e
WHERE 1 (N-1) = (SELECT COUNT(DISTINCT Salary)
FROM Employee p WHERE e.Salary<p.Salary
Only Distinct Value
SELECT NAME,COUNT(NAME) FROM STUD GROUP BY NAME
HAVING COUNT(NAME)>1;
Positive/Negative Value
SELECT
(SELECT COUNT(roll_no) FROM stud WHERE roll_no>0)
Positevalue ,
(SELECT COUNT(roll_no) FROM stud WHERE roll_no<0)
Negativevalue
Current Date
SELECT CURDATE();
Difference between Truncate and Delete
Truncate Delete
We can’t Rollback after performing
truncate.
Example
Begain tran
Trancate table tranTest;
Select * from tranTest;
Rollback;
We can rollback after delete
Example
Begain tran
Delete from tranTest;
Rollback;
Truncate reset identity of table Truncate reset identity of table
It locks the entire table. It locks the table row.
Its DDL(Data Definition Language)
command.
Its DML(Data Manipulation Language)
command.
We can’t use where cluses with it. We can use where to filter data to delete.
Trigger is not fired while truncate. Trigger is fired.
Syntax :
Trancate table tablename
Syntax :
1. Delete from tablename
2. Delete from tablename
where
columnanme=condition

Difference Primary Key and Unique Key
Primary Key Unique Key
Primary Key cannot accept null value Unique Key can accept only one null
value.
We can have only one primary key in a
table.
We can have more than one unique key.
Primary key can be made foreign key
into another table
Unique key can be made foreign key into
another table
By default it adds clustered index By default it adds unique non clustered
index
Difference among Delete , Drop and Truncate
Type Delete Drop Truncate
Usage Remove row from a
table.
Delete a table from
the database /Data
dictionary
Remove all
rows from a
table.
Type command DML DDL DDL
Rollback Can be rollback Can’t rollback Can’t rollback
Rows ,indexes
and privileges
Only table row are
deleted.
Table rows, indexes
and privileges are
deleted.
Only table rows
are deleted.
DML trigger
firing
Trigger is fired No triggers are fired No triggers are
fired
Performance Slower than truncate Quick but could lead
to complications
Faster than
delete.
Uno space Uses Uno space Does not use Uno
space
Uses Uno space
but not much as
delete
Permanent
Deletion
Does not remove the
record permanently
Remove all records
,indexes and
privileges
permanently
Remove the
record
permanently
Where clause Yes No No
Row deletion Deletes all rows or
some rows.
Deletes all rows Deletes all rows
Difference Procedure and Function
Procedure Function
Procedure does not return a value through return
statement.
Function returns value by return
statement.
Return statement may or may not be present in
procedure.
Return statement has must be present in
function.
In Procedure return statement is just that without Return statement in function must

any expression.
Return statement in procedure is used only to
transfer control back to calling program
contain a expression, expression can be
variable, hard coded values an arithmetic
expression involved in column.
The return data type is not required in procedure. The return data type is required to declare
in a function.
Procedure is call as a standalone call like a
command in any other procedure, function or
trigger
A function has to be called as part of an
SQL statement or part of an expression
only.
Procedure may return no value or multiple value
time outmode parameters.
Function returns a single value.
Procedure does not purity level Function has purity level.
Procedure are mainly written manipulate and
process the data from the table.
Function normally should not be used to
manipulate the data.
Advantage of subprogram (Procedures & Functions)
 Extensibility
 Modularity
 Reusability
 Maintainability
 Abstraction & Data Hiding
 Security
PLSQL
Cursor
A cursor is a pointer to this context area. PL/SQL controls the context area through a cursor. A
cursor holds the rows (one or more) returned by a SQL statement. The set of rows the cursor
holds is referred to as the active set.
You can name a cursor so that it could be referred to in a program to fetch and process the rows
returned by the SQL statement, one at a time. There are two types of cursors −
1. Implicit cursors
2. Explicit cursors
Attribute Description
%FOUND Returns TRUE if an INSERT, UPDATE, or DELETE statement
affected one or more rows or a SELECT INTO statement returned
one or more rows. Otherwise, it returns FALSE.
%NOTFOUND The logical opposite of %FOUND. It returns TRUE if an INSERT,
UPDATE, or DELETE statement affected no rows, or a SELECT
INTO statement returned no rows. Otherwise, it returns FALSE.
%ISOPEN Always returns FALSE for implicit cursors, because Oracle closes the
SQL cursor automatically after executing its associated SQL
statement.
%ROWCOUNT Returns the number of rows affected by an INSERT, UPDATE, or
DELETE statement, or returned by a SELECT INTO statement.

To create an explicit cursor you need to follow 5 steps.
1. Declare
2. Open
3. Fetch
4. Close
5. Reallocates
Triggers
Triggers are stored programs, which are automatically executed or fired when some events occur.
Triggers are, in fact, written to be executed in response to any of the following events –
 A database manipulation (DML) statement (DELETE, INSERT, or UPDATE)
 A database definition (DDL) statement (CREATE, ALTER, or DROP).
 A database operation (SERVERERROR, LOGON, LOGOFF, STARTUP, or
SHUTDOWN)
Example
CREATE OR REPLACE TRIGGER orders_after_insert
AFTER INSERT
ON orders
FOR EACH ROW
DECLARE
v_username varchar2(10);
BEGIN
-- Find username of person performing the INSERT into
the table
SELECT user INTO v_username
UPDATE TABLE SET…
INSERT INTO TABLE…
DELETE FROM
TABLE...
INSERT TRIGGER
BEGIN
……………….
DELETE TRIGGER
BEGIN
……………….
UPDATE TRIGGER
BEGIN
……………….
TABLE
Application
Database

FROM dual;
-- Insert record into audit table
INSERT INTO orders_audit
( order_id,
quantity,
cost_per_item,
total_cost,
username )
VALUES
( :new.order_id,
:new.quantity,
:new.cost_per_item,
:new.total_cost,
v_username );
END;
Triggers can be defined on the table, view, schema, or database with which the event is
associated.
Benefits of Triggers
Triggers can be written for the following purposes –
 Generating some derived column values automatically
 Enforcing referential integrity
 Event logging and storing information on table access
 Auditing
 Synchronous replication of tables
 Imposing security authorizations
 Preventing invalid transactions
Package
 A package is a schema object that groups logically related PL/SQL types, variables,
constants, subprograms, cursors, and exceptions. A package is compiled and stored in the
database, where many applications can share its contents.
Advantage of Package
 Less I/O, More efficiency
 Program overloading is available only for package subprograms where as standalone sub
program can not be overloaded.
 Avoid dependencies.
 Variable declared in the package specification is global.

E-R Diagram
ER-Diagram is a visual representation of data that describes how data is related to each other.
Symbols and Notations
Components of E-R Diagram
The E-R diagram has three main components.

Entity/Data Object
An Entity can be any object, place, person or class. In E-R Diagram, an entity is represented
using rectangles. Consider an example of an Organization. Employee, Manager, Department,
Product and many more can be taken as entities from an Organization.
Weak Entity
Weak entity is an entity that depends on another entity. Weak entity doen't have key attribute of
their own. Double rectangle represents weak entity.
Attribute
[
An Attribute describes a property or characteristic of an entity. For example, Name, Age,
Address etc can be attributes of a Student. An attribute is represented using eclipse.
Key Attribute
Key attribute represents the main characteristic of an Entity. It is used to represent Primary key.
Ellipse with underlying lines represent Key Attribute.

Composite Attribute
An attribute can also have their own attributes. These attributes are known as Composite
attribute.
Relationship
A Relationship describes relations between entities. Relationship is represented using diamonds
There are three types of relationship that exist between Entities.
1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship
Binary Relationship
Binary Relationship means relation between two Entities. This is further divided into three types.
1. One to One : This type of relationship is rarely seen in real world.

 The above example describes that one student can enroll only for one course and a course
will also have only one Student. This is not what you will usually see in relationship.
2. One to Many: It reflects business rule that one entity is associated with many number of
same entity. The example for this relation might sound a little weird, but this means that
one student can enroll to many courses, but one course will have one Student.
 The arrows in the diagram describes that one student can enroll for only one
course.
3. Many to One: It reflects business rule that many entities can be associated with just one
entity. For example, Student enrolls for only one Course but a Course can have many
Students.
4. Many to Many :

The above diagram represents that many students can enroll for more than one course.
Recursive Relationship
When an Entity is related with itself it is known as Recursive Relationship.
Ternary Relationship
Relationship of degree three is called Ternary relationship.
Generalization
Generalization is a bottom-up approach in which two lower level entities combine to form a
higher-level entity. In generalization, the higher-level entity can also combine with other lower
level entity to make further higher level entity.
Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one higher level
entity can be broken down into two lower level entities. In specialization, some higher level
entities may not have lower-level entity sets at all.

Aggregation
Aggregation is a process when relation between two entities is treated as a single entity. Here the
relation between Center and Course is acting as an Entity in relation with Visitor.
Weak entity
An entity set that does not possess sufficient attributes to form a primary key is called a weak
entity set.

Strong entity set
One that does have a primary key is called a strong entity set.
Data Flow Diagram
Data flow diagram is graphical representation of flow of data in an information system. It is
capable of depicting incoming data flow, outgoing data flow and stored data. The DFD does not
mention anything about how data flows through the system.
Types of DFD
Logical DFD - This type of DFD concentrates on the system process and flow of data in the
system. For example in a Banking software system, how data is moved between different entities.
Physical DFD - This type of DFD shows how the data flow is actually implemented in the
system. It is more specific and close to the implementation.
DFD Components
 Entities - Entities are source and destination of information data. Entities are represented
by rectangles with their respective names.
 Process - Activities and action taken on the data are represented by Circle or Round-
edged rectangles.
 Data Storage - There are two variants of data storage - it can either be represented as a
rectangle with absence of both smaller sides or as an open-sided rectangle with only one
side missing.
 Data Flow - Movement of data is shown by pointed arrows. Data movement is shown
from the base of arrow as its source towards head of the arrow as destination.
Entity
Enti
Process Data Store Data flow

Symbol Name Meaning
Data flow Represent flows of data
Process Represents activities and processes
including data processing/ conversion.
Data store Represents stored data(e.g ledgers, files,
databases)
Data source (External) Represents the originating orgins(i.e
source) or destination(i.e sink) of data.
UML (Unified Modeling Language)
UML is a standard unified modeling language approved by the OMG (Object Management
Group) (a standardization body for object-oriented technologies). It is used in the notation of
deliverables (e.g., specification documents) in object-oriented development, from analysis to
design, implementation, and testing.
1. Class diagram
2. Use case diagram
3. Sequence diagram
4. Communication diagram (collaboration diagram)
5. State machine diagram (state chart diagram)
6. Activity diagram
7. Component diagram
8. Object diagram
9. Package diagram
10. Timing diagram
Use case diagram
A use case diagram at its simplest is a representation of a user's interaction with the system that
shows the relationship between the user and the different use cases in which the user is involved.
A use case diagram can identify the different types of users of a system and the different use
cases and will often be accompanied by other types of diagrams as well.

ER-Diagram ATM:
Indexing
Indexing is a data structure technique to efficiently retrieve records from the database files based
on some attributes on which the indexing has been done. Indexing in database systems is similar
to what we see in books.
Benefits
 Improve the search efficiency
 Consist of two fields (key and block point).

 Index is an order file.
 Searching can be binary

Average no of block access to access a record is Log2
B
Indexing is defined based on its indexing attributes. Indexing can be of the following types −
 Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
 Secondary Index − Secondary index may be generated from a field which is a candidate
key and has a unique value in every record, or a non-key with duplicate values.
 Clustering Index − Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field.
Ordered Indexing is of two types −
1. Dense Index
2. Sparse Index
Employee Id Employee Name Block
1 A B1
2 B
3 C B2
4 D
5 E B3
6 F
Dense Index
In dense index, there is an index record for every search key value in the database. This makes
searching faster but requires more space to store index records itself. Index records contain search
key value and a pointer to the actual record on the disk.
Search Key Block Point
1 B1
2 B1
3 B2
4 B2
5 B3
6 B3
Sparse Index
In sparse index, index records are not created for every search key. An index record here contains
a search key and an actual pointer to the data on the disk. To search a record, we first proceed by
index record and reach at the actual location of the data. If the data we are looking for is not
where we directly reach by following the index, then the system starts sequential search until the
desired data is found.
Search Key Block Point
1 B1
3 B2
5 B3

Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the disk
along with the actual database files. As the size of the database grows, so does the size of the
indices. There is an immense need to keep the index records in the main memory so as to speed
up the search operations. If single-level index is used, then a large size index cannot be kept in
memory which leads to multiple disk accesses.
Hash Organization
Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage. A
bucket typically stores one complete disk block, which in turn can store one or more records.
Hash Function − A hash function, h, is a mapping function that maps all the set of search-keys
K to the address where actual records are placed. It is a function from search keys to bucket
addresses.
There are two types of hash file organizations –
1. Static Hashing.
2. Dynamic Hashing
Static Hashing
In this method of hashing, the resultant data bucket address will be always same. That means, if
we want to generate address for EMP_ID = 103 using mod (5) hash function, it always result in
the same bucket address 3. There will not be any changes to the bucket address here. Hence
number of data buckets in the memory for this static hashing remains constant throughout. In our
example, we will have five data buckets in the memory used to store the data.

Operation
Insertion − When a record is required to be entered using static hash, the hash function h
computes the bucket address for search key K, where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be retrieved, the same hash function can be used to retrieve the
address of the bucket where the data is stored.
Delete − This is simply a search followed by a deletion operation.
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash result
and is linked after the previous one. This mechanism is called Closed Hashing.
Linear Probing − When a hash function generates an address at which data is already stored, the
next free bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of the
database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are
added and removed dynamically and on-demand. Dynamic hashing is also known as extended
hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a few
are used initially.

Data center
A data center is a facility used to house computer systems and associated components, such as
telecommunications and storage systems. It generally includes redundant or backup power
supplies, redundant data communications connections, environmental controls (e.g. air
conditioning, fire suppression) and various security devices. A large data center is an industrial
scale operation using as much electricity as a small town.
Tier 1 to 4 data center is nothing but a standardized methodology used to define uptime of data
center. This is useful for measuring:
a) Data center performance
b) Investment
c) ROI (return on investment)
Tier 4 data center considered as most robust and less prone to failures. Tier 4 is designed to host
mission critical servers and computer systems, with fully redundant subsystems (cooling, power,
network links, storage etc.) and compartmentalized security zones controlled by biometric access
controls methods. Naturally, the simplest is a Tier 1 data center used by small business or shops.
• Tier 1 = Non-redundant capacity components (single uplink and servers).
• Tier 2 = Tier 1 + Redundant capacity components.
• Tier 3 = Tier 1 + Tier 2 + Dual-powered equipment’s and multiple uplinks.
• Tier 4 = Tier 1 + Tier 2 + Tier 3 + all components are fully fault-tolerant including
uplinks, storage, chillers, HVAC systems, servers etc. Everything is dual-powered.
Data Center Availability According To Tiers
The levels also describe the availability of data from the hardware at a location as follows:
 Tier 1: Guaranteeing 99.671% availability.
Blade Server
A blade server is a stripped-down server computer with a modular design optimized to minimize
the use of physical space and energy. Blade servers have many components removed to save

space, minimize power consumption and other considerations, while still having all the functional
components to be considered a computer. Unlike a rack-mount server, a blade server needs a
blade enclosure, which can hold multiple blade servers, providing services such as power,
cooling, networking, various interconnects and management. Together, blades and the blade
enclosure, form a blade system. Different blade providers have differing principles regarding
what to include in the blade itself, and in the blade system as a whole.
RAID
RAID is a technology that is used to increase the performance and/or reliability of data storage.
The abbreviation stands for Redundant Array of Inexpensive Disks. A RAID system consists of
two or more drives working in parallel.
This article covers the following RAID levels:
 RAID 0 – striping
 RAID 1 – mirroring
 RAID 5 – striping with parity
 RAID 6 – striping with double parity
 RAID 10 – combining mirroring and striping
RAID level 0 – Striping
Advantages
 RAID 0 offers great performance, both in read and writes operations. There is no
overhead caused by parity controls.
 All storage capacity is used, there is no overhead.
 The technology is easy to implement.
Disadvantages
 RAID 0 is not fault-tolerant. If one drive fails, all data in the RAID 0 array are lost. It
should not be used for mission-critical systems.

RAID level 1 – Mirroring
Advantages
 RAID 1 offers excellent read speed and a write-speed that is comparable to that of a
single drive.
 In case a drive fails, data do not have to be rebuilt, they just have to be copied to the
replacement drive.
 RAID 1 is a very simple technology.
Disadvantages
 The main disadvantage is that the effective storage capacity is only half of the total drive
capacity because all data get written twice.
 Software RAID 1 solutions do not always allow a hot swap of a failed drive. That means
the failed drive can only be replaced after powering down the computer it is attached to.
For servers that are used simultaneously by many people, this may not be acceptable.
Such systems typically use hardware controllers that do support hot swapping
RAID level 5
Advantages
 Read data transactions are very fast while write data transactions are somewhat slower
(due to the parity that has to be calculated).
 If a drive fails, you still have access to all data, even while the failed drive is being
replaced and the storage controller rebuilds the data on the new drive.
Disadvantages
 Drive failures have an effect on throughput, although this is still acceptable.

 This is complex technology. If one of the disks in an array using 4TB disks fails and is
replaced, restoring the data (the rebuild time) may take a day or longer, depending on the
load on the array and the speed of the controller. If another disk goes bad during that
time, data are lost forever.
RAID level 6 – Striping with double parity
Advantages
 Like with RAID 5, read data transactions are very fast.
 If two drives fail, you still have access to all data, even while the failed drives are being
replaced. So RAID 6 is more secure than RAID 5.
Disadvantages
 Write data transactions are slower than RAID 5 due to the additional parity data that
have to be calculated. In one report I read the write performance was 20% lower.
 Drive failures have an effect on throughput, although this is still acceptable.
 This is complex technology. Rebuilding an array in which one drive failed can take a
long time.
RAID level 10 – combining RAID 1 & RAID 0
Advantages
 If something goes wrong with one of the disks in a RAID 10 configuration, the rebuild
time is very fast since all that is needed is copying all the data from the surviving mirror
to a new drive. This can take as little as 30 minutes for drives of 1 TB.

Disadvantages
 Half of the storage capacity goes to mirroring, so compared to large RAID 5 or RAID 6
arrays, this is an expensive way to have redundancy.
Big Data
 Big data is a term that describes the large volume of data – both structured and
unstructured – that inundates a business on a day-to-day basis.
Why Big Data
 Increase of storage capacities
 Increase of processing power
 Availability of data
 Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has
been created in the last two years alone
Sources of Big Data
 Social networking sites: Facebook, Google, LinkedIn all these sites generates huge
amount of data on a day to day basis as they have billions of users worldwide.
 E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs
from which users buying trends can be traced.
 Weather Station: All the weather station and satellite gives very huge data which are
stored and manipulated to forecast weather.
 Telecom Company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
 Share Market: Stock exchange across the world generates huge amount of data through
its daily transaction.
3V's of Big Data
 Velocity: The data is increasing at a very fast rate. It is estimated that the volume of data
will double in every 2 years.
 Variety: Now a day’s data are not stored in rows and column. Data is structured as well
as unstructured. Log file, CCTV footage is unstructured data. Data which can be saved in
tables are structured data like the transaction data of the bank.
 Volume: The amount of data which we deal with is of very large size of Peta bytes
Issues
 Huge amount of unstructured data which needs to be stored, processed and analyzed
Solution
 Storage: This huge amount of data, Hadoop uses HDFS (Hadoop Distributed File
System) which uses commodity hardware to form clusters and store data in a
distributed fashion. It works on Write once, read many times principle.
 Processing: Map Reduce paradigm is applied to data distributed over network to find
the required output.
 Analyze: Pig, Hive can be used to analyze the data.
 Cost: Hadoop is open source so the cost is no more an issue.

SQL Vs NoSQL
SQL NoSQL
Databases are categorized as Relational
Database Management System
(RDBMS).
NoSQL databases are categorized as Non-
relational or distributed database system.
SQL databases have fixed or static or
predefined schema.
NoSQL databases have dynamic schema.
SQL databases display data in form of
tables so it is known as table-based
database.
NoSQL databases display data as collection of
key-value pair, documents, graph databases or
wide-column stores.
SQL databases are vertically scalable. NoSQL databases are horizontally scalable.
SQL databases use a powerful
language Structured Query
Language to define and manipulate
the data.
In NoSQL databases, collection of documents
are used to query the data. It is also called
unstructured query language. It varies from
database to database.
SQL databases are best suited for
complex queries.
NoSQL databases are not so good for complex
queries because these are not as powerful as
SQL queries.
SQL databases are not best suited for
hierarchical data storage.
NoSQL databases are best suited for
hierarchical data storage.
MySQL, Oracle, Sqlite, PostgreSQL
and MS-SQL etc. are the example of
SQL database.
MongoDB, BigTable, Redis, RavenDB,
Cassandra, Hbase, Neo4j, CouchDB etc. are
the example of nosql database
Oracle Vs Mysql
Feature Oracle Mysql
Strengths Aircraft carrier database
capable of running large OLTP
and VLDBs.
Price/Performance Great
performance when
applications leverage
architecture.
Database Product • Enterprise ($$$$)
• Standard ($$)
• Standard One ($)
• Express (Free) - up
to 4GB
• Enterprise ($) –
supported,more
stable.
• Community (free)
Application
Perspective
More you do in the database
the more you will love Oracle
with compiled PL/SQL,
XML,APEX, Java, etc.
Web applications often don’t
leverage database server
functionality. Web apps more
concerned with fast reads.
Adminstration Requires lots of in-depth Can be trivial to get it setup

knowledge and skill to manage
large environments. Can get
extremely complex but also
very powerful.
and running. Large and
advanced configurations can
get complex.
Popularity Extremely popular in Fortune
100, medium/large enterprise
business applications and
medium/large data warehouses.
Extremely popular with web
companies,
startups,small/medium
businesses, small/medium
projects.
Application Domain Medium/Large OLTP and
enterprise applications. Oracle
excels in large business
applications. Medium/Large
data warehouse
Web (MySQL excels) Data
Warehouse Gaming
Small/media OLTP
environmnets
Development
Environments
1) Java
2) .NET
3) APEX
4) Ruby on Rails
5) PHP
1) PHP
2) JAVA
3) Ruby on Rails
4) .NET
5) Perl
Database
Server(Instance)
Database instance has
numerous background
processes dependent on
configuration. System Global
Area is shared memory for
SMON, PMON,DBWR,
LGWR, ARCH, RECO,
etc.Sessions are managed
through server processes.
Database Instance stores
global memory in mysqld
background process. User
sessions are managed through
threads.
Database Server Uses tablespaces for system
metadata, user data and
indexes. Common tablespaces
include
Made up of database
schemas.
Partitioning $$$ with lots of options Free, basic features
Replication $$$, lots of features and
options. Much higher
complexity with a lot of
features. Allows a lot of data
filtering and manipulation.
Free, relatively easy to setup
and manage. Basic features
but works great. Great
horizontal scalability.
Transactions Regular and Index only tables
support transactions.
InnoDB and upcoming
Falcon and Maria storage
engines
Backup/Recovery Recovery Manager (RMAN)
supports hot backups and runs
No online backup builtin.

as a separate central repository
for multiple Oracle database
servers.
Export/Import More features. Easy, very basic.
Data Dictionary(catalog) Data dictionary offers lots of
detailed information for tuning.
Oracle starting to charge for
use of new metadata structures.
Information_schema and
mysql database schemas offer
basic metadata.
Management/Monitoring $$$$, Grid Control offers lots
of functionality. Lots of 3rd
party options such as Quest.
$, MySQL Enterprise
Monitor offers basic
functionality. Additional
open source solutions. May
also use admin scripts.
Storage Tables managed in tablespaces.
ASM offers striping and
mirroring using cheap fast
disks.
Each storage engine uses
different storage. Varies from
individual files to
tablespaces.
Stored Procedures Advanced features, runs
interpreted or compiled. Lots
of built in packages add
significant functionality.
Extremely scalable.
Very basic features, runs
interpreted in session threads.
Limited scalability.
Comparison between Rack and Blade Servers
Rack Servers Blade Servers
Definition Rack servers are also known as traditional servers.
They are essentially stand alone computers on which
applications are run. All the components like hard
drives, a network card, etc. are contained in a case.
A blade server is a stripped
down computer server that is
based on a modular design. It
minimizes the use of physical
space.
Origin Rack servers are specially designed to be stored in
racks, hence the name rack server.
Blade comes from the word
“blade” indicating the
restricted format).
Focus Rack servers are very expandable Comparatively less
Power
Demand
More Less
Maintenance More Less
Cost More Less
Size Comparatively large Compact
Cabling More Less
Suitable for Small business Extended organizations
Benefits • Make it easy to keep things neat and
orderly (most include some kind of
cable management)
• Known to be very expandable
• Lower acquisition
cost
• Lower operational
cost for

• Many rack servers support large
amounts of RAM
deployment
• Lower cost for
troubleshooting
and repair
• Lower power
requirements
• Lower space and
cooling
requirements
• Reduces the
cabling
requirements
• Very efficient on
out-of-band
management
• Allows faster
server-to-server
communication.
• They offer greater
flexibility
Configurations Available in multiple U iterations Only available in 2U
configurations.
Example Dell PowerEdge R320, R420 and R520. Dell PowerEdge M series
Design Stand alone Modular
Disadvantages Consumes more physical rack space. Reliability on the chassis
Mount inside a Special rack Chassis

1. Data security threats include_
a.hardware failure b. privacy invasion c. fraudulent manipulation of data d. All of these
2. An operation that will increase the length of a list is-
a. Insert b. Look-up c. Modify d. All of these
3. In SQL, which command is used to add a column/ integrity constraint to a table-
a. ADD COLUMN b. INSERT COLUMN c. MODIFY TABLE d. ALTER TABLE
4. In SQL, which command(s) are is used to enable/disable a database trigger?
a. ALTER TRIGGER b. ALTER DATABASE c. ALTER TABLE d. MODIFY TRIGER
5. In a relational schema, each tuple is divided into fields called-
a. Relations b. Domains c. Queries d. All of the above
6. In SQL, which command is used to changes data in a table?
a. UPDATE b. INSERT c. BROWSE d. APPEND
7. What name is given to the collection of facts, items of information or data which are
related in some way?
a. Database b. Directory information c. Information tree d. Information provider
8. In a large DBMS_
a. each user can “see” only a small part of the entire database
b. each user can access every sub-schema
c. each subschema contains every field in the logical schema
d. All of the above
9. Which of the following command(s) is used to recompile a stored procedure in
SQL?
a. COMPILE PROCEDURE b. ALTER PROCEDURE
c. MODIFY PROCEDURE d. All of the above
10. Internal auditors should review data system design before they are-
a. developed b. implemented c. modified d. All of the above
11. A____ means that one record in a particular record types may be related to more
than one record of another record type.
a. One-to-one relationship b. One-to-many relationship
c. Many-to-one relationship d. Many-to-many relationship
12. Which command is used to redefine a column of the table in SQL?
a. ALTER TABLE b. DEFINE TABLE c.MODIFY TABLE d. All of the above
13. Which command is used to enable/disable/drop an integrity constraint in SQL?
a. DEFINE TABLE b. MODIFY TABLE c. ALTER TABLE d. All of the above
14. In SQL, the ALTER TABLESPACE command is used-
a. to add/rename data files b. to change storage characteristics
c. to take a table space online/offline d. to begin/end a backup
Model Test

15. The language used in application programs to request data from the DBMS is
referred to as the-
a. DML b. DDL c. query language d. All of these above
16. A database management system might consist of application programs and a
software package called
a. FORTRAN b. AUTOFLOW c. BPL d. TOTAL
17. An audit trail
a. is used to make back-up copies b. is the recorded history of operations performed on a file
c. can be used to restore lost information d. All of the above
18. A race condition occurs when
a. Two concurrent activities interact to cause a processing error
b. two users of the DBMS are interacting with different files at the same time
c. both (a) and (b)
d. None of the above
19. An indexing operation
a. sorts a file using a single key b. sorts using two keys
c. establishes an index for a file d. both (b) and (c)
20. The on-line softcopy display a customer’s charge account to respond to an inquiry
is an example of a
a. forecasting report b. exception report
c. regularly scheduled report d. on demand report
21. In SQL, which command is used to create a synonym for a schema object?
a. CREATE SCHEMA b. CREATE SYNONYM
c. CREATE SAME d. All of the above
22. If you want your database to include methods, you should use a _______database.
a. Network b. Distributed c. Hierarchical d. Object-Oriented
23. In SQL, which of the following is not a data Manipulation Language Commands?
a. DELETE b. SELECT c. UPDATE d. CREATE
24. Which of the following is not characteristic of a relational database model?
a. tables b. treelike structure c. complex logical relationships d. records
25. A computer file contains several records. What does each record contain?
a. Bytes b. Words c. Fields d. Database
26. In SQL, the CREATE VIEW command is used
a. to recompile view b. to define a view of one or more tables or views
c. to recompile a table d. to create a trigger
27. A ______ Contains the smallest unit of meaningful data, so you might call it the
basic building block for adata file.
a. File Structure b. Records c. Fields d. Database
28. In the DBM approach, application programs perform the
a. storage function b. processing functions c. access control d. All of the above
29. In SQL, which command is used to create a database user?
a. ADD USER TO DATABASE b. MK USER
c. CREATE USER d. All of the above
30. A _____ means that one record in a particular record type is related to only one
record of another record type.

a. One-to-one relationship b. One-to-many relationship
c. Many-to-many relationship d. Many-to-many relationship
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
b a d a b a a a b d b a c e a
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
c b a c d b d d b c b c b c a
Model Test Answer

RDBMS Fundamentals

Recommandé

Recommandé

Contenu connexe

Similaire à RDBMS Fundamentals

Similaire à RDBMS Fundamentals (20)

Plus de Export Promotion Bureau

Plus de Export Promotion Bureau (20)

Dernier

Dernier (20)

RDBMS Fundamentals