4. • Recommended Book
1. Database Systems: A Practical Approach to
Design, Implementation, and Management, 6th
Edition by Thomas Connolly and Carolyn Begg
• Reference Material
1. Database Systems: The Complete Book, 2nd
Edition by Hector Garcia-Molina, Jeffrey D.
Ullman, Jennifer Widom
2. Database System Concepts, 6th Edition by Avi
Silberschatz, Henry F. Korth and S. Sudarshan.
3. Database Management Systems, 3rd Edition by
Raghu Ramakrishnan, Johannes Gehrke
6. Objective
• Introduction to database system
• Why need databases
• History of database
• Types of databases
• Database user
• DBMS
7. Why Study Databases?
• Databases are useful
– Many computing
applications deal
with large amounts
of information
– Database systems
give a set of tools for
storing, searching
and managing this
information
• Databases in CS
– Databases are a ‘core
topic’ in computer
science
– Basic concepts and
skills with database
systems are part of
the skill set you will
be assumed to have
as a CS graduate
8. What is a Database?
• “A set of information held in a computer”
Oxford English Dictionary
• “One or more large structured sets of
persistent data, usually associated with
software to update and query the data”
Free On-Line Dictionary of Computing
• “A collection of data arranged for ease and
speed of search and retrieval”
Dictionary.com
9. Database
• Definition
– A collection of self-describing and integrated data
files
• System catalog
– Meta data
– Data dictionary
• Data abstraction
10. Databases
• Web indexes
• Library catalogues
• Medical records
• Bank accounts
• Stock control
• Personnel systems
• Product catalogues
• Telephone directories
• Train timetables
• Airline bookings
• Credit card details
• Student records
• Customer histories
• Stock market prices
• Discussion boards
• and so on…
11. File-Based Systems
• Early attempt to Computerize the manual
filing system
• Collection of application programs that
perform services for the end users (e.g.
reports).
• Each program defines and manages its own
data.
12. Manual Filing Systems
• Works well
– while number of items to be stored is small
– For only storage or retrieval functionality of large
number of items
14. Limitations of File-Based
Approach
Separation and isolation of data
Each program maintains its own set of data.
Users of one program may be unaware of potentially
useful data held by other programs.
For example, if we want to produce a list of all houses
that match the requirements of the clients.
Duplication of data
Decentralized approach taken by each department.
Same data is held by different programs.
Wasted space and potentially different values and/or
different formats for the same item.
15. Limitations of File-Based
Approach..
Data dependence
File structure is defined in the program code.
Incompatible file formats
Programs are written in different languages, and so cannot
easily access each other’s files.
Fixed Queries/Proliferation of application
programs
Programs are written to satisfy particular functions.
Any new requirement needs a new program.
16. DatabaseApproach
Arose because:
Definition of data was embedded in application programs,
rather than being stored separately and independently.
No control over access and manipulation of data beyond
that imposed by application programs.
Result:
the database and Database Management System (DBMS).
17. History of Database
Systems
Roots of the DBMS
Apollo moon-landing project, 1960s
NAA(NorthAmericanAviation), prime
contractor for the project
Developed a software GUAM (Generalized
UpdateAccess Method), hierarchical
In mid – 1960s IBM joined NAA, result was
IMS(Information Management System)
18. History of Database
Systems..
IDS ( Integrated Data Store)
By General Electric, network, mid-1960
CODASYL ( Conference on Data Systems
Languages)
DBTG (Data Base Task Group)
19. History of Database
Systems..
DBTG proposal in 1971, components
by the DBA – which includes a definition of
the database name, the type of each record, and
the components of each record type.
The subschema: the part of the database as
seen by the user or application program;
A data management language to define the
data characteristics and the data structure, and
to manipulate the data.
The network schema: the logical
organization of the entire database as seen
20. History of Database
Systems..
DBTG specified three languages
A schema Data Definition Language (DDL),
which enables the DBA to define the schema.
A subschema DDL, which allows the
application programs to define the parts of
the database they require.
A Data Manipulation Language (DML), to
manipulate the data.
21. History of Database
Systems..
E. F. Codd, 1970
IBM Research Laboratory
Relational model
System R project by IBM’S San Jose
Research Laboratory California
Result of this project
Development of SQL
Commercial relational DBMS products e.g. DB2,
SQL/DS from IBM, Oracle from Oracle Corp.
22. History of Database Systems
• First generation
– Hierarchical model
• Information Management System (IMS)
– Network model
• Conference on Data System Languages (CODASYL)
• Data Base Task Group (DBTG)
– Limitation
• Complex program for simple query
• Minimum data independence
• No theoretical foundation
• Second generation
– Relational model
• E. R. Codd
• DB2, Oracle
– Limitation
• Limited data modeling
• Third generation
– Object-relational DBMS
– Object-oriented DBMS
24. History of Database Systems
• File based systems
– File based systems came in 1960s and was widely used. It stores information and organize it into storage devices like a hard disk,
a CD-ROM, USB, SSD, floppy disk, etc.
• Relational Model
– Relational Model introduced by E.F.Codd in 1969. The model stated that data will be represented in tuples. A relational model
groups data into one or more tables. These tables are related to each other using common records.
• Dbase
– Database like Dbase went on sale in 1980s. It was one of the first database management systems for microcomputers. Cecil
Wayne Ratliff developed it.
• Centralized DBMS and Data Warehousing
– In 1990s, centralized DBMS server was used. The period also witnessed the introduction of MS-Access. In addition, users worked
on Internet and data warehousing introduced.
• NoSQL
– NoSQL, Big Data came in 2008. Big Data described large value of both the structured and unstructured data. This data is so large
that traditional database cannot process it.
• Hadoop
– Hadoop and MongoDB launched in 2009. Hadoop use distributed file system for storing big data, and MapReduce to process it.
Hadoop excels in storing and processing of huge data of various formats such as arbitrary, semi-, unstructured, etc. MongoDB is
a cross-platform, document oriented database that provides, high performance, high availability, and easy scalability. It works
works on the concept of collection and document.
• Hbase
– It introduced in 2010 and is a database built on top of the HDFS. HBase provides fast lookups for larger tables.
25. Database Systems
• A database system
consists of
– Data (the database)
– Software
– Hardware
– Users
• We focus mainly on
the software
• Database systems
allow users to
– Store
– Update
– Retrieve
– Organise
– Protect
their data.
26. Database Users
• End users
– Use the database
system to achieve
some goal
• Application
developers
– Write software to
allow end users to
interface with the
database system
Data Administrator (DA)
- Database planning
- Development and
maintenance of
standards, policies and
procedures
• Database Administrator
(DBA)
– Designs & manages the
database system
• Database systems
programmer
– Writes the database
software itself
27. Database Management Systems
• A database is a
collection of
information
• A database
management system
(DBMS) is the
software that controls
information
• Used to create,
maintain, and access
databases
• Examples:
– Oracle
– DB2 (IBM)
– MS SQL Server
– MS Access
– Ingres
– PostgreSQL
– MySQL
– OpenOffice Base
– Corel Paradox
28. What the DBMS does?
• Provides users with
– Data definition language
(DDL) Permits specification
of data types, structures
and any data constraints.
– Data manipulation
language (DML) General
enquiry facility (query
language) of the data
– Data control language
(DCL)
• Often these are all the
same language
• DBMS provides
– Concurrency
– Integrity
– Security
– Data independence
– Backup & recovery
system
• Data Dictionary
– Describes the database
itself
29. Views
Allows each user to have his or her own
view of the database.
Aview is essentially some subset of the
database.
30. Views - Benefits
Reduce complexity
Provide a level of security
Provide a mechanism to customize the
appearance of the database
Present a consistent, unchanging
picture of the structure of the
database, even if the underlying
database is changed
32. • Hardware
– Client-server architecture
– Can range from a PC to a network of computers
• Software
– dbms, os, network, application
• Data
– Schema, subschema, table, attribute
• People
– Data administrator & database administrator
– Database designer: logical & physical
– Application programmer
– End-user: naive & sophisticated
• Procedure
– Start, stop, log on, log off, back up, recovery
33. Advantages of DBMS
• Control redundancy
• Consistency
• Integrity
• Security
• Concurrency control
• Backup & recovery
• Data standard
• More information
• Data sharing & conflict control
• Productivity & accessibility
• Economy of scale
• Maintenance
35. Data Dictionary - Metadata
• The dictionary or
catalog stores
information about
the database itself
• This is data about
data or ‘metadata’
• Almost every aspect
of the DBMS uses
the dictionary
• The dictionary holds
– Descriptions of
database objects
(tables, users, rules,
views, indexes,…)
– Information about
who is using which
data (locks)
– Schemas and
mappings
36. File Based Systems
• File based systems
–Data is stored in files
–Each file has a
specific format
–Programs that use
these files depend on
knowledge about
that format
• Problems:
–No standards
–Data duplication
–Data dependence
–No way to generate
ad hoc queries
–No provision for
security, recovery,
concurrency, etc.
37. Relational Systems
• Problems with early
databases
– Navigating the
records requires
complex programs
– There is minimal data
independence
– No theoretical
foundations
• Then, in 1970, E. F.
Codd wrote “A
Relational Model of
Data for Large
Shared Databanks”
and introduced the
relational model
38. Relational Systems
• Information is stored
as tuples or records
in relations or tables
• There is a sound
mathematical theory
of relations
• Most modern DBMS
are based on the
relational model
• The relational
model covers 3
areas:
–Data structure
–Data integrity
–Data manipulation
39. DBMS vs File System
• There are following differences between
DBMS and File system: