3. Database Approach to Data Management
• Database
• A collection of related files (entities) containing records on people, places, or
things
• Entity
• A person, place, thing, or event about which information must be kept.
• Has relationships to other entities (i.e. the entity Student has a relationship to
the entity Grades in a University Student database )
• Also called a table or file
• Attribute
• Pieces of information describing a particular entity (i.e. Student ID, Name, etc.
for the entity Student)
• Also called a field or column
4. Database Approach to Data Management
• Relational Database
• Organizes data into two-dimensional tables with rows and
columns.
• A column is also called an attribute or field. A row is a group
of attributes that describe a single instance of an entity. A row
is also called a record or tuple.
• Can relate data stored in one table to data stored in another as
long as the two tables share a common data element
(attribute).
5. Database Approach to Data Management
• Primary Key
• A unique attribute type used to identify a single instance (row)
of an entity.
• Allows each record to be retrieved, updated, or sorted.
• Foreign Key
• An attribute that appears as a primary key in one entity (table)
and as a non-primary key attribute in another entity (table).
Used to link two tables together.
6. A Relational Database Table
A relational database organizes data in the form of two-dimensional tables.
Illustrated here is a table for the entity SUPPLIER showing how it represents the
entity and its attributes. Supplier_Number is the key field.
Figure 6.2
7. Primary vs. Foreign Key
Data for the entity
PART have their own
separate table.
Part_Number is the
primary key and
Supplier_Number is
the foreign key,
enabling users to find
related information
from the SUPPLIER
table about the
supplier for each part.
8. Database Approach to Data Management
• Entity-relationship diagram (ERD)
• Diagramming tool used to express entity relationships
• Very useful in developing complex databases
• Associations
• Define the relationships one entity has to another
• Determine necessary key structures to access data
• Come in three relationship types:
• One-to-One
• One-to-Many
• Many-to-Many
9. Sample ERD
• Example
• Each Team has one Mascot (One-to-One)
• Each Team has Players (One-to-Many)
• Each Team Participates in Games (Many-to-Many)
10. Database Approach to Data Management
• Normalization
• A technique to make complex databases more efficient by eliminating as much
redundant data as possible
• Referential Integrity
• Used by relational databases to ensure that relationships between coupled tables
remain consistent.
• For example: when one table has a foreign key that points to another table, you
may not add a record to the table with foreign key unless there is a corresponding
record in the linked table.
13. Database Management System
• A specific type of software for creating, storing, organizing, and
accessing data from a database
• Separates the logical and physical views of the data
• Logical view: how end users view data
• Physical view: how data are actually structured and organized
• Examples of relational DBMS: Microsoft Access, DB2, Oracle
Database, Microsoft SQL Server, MYSQL
14. Database Management System
A single human
resources database
provides many
different views of
data, depending on
the information
requirements of the
user. Illustrated
here are two
possible views, one
of interest to a
benefits specialist
and one of interest
to a member of the
company’s payroll
department.
Figure 6.8
15. Database Management Systems
• Operations of a Relational DBMS
• Select: creates a subset of records based on stated criteria
• Join: combines relational tables to present the user with more
information than is available from individual tables
• Project: creates a subset consisting of columns in a table,
permitting the user to create new tables that contain only the
information required
16. Operations of a Relational DBMS
The select, project, and join operations enable data from two
different tables to be combined and only selected attributes to
be displayed.
Fig. 6.9
17. Database Management Systems
• Capabilities of a DBMS
• Data Definition: information about the structure of the content of the
database
• data elements (entities) and their characteristics (fields, etc.)
• ownership (who maintains db)
• authorization (who can access db)
• Data Dictionary: an automated or manual file that stores data definitions.
18. Database Management Systems
• Querying a database
• A query is a request for information from a database given certain selection
parameters.
• Data Manipulation Language
• A specialized language that is used to add, change, delete, and retrieve
data in the database.
• Structured Query Language (SQL) is the most prominent data
manipulation language used today. It is the industry standard language
for relational databases.
19. Database Management Systems
• Non-relational DBMS
• A more flexible data model used as an alternative to the traditional
relational model of organizing data
• Used for data that is not easily organized into rows and columns
(e.g., social media, graphics, emails)
• Useful for querying large volumes of data (i.e., big data) that may be
distributed across many machines
20. Big Data
• A term used to describe datasets with volumes so huge that they are beyond
the ability of typical DBMS to capture, store, and analyze.
• Characterized by the “3Vs” – volume of data, variety of data, and the
velocity at which the data must be processed
• Big data sets provide more patterns and insights than smaller datasets
• Requires new technologies and tools
21. Business Intelligence
• Applications and technologies to help users obtain useful information from all
different types of data in order to make better business decisions. Consists of:
Tools for capturing and organizing data:
• Data warehouses
• Data marts
• Hadoop
• In-memory computing
• Analytical platforms
Tools for analyzing data
• OLAP
• Data mining
• Text mining and web mining
22. Tools for Capturing &
Organizing Data
• Data Warehouse
• A database that stores current and historical data that may be of interest to
decision makers
• Integrates multiple large databases and other information sources into a
single repository
• Data Mart
• Subsets of data warehouses that are highly focused (customized) and
isolated for a specific population of users
23. Tools for Capturing &
Organizing Data
• Hadoop
• Open-source software framework from Apache
• Designed for big data
• Breaks data task into sub-problems and distributes the processing to many
inexpensive computer processing nodes
• Combines result into smaller data set that is easier to analyze
• Key services
• Hadoop Distributed File System (HDFS)
• MapReduce
24. Tools for Capturing &
Organizing Data
• In-Memory Computing
• Relies on computer’s main memory (RAM) for data storage
• Eliminates bottlenecks in retrieving and reading data from hard-disk based
databases
• Dramatically shortens query response times
• Enabled by
• High-speed processors
• Multicore processing
• Falling computer memory prices
25. Tools for Capturing &
Organizing Data
• Analytic Platforms
• Preconfigured hardware-software systems
• Designed for query processing and analytics
• Can use both relational and non-relational technology to analyze large data
sets
• Include in-memory systems, NoSQL DBMS
• Example: IBM PureData Systems for Analytics
• Integrated database, server, storage components
26. Tools for Analyzing Data
• Online Analytical Processing (OLAP)
• Supports multidimensional data analysis, enabling users to view
the same data in different ways using multiple dimensions
(e.g., product, pricing, cost, region, time period).
• Each aspect of information—product, pricing, cost, region, or time period—represents a
different dimension
• E.g., how many bolts did we sell in each sales region in the month of June and how does
it compare with projected sales
• Need to have a good idea about what information you are
looking for
27. Tools for Analyzing Data
• Data Mining
• Provides insights into corporate data by finding hidden patterns
and relationships in large databases and inferring rules from
them to predict future behavior
• Patterns and rules are used to guide decision making and
forecast the effect of those decisions
• Popular use of data mining is to provide detailed analyses of
patterns in customer data for one-to-one marketing campaigns
or for identifying profitable customers
28. Tools for Analyzing Data
• Text Mining (aka: Text Analytics)
• Unstructured data (mostly text files) that accounts for more than
80% of an organization’s useful information.
• Text mining allows businesses to extract key elements from, discover
patterns in, and summarize large unstructured data sets.
• Web Mining
• Discovery and analysis of useful patterns and information from the
Web
• Includes content mining, structure mining, and usage mining
29. Contemporary Business Intelligence
Infrastructure
A contemporary
business intelligence
infrastructure features
capabilities and tools to
manage and analyze
large quantities and
different types of data
from multiple sources.
Easy-to-use query and
reporting tools for
casual business users
and more sophisticated
analytical toolsets for
power users are
included.
Figure 6.13
30. Managing Data Resources
• Need to have policies and procedures in place to ensure that
data is accurate, reliable, and available. This includes:
• Establishing an Information Policy
• Identifies which users and organizational units can share information, where
information can be distributed, and who is responsible for updating and
maintaining information
• Ensuring Data Quality
• Data quality problems can be caused by redundant and inconsistent data
produced by multiple systems
• Data input errors are the cause of many data quality problems
31. Managing Data Resources
• How to Ensure Data Quality
• Data Quality Audit
• A structured survey of the accuracy and level of completeness of the
data in a database
• Can survey entire data files, samples from data files, or perceptions of
end users
• Data Cleansing (AKA: Data Scrubbing)
• Activities for detecting and correcting data in a database that are
incorrect, incomplete, improperly formatted, or redundant.
• Can use specialized data-cleansing software to perform data cleansing
activities
32. ERD Exercise – Open on Blackboard
• The project manager at ABC Consulting would like a database so that he can
keep track of employees, their skills, which project(s) they are working on and
the client associated with each project. Here is the information he has
provided you regarding the relationships between these entities:
• An employee has many different skills, and there may be multiple employees
with the same skill. An employee may work on more than one project at a
time, and a project may have more than one employee working on it. A project
belongs to only one client; however, a client may have multiple projects being
worked on at any given time.
• Create an entity-relationship diagram illustrating the associations between
these entities.
34. Homework
•Study for 1st Exam
• L&L Chapters 1, 2, 3, 5, and 6
•Have MyITLab Access Code purchased for
next class [will intro Access Lab before
Exam]
Notes de l'éditeur
Notes to presenter:
What is your purpose for sharing this reflection?
Is it at the end of a unit or project?
Are you sharing this reflection, at the attainment of a learning goal you set for yourself?
Is it at the end of a course?
State your purpose for the reflection or even the purpose of the learning experience or learning goal. Be clear and be specific in stating your purpose.