Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
1. Management
Challenges
CHAPTER 5 Business
Applications Module
II
Information
Technologies
Development Foundation
Processes Concepts
DATA RESOURCE MANAGEMENT
Chapter Highlights Learning Objectives
Section I After reading and studying this chapter, you should
Technical Foundations of Database Management be able to:
Real World Case: Harrah’s Entertainment and Others: 1. Explain the business value of implementing data
Protecting the Data Jewels
resource management processes and technologies
Database Management
in an organization.
Fundamental Data Concepts
2. Outline the advantages of a database management
Database Structures
approach to managing the data resources of a
Database Development
business, compared to a file processing approach.
Section II
3. Explain how database management software helps
Managing Data Resources
business professionals and supports the operations
Real World Case: Emerson and Sanofi: Data Stewards and management of a business.
Seek Data Conformity
Data Resource Management 4. Provide examples to illustrate each of the
Types of Databases
following concepts:
Data Warehouses and Data Mining a. Major types of databases.
Traditional File Processing b. Data warehouses and data mining.
The Database Management Approach c. Logical data elements.
Real World Case: Acxiom Corporation: Data d. Fundamental database structures.
Demands Respect
e. Database development.
149
2. 150 ● Module II / Information Technologies
SECTION I Technical Foundations of
Database Management
Database Just imagine how difficult it would be to get any information from an information sys-
tem if data were stored in an unorganized way, or if there were no systematic way to
Management retrieve them. Therefore, in all information systems, data resources must be organized
and structured in some logical manner so that they can be accessed easily, processed ef-
ficiently, retrieved quickly, and managed effectively. Data structures and access meth-
ods ranging from simple to complex have been devised to efficiently organize and
access data stored by information systems. In this chapter, we will explore these con-
cepts, as well as the managerial implications and value of data resource management.
See Figure 5.1.
Read the Real World Case on data resources in the casino gaming and hospitality
industry. We can learn a lot from this case about the importance of protecting the data
resources of the organization.
Fundamental Before we go any further, let’s discuss some fundamental concepts about how data are
organized in information systems. A conceptual framework of several levels of data has
Data Concepts been devised that differentiates between different groupings, or elements, of data.
Thus, data may be logically organized into characters, fields, records, files, and data-
bases, just as writing can be organized in letters, words, sentences, paragraphs, and
documents. Examples of these logical data elements are shown in Figure 5.2.
Character The most basic logical data element is the character, which consists of a single alpha-
betic, numeric, or other symbol. You might argue that the bit or byte is a more ele-
mentary data element, but remember that those terms refer to the physical storage
elements provided by the computer hardware, discussed in Chapter 3. Using that un-
derstanding, one way to think of a character is that it is a byte used to represent a par-
ticular character. From a user’s point of view (that is, from a logical as opposed to a
physical or hardware view of data), a character is the most basic element of data that
can be observed and manipulated.
Field The next higher level of data is the field, or data item. A field consists of a grouping of
related characters. For example, the grouping of alphabetic characters in a person’s
name may form a name field (or typically, last name, first name, and middle initial
fields), and the grouping of numbers in a sales amount forms a sales amount field.
Specifically, a data field represents an attribute (a characteristic or quality) of some
entity (object, person, place, or event). For example, an employee’s salary is an
attribute that is a typical data field used to describe an entity who is an employee of a
business. Generally speaking, fields are organized such that they represent some logi-
cal order. For example, last_name, first_name, address, city, state, zipcode, and so on.
Record All of the fields used to describe the attributes of an entity are grouped to form a
record. Thus, a record represents a collection of attributes that describe an entity. An
example is a person’s payroll record, which consists of data fields describing attributes
such as the person’s name, Social Security number, and rate of pay. Fixed-length records
contain a fixed number of fixed-length data fields. Variable-length records contain a
variable number of fields and field lengths. Another way of looking at a record is that
it represents a single instance of an entity. Each record in an employee file describes
one specific employee.
File A group of related records is a data file, or table. Thus, an employee file would contain
the records of the employees of a firm. Files are frequently classified by the application
3. Chapter 5 / Data Resource Management ● 151
1
REAL WORLD
Harrah’s Entertainment and Others:
CASE Protecting the Data Jewels
I n the casino industry, one of the most valuable assets is the
dossier that casinos keep on their affluent customers, the
high rollers. But in 2003, casino operator Harrah’s Enter-
tainment Inc. filed a lawsuit in Placer County, California,
Superior Court charging that a former employee had copied
lists. Through these documents, employees “acknowledge
that they will be introduced to this information and agree not
to disclose it on departure from the company,” says Suzanne
Labrit, a partner at law firm Shutts & Bowen LLP in West
Palm Beach, Florida.
the records of up to 450 wealthy customers before leaving Although most states have enacted trade-secrets laws,
the company to work at competitor Thunder Valley Casino Labrit says they have different attitudes about enforcing these
in Lincoln, California. laws with regard to customer lists. “But as a starting point, at
The complaint said the employee was seen printing the least you have this understanding [with employees] that the
list—which included names, contact information, and credit customer information is being treated as confidential,” Labrit
and account histories—from a Harrah’s database. It also says. Then if an employee leaves to work for a competitor and
alleged that he tried to lure those players to Thunder Valley. uses this protected customer data, the employer will more
The employee denies the charge of stealing Harrah’s trade likely be able to take legal action to stop the activity. “If you
secrets, and the case was still pending at this writing, but don’t treat it as confidential information internally,” she says,
many similar cases have been filed in the past 20 years, legal “the court will not treat it as confidential information, either.”
experts say. It’s also important to educate employees about the
While savvy companies are using business intelligence confidentiality of customer lists, because many people
and customer relationship management systems to identify wrongly assume they’re public information, says Tim
their most profitable customers, there’s a genuine danger Headley, a partner at the Houston law firm of Gardere
of that information falling into the wrong hands. Broader Wynne Sewell LLP. “Most people think they can take the
access to those applications and the trend toward employees lists with them,” he says. “You have to show that you’ve
switching jobs more frequently have made protecting cus- kept it a secret and told employees it’s a valuable secret.
tomer lists an even greater priority. [Customer lists] are at the core of how you bring revenue
Fortunately, there are managerial, legal, and technologi- into the company. These are the decision-makers who are
cal steps you can take to help prevent, or at least discourage, willing to buy your product.”
departing employees from walking out the door with this From a management and process standpoint, organiza-
vital information. tions should try to limit access to customer lists to only
For starters, organizations should make sure that certain employees, such as sales representatives, who need the
employees—particularly those with frequent access to cus- information to do their jobs. “If you make it broadly avail-
tomer information—sign nondisclosure, noncompete, and able to employees, then it’s not considered confidential,” says
nonsolicitation agreements that specifically mention customer Labrit.
Physical security should also be considered, Labrit says.
FIGURE 5.1 Visitors such as vendors shouldn’t be permitted to roam free
in the hallways or into conference rooms. And security poli-
cies, such as a requirement that all computer systems have
strong password protection, should be strictly enforced.
Companies should instantly shut down access to com-
puters and networks when employees leave, whether the rea-
son is a layoff or a move to a new job. At the exit interview,
the employee should be reminded of any signed agreements
and corporate policies regarding customer lists and other
confidential information. Employees should be told to turn
over anything, including data that belongs to the company.
In addition, employers should track the activities of em-
ployees who’ve given notice but will be around for a while.
This includes monitoring systems to see if the employee is
e-mailing company-owned documents outside the company.
Some organizations rely on technology to help prevent
While data management is a strategic initiative in the loss of customer lists and other critical data. Inflow Inc.,
every modern organization, those in the gaming a Denver-based provider of managed Web hosting services,
industry believe their success lies in the protection uses a product from Opsware Inc. in Sunnyvale, California,
and strategic management of their data resources. that lets managers control access to specific systems, such as
databases, from a central location.
Source: Jose Luis Palaez, Inc./Corbis.
5. Chapter 5 / Data Resource Management ● 153
FIGURE 5.2 Examples of the logical data elements in information systems. Note especially the examples of how
data fields, records, files, and databases are related.
Human Resource Database
Payroll File Benefits File
Employee Employee Employee Employee
Record 1 Record 2 Record 3 Record 4
Name SS No. Salary Name SS No. Salary Name SS No. Insurance Name SS No. Insurance
Field Field Field Field Field Field Field Field Field Field Field Field
Jones T. A. 275-32-3874 20,000 Klugman J. L. 349-88-7913 28,000 Alvarez J.S. 542-40-3718 100,000 Porter M.L. 617-87-7915 50,000
for which they are primarily used, such as a payroll file or an inventory file, or the type
of data they contain, such as a document file or a graphical image file. Files are also classified
by their permanence, for example, a payroll master file versus a payroll weekly transac-
tion file. A transaction file, therefore, would contain records of all transactions occur-
ring during a period and might be used periodically to update the permanent records
contained in a master file. A history file is an obsolete transaction or master file retained
for backup purposes or for long-term historical storage called archival storage.
Database A database is an integrated collection of logically related data elements. A database
consolidates records previously stored in separate files into a common pool of data
elements that provides data for many applications. The data stored in a database are
independent of the application programs using them and of the type of storage devices
on which they are stored.
Thus, databases contain data elements describing entities and relationships among
entities. For example, Figure 5.3 outlines some of the entities and relationships in a
FIGURE 5.3
Some of the entities and Electric Utility Database
relationships in a simplified
electric utility database.
Note a few of the business Billing Payment
applications that access the Entities: processing
data in the database. Customers, meters, bills,
payments, meter readings
Meter Service
Relationships:
reading start / stop
Bills sent to customers,
customers make payments,
customers use meters, . . .
Source: Adapted from Michael V. Mannino, Database Application Development
and Design (Burr Ridge, IL: McGraw-Hill/Irwin, 2001), p. 6.
6. 154 ● Module II / Information Technologies
database for an electric utility. Also shown are some of the business applications (billing,
payment processing) that depend on access to the data elements in the database.
Database The relationships among the many individual data elements stored in databases are
based on one of several logical data structures, or models. Database management sys-
Structures tem packages are designed to use a specific data structure to provide end users with
quick, easy access to information stored in databases. Five fundamental database struc-
tures are the hierarchical, network, relational, object-oriented, and multidimensional models.
Simplified illustrations of the first three database structures are shown in Figure 5.4.
Hierarchical Early mainframe DBMS packages used the hierarchical structure, in which the rela-
Structure tionships between records form a hierarchy or treelike structure. In the traditional
hierarchical model, all records are dependent and arranged in multilevel structures,
FIGURE 5.4 Hierarchical Structure
Example of three Department
fundamental database Data Element
structures. They represent
three basic ways to
develop and express the
relationships among the Project A Project B
data elements in a database. Data Element Data Element
Employee 1 Employee 2
Data Element Data Element
Network Structure
Department A Department B
Employee Employee Employee
1 2 3
Project Project
A B
Relational Structure
Department Table Employee Table
Deptno Dname Dloc Dmgr Empno Ename Etitle Esalary Deptno
Dept A Emp 1 Dept A
Dept B Emp 2 Dept A
Dept C Emp 3 Dept B
Emp 4 Dept B
Emp 5 Dept C
Emp 6 Dept B
7. Chapter 5 / Data Resource Management ● 155
consisting of one root record and any number of subordinate levels. Thus, all of the
relationships among records are one-to-many, since each data element is related to only
one element above it. The data element or record at the highest level of the hierarchy
(the department data element in this illustration) is called the root element. Any data
element can be accessed by moving progressively downward from a root and along the
branches of the tree until the desired record (for example, the employee data element)
is located.
Network Structure The network structure can represent more complex logical relationships and is still
used by some mainframe DBMS packages. It allows many-to-many relationships
among records; that is, the network model can access a data element by following one
of several paths, because any data element or record can be related to any number of
other data elements. For example, in Figure 5.4, departmental records can be related
to more than one employee record, and employee records can be related to more than
one project record. Thus, you could locate all employee records for a particular
department, or all project records related to a particular employee.
Relational Structure The relational model is the most widely used of the three database structures. It is used
by most microcomputer DBMS packages, as well as by most midrange and mainframe
systems. In the relational model, all data elements within the database are viewed as
being stored in the form of simple two-dimensional tables sometimes referred to as
relations. The tables in a relational database have rows and columns. Each row repre-
sents a single record in the file, and each column represents a field.
Figure 5.4 illustrates the relational database model with two tables representing some
of the relationships among departmental and employee records. Other tables, or rela-
tions, for this organization’s database might represent the data element relationships
among projects, divisions, product lines, and so on. Database management system pack-
ages based on the relational model can link data elements from various tables to provide
information to users. For example, a manager might want to retrieve and display an
employee’s name and salary from the employee table in Figure 5.4, and the name of the
employee’s department from the department table, by using their common department
number field (Deptno) to link or join the two tables. See Figure 5.5. The relational
model can relate data in any one file with data in another file if both files share a com-
mon data element or field. Because of this, information can be created by retrieving data
from multiple files even if they are not all stored in the same physical location.
Relational Three basic operations can be performed on a relational database to create useful sets
Operations of data. The select operation is used to create a subset of records that meet a stated cri-
terion. For example, a select operation might be used on an employee database to
create a subset of records that contain all employees who make more than $30,000 per
year and who have been with the company more than three years. Another way to
think of the select operation is that it temporarily creates a table whose rows have
records that meet the selection criteria.
FIGURE 5.5 Department Table Employee Table
Joining the Employee and Deptno Dname Dloc Dmgr Empno Ename Etitle Esalary Deptno
Department tables in a Dept A Emp 1 Dept A
relational database enables Dept B Emp 2 Dept A
you to selectively access Dept C Emp 3 Dept B
data in both tables at the Emp 4 Dept B
same time. Emp 5 Dept C
Emp 6 Dept B
8. 156 ● Module II / Information Technologies
The join operation can be used to temporarily combine two or more tables so that a
user can see relevant data in a form that looks like it is all in one big table. Using this
operation, a user can ask for data to be retrieved from multiple files or databases without
having to go to each one separately.
Finally, the project operation is used to create a subset of the columns contained in
the temporary tables created by the select and join operations. Just as the select oper-
ation creates a subset of records that meet stated criteria, the project operation creates
a subset of the columns, or fields, that the user wants to see. Using a project operation,
the user can decide not to view all of the columns in the table but only those that have
data necessary to answer a particular question or to construct a specific report.
Because of the widespread use of the relational model, an abundance of commer-
cial products exists to create and manage them. Leading mainframe relational database
applications include Oracle 10g from Oracle Corp. and DB2 from IBM. A very popu-
lar midrange database application is SQL server from Microsoft. The most commonly
used database application for the PC is Microsoft Access.
Multidimensional The multidimensional model is a variation of the relational model that uses multidi-
Structure mensional structures to organize data and express the relationships between data. You
can visualize multidimensional structures as cubes of data and cubes within cubes
of data. Each side of the cube is considered a dimension of the data. Figure 5.6 is an
example that shows that each dimension can represent a different category, such as
product type, region, sales channel, and time [5].
FIGURE 5.6 An example of the different dimensions of a multidimensional database.
Denver Profit
Los Angeles Total Expenses
San Francisco Margin
West COGS
February March East West
East Sales
Actual Budget Actual Budget Actual Budget Actual Budget
Sales Camera TV January
TV February
VCR March
Audio Qtr 1
Margin Camera VCR January
TV February
VCR March
Audio Qtr 1
April April
Qtr 1 Qtr 1
March March
February February
Actual Budget Sales Margin
January January
Sales Margin Sales Margin TV VCR TV VCR
TV East East Actual
West Budget
South Forecast
Total Variance
VCR East West Actual
West Budget
South Forecast
Total Variance
10. 158 ● Module II / Information Technologies
FIGURE 5.8
This claims analysis
graphics display provided
by the CleverPath
enterprise portal is powered
by the Jasmine ii object-
oriented database
management system of
Computer Associates.
Source: Courtesy of Computer Associates.
adding object-oriented modules to their relational software. Examples include multi-
media object extensions to IBM’s DB2, and Oracle’s object-based “cartridges” for
Oracle 10g. See Figure 5.8.
Evaluation of The hierarchical data structure was a natural model for the databases used for the
Database Structures structured, routine types of transaction processing characteristic of many business op-
erations in the early years of data processing and computing. Data for these operations
can easily be represented by groups of records in a hierarchical relationship. However,
as time progressed, there were many cases where information was needed about
records that did not have hierarchical relationships. For example, in some organizations,
employees from more than one department can work on more than one project (refer
back to Figure 5.4). A network data structure could easily handle this many-to-many
relationship, whereas a hierarchical model could not. As such, the more flexible net-
work structure became popular for these types of business operations. However, like
the hierarchical structure, because its relationships must be specified in advance, the
network model was unable to easily handle ad hoc requests for information, thus
pointing out the need for the relational model.
Relational databases allow an end user to easily receive information in response to
ad hoc requests. That’s because not all of the relationships between the data elements
in a relationally organized database need to be specified when the database is created.
Database management software (such as Oracle 10g, DB2, Access, and Approach) cre-
ates new tables of data relationships by using parts of the data from several tables.
Thus, relational databases are easier for programmers to work with and easier to main-
tain than the hierarchical and network models.
The major limitation of the relational model is that relational database manage-
ment systems cannot process large amounts of business transactions as quickly and
efficiently as those based on the hierarchical and network models, or process com-
plex, high-volume applications as well as the object-oriented model. This performance
gap has narrowed with the development of advanced relational database software
with object-oriented extensions. The use of database management software based on
the object-oriented and multidimensional models is growing steadily, as these tech-
nologies are playing a greater role for OLAP and Web-based applications.
11. Chapter 5 / Data Resource Management ● 159
Experian Experian Inc. (www.experian.com), a unit of London-based GUS PLC, runs one of
Automotive: The the largest credit reporting agencies in the United States. But Experian wanted to
expand its business beyond credit checks for automobile loans. If it could collect
Business Value vehicle data from the various motor-vehicle departments in the United States and
of Relational blend that with other data, such as change-of-address records, then its Experian
Database Automotive division could sell the enhanced data to a variety of customers. For
example, car dealers could use the data to make sure their inventory matches local
Management buying preferences. And toll collectors could match license plates to addresses to
find motorists who sail past tollbooths without paying.
But to offer new services, Experian first needed a way to extract, transfer, and
load data from the systems of 50 different U.S. state departments of motor vehicles
(DMVs), plus Puerto Rico, into a single database. That was a big challenge. “Unlike
the credit industry that writes to a common format, the DMVs do not,” says Ken
Kauppila, vice president of IT at Experian Automotive in Costa Mesa, California.
Of course, Experian didn’t want to replicate the hodgepodge of file formats it
inherited when the project began in January 1999—175 formats among 18,000
files. So Kauppila decided to transform and map the data to a common relational
database format.
Fortunately, off-the-shelf software tools for extracting, transforming, and loading
data (called ETL tools) make it economical to combine very large data repositories.
Using ETL Extract from Evolutionary Technologies, Experian created a database
that can incorporate vehicle information within 48 hours of its entry into any of the
nation’s DMV computers. This is one of the areas in which data management soft-
ware tools can excel, says Guy Creese, analyst at Aberdeen Group in Boston. “It can
simplify the mechanics of multiple data feeds, and it can add to data quality, making
fixes possible before errors are propagated to data warehouses,” he says.
Using the ETL extraction and transformation tools along with IBM’s DB2 data-
base system, Experian Automotive created a database that processes 175 million
transactions per month and has created a variety of profitable new revenue streams.
Experian’s automotive database is the 10th largest database in the world—now, with
up to 16 billion rows of data. But the company says the relational database is man-
aged by just three IT professionals. Experian says this demonstrates how efficiently
database software like DB2 and the ETL tools can work with a large database to
handle vast amounts of data quickly.
Database Database management packages like Microsoft Access or Lotus Approach allow end
users to easily develop the databases they need. See Figure 5.9. However, large orga-
Development nizations usually place control of enterprisewide database development in the hands of
database administrators (DBAs) and other database specialists. This improves the in-
tegrity and security of organizational databases. Database developers use the data def-
inition language (DDL) in database management systems like Oracle 10g or IBM’s DB2
to develop and specify the data contents, relationships, and structure of each database,
and to modify these database specifications when necessary. Such information is cata-
loged and stored in a database of data definitions and specifications called a data dictio-
nary, or metadata repository, which is managed by the database management software
and maintained by the DBA.
A data dictionary is a database management catalog or directory containing
metadata, that is, data about data. A data dictionary relies on a specialized database
software component to manage a database of data definitions, that is, metadata about
the structure, data elements, and other characteristics of an organization’s databases.
For example, it contains the names and descriptions of all types of data records and
their interrelationships, as well as information outlining requirements for end users’
access and use of application programs, and database maintenance and security.
12. 160 ● Module II / Information Technologies
FIGURE 5.9
Creating a database table
using the Table Wizard
of Microsoft Access.
Source: Courtesy of Microsoft Corp.
Data dictionaries can be queried by the database administrator to report the status
of any aspect of a firm’s metadata. The administrator can then make changes to the
definitions of selected data elements. Some active (versus passive) data dictionaries
automatically enforce standard data element definitions whenever end users and ap-
plication programs access an organization’s databases. For example, an active data dic-
tionary would not allow a data entry program to use a nonstandard definition of
a customer record, nor would it allow an employee to enter a name of a customer that
exceeded the defined size of that data element.
Developing a large database of complex data types can be a complicated task. Data-
base administrators and database design analysts work with end users and systems
analysts to model business processes and the data they require. Then they determine
(1) what data definitions should be included in the database and (2) what structure or
relationships should exist among the data elements.
Data Planning and As Figure 5.10 illustrates, database development may start with a top-down data plan-
Database Design ning process. Database administrators and designers work with corporate and end
user management to develop an enterprise model that defines the basic business process
of the enterprise. Then they define the information needs of end users in a business
process, such as the purchasing/receiving process that all businesses have.
Next, end users must identify the key data elements that are needed to perform
their specific business activities. This frequently involves developing entity relationship
diagrams (ERDs) that model the relationships among the many entities involved in
business processes. For example, Figure 5.11 illustrates some of the relationships
in a purchasing/receiving process. ERDs are simply graphical models of the various
files and their relationships contained within a database system. End users and data-
base designers could use database management or business modeling software
to help them develop ERD models for the purchasing/receiving process. This would
help identify what supplier and product data are required to automate their purchasing/
receiving and other business processes using enterprise resource management (ERM)
or supply chain management (SCM) software. You will learn about ERDs and other
data modeling tools in much greater detail if you ever take a course in systems analysis
and design.
13. Chapter 5 / Data Resource Management ● 161
FIGURE 5.10
Database development 1. Data Planning Physical Data Models
involves data planning and Develops a model of business Storage representations and
database design activities. processes access methods
Data models that support
business processes are used
to develop databases that
meet the information needs 5. Physical Design
of users. Enterprise model of business Determines the data storage
processes with documentation structures and access
methods
Logical Data Models
2. Requirements Specification
E.g., relational, network,
Defines information needs of end
hierarchical, multidimensional,
users in a business process
or object-oriented models
Description of users’ needs may 4. Logical Design
be represented in natural Translates the conceptual
language or using the tools of a models into the data model of
particular design methodology a DBMS
3. Conceptual Design
Conceptual Data Models
Expresses all information
Often expressed as entity
requirements in the form of a
relationship models
high-level model
Such user views are a major part of a data modeling process where the relation-
ships between data elements are identified. Each data model defines the logical rela-
tionships among the data elements needed to support a basic business process. For
example, can a supplier provide more than one type of product to us? Can a customer
have more than one type of account with us? Can an employee have several pay rates
or be assigned to several project workgroups?
Answering such questions will identify data relationships that have to be repre-
sented in a data model that supports a business process. These data models then serve
as logical frameworks (called schemas and subschemas) on which to base the physical de-
sign of databases and the development of application programs to support the business
processes of the organization. A schema is an overall logical view of the relationships
FIGURE 5.11 Ordered on Supplies
Purchase
This entity relationship Product Supplier
Order Item
diagram illustrates some of
the relationships among the
Stocked as
entities (product, supplier,
Contains
warehouse, etc.) in a
purchasing/receiving
business process.
Purchase Product Holds
Warehouse
Order Stock
14. 162 ● Module II / Information Technologies
FIGURE 5.12 Example of the logical and physical database views and the software interface of a banking services
information system.
Installment
Checking Savings
Loan
Application Application
Application
Logical User Views
Checking and Installment Data elements and relationships (the subschemas) needed
Savings Loan for checking, savings, or installment loan processing
Data Model Data Model
Data elements and relationships (the schema)
Banking Services Data Model needed for the support of all bank services
Software Interface
Database Management System
The DBMS provides access to the bank’s databases
Physical Data Views
Organization and location of data on the storage media
Bank
Databases
among the data elements in a database, while the subschema is a logical view of the
data relationships needed to support specific end user application programs that will
access that database.
Remember that data models represent logical views of the data and relationships of
the database. Physical database design takes a physical view of the data (also called the
internal view) that describes how data are to be physically stored and accessed on the
storage devices of a computer system. For example, Figure 5.12 illustrates these dif-
ferent database views and the software interface of a bank database processing system.
In this example, checking, savings, and installment lending are the business processes
whose data models are part of a banking services data model that serves as a logical
data framework for all bank services.
Aetna: Insuring On a daily basis the operational services central support area at Aetna Inc. is
Tons of Data responsible for 21.8 tons of data (174.6 terabytes [TB]). Over 119.2TB reside
on mainframe-connected disk drives, while the remaining 55.4TB sit on disks
attached to midrange computers. Almost all of this data are located in the com-
pany’s headquarters in Hartford, Connecticut—with most of the information in
relational databases. To make matters even more interesting, outside customers
have access to about 20TB of the information. Four interconnected data centers
containing 14 mainframes and more than 1,000 midrange servers process the
data. It takes more than 4,100 direct-access storage devices to hold Aetna’s key
databases.
15. Chapter 5 / Data Resource Management ● 163
Most of Aetna’s ever-growing mountain of data is health care information. The
insurance company maintains records for both health maintenance organization
participants and customers covered by insurance policies. Aetna has detailed
records of providers, such as doctors, hospitals, dentists, and pharmacies, and it
keeps track of all the claims it has processed. Some of Aetna’s larger customers send
tapes containing insured employee data; the firm is moving toward using the Internet
to collect such data.
If managing gigabytes of data is like flying a hang glider, managing multiple
terabytes of data is like piloting a space shuttle: a thousand times more complex.
You can’t just extrapolate from experiences with small and medium data stores to
understand how to successfully manage tons of data. Even an otherwise mundane
operation such as backing up a database can be daunting if the time needed to finish
copying the data exceeds the time available.
Data integrity, backup, security, and availability are collectively the Holy
Grail of dealing with large data stores. The sheer volume of data makes these
goals a challenge, and a highly decentralized environment complicates matters
even more. Developing and adhering to standardized data maintenance proce-
dures always provide an organization with the best return on their data dollar
investment [9, 11].
16. 164 ● Module II / Information Technologies
SECTION II Managing Data Resources
Data Resource Data are a vital organizational resource that needs to be managed like other important
business assets. Today’s business enterprises cannot survive or succeed without quality
Management data about their internal operations and external environment.
With each online mouse click, either a fresh bit of data is created or already-stored data are
retrieved from all those business websites. All that’s on top of the heavy demand for indus-
trial-strength data storage already in use by scores of big corporations. What’s driving the
growth is a crushing imperative for corporations to analyze every bit of information they can
extract from their huge data warehouses for competitive advantage. That has turned the
data storage and management function into a key strategic role of the information age [8].
That’s why organizations and their managers need to practice data resource man-
agement, a managerial activity that applies information systems technologies like data-
base management, data warehousing, and other data management tools to the task of
managing an organization’s data resources to meet the information needs of their busi-
ness stakeholders. This chapter will show you the managerial implications of using
data resource management technologies and methods to manage an organization’s data
assets to meet business information requirements.
Read the Real World Case on data administration. We can learn a lot from this case
about the challenges of managing the data within an organization. See Figure 5.13.
Types of Continuing developments in information technology and its business applications
have resulted in the evolution of several major types of databases. Figure 5.14 illus-
Databases trates several major conceptual categories of databases that may be found in many
organizations. Let’s take a brief look at some of them now.
Operational Operational databases store detailed data needed to support the business processes
Databases and operations of a company. They are also called subject area databases (SADB), trans-
action databases, and production databases. Examples are a customer database, human re-
source database, inventory database, and other databases containing data generated by
business operations. For example, a human resource database like that shown earlier in
Figure 5.2 would include data identifying each employee and his or her time worked,
compensation, benefits, performance appraisals, training and development status, and
other related human resource data. Figure 5.15 illustrates some of the common oper-
ational databases that can be created and managed for a small business using Microsoft
Access database management software.
Distributed Many organizations replicate and distribute copies or parts of databases to network
Databases servers at a variety of sites. These distributed databases can reside on network servers
on the World Wide Web, on corporate intranets or extranets, or on other company
networks. Distributed databases may be copies of operational or analytical databases,
hypermedia or discussion databases, or any other type of database. Replication and dis-
tribution of databases are done to improve database performance at end user worksites.
Ensuring that the data in an organization’s distributed databases are consistently and
concurrently updated is a major challenge of distributed database management.
Distributed databases have both advantages and disadvantages. One primary ad-
vantage of a distributed database lies with the protection of valuable data. If all of an
organization’s data reside in a single physical location, any catastrophic event like a fire
or damage to the media holding the data would result in an equally catastrophic loss
of use of that data. By having databases distributed in multiple locations, the negative
impact of such an event can be minimized.
17. Chapter 5 / Data Resource Management ● 165
2
REAL WORLD
Emerson and Sanofi: Data
CASE Stewards Seek Data Conformity
A customer is a customer is a customer, right? Actu-
ally, it’s not that simple. Just ask Emerson Process
Management, an Emerson Electric Co. unit in
Austin that supplies process automation products. In 2000 the
company attempted to build a data warehouse to store cus-
“It’s usually a seesaw effect,” says Chris Enger, formerly
manager of information management at Philip Morris USA
Inc. “When something goes wrong, they put someone in
charge of data quality, and when things get better, they pull
those resources away.”
tomer information from over 85 countries. The effort failed Creating a data quality team requires gathering people
in large part because the structure of the warehouse couldn’t with an unusual mix of business, technology, and diplomatic
accommodate the many variations on customers’ names. skills. It’s even difficult to agree on a job title. In Rybeck’s
For instance, different users in different parts of the world department, they’re called “data analysts,” but titles at other
might identify Exxon as Exxon, Mobil, Esso, or ExxonMobil, companies include “data quality control supervisor,” “data
to name a few variations. The warehouse would see them as coordinator,” or “data quality manager.”
separate customers, and that would lead to inaccurate results “When you say you want a data analyst, they’ll come
when business users performed queries. back with a DBA [database administrator]. But it’s not the
That’s when the company hired Nancy Rybeck as data same at all,” Rybeck says. “It’s not the data structure, it’s the
administrator. Rybeck is now leading a renewed data ware- content.”
house project that ensures not only the standardization of At Emerson, data analysts in each business unit review
customer names, but also the quality and accuracy of cus- data and correct errors before it’s put into the operational
tomer data, including postal addresses, shipping addresses, systems. They also research customer relationships, loca-
and province codes. tions, and corporate hierarchies; train overseas workers to fix
To accomplish this, Emerson has done something unusual: data in their native languages; and serve as the main contact
It has started to build a department with 6 to 10 full-time “data with the data administrator and database architect for new
stewards” dedicated to establishing and maintaining the quality requirements and bug fixes.
of data entered into the operational systems that feed the data As the leader of the group, Rybeck plays a role that
warehouse. includes establishing and communicating data standards,
The practice of having formal data stewards is uncom- ensuring data integrity is maintained during database con-
mon. Most companies recognize the importance of data versions, and doing the logical design for the data ware-
quality, but many treat it as a “find-and-fix” effort, to be con- house tables.
ducted at the end of a project by someone in IT. Others The stewards have their work cut out for them. Bringing
casually assign the job to the business users who deal with the together customer records from the 75 business units yielded
data head-on. Still others may throw resources at improving a 75 percent duplication rate, misspellings, and fields with
data only when a major problem occurs. incorrect or missing data.
“Most of the divisions would have sworn they had great
FIGURE 5.13 processes and standards in place,” Rybeck says. “But when
you show them they entered the customer name 17 different
ways, or someone had entered, ‘Loading dock open 8:00–4:00’
into the address field, they realize it’s not as clean as they
thought.”
Although the data steward may report to IT—as is the
case at Emerson and at pharmaceuticals company Sanofi-
Synthelabo Inc.—it’s not a job for someone steeped in tech-
nical knowledge. Yet it’s not right for a businessperson who’s
a technophobe, either.
Seth Cohen is the first data quality control supervisor at
Sanofi in New York. He was hired in 2003 to help design au-
tomated processes to ensure the data quality of the customer
knowledge base that Sanofi was beginning to build.
Data stewards at Sanofi need to have business knowledge
because they need to make frequent judgment calls, Cohen
says. Indeed, judgment is a big part of the data steward’s
job—including the ability to determine where you don’t
need 100 percent perfection.
Cohen says that task is one of the biggest challenges of the
job. “One-hundred percent accuracy is just not achievable,”
Source: Flying Colours Ltd./Digital Vision/Getty Images
19. Chapter 5 / Data Resource Management ● 167
FIGURE 5.14 Examples of some of the major types of databases used by organizations and end users.
External
Databases
on the
Internet and
Online
Client PC Services
Network
Server
Distributed
Databases Operational
on Intranets Databases
and Other of the
Networks Organization
End User Data Data
Databases Warehouse Marts
Another advantage of distributed databases is found in their storage requirements.
Often, a large database system may be distributed into smaller databases based on
some logical relationship between the data and the location. For example, a company
with several branch operations may distribute its data so that each branch operation
location is also the location of its branch database. Because multiple databases in a
distributed system can be joined together, each location has control of its local data
while all other locations can access any database in the company if so desired.
Distributed databases are not without some challenges, however. The primary chal-
lenge is the maintenance of data accuracy. If a company distributes its database to
FIGURE 5.15
Examples of operational
databases that can be
created and managed
for a small business by
microcomputer database
management software like
Microsoft Access.
Source: Courtesy of Microsoft Corp.
20. 168 ● Module II / Information Technologies
multiple locations, any change to the data in one location must somehow be updated in
all other locations. This can be accomplished in one of two ways: replication or duplication.
Updating a distributed database using replication involves using a specialized soft-
ware application that looks at each distributed database and then finds the changes
made to it. Once these changes have been identified, the replication process makes all
of the distributed databases look the same by making the appropriate changes to each
one. The replication process is very complex and, depending upon the number and
size of the distributed databases, can consume a lot of time and computer resources.
The duplication process, in contrast, is much less complicated. It basically identi-
fies one database as a master and then duplicates that database at a prescribed time af-
ter hours so that each distributed location has the same data. One drawback to the
duplication process is that no changes can ever be made to any database other than the
master to avoid having local changes overwritten during the duplication process.
Nonetheless, properly used, duplication and replication can keep all distributed
locations current with the latest data.
One additional challenge associated with distributed databases is the extra com-
puting power and bandwidth necessary to access multiple databases in multiple loca-
tions. We will look more closely at the issue of bandwidth in Chapter 6 when we focus
on telecommunications and networks.
External Databases Access to a wealth of information from external databases is available for a fee from
commercial online services, and with or without charge from many sources on the
World Wide Web. Websites provide an endless variety of hyperlinked pages of multi-
media documents in hypermedia databases for you to access. Data are available in the
form of statistics on economic and demographic activity from statistical databanks. Or
you can view or download abstracts or complete copies of hundreds of newspapers,
magazines, newsletters, research papers, and other published material and other peri-
odicals from bibliographic and full text databases. Whenever you use a search engine like
Google or Yahoo to look up something on the Internet, you are using an external
database—a very, very large one!
Hypermedia The rapid growth of websites on the Internet and corporate intranets and extranets has
Databases dramatically increased the use of databases of hypertext and hypermedia documents.
A website stores such information in a hypermedia database consisting of hyper-
linked pages of multimedia (text, graphic, and photographic images, video clips, audio
segments, and so on). That is, from a database management point of view, the set of
interconnected multimedia pages at a website is a database of interrelated hypermedia
page elements, rather than interrelated data records [2].
Figure 5.16 shows how you might use a Web browser on your client PC to connect
with a Web network server. This server runs Web server software to access and transfer the
FIGURE 5.16 The components of a Web-based information system include Web browsers,
servers, and hypermedia databases.
The Internet
Intranets
Web Extranets HTML
Browser Web XML
Server
Web Pages
Software
Image Files
Video Files
Audio Files
Client PCs
Network Hypermedia
Server Database
21. Chapter 5 / Data Resource Management ● 169
FIGURE 5.17 The components of a complete data warehouse system.
Operational, External,
and Other Databases Analytical
Data Store
Data Enterprise
Management Warehouse
Data
Marts
Data Acquisition Data Analysis
(Capture, clean, (Query, report,
transform, transport, analyze, mine,
load/apply) deliver)
Metadata
Metadata Directory
Management
Warehouse Metadata
Repository Web Information
Design Systems
Source: Adapted courtesy of Hewlett-Packard.
Web pages you request. The website illustrated in Figure 5.17 uses a hypermedia database
consisting of Web page content described by HTML (Hypertext Markup Language)
code or XML (Extensible Markup Language) labels, image files, video files, and audio.
The Web server software acts as a database management system to manage the transfer of
hypermedia files for downloading by the multimedia plug-ins of your Web browser.
Data A data warehouse stores data that have been extracted from the various operational,
external, and other databases of an organization. It is a central source of the data that
Warehouses have been cleaned, transformed, and cataloged so they can be used by managers and
and Data other business professionals for data mining, online analytical processing, and other
Mining forms of business analysis, market research, and decision support. (We’ll talk in depth
about all of these activities in Chapter 9.) Data warehouses may be subdivided into
data marts, which hold subsets of data from the warehouse that focus on specific
aspects of a company, such as a department or a business process.
Figure 5.17 illustrates the components of a complete data warehouse system. No-
tice how data from various operational and external databases are captured, cleaned,
and transformed into data that can be better used for analysis. This acquisition process
might include activities like consolidating data from several sources, filtering out un-
wanted data, correcting incorrect data, converting data to new data elements, and
aggregating data into new data subsets.
This data is then stored in the enterprise data warehouse, from where it can be moved
into data marts or to an analytical data store that holds data in a more useful form for cer-
tain types of analysis. Metadata (data that defines the data in the data warehouse) is stored
in a metadata repository and cataloged by a metadata directory. Finally, a variety of ana-
lytical software tools can be provided to query, report, mine, and analyze the data for
delivery via Internet and intranet Web systems to business end users. See Figure 5.18.
Revenue: Closing In the late 1990s the state of Iowa had a tax gap, a polite way of describing compa-
the Gap with a nies and individuals who either didn’t file state tax returns or who underreported
their earnings. To identify noncompliant taxpayers, the Iowa Department of
Data Warehouse Revenue and Finance (IDRF) relied on a jumble of nonintegrated mainframe
applications, file extracts, and over 20 disparate stand-alone systems (databases,
22. 170 ● Module II / Information Technologies
FIGURE 5.18 Applications Data Marts
A data warehouse and its
data mart subsets hold data
that have been extracted Finance
ERP
from various operational
databases for business
analysis, market research,
decision support, and data
mining applications. Inventory
control
Marketing
Logistics
Data
Warehouse Sales
Shipping
Accounting
Purchasing
CRM
Management
reporting
mainframe data, and information on individual spreadsheets, to name a few).
The real problem was that none of these systems could communicate with each
other. What was needed was a central data warehouse to pull together information
from all those systems for analysis. But getting funding from the state for such a
large-scale project wasn’t an option.
So the IDRF came up with a plan the Iowa Legislature couldn’t help but ap-
prove. The plan was simple: Build a data warehouse that would be entirely funded
using the additional tax revenue it generated by catching tax scofflaws.
Development of the data warehouse began in November 1999, and it became
operational five months later. The system combines data from the department’s
own tax and accounts receivable systems, tax files shared by the federal Internal
Revenue Service, the Iowa Workforce Development Agency, and a number of other
sources. Revenue- and finance-department employees analyze the data using com-
mercially available reporting software.
In the three years since it went live, the IDRF data warehouse has generated $28
million in tax revenue and is expected to generate $10 million each year from now
on. There’s no question the project has paid for itself many times over, and the state
of Iowa is sold on the value of data warehousing. The next step is to use the data
warehouse to better understand why taxpayers might be in noncompliance. That will
involve analyzing taxpayer demographics and changes in tax laws and policies. This
phase of the project is also expected to generate revenues for the state while simulta-
neously helping to improve the tax laws for the citizens of Iowa [12, 13].