This document is about Data Warehouse Tools such as:
OLAP (On – line Analytical Processing)
OLTP (On – Line Transaction Processing)
Business Intelligence
Driving Force
Data Mart
Meta Data
2. PAGE 1
OLAP (On – line Analytical Processing):
Data warehouse serve users and knowledge in the role of data analysis and
decision making. Such systems can organize & present data in various formats as
per needs of different users. These systems are known as on – line analytical
processing (OLAP) systems.
DW or data marts, without concerns regarding how or where the data are stored.
specifically designed to support and operate on multi-dimensional data structure.
ervers must
consider data storage issues.
-Line Analytical Processing.
dimensions such as time, geography, gender, product, etc.
warehouse server for OLAP processing include the
following
OLAP (HOLAP).
OLAP Types:
1. Relational OLAP (ROLAP) servers
2. Multidimensional OLAP (MOLAP) servers
3. Hybrid OLAP (HOLAP) servers
Such systems can organize and present data in various formats in order to
accommodate the diverse needs of the different users. These systems are called on-
line analytical processing (OLAP) systems.
3. PAGE 2
Need of data warehousing and OLAP:
Data warehousing developed, despite the presence of operational databases due to
following reasons:
An operational database is designed and tuned from known tasks and workloads,
such as indexing using primary keys, searching for particular records and
optimizing ‘canned queries’. As data warehouse queries are often complex, they
involve the computation of large groups of data at summarized levels and may
require the use of special data organization, access and implementation methods
based on multidimensional views.
Processing OLAP queries in operational databases would substantially degrade the
performance of operational tasks. Concurrency control and recovery mechanisms,
such as locking and logging are required to ensure the consistency and robustness
of transactions. While and OLAP query often needs read-only access of data
records for summarization and aggregation. Concurrency control and recovery
mechanisms, if applied for such OLAP operations, may jeopardize the execution of
concurrent transactions.
Case Study: Quasi real-time OLAP cubes
Used to quickly analyze and retrieve data from different perspectives
OLAP Facts and dimensions:
Every "cell" in an OLAP cube contains numeric data a.k.a "measures".
Every "cell" may contain more than one measure, e.g. forecast and outcome.
Every "cell" has a unique combination of dimension values.
MS OLAP cube partitioning – details:
Every cube partition has its own query to define the data set fetched from
the data source
The SQL statements define the non-overlapping data sets.
4. PAGE 3
OLTP (On – Line Transaction Processing):
OLTP: The online operational Database System that performs online transaction
and query processing is called on – Line transaction Processing (OLTP) systems.
Ex. “Day to day” operations of organizations, such as purchasing, inventory,
manufacturing, banking, payroll registration, and accounting.
OLTP [On-line Transaction Processing] is characterized by a large number of short
on-line transactions (INSERT, UPDATE, and DELETE). The main emphasis for
OLTP systems is put on very fast query processing, maintaining data integrity in
multi-access environments and an effectiveness measured by number of
transactions per second. In OLTP database there is detailed and current data, and
schema used to store transactional databases is the entity model (usually 3NF).
The job of earlier on-line operational systems was to perform transaction and
query processing. So, they are also termed as on-line transaction processing
systems (OLTP).Data warehouse systems serve users or knowledge workers in the
role of data analysis and decision-making.
Case Study: An OLTP Application on an SMP Platform
Diversified Electronics is engaged in e-commerce and sells electronics products
over the Web. Their product line ranges from cameras to camcorders to audio-
visual equipment and accessories. The company maintains detailed technical
information on hundreds of products from hundreds of manufacturers. Customers
log on to the web site, pick items to buy, choose a method of delivery, and make
payment by providing a credit card number in a secured environment. The
database is the backbone of the web-based ordering system. At peak times,
between 200 and 400 users access the database. The ordering system processes
about 10,000 orders in an average month. The current size of the database is 20
GB.
The customer base of Diversified Electronics is growing at a rapid rate. Because of
its rapid growth, the company can’t afford to thrash around and re-architect their
existing database platform. Instead, Diversified Electronics wants to maintain its
database application on a scalable environment that can grow to match the
anticipated growth in load for the next few years.
5. PAGE 4
Business Intelligence:
Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages methods and
technologies that focus on counts, statistics and business objectives to improve
business performance.
The objective of Business Intelligence is to better understand customers and
improve customer service, make the supply and distribution chain more efficient,
and to identify and address business problems and opportunities quickly.
Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages technologies
that focus on counts, statistics and business objectives to improve business
performance.
Case Study: Infosys - Service Offerings Business Intelligence
The exponential growth of information, heterogeneous silos, unstructured formats
and poor data quality pose challenges in information management. They prevent
businesses from utilizing information effectively.
Business Intelligence (BI) and Data Warehousing (DW) address these challenges
by unearthing the hidden value in information assets to facilitate informed
decisions.
Infosys offers end-to-end BI and DW services – Reporting and Analytics,
Maintenance and Support. Our services cover Business Intelligence road map, data
warehousing implementation, analytics, data mining, data quality and Master Data
Management. Our business result-oriented approach ensures return on
information.
Infosys offerings include:
BI strategy development
BI and DW governance consulting
BI and DW architecture development
6. PAGE 5
Strategic audit
Taxonomy implementation consulting
Migration strategy and planning, cross-platform migration and version
upgrades
BI services in a SaaS model
SOA-enabled BI framework
Driving Force:
Case Study: Database Market: Growing with Data Warehouse a Driving Force
The 2005 worldwide relational database (RDBMS) market was approximately $14
billion in revenue (IDC estimates $14.6 billion while Gartner, Inc believes it was
$13.8 billion) and experienced a healthy upper single-digit growth (IDC 9.4%
increase vs. Gartner’s 8.3% increase.) It has been said many times over the last
decade that RDBMS growth would slow significantly and that the market was
saturated, but the industry continues to underestimate the demand created by
business, competitive and governmental data-driven initiatives. Businesses crave
data (actually information) to monitor, measure and engage in performance
management. It is no longer a nice-to-have, but a must-have in order to compete
and operate in today’s business climate.
Colleen Graham, principal analyst at Gartner, says “The market will also
experience increased demand from organizations buying relational database
management systems for business intelligence and data warehousing activities.”
IDC states that "Data warehousing remains a major driver of RDBMS growth."
The top RDBMS by market share are Oracle, IBM and Microsoft. The latter’s share
had been increasing before its latest release SQL Server 2005. Microsoft is “moving
up the food chain” (paraphrased from IDC) by being able to handle increasing
workload and sophistication. Microsoft is enabling RDBMS to be pervasive in the
SMB (small-to-medium business) market, as well as putting pricing pressure
downward in the Fortune 500 market. Open source RDBMS is attracting attention
but has yet to make a significant dent in the marketplace other than adding to the
pricing pressure from Microsoft on the other database giants.
7. PAGE 6
Data Mart:
A Data Mart is a subset of data from a Data Warehouse. Data Marts are built for
specific user groups. They contain a subset of rows and columns that are of
interest to the particular audience. By providing decision makers with only a
subset of the data from the Data Warehouse, privacy, performance and clarity
objectives can be attained.
There are different types of Data Marts. A Data Mart can be a physically separate
data store from the Corporate Data Warehouse or it can be a logical "view" of rows
and columns from the Warehouse. Data Marts can be architected to support
online queries and data mining (i.e. dimensional design) or then can be designed
to support more conventional reporting needs (i.e. relational design).
In some organizations, a Data Warehouse might not physically exist. Logically,
however, it exists as the sum of all the "conformed" Data Marts.
Case Study: Customer Services Example of Data Mart
8. PAGE 7
Meta Data:
Meta data is “data about data” or Meta data is “the data used to define other data”.
It specifies source, values, usage and features of DWH data and defines how data
can be changed and processed at every architecture layer. Meta data is stored in a
Meta data repository which all the other architecture components can access.
Meta data is a critical need for using, building, and administering the data
warehouse. For endusers, metadata is like a roadmap to the data warehouse
contents. A Meta data repository is like a general-purpose information directory
that includes several enhancing functions.
According to Kelly, a tool for Meta data management should:
security.
to navigate and query Meta data.
formats.
Categories of Metadata:
Metadata can be broadly categorized into three categories:
Business Metadata - It has the data ownership information, business
definition, and changing policies.
Technical Metadata - It includes database system names, table and column
names and sizes, data types and allowed values. Technical metadata also
includes structural information such as primary and foreign key attributes
and indices.
Operational Metadata - It includes currency of data and data lineage.
Currency of data means whether the data is active, archived, or purged.
9. PAGE 8
Lineage of data means the history of data migrated and transformation
applied on it.
Challenges for Metadata Management:
The importance of metadata cannot be overstated. Metadata helps in
driving the accuracy of reports, validates data transformation, and ensures
the accuracy of calculations. Metadata also enforces the definition of
business terms to business end-users.
Metadata in a big organization is scattered across the organization. This
metadata is spread in spreadsheets, databases, and applications.
Metadata could be present in text files or multimedia files. To use this data
for information management solutions, it has to be correctly defined.
There are no industry-wide accepted standards. Data management solution
vendors have narrow focus.
There are no easy and accepted methods of passing metadata.
Case Study: Improving Web Search Using Metadata
The majority of Web pages today are generated from databases, and Web site
owners increasingly are providing APIs to this data or embedding information
inside their HTML pages with micro formats, eRDF, or RDFa. In other cases,
structured data can be extracted with relative ease from Web pages that follow a
template using XSLT style sheets.
Architecture - The high level architecture of the system can be almost entirely
reconstructed from the description. The user’s applications trigger on URLs in the
search result page, transforming the search results. The inputs of the system are as
follows:
Metadata embedded inside HTML pages (microformats, eRDF, RDFa) and
collected by Yahoo Slurp, the Yahoo crawler during the regular crawling
process.
Custom data services extract metadata from HTML pages using XSLT or
they wrap APIs implemented as Web Services.
Metadata can be submitted by publishers. Feeds are polled at regular
intervals.