Case Study Real Time Olap Cubes

Case study:

Quasi real-time OLAP cubes

by Ziemowit Jankowski
Database Architect

OLAP Cubes - what is it?

• Used to quickly analyze and retrieve data from different
perspectives
• Numeric data
• Structured data:
o can be represented as numeric values (or sets
thereof) accessed by a composite key
o each of the parts of the composite key belongs to a
well-defined set of values
• Facts = numeric values
• Dimensions = parts of the composite key
• Source = usually a start or snowflake schema in a
relational DB (other sources possible)

OLAP Cubes - data sources

Star schema Snowflake schema

OLAP Facts and dimensions

• Every "cell" in an OLAP cube
contains numeric data a.k.a
"measures".
• Every "cell" may contain more than
one measure, e.g. forecast and
outcome.
• Every "cell" has a unique
combination of dimension values.

OLAP Cubes - operations

• Slice = choose values corresponding to
ONE value on one or more dimensions

• Dice = choose values corresponding to
one slice or a number of consecutive
slices on more than 2 dimensions of
the cube

OLAP Cubes - operations (cont'd)

• Drill down/up = choose lower/higher
level details. Used in context of
hierarchical dimensions.

• Pivot = rotate the orientation of the
data for reporting purposes

• Roll-up

OLAP Cubes - refresh methods
• Incremental:
o possible when cubes grow
"outwards", i.e. no "scattered"
changes in data
o only delta data need to be read
o refresh may be fast if delta is small
• Full:
o possible for all cubes, even when
changes are "scattered" all over
thedata
o all data need to be re-read with
every
o refresh may take long time (hours)

The situation on hand

• Business operating on 24*6 basis (Sun-Fri)
• Events from production systems are aggregated into flows
and production units
• Production figures may be adjusted manually long after
production date
• Daily production figures are basis for daily forecasts with
the simplified formula:
forecast(yearX) = production(yearX-1) * trend(yearX) + manualFcastAdjustm
• Adjustments in production figures will alter forecast
figures
• Outcome and forecast should be stored in MS OLAP cubes
as per software architecture demands
• The system should simplify comparisons between forecast
and outcome figures

Software

• Source of data:
o Relational database
o Oracle 10g database
o extensive use of PL/SQL in database
• Destination of data:
o OLAP cubes - MS SQL Server Analysis Services (version
2005 and 2008)
• Other software:
o MS SQL Server database

QUESTION

Can we get almost real-time reports from MS OLAP cubes?

ANSWER

YES! The answer lies in "cube partitioning".

Cube partitioning - the basics

• Cube partitions may be updated independently
• Cube partitions may not overlap (duplicate values may
occur)
• Time is a good dimension to partition on

MS OLAP cube partitioning - details

• Every cube partition has its own query to define the data
set fetched from the data source
• The SQL statements define the non-overlapping data sets

MS OLAP cube partitioning - details

How to partition? - theory

• Partitions with different lengths and different update
frequencies:
o current data = very small partition, very short update
times, updated often
o "not very current" data = a bit larger partition, longer
update times, updated less often
o historical data = large partition, long update times,
updated seldom
• Operation 24x6 delivers the "seldom" window

How to partition? - theory cont'd

• One cube for both forecast and outcome

Solution - approach one

Decisions:
• Cubes partitioned on date boundaries
• MOLAP cubes (for better queryperformance)
• Use SSIS to populate cubes
o dimensions populated by incremental processing
o facts populated by full processing
o jobs for historical data must be run after midnight to
compensate for date change

Actions:
• Cubes built
• SSIS deployed inside SQL Server (and not filesystem)
• SSIS set up as scheduled database jobs

Did it work?

No!
Malfunctions:
• Simultaneous updates of cube partitions could lead to
deadlocks
• Deadlocks left cube partitions in unprocessed state

Amendment:
• Cube partitions must not be updated simultaneously

Solution - approach two

Decisions:
• Cube processing must be ONE partition at a time
• Scheduling done by SSIS "super package":
o SQL Server table contains approx. frequency and
package names
o "super package" executes SSIS packages as indicated by
the table

Actions:
• Scheduling table created
• "Super package" created to be self-modifying

Did it work?

Not really!
Malfunctions:
• Historical data had to be updated after midnight and real-
time updates for "Now" partition were postponed. This was
done to avoid "gaps" in outcome data and "overlappings" in
forecast data.
• Real-time updates ended soon after midnight and were
resumed a few hours later. (That was NOT acceptable.)

Amendment:
• Re-think!

Solution - approach three

Decisions:
• Take advantage of 6*24 cycle (as opposed to 7*24)
• Switch dates on Saturdays only
o the "Now" partition had to stretch from Saturday to
Saturday
o all other partitions had to stretch from a Saturday to
another Saturday
• Re-process all time-consuming partitions on Saturday after
switch of date

Solution - approach three cont'd

Actions:
• Create logic in Oracle database to do date calculations
"modulo week", i.e. based on Saturday. Logic implemented
as function.
• Rewrite SQL statements for cube partitions so that they
employ the Oracle function (as above) instead of current
date +/- given number of days.
• Reschedule the time consuming updates so they run every
7th day.

Did it work?

Yes!
Malfunctions:
• None, really.

Lessons learned

• It is possible to build real-time OLAP cubes in MS
technology
• It is possible to make the partitions self-maintaining in
terms of partition boundaries
• The concept need careful engineering as there are pits in
the way.

Omitted details

Some details have been omitted:
• the quasi real-time updates are scheduled to occur every
2nd or 3rd minute
• scheduling is not exact, as the Super-job keeps track of
what is to be run and when and executes SSIS packages
based on "scheduled-to-run" state, their priority and a few
other criteria
• the source of data is not a proper star schema, it is rather
an emulation of facts and dimensions by means of data
tables and views in Oracle.

Case Study Real Time Olap Cubes

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Case Study Real Time Olap Cubes

Similaire à Case Study Real Time Olap Cubes (20)

Case Study Real Time Olap Cubes