SlideShare une entreprise Scribd logo
1  sur  24
Case study:
               
Quasi real-time OLAP cubes

      by Ziemowit Jankowski
        Database Architect
OLAP Cubes - what is it?

• Used to quickly analyze and retrieve data from different
  perspectives
• Numeric data
• Structured data:
   o can be represented as numeric values (or sets
     thereof) accessed by a composite key
   o each of the parts of the composite key belongs to a
     well-defined set of values
• Facts = numeric values
• Dimensions = parts of the composite key
• Source = usually a start or snowflake schema in a
  relational DB (other sources possible)
OLAP Cubes - data sources

Star schema        Snowflake schema
OLAP Facts and dimensions

• Every "cell" in an OLAP cube
  contains numeric data a.k.a
  "measures".
• Every "cell" may contain more than
  one measure, e.g. forecast and
  outcome.
• Every "cell" has a unique
  combination of dimension values.
OLAP Cubes - operations

• Slice = choose values corresponding to
  ONE value on one or more dimensions




• Dice = choose values corresponding to
  one slice or a number of consecutive
  slices on more than 2 dimensions of
  the cube
OLAP Cubes - operations (cont'd)

• Drill down/up = choose lower/higher
  level details. Used in context of
  hierarchical dimensions.




• Pivot = rotate the orientation of the
  data for reporting purposes


• Roll-up
OLAP Cubes - refresh methods
• Incremental:
   o possible when cubes grow
     "outwards", i.e. no "scattered"
     changes in data
   o only delta data need to be read
   o refresh may be fast if delta is small
• Full:
   o possible for all cubes, even when
     changes are "scattered" all over
     thedata
   o all data need to be re-read with
     every 
   o refresh may take long time (hours)
The situation on hand

• Business operating on 24*6 basis (Sun-Fri)
• Events from production systems are aggregated into flows
  and production units
• Production figures may be adjusted manually long after
  production date
• Daily production figures are basis for daily forecasts with
  the simplified formula:
  forecast(yearX) = production(yearX-1) * trend(yearX) + manualFcastAdjustm
• Adjustments in production figures will alter forecast
  figures
• Outcome and forecast should be stored in MS OLAP cubes
  as per software architecture demands
• The system should simplify comparisons between forecast
  and outcome figures
Software

• Source of data:
   o Relational database
   o Oracle 10g database
   o extensive use of PL/SQL in database
• Destination of data:
   o OLAP cubes - MS SQL Server Analysis Services (version
     2005 and 2008)
• Other software:
   o MS SQL Server database
QUESTION

Can we get almost real-time reports from MS OLAP cubes?




ANSWER

YES! The answer lies in "cube partitioning".
Cube partitioning - the basics

• Cube partitions may be updated independently
• Cube partitions may not overlap (duplicate values may
  occur)
• Time is a good dimension to partition on
MS OLAP cube partitioning - details

• Every cube partition has its own query to define the data
  set fetched from the data source
• The SQL statements define the non-overlapping data sets
MS OLAP cube partitioning - details
How to partition? - theory

• Partitions with different lengths and different update
  frequencies:
   o current data = very small partition, very short update
     times, updated often
   o "not very current" data = a bit larger partition, longer
     update times, updated less often
   o historical data = large partition, long update times,
     updated seldom
• Operation 24x6 delivers the "seldom" window
How to partition? - theory cont'd

• One cube for both forecast and outcome
Solution - approach one

Decisions:
 • Cubes partitioned on date boundaries
 • MOLAP cubes (for better queryperformance)
 • Use SSIS to populate cubes
    o dimensions populated by incremental processing
    o facts populated by full processing
    o jobs for historical data must be run after midnight to
      compensate for date change

Actions:
 • Cubes built
 • SSIS deployed inside SQL Server (and not filesystem)
 • SSIS set up as scheduled database jobs
Did it work?


No!
Malfunctions:
• Simultaneous updates of cube partitions could lead to
  deadlocks
• Deadlocks left cube partitions in unprocessed state

Amendment:
 • Cube partitions must not be updated simultaneously
Solution - approach two

Decisions:
 • Cube processing must be ONE partition at a time
 • Scheduling done by SSIS "super package":
    o SQL Server table contains approx. frequency and
      package names
    o "super package" executes SSIS packages as indicated by
      the table

Actions:
 • Scheduling table created
 • "Super package" created to be self-modifying
Did it work?


Not really!
Malfunctions:
• Historical data had to be updated after midnight and real-
  time updates for "Now" partition were postponed. This was
  done to avoid "gaps" in outcome data and "overlappings" in
  forecast data.
• Real-time updates ended soon after midnight and were
  resumed a few hours later. (That was NOT acceptable.)

Amendment:
 • Re-think!
Solution - approach three

Decisions:
 • Take advantage of 6*24 cycle (as opposed to 7*24)
 • Switch dates on Saturdays only
    o the "Now" partition had to stretch from Saturday to
      Saturday
    o all other partitions had to stretch from a Saturday to
      another Saturday
 • Re-process all time-consuming partitions on Saturday after
   switch of date
Solution - approach three cont'd

Actions:
 • Create logic in Oracle database to do date calculations
   "modulo week", i.e. based on Saturday. Logic implemented
   as function.
 • Rewrite SQL statements for cube partitions so that they
   employ the Oracle function (as above) instead of current
   date +/- given number of days.
 • Reschedule the time consuming updates so they run every
   7th day.
Did it work?


Yes!
Malfunctions:
• None, really.
Lessons learned

• It is possible to build real-time OLAP cubes in MS
  technology
• It is possible to make the partitions self-maintaining in
  terms of partition boundaries
• The concept need careful engineering as there are pits in
  the way.
Omitted details

Some details have been omitted:
 • the quasi real-time updates are scheduled to occur every
   2nd or 3rd minute
 • scheduling is not exact, as the Super-job keeps track of
   what is to be run and when and executes SSIS packages
   based on "scheduled-to-run" state, their priority and a few
   other criteria
 • the source of data is not a proper star schema, it is rather
   an emulation of facts and dimensions by means of data
   tables and views in Oracle.

Contenu connexe

Tendances

Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemBojan Babic
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyMark Ginnebaugh
 

Tendances (20)

Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Thinking big
Thinking bigThinking big
Thinking big
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 

En vedette

Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Cosmin Lehene
 
Olap Cube Design
Olap Cube DesignOlap Cube Design
Olap Cube Designh1m
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastoreKishore Gopalakrishna
 
MS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data miningMS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data miningDataminingTools Inc
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Issac Buenrostro
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Ziemowit Jankowski
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?DataWorks Summit
 
An introduction to Pentaho
An introduction to PentahoAn introduction to Pentaho
An introduction to PentahoMike Frampton
 
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Indus Khaitan
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP OverviewAlex Meadows
 
Multidimensional Data Analysis with Ruby (sample)
Multidimensional Data Analysis with Ruby (sample)Multidimensional Data Analysis with Ruby (sample)
Multidimensional Data Analysis with Ruby (sample)Raimonds Simanovskis
 
Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkStefan Urbanek
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Data Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniData Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniScott Fabini
 

En vedette (20)

Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
 
Olap Cube Design
Olap Cube DesignOlap Cube Design
Olap Cube Design
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
MS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data miningMS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data mining
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
OLAP
OLAPOLAP
OLAP
 
OLAP
OLAPOLAP
OLAP
 
An introduction to Pentaho
An introduction to PentahoAn introduction to Pentaho
An introduction to Pentaho
 
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Multidimensional Data Analysis with Ruby (sample)
Multidimensional Data Analysis with Ruby (sample)Multidimensional Data Analysis with Ruby (sample)
Multidimensional Data Analysis with Ruby (sample)
 
Tutorial olap4j
Tutorial olap4jTutorial olap4j
Tutorial olap4j
 
Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP Framework
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Data Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniData Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-Fabini
 

Similaire à Case Study Real Time Olap Cubes

Azure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsAzure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsITProceed
 
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...Neo4j
 
ROLAP partitioning in MS SQL Server 2016
ROLAP partitioning in MS SQL Server 2016ROLAP partitioning in MS SQL Server 2016
ROLAP partitioning in MS SQL Server 2016Andrej Zafka
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...Mohamed Sayed
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresJitendra Singh
 
An Approach to Sql tuning - Part 1
An Approach to Sql tuning - Part 1An Approach to Sql tuning - Part 1
An Approach to Sql tuning - Part 1Navneet Upneja
 
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...Neo4j
 
Oracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachOracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachLaurent Leturgez
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSkillwise Group
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTPConnor McDonald
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDenny Lee
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
Azure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solutionAzure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solutionGelis Wu
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
 

Similaire à Case Study Real Time Olap Cubes (20)

SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
 
sqltuning101-170419021007-2.pdf
sqltuning101-170419021007-2.pdfsqltuning101-170419021007-2.pdf
sqltuning101-170419021007-2.pdf
 
Azure stream analytics by Nico Jacobs
Azure stream analytics by Nico JacobsAzure stream analytics by Nico Jacobs
Azure stream analytics by Nico Jacobs
 
SQLDay2013_MarcinSzeliga_StoredProcedures
SQLDay2013_MarcinSzeliga_StoredProceduresSQLDay2013_MarcinSzeliga_StoredProcedures
SQLDay2013_MarcinSzeliga_StoredProcedures
 
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
 
ROLAP partitioning in MS SQL Server 2016
ROLAP partitioning in MS SQL Server 2016ROLAP partitioning in MS SQL Server 2016
ROLAP partitioning in MS SQL Server 2016
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
 
An Approach to Sql tuning - Part 1
An Approach to Sql tuning - Part 1An Approach to Sql tuning - Part 1
An Approach to Sql tuning - Part 1
 
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...
 
Oracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachOracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approach
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Azure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solutionAzure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solution
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 

Case Study Real Time Olap Cubes

  • 1. Case study:   Quasi real-time OLAP cubes by Ziemowit Jankowski Database Architect
  • 2. OLAP Cubes - what is it? • Used to quickly analyze and retrieve data from different perspectives • Numeric data • Structured data: o can be represented as numeric values (or sets thereof) accessed by a composite key o each of the parts of the composite key belongs to a well-defined set of values • Facts = numeric values • Dimensions = parts of the composite key • Source = usually a start or snowflake schema in a relational DB (other sources possible)
  • 3. OLAP Cubes - data sources Star schema Snowflake schema
  • 4. OLAP Facts and dimensions • Every "cell" in an OLAP cube contains numeric data a.k.a "measures". • Every "cell" may contain more than one measure, e.g. forecast and outcome. • Every "cell" has a unique combination of dimension values.
  • 5. OLAP Cubes - operations • Slice = choose values corresponding to ONE value on one or more dimensions • Dice = choose values corresponding to one slice or a number of consecutive slices on more than 2 dimensions of the cube
  • 6. OLAP Cubes - operations (cont'd) • Drill down/up = choose lower/higher level details. Used in context of hierarchical dimensions. • Pivot = rotate the orientation of the data for reporting purposes • Roll-up
  • 7. OLAP Cubes - refresh methods • Incremental: o possible when cubes grow "outwards", i.e. no "scattered" changes in data o only delta data need to be read o refresh may be fast if delta is small • Full: o possible for all cubes, even when changes are "scattered" all over thedata o all data need to be re-read with every  o refresh may take long time (hours)
  • 8. The situation on hand • Business operating on 24*6 basis (Sun-Fri) • Events from production systems are aggregated into flows and production units • Production figures may be adjusted manually long after production date • Daily production figures are basis for daily forecasts with the simplified formula: forecast(yearX) = production(yearX-1) * trend(yearX) + manualFcastAdjustm • Adjustments in production figures will alter forecast figures • Outcome and forecast should be stored in MS OLAP cubes as per software architecture demands • The system should simplify comparisons between forecast and outcome figures
  • 9. Software • Source of data: o Relational database o Oracle 10g database o extensive use of PL/SQL in database • Destination of data: o OLAP cubes - MS SQL Server Analysis Services (version 2005 and 2008) • Other software: o MS SQL Server database
  • 10. QUESTION Can we get almost real-time reports from MS OLAP cubes? ANSWER YES! The answer lies in "cube partitioning".
  • 11. Cube partitioning - the basics • Cube partitions may be updated independently • Cube partitions may not overlap (duplicate values may occur) • Time is a good dimension to partition on
  • 12. MS OLAP cube partitioning - details • Every cube partition has its own query to define the data set fetched from the data source • The SQL statements define the non-overlapping data sets
  • 13. MS OLAP cube partitioning - details
  • 14. How to partition? - theory • Partitions with different lengths and different update frequencies: o current data = very small partition, very short update times, updated often o "not very current" data = a bit larger partition, longer update times, updated less often o historical data = large partition, long update times, updated seldom • Operation 24x6 delivers the "seldom" window
  • 15. How to partition? - theory cont'd • One cube for both forecast and outcome
  • 16. Solution - approach one Decisions: • Cubes partitioned on date boundaries • MOLAP cubes (for better queryperformance) • Use SSIS to populate cubes o dimensions populated by incremental processing o facts populated by full processing o jobs for historical data must be run after midnight to compensate for date change Actions: • Cubes built • SSIS deployed inside SQL Server (and not filesystem) • SSIS set up as scheduled database jobs
  • 17. Did it work? No! Malfunctions: • Simultaneous updates of cube partitions could lead to deadlocks • Deadlocks left cube partitions in unprocessed state Amendment: • Cube partitions must not be updated simultaneously
  • 18. Solution - approach two Decisions: • Cube processing must be ONE partition at a time • Scheduling done by SSIS "super package": o SQL Server table contains approx. frequency and package names o "super package" executes SSIS packages as indicated by the table Actions: • Scheduling table created • "Super package" created to be self-modifying
  • 19. Did it work? Not really! Malfunctions: • Historical data had to be updated after midnight and real- time updates for "Now" partition were postponed. This was done to avoid "gaps" in outcome data and "overlappings" in forecast data. • Real-time updates ended soon after midnight and were resumed a few hours later. (That was NOT acceptable.) Amendment: • Re-think!
  • 20. Solution - approach three Decisions: • Take advantage of 6*24 cycle (as opposed to 7*24) • Switch dates on Saturdays only o the "Now" partition had to stretch from Saturday to Saturday o all other partitions had to stretch from a Saturday to another Saturday • Re-process all time-consuming partitions on Saturday after switch of date
  • 21. Solution - approach three cont'd Actions: • Create logic in Oracle database to do date calculations "modulo week", i.e. based on Saturday. Logic implemented as function. • Rewrite SQL statements for cube partitions so that they employ the Oracle function (as above) instead of current date +/- given number of days. • Reschedule the time consuming updates so they run every 7th day.
  • 23. Lessons learned • It is possible to build real-time OLAP cubes in MS technology • It is possible to make the partitions self-maintaining in terms of partition boundaries • The concept need careful engineering as there are pits in the way.
  • 24. Omitted details Some details have been omitted: • the quasi real-time updates are scheduled to occur every 2nd or 3rd minute • scheduling is not exact, as the Super-job keeps track of what is to be run and when and executes SSIS packages based on "scheduled-to-run" state, their priority and a few other criteria • the source of data is not a proper star schema, it is rather an emulation of facts and dimensions by means of data tables and views in Oracle.