SlideShare une entreprise Scribd logo
1  sur  64
BIUG

Itay Braun
CTO
itay@twingo.co.il
Agenda

l Dimension Design
l SSAS Best Practices
l MDX

l Inspired by Vincent Rainardi (http://sqlbits.com/ )
  and Mosha Pumanski
l
                     l
                     l
BI-   DB             l
                     l
                     l
           DB- BI-   l
Microstrategy-                 BI         •
                               DWH                      •
                                                        •
                                                             BI

                               SQL SERVER- SYBASE-      •
                                                        •
                                        DB        P T   •   DB
                                     OEM                •



                                                        •
WEB SILVERLIGHT C                                       •
                                                        •
                                                            .NET
White Papers
• Analysis Services 2008 R2 Performance
  Guide
• Analysis Services 2008 Operation
  Guide
• Performance Improvements for MDX in
  SQL Server 2008 Analysis Services
• OLAP Design Best Practices
1 or 2 dimensions
 a) One Dimension                    b) Two Dimensions

                  Dim                                 Dim
                 Account                            Account
    Fact                                Fact
    Table                               Table
                   customer
                                                      Dim
                   attributes
                                                    Customer

                                     • We can get the customer
• Simplicity, 1 dim
                                       attributes without knowing the
• Hierarchy from customer
                                       account key
  attribute &account attribute
                                     • Disadvantage: can‟t go from
• Use when we don‟t have fact
                                       account to customer without
  tables requiring customer grain.
                                       going through the fact table -
                                       performance
1 or 2 dimensions
c) Snowflake

                 Dim          Dim
               Account      Customer
   Fact
   Table       • Dim customer is needed by another fact table
               • Modular: 2 separate dim tables but we can combine
                 them easily to create a bigger dimension
               • To get the breakdown of a measure by a customer
                 attribute is a bit more complicated than a)
 select c. attribute, sum(f.measure1) from fact1 f
 inner join dim_account a on f.account_key = a.account_key
 inner join dim_customer c on a.customer_key = c.customer_key
 group by c. attribute
When to Snowflake
1. When the sub dim is used by several dims
                               City-Country-Region columns exist in
                               DimBroker, DimPolicy, DimOffice and
                               DimInsured
                              Replaced by Location/GeoKey
                              pointing to DimLocation /
                              DimGeography
 Advantage: consistent hierarchy, i.e. relationship between
 City, Country & Region.
 Weakness: we would lose flexibility. City to Country are
 more or less fixed, but the grouping of countries might be
 different between dimensions.
When to Snowflake
2. When the sub dim is used by both the main dim and
the fact table(s)
                    • DimCustomer is used in DimAccount,
                      and is also used in the fact table.
                    • DimManufacturer is used in DimProduct,
                      and is also used in the fact table.
                    • DimProductGroup is used in DimProduct,
                      and is also used in some fact table.

                     The alternative is maintaining two
                     full dimensions (star classic).
When to Snowflake
3. To make “base dim” and “detail dim”
Insurance classes, account types
(banking), product lines, diagnosis,
treatment (health care)
Policies for marine, aviation & property classes have different
attributes.
Pull common attributes into 1 dim: DimBasePolicy
Put class-specific attributes into DimMarine, DimProperty, DimAviation
Ref: Kimball DW Toolkit 2nd edition page 213
A dimension with only 1 attribute

             Should we put the attribute in the fact table?
             (like DD = Degenerate Dim)
             Probably, if the grain = fact table,
             and it‟s short or it‟s a number.
Reasons for putting single attribute in its own dim:
– Keep fact table slim (4 bytes int not 100 bytes varchar)
– When the value changes, we don‟t have to update the
  BIG fact table – ETL performance
– Grain is much lower than fact table – small dim
– Yes it‟s only 1 attribute today, but in the future there
  could be another attribute.
Fact Table Primary Key
Should we have a PK?                            Some experts totally disagree

Yes, if we need to be able to identify each fact row
1. Need to refer to a fact row from another fact row e.g. chain of events
2. Many identical fact rows and we need to update/delete only one
3. To link the fact table to another fact table

Related Trans        Header - Detail           Uniqueness


  PK FK              PK         FK (no RI)        PK
    (not enforced)
                                             previous/next transaction
Fact Table Primary Key
Single or Multi Column?
  Single Column: Generated Identity
  Multi Column: Dimension Keys
Single-column PK is better than multi-column PK because :
1) A multi-column PK may not be unique. A single-column PK
guarantees that the PK is unique, because it is an identity column.
2) A single-column PK is slimmer than a multi-column PK, better query
performance. To do a self join in the fact table (e.g. to link the current
fact row to the previous fact row), we join on a single integer column.
Fact Table Primary Key
• Advantage: Prevent duplicate rows, query performance
• Disadvantage: loading performance
• Indexing the PK: cluster or not?
    – Cluster the PK if: the PK is an identity column
    – Don‟t cluster the PK if: the PK is a composite, or when you need
      the cluster index for query performance (with partitioning)

Example of not having a PK
 If duplicate fact rows are allowed.
 e.g. retail DW: Store Key, Date Key, Product Key, Customer Key
 Same customer buying the same milk in the same shop on the same day
 twice
Aggregate Fact Tables
What are they?
                                                      Base Fact Tables
• High level aggregation of base fact tables
• A “select group by” query on a 2 billion rows
  fact table can take 30 mins if it joins with two
  big fact tables, even with indexes in place
• So we do this query in advance as part of the
  DW load and store it as an Aggregate Fact
  Table                                    30 mins
• The report only takes 1 second to run.
                                                      Aggregate
                                              1 sec   Fact Table

                                 Report
Rapidly Changing Dimension
• Why is it a problem
   – Large SCD2 dim – Attributes change every day
   – Slow query when join with large fact tables
• What to do
   – Put into a separate dim, link direct to fact table.
   – Just store the latest, type 1 attributes (or dual)
   – Store in the fact table (for small attribute, e.g. indicator)

        Type2                 Type2               Type2

        Type2                                    Type1
Very Large Dimension
Why is it a problem
  – SSAS: 4 GB string store limit for dimension
  – SSAS: dim is “select distinct” on each attribute
    – long processing time
  – Difficult to browse high cardinality attribute
  – Join with fact tables – performance
Very Large Dimension
What to do
– Split into 2 dims, same grain. Always cut vertically.
– Remove SCD2, or at least only certain columns.
– Most common: separate the attributes with high cardinality/change
  frequency




        VLD
Real Time Fact Table
•   Reporting the transaction system in real time
•   View to union with the normal fact table, or use partitions
•   Freezing the dims for key lookup, -3 unknown key
•   Key corrections next day

                            Dims as of                    Main partition
                             yesterday                 (up to last night)
Unknown keys:
-1 null in source
-2 not in dim table                                    Real time partition
-3 not in dim table as dim was frozen         dim        (intraday today)
   to be resolved next batch                  key
Dealing with Currency Rates
What for/background/requirements
– Report in 3 reporting currencies, using today rates or past
– Analyse over time without the impact of currency rates (using fixed
  currency rates, e.g. 2010 EOY rates)
– Had the transactions happened today
– Currency rates historical analysis

 Transaction                      DW                        Reporting
  Currency      Transaction     Currency      Reporting     Currency
                   Rates                        Rates
100 countries (many transaction 1 currency   ( 1 reporting 3-4 currencies
40 currencies      dates)       e.g. GBP                  GBP, USD, EUR,
                                                 date)
                                                              Original
Dealing with Currency Rates
• A good example can be found here.
Dealing with Status
What/background
  – Workflow (policies, contracts, documents)
  – Bottleneck analysis (no of days between
    stages)
  – How many on each stage

   Status       Status            Status        Status
     1            2                 4             6
   date1         date2            date3         date4
                         Status            Status
                           3                 5
Dealing with Status
Approaches
– Accumulative Snapshot Fact, 1 row per application
– SCD2 on DimApp                   AppKey AppID StsKey               StsDate Current
                                   1      1     1                    1/3/11     N
– App Status fact table
                                               2        1     2      3/3/11     N
                                               3        1     3      7/3/11     Y
 AppKey   StsKey   StsDateKey
                                               4        2     1      6/3/11     N
 1        1        61
                                               5        2     2      7/3/11     Y
 1        2        63
 1        3        67
 2        1        66
                                AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind
 2        2        67
                                1      1/3/11 1         3/3/11 1         7/3/11 1
                                2      6/3/11 1         7/3/11 1                  0
Referenced Dimensions
• Enables using one “master” member
• Not Snowflake dimension
  – For ex.
    • Dim customers: UK, London, Roman Avramovich.
    • Dim Stores: UK, London, Friendly Bikes Store
  – What is the total revenue from Internet
    customers and stores in London?
MDX optimization Methodology
•   Re-write the MDX code
•   Add Aggregations
•   Add pre-calculated Measure Groups (ETL)
•   Solve the problem using Relational Engine
•   Use .NET Store Procedures.
     – Rarely the problem can be solved using better
       hardware.
• Column based Databases
• Optimizing MDX
  – Baselining Query Speeds
    • Clearing the Analysis Services Caches
    • Clearing the Operating System Caches using
      fsutil.exe or SSAS Stored Proc (codeplex)
    • Identifying and Resolving MDX Query
      Performance Bottlenecks in SQL Server 2005
      Analysis Services
    • Configuring the Analysis Services Query Log
• Cell-by-Cell Mode vs. Subspace Mode
Almost always, performance obtained by
using subspace (or block computation)
mode is superior to that obtained by using
cell-by-cell (nor naïve) mode.
Using Profiler
• So far so good
Doesn‟t use the cache
Subcube
• Granularity
• Slice
Granularity
• Single grain
  – List of GROUP BY attributes in SQL SELECT
• Mixed grain
  – Both Attribute.[All] and Attribute.MEMBERS
Granularity
                  All   Countries,   Countries,
            Country,      All City      Cities
             All City
All
Products




 Products
Slice
• Single member
  – SQL: Where City = „Redmond‟
  – MDX: [City].[Redmond]
• Multiple members
  – SQL: Where City IN („Redmond‟, „Seattle‟)
  – MDX: { [City].[Redmond], [City].[Seattle] }
Slice at granularity
SQL
SELECT Sum(Sales), City FROM Sales_Table
WHERE City IN (‘Redmond’, ‘Seattle’)
GROUP BY City
MDX
SELECT Measures.Sales ON 0
, NON EMPTY {Redmond, Seattle} ON 1
FROM Sales_Cube
Slice below granularity

SQL
SELECT Sum(Sales) FROM Sales_Table
WHERE City IN (‘Redmond’, ‘Seattle’)
MDX
SELECT Measures.Sales ON 0
FROM Sales_Cube
WHERE {Redmond, Seattle}
Examples

             All Years   2005   2006   2007   2008
All Cities
Redmon
   d
Seattle
  New
  York
London
Examples

             All Years   2005    2006      2007         2008
All Cities
Redmon
   d
Seattle
  New
  York
London
                         (Seattle, Year.Year.MEMBERS)
Examples

             All Years   2005       2006     2007         2008
All Cities
Redmon
   d
Seattle
  New
  York
London
                                (Seattle, Year.MEMBERS)
Examples

             All Years   2005     2006     2007      2008
All Cities
Redmon
   d
Seattle
  New
  York
London
                 ({Redmond, Seattle, London}, Year.MEMBERS)
Examples

             All Years   2005      2006       2007      2008
All Cities
Redmon
   d
Seattle
  New
  York
London
                 ({Redmond, Seattle}, {2005, 2006, 2007})
Arbitrary shaped subcubes
•   What is it ?
•   How can it happen ?
•   Why is it so bad ?
•   How to avoid them ?
Arbitrary shaped subcubes

             All Years   2005   2006   2007     2008
All Cities
Redmon
   d
Seattle
  New
  York
Lodnon
 Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS,
                                                   2005))
Arbitrary shaped subcubes

             All Years   2005    2006       2007      2008
All Cities
Redmon
   d
Seattle
   SF
Denver

          CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) –
                                               (Seattle, 2007)
Arbitrary shaped subcubes

             All Years   2005   2006      2007       2008
All Cities
Redmon
   d
Seattle
  New
  York
London
{(Redmond,2005), (Seattle, 2006), (New York, 2007), (London,
                                                      2008)}
Arbitrary shaped subcubes

             All Years   2005   2006      2007      2008
All Cities
Redmon
   d
Seattle
  New
  York
London
   Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All
                                                  Years]))
Arbitrary shapes
• WHERE/Subselect/Aggregate
• Unnatural hierarchies
• Parent-Child (visual totals)
• “Non Leaves” subcube
• Conditional logic (IIF, IF, CASE,
  CoalesceEmpty etc)
• NonEmpty, Exists
WHERE/Subselect
• Severity = „1‟ OR Priority = „1‟
• multiselect
  – {USA, London}
Mixed grain slicer
                       All



          USA                         UK


                New
Seattle                      London        Bristol
                York
Mixed grain slicer
                                          All



                             USA                         UK


                                   New
                 Seattle                        London        Bristol
                                   York

                All Cities     Seattle    New York       London         Bristol
All Countries
    USA
    UK
Parent-child
Leaves vs. Non Leaves
                All   Countries,   Countries,
          Country,      All City      Cities
           All City
    All
Product
      s




Product                                Leaves
      s
Problems with arbitrary shapes
•    Caching
•    Partition slices
•    Indexes
•    SCOPEs
•    Matching calculations
•    Many more
(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)
SCOPE
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members,
                 { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } )
);
    ...;
END SCOPE;
Subcube decomposition
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members,
                 { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } )
);
    ...;
END SCOPE;




                                                  Scope 2




                Scope 3                           Scope 1
Subcube decomposition
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ),
         Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember )
    ...;
END SCOPE;
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         [DateTool].[Aggregation].DefaultMember,
         Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember )
    ...;
END SCOPE;
SCOPE ( [Date].[Month of Year].[All Periods],
         [Date].[Month Name].[All],
         Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ),
         [DateTool].[Comparison].DefaultMember
    ...;
END SCOPE;
MDX Optimization - Tips
• Partial expressions are not cached
This = iif(<expensive expression >= 0, 1/<expensive expression>, null);

create member currentcube.measures.MyPartialExpression as <expensive
expression> , visible=0;
this = iif(measures.MyPartialExpression >= 0, 1/
measures.MyPartialExpression, null);
Demo
SSAS Denali
• Coming in the first half of 2012
• SSAS Tabular Mode
  – Cheaper
  – Not best of breed
  – Uses DAX or MDX
• Have you started working with it?
Mobile BI

                          BI               l

          Smart Phone                      l
                     BI                    l

                               Mobile Bi
                                BI

Gartner
Social BI


• Discover New Insights - Analyze the
  demographic and psychographic profiles
  of your Facebook application users.
• Analyze Facebook Data - Analyze the full
  spectrum of Facebook data: profiles,
  interests, check-ins, and more
• Instantly Available via Cloud
Social BI
• Deep Personalization



• Enterprise Data Integration
Survey
• SQL / SSAS Denali
• Mobile BI
• Social BI
Biug 20112026   dimensional modeling and mdx best practices

Contenu connexe

En vedette (7)

Microstrategy Overview (Hebrew)
Microstrategy Overview (Hebrew)Microstrategy Overview (Hebrew)
Microstrategy Overview (Hebrew)
 
Journées SQL Server 2012 - DAX pour les fans de MDX
Journées SQL Server 2012 - DAX pour les fans de MDXJournées SQL Server 2012 - DAX pour les fans de MDX
Journées SQL Server 2012 - DAX pour les fans de MDX
 
Le reporting BI dans tous ses états / quel outil pour quel usage
Le reporting BI dans tous ses états / quel outil pour quel usage Le reporting BI dans tous ses états / quel outil pour quel usage
Le reporting BI dans tous ses états / quel outil pour quel usage
 
Extreme SSAS - Part II
Extreme SSAS - Part IIExtreme SSAS - Part II
Extreme SSAS - Part II
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011
 
JSS2014 – Azure ML et Data Mining SSAS
JSS2014 – Azure ML et Data Mining SSASJSS2014 – Azure ML et Data Mining SSAS
JSS2014 – Azure ML et Data Mining SSAS
 
[JSS2015] Nouveautés SSIS SSRS 2016
[JSS2015] Nouveautés SSIS SSRS 2016[JSS2015] Nouveautés SSIS SSRS 2016
[JSS2015] Nouveautés SSIS SSRS 2016
 

Similaire à Biug 20112026 dimensional modeling and mdx best practices

Solutions for Sage Customers from Robert Lavery
Solutions for Sage Customers from Robert LaverySolutions for Sage Customers from Robert Lavery
Solutions for Sage Customers from Robert Lavery
Suzanne Spear
 
Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013
Sameer Wadkar
 
The final frontier
The final frontierThe final frontier
The final frontier
Terry Bunio
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
Noam Sheffer
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
Andrew Brust
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and future
Nir Rubinstein
 

Similaire à Biug 20112026 dimensional modeling and mdx best practices (20)

Solutions for Sage Customers from Robert Lavery
Solutions for Sage Customers from Robert LaverySolutions for Sage Customers from Robert Lavery
Solutions for Sage Customers from Robert Lavery
 
Advanced dimensional modelling
Advanced dimensional modellingAdvanced dimensional modelling
Advanced dimensional modelling
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
 
6910 week 3 - web metircs and tools
6910   week 3 - web metircs and tools6910   week 3 - web metircs and tools
6910 week 3 - web metircs and tools
 
Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013Big Data presentation at GITPRO 2013
Big Data presentation at GITPRO 2013
 
AWS User Group October
AWS User Group OctoberAWS User Group October
AWS User Group October
 
Temporal Snapshot Fact Tables
Temporal Snapshot Fact TablesTemporal Snapshot Fact Tables
Temporal Snapshot Fact Tables
 
Microsoft SQL Server - How to Collaboratively Manage Excel Data
Microsoft SQL Server - How to Collaboratively Manage Excel DataMicrosoft SQL Server - How to Collaboratively Manage Excel Data
Microsoft SQL Server - How to Collaboratively Manage Excel Data
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4
 
Everything You Need to Know About Oracle 12c Indexes
Everything You Need to Know About Oracle 12c IndexesEverything You Need to Know About Oracle 12c Indexes
Everything You Need to Know About Oracle 12c Indexes
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
 
Oracle 12.2 - My Favorite Top 5 New or Improved Features
Oracle 12.2 - My Favorite Top 5 New or Improved FeaturesOracle 12.2 - My Favorite Top 5 New or Improved Features
Oracle 12.2 - My Favorite Top 5 New or Improved Features
 
BigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and futureBigQuery at AppsFlyer - past, present and future
BigQuery at AppsFlyer - past, present and future
 
Microsoft Dynamics NAV data integration
Microsoft Dynamics NAV data integrationMicrosoft Dynamics NAV data integration
Microsoft Dynamics NAV data integration
 
Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0
 
Cloud dwh
Cloud dwhCloud dwh
Cloud dwh
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Biug 20112026 dimensional modeling and mdx best practices

  • 2. Agenda l Dimension Design l SSAS Best Practices l MDX l Inspired by Vincent Rainardi (http://sqlbits.com/ ) and Mosha Pumanski
  • 3. l l l BI- DB l l l DB- BI- l
  • 4. Microstrategy- BI • DWH • • BI SQL SERVER- SYBASE- • • DB P T • DB OEM • • WEB SILVERLIGHT C • • .NET
  • 5.
  • 6. White Papers • Analysis Services 2008 R2 Performance Guide • Analysis Services 2008 Operation Guide • Performance Improvements for MDX in SQL Server 2008 Analysis Services • OLAP Design Best Practices
  • 7. 1 or 2 dimensions a) One Dimension b) Two Dimensions Dim Dim Account Account Fact Fact Table Table customer Dim attributes Customer • We can get the customer • Simplicity, 1 dim attributes without knowing the • Hierarchy from customer account key attribute &account attribute • Disadvantage: can‟t go from • Use when we don‟t have fact account to customer without tables requiring customer grain. going through the fact table - performance
  • 8. 1 or 2 dimensions c) Snowflake Dim Dim Account Customer Fact Table • Dim customer is needed by another fact table • Modular: 2 separate dim tables but we can combine them easily to create a bigger dimension • To get the breakdown of a measure by a customer attribute is a bit more complicated than a) select c. attribute, sum(f.measure1) from fact1 f inner join dim_account a on f.account_key = a.account_key inner join dim_customer c on a.customer_key = c.customer_key group by c. attribute
  • 9. When to Snowflake 1. When the sub dim is used by several dims City-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsured Replaced by Location/GeoKey pointing to DimLocation / DimGeography Advantage: consistent hierarchy, i.e. relationship between City, Country & Region. Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.
  • 10. When to Snowflake 2. When the sub dim is used by both the main dim and the fact table(s) • DimCustomer is used in DimAccount, and is also used in the fact table. • DimManufacturer is used in DimProduct, and is also used in the fact table. • DimProductGroup is used in DimProduct, and is also used in some fact table. The alternative is maintaining two full dimensions (star classic).
  • 11. When to Snowflake 3. To make “base dim” and “detail dim” Insurance classes, account types (banking), product lines, diagnosis, treatment (health care) Policies for marine, aviation & property classes have different attributes. Pull common attributes into 1 dim: DimBasePolicy Put class-specific attributes into DimMarine, DimProperty, DimAviation Ref: Kimball DW Toolkit 2nd edition page 213
  • 12. A dimension with only 1 attribute Should we put the attribute in the fact table? (like DD = Degenerate Dim) Probably, if the grain = fact table, and it‟s short or it‟s a number. Reasons for putting single attribute in its own dim: – Keep fact table slim (4 bytes int not 100 bytes varchar) – When the value changes, we don‟t have to update the BIG fact table – ETL performance – Grain is much lower than fact table – small dim – Yes it‟s only 1 attribute today, but in the future there could be another attribute.
  • 13. Fact Table Primary Key Should we have a PK? Some experts totally disagree Yes, if we need to be able to identify each fact row 1. Need to refer to a fact row from another fact row e.g. chain of events 2. Many identical fact rows and we need to update/delete only one 3. To link the fact table to another fact table Related Trans Header - Detail Uniqueness PK FK PK FK (no RI) PK (not enforced) previous/next transaction
  • 14. Fact Table Primary Key Single or Multi Column? Single Column: Generated Identity Multi Column: Dimension Keys Single-column PK is better than multi-column PK because : 1) A multi-column PK may not be unique. A single-column PK guarantees that the PK is unique, because it is an identity column. 2) A single-column PK is slimmer than a multi-column PK, better query performance. To do a self join in the fact table (e.g. to link the current fact row to the previous fact row), we join on a single integer column.
  • 15. Fact Table Primary Key • Advantage: Prevent duplicate rows, query performance • Disadvantage: loading performance • Indexing the PK: cluster or not? – Cluster the PK if: the PK is an identity column – Don‟t cluster the PK if: the PK is a composite, or when you need the cluster index for query performance (with partitioning) Example of not having a PK If duplicate fact rows are allowed. e.g. retail DW: Store Key, Date Key, Product Key, Customer Key Same customer buying the same milk in the same shop on the same day twice
  • 16. Aggregate Fact Tables What are they? Base Fact Tables • High level aggregation of base fact tables • A “select group by” query on a 2 billion rows fact table can take 30 mins if it joins with two big fact tables, even with indexes in place • So we do this query in advance as part of the DW load and store it as an Aggregate Fact Table 30 mins • The report only takes 1 second to run. Aggregate 1 sec Fact Table Report
  • 17. Rapidly Changing Dimension • Why is it a problem – Large SCD2 dim – Attributes change every day – Slow query when join with large fact tables • What to do – Put into a separate dim, link direct to fact table. – Just store the latest, type 1 attributes (or dual) – Store in the fact table (for small attribute, e.g. indicator) Type2 Type2 Type2 Type2 Type1
  • 18. Very Large Dimension Why is it a problem – SSAS: 4 GB string store limit for dimension – SSAS: dim is “select distinct” on each attribute – long processing time – Difficult to browse high cardinality attribute – Join with fact tables – performance
  • 19. Very Large Dimension What to do – Split into 2 dims, same grain. Always cut vertically. – Remove SCD2, or at least only certain columns. – Most common: separate the attributes with high cardinality/change frequency VLD
  • 20. Real Time Fact Table • Reporting the transaction system in real time • View to union with the normal fact table, or use partitions • Freezing the dims for key lookup, -3 unknown key • Key corrections next day Dims as of Main partition yesterday (up to last night) Unknown keys: -1 null in source -2 not in dim table Real time partition -3 not in dim table as dim was frozen dim (intraday today) to be resolved next batch key
  • 21. Dealing with Currency Rates What for/background/requirements – Report in 3 reporting currencies, using today rates or past – Analyse over time without the impact of currency rates (using fixed currency rates, e.g. 2010 EOY rates) – Had the transactions happened today – Currency rates historical analysis Transaction DW Reporting Currency Transaction Currency Reporting Currency Rates Rates 100 countries (many transaction 1 currency ( 1 reporting 3-4 currencies 40 currencies dates) e.g. GBP GBP, USD, EUR, date) Original
  • 22. Dealing with Currency Rates • A good example can be found here.
  • 23. Dealing with Status What/background – Workflow (policies, contracts, documents) – Bottleneck analysis (no of days between stages) – How many on each stage Status Status Status Status 1 2 4 6 date1 date2 date3 date4 Status Status 3 5
  • 24. Dealing with Status Approaches – Accumulative Snapshot Fact, 1 row per application – SCD2 on DimApp AppKey AppID StsKey StsDate Current 1 1 1 1/3/11 N – App Status fact table 2 1 2 3/3/11 N 3 1 3 7/3/11 Y AppKey StsKey StsDateKey 4 2 1 6/3/11 N 1 1 61 5 2 2 7/3/11 Y 1 2 63 1 3 67 2 1 66 AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind 2 2 67 1 1/3/11 1 3/3/11 1 7/3/11 1 2 6/3/11 1 7/3/11 1 0
  • 25. Referenced Dimensions • Enables using one “master” member • Not Snowflake dimension – For ex. • Dim customers: UK, London, Roman Avramovich. • Dim Stores: UK, London, Friendly Bikes Store – What is the total revenue from Internet customers and stores in London?
  • 26. MDX optimization Methodology • Re-write the MDX code • Add Aggregations • Add pre-calculated Measure Groups (ETL) • Solve the problem using Relational Engine • Use .NET Store Procedures. – Rarely the problem can be solved using better hardware. • Column based Databases
  • 27. • Optimizing MDX – Baselining Query Speeds • Clearing the Analysis Services Caches • Clearing the Operating System Caches using fsutil.exe or SSAS Stored Proc (codeplex) • Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services • Configuring the Analysis Services Query Log
  • 28. • Cell-by-Cell Mode vs. Subspace Mode Almost always, performance obtained by using subspace (or block computation) mode is superior to that obtained by using cell-by-cell (nor naïve) mode.
  • 29. Using Profiler • So far so good
  • 32. Granularity • Single grain – List of GROUP BY attributes in SQL SELECT • Mixed grain – Both Attribute.[All] and Attribute.MEMBERS
  • 33. Granularity All Countries, Countries, Country, All City Cities All City All Products Products
  • 34. Slice • Single member – SQL: Where City = „Redmond‟ – MDX: [City].[Redmond] • Multiple members – SQL: Where City IN („Redmond‟, „Seattle‟) – MDX: { [City].[Redmond], [City].[Seattle] }
  • 35. Slice at granularity SQL SELECT Sum(Sales), City FROM Sales_Table WHERE City IN (‘Redmond’, ‘Seattle’) GROUP BY City MDX SELECT Measures.Sales ON 0 , NON EMPTY {Redmond, Seattle} ON 1 FROM Sales_Cube
  • 36. Slice below granularity SQL SELECT Sum(Sales) FROM Sales_Table WHERE City IN (‘Redmond’, ‘Seattle’) MDX SELECT Measures.Sales ON 0 FROM Sales_Cube WHERE {Redmond, Seattle}
  • 37. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London
  • 38. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London (Seattle, Year.Year.MEMBERS)
  • 39. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London (Seattle, Year.MEMBERS)
  • 40. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London ({Redmond, Seattle, London}, Year.MEMBERS)
  • 41. Examples All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London ({Redmond, Seattle}, {2005, 2006, 2007})
  • 42. Arbitrary shaped subcubes • What is it ? • How can it happen ? • Why is it so bad ? • How to avoid them ?
  • 43. Arbitrary shaped subcubes All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York Lodnon Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS, 2005))
  • 44. Arbitrary shaped subcubes All Years 2005 2006 2007 2008 All Cities Redmon d Seattle SF Denver CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) – (Seattle, 2007)
  • 45. Arbitrary shaped subcubes All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London {(Redmond,2005), (Seattle, 2006), (New York, 2007), (London, 2008)}
  • 46. Arbitrary shaped subcubes All Years 2005 2006 2007 2008 All Cities Redmon d Seattle New York London Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All Years]))
  • 47. Arbitrary shapes • WHERE/Subselect/Aggregate • Unnatural hierarchies • Parent-Child (visual totals) • “Non Leaves” subcube • Conditional logic (IIF, IF, CASE, CoalesceEmpty etc) • NonEmpty, Exists
  • 48. WHERE/Subselect • Severity = „1‟ OR Priority = „1‟ • multiselect – {USA, London}
  • 49. Mixed grain slicer All USA UK New Seattle London Bristol York
  • 50. Mixed grain slicer All USA UK New Seattle London Bristol York All Cities Seattle New York London Bristol All Countries USA UK
  • 52. Leaves vs. Non Leaves All Countries, Countries, Country, All City Cities All City All Product s Product Leaves s
  • 53. Problems with arbitrary shapes • Caching • Partition slices • Indexes • SCOPEs • Matching calculations • Many more (for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)
  • 54. SCOPE SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) ); ...; END SCOPE;
  • 55. Subcube decomposition SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members * [DateTool].[Comparison].Members, { ( [DateTool].[Aggregation].DefaultMember, [DateTool].[Comparison].DefaultMember, ) } ) ); ...; END SCOPE; Scope 2 Scope 3 Scope 1
  • 56. Subcube decomposition SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...; END SCOPE; SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], [DateTool].[Aggregation].DefaultMember, Except( [DateTool].[Comparison].Members, [DateTool].[Comparison].DefaultMember ) ...; END SCOPE; SCOPE ( [Date].[Month of Year].[All Periods], [Date].[Month Name].[All], Except( [DateTool].[Aggregation].Members, [DateTool].[Aggregation].DefaultMember ), [DateTool].[Comparison].DefaultMember ...; END SCOPE;
  • 57. MDX Optimization - Tips • Partial expressions are not cached This = iif(<expensive expression >= 0, 1/<expensive expression>, null); create member currentcube.measures.MyPartialExpression as <expensive expression> , visible=0; this = iif(measures.MyPartialExpression >= 0, 1/ measures.MyPartialExpression, null);
  • 58. Demo
  • 59. SSAS Denali • Coming in the first half of 2012 • SSAS Tabular Mode – Cheaper – Not best of breed – Uses DAX or MDX • Have you started working with it?
  • 60. Mobile BI BI l Smart Phone l BI l Mobile Bi BI Gartner
  • 61. Social BI • Discover New Insights - Analyze the demographic and psychographic profiles of your Facebook application users. • Analyze Facebook Data - Analyze the full spectrum of Facebook data: profiles, interests, check-ins, and more • Instantly Available via Cloud
  • 62. Social BI • Deep Personalization • Enterprise Data Integration
  • 63. Survey • SQL / SSAS Denali • Mobile BI • Social BI