SlideShare une entreprise Scribd logo
1  sur  30
Nauzad Kapadia
Quartz Systems
nauzadk@quartzsystems.com
Key Takeaways
 How to design your cubes efficiently
 How to effectively partition your facts
 How to optimize cube and query processing
 How to ensure that your solution scales well
Agenda
 Cube Design
 Storage and Partitioning
 Aggregations
 Processing
 Scalability
Tips for Designing Dimensions and Facts

 Base fact data sources on views
    Can use query hints
    Can facilitate write-back partitions for measure
    groups containing semi-additive measures
 Avoid Linked Dimensions
 Use the Unknown Member
Tips for designing Attributes
 Avoid unnecessary attributes
 Use AttributeHierarchyEnabled property with
 care
 Use Key Columns appropriately
Query performance
   Dimension storage access is faster
   Produces more optimal execution plans
Aggregation design
   Enables aggregation design algorithm to produce effective
   set of aggregations
Dimension security
   DeniedSet = {State.WA} should deny cities and customers
   in WA—requires attribute relationships
How attribute relationships affect
performance
After adding attribute relationships…
Don’t forget to remove redundant relationships!
All attributes have implicit relationship to key
Examples:
   Customer  City (not redundant)
   Customer  State (redundant)
   Customer  Country (redundant)
   Date  Month (not redundant)
   Date  Quarter (redundant)
   Date  Year (redundant)
User Defined Hierarchies
 Pre-defined navigation paths through
 dimensional space defined by attributes
 Why create user defined hierarchies?
    Guide end users to interesting navigation paths
    Existing client tools are not “attribute-aware”
    Performance
      Optimize navigation path at processing time
      Materialization of hierarchy tree on disk
      Aggregation designer favors user defined hierarchies
Natural Hierarchies
 1:M relation (via attribute relationships)
 between every pair of adjacent levels
 Examples:
    Country-State-City-Customer (natural)
    Country-City (natural)
    State-Customer (natural)
    Age-Gender-Customer (unnatural)
    Year-Quarter-Month (depends on key columns)
      How many quarters and months?
      4 & 12 across all years (unnatural)
      4 & 12 for each year (natural)
Natural Hierarchies
 Performance implications
    Only natural hierarchies are materialized on disk
    during processing
    Unnatural hierarchies are built on the fly during queries
    (and cached in memory)
 Create natural hierarchies where possible
    Using attribute relationships
    Not always appropriate (e.g., Age-Gender)
Benefits of Partitioning
 Partitions can be added, processed,
 deleted independently
    Update to last month’s data does not affect prior
    months’ partitions
    Sliding window scenario easy to implement
            e.g., 24 month window  add June 2006 partition
    and delete June 2004
 Partitions can have different storage settings
    Storage mode (MOLAP, ROLAP, HOLAP)
    Aggregation design
    Alternate disk drive
    Remote server
Benefits of Partitioning
 Partitions can be processed and
 queried in parallel
    Better utilization of server resources
    Reduced data warehouse load times
 Queries are isolated to relevant
 partitions  less data to scan
    SELECT … FROM … WHERE *Time+.*Year+.*2006]
    Queries only 2006 partitions
 Bottom line  partitions enable:
    Manageability
    Performance
    Scalability
Best Practices for Partitioning
 No more than 20M rows per partition
 Specify partition / data slice
    Optional (but still recommended) for MOLAP: server auto-
    detects the slice and validates against user-specified slice (if
    any)
    Should reflect, as closely as possible, the data in the
    partition
    Must be specified for ROLAP
 Remote partitions for scale out
Best Practices for Designing Partitions

  Design from the start
  Partition boundary and intervals
  Determine what storage model and aggregation
  level fits best
     Frequently queried  MOLAP with lots of aggs
     Periodically queried  MOLAP with less or no aggs
     Real-time ROLAP with no aggs
  Pick efficient data types in fact table
What is Proactive Caching
 Benefits of Proactive caching
 Considerations for using proactive caching
    Use correct latency and silence settings
    Useful in a transaction-oriented system in which
    changes are unpredictable
What are Aggregations
 Benefits of Aggregations
 Aggregating data in partitions
Aggregation Design Algorithm
 Evaluate cost/benefit of aggregations
     Relative to other aggregations
     Designed in “waves” from top of pyramid
     Cost is related to aggregation size
     Benefit is related to “distance”
     from another aggregation
 Storage design wizard
     Assumes all combinations of
     attributes are equally likely
     Can be done before you know                  Fact Table
     the query load
 Usage based optimization wizard
     Assumes query pattern resembles your selection from the query log
     Representative history is needed
Aggregation Design Algorithm
 Examines the AggregationUsage property to
 build list of candidate attributes
    Full: Every agg must include the attribute
    None: No agg can include the attribute
    Unrestricted: No restrictions on the algorithm
    Default: Unrestricted if attribute is All, key or
    belongs to a natural hierarchy, None otherwise
 Builds the attribute lattice
How to Monitor Aggregation Usage?

                                 

                     
                           Hit


                                 


                                     

                       
                Profiler
How to Monitor Aggregation Usage?

                            
                            


                   Miss
                             
                            




                Profiler
Best Practices for Aggregations
 Define all possible attribute relationships
 Set accurate attribute member counts and fact table
 counts
 Set AggregationUsage to guide agg designer
    Set rarely queried attributes to None
    Set commonly queried attributes to Unrestricted
 Do not build too many aggregations
    In the 100s, not 1000s!
 Do not build aggregations larger than 30% of fact table
 size (aggregation design algorithm doesn’t)
Best Practices for Aggregations
 Aggregation design cycle
    Use Storage Design Wizard (~20% perf gain)
    to design initial set of aggregations
    Enable query log and run pilot workload (beta
    test with limited set of users)
    Use Usage Based Optimization (UBO) Wizard
    to refine aggregations
    Use larger perf gain (70-80%)
    Reprocess partitions for new aggregations to
    take effect
    Periodically use UBO to refine aggregations
Processing Options
 ProcessFull
    Fully processes the object from scratch
 ProcessClear
    Clears all data—brings object to unprocessed state
 ProcessData
    Reads and stores fact data only (no aggs or indexes)
 ProcessIndexes
    Builds aggs and indexes
 ProcessUpdate
    Incremental update of dimension (preserves fact data)
 ProcessAdd
    Adds new rows to dimension or partition
Best Practices for Processing
 Use XMLA scripts in large production systems
    Automation (e.g., using ascmd)
    Finer control over parallelism, transactions,
    memory usage, etc.
    Don’t just process the entire database!
 Dimension processing
    Performance is limited by attribute relationships
    Key attribute is a big bottleneck
    Define all possible attribute relationships
    Eliminate redundant relationships—especially on key!
    Bind Dimension data sources to views instead of tables or
    named queries
Best Practices for Processing
 Partition processing
    Split ProcessFull into ProcessData + ProcessIndexes for large
    partitions—consumes less memory
    Monitor aggregation processing spilling to disk (perfmon
    counters for temp file usage)
       Add memory, turn on /3GB, move to x64/ia64
    Fully process partitions periodically
       Achieves better compression over repeated incremental processing
 Data sources
    Avoid using .NET data sources—OLEDB is order of
    magnitude faster for processing
Improving multi-user performance
 Increase Query parallelism
 Block long running queries
 Use a load balancing cluster
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
 not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
                                                                           IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Contenu connexe

Similaire à How to Design Cubes Efficiently and Optimize Performance

Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...
Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...
Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...Amazon Web Services
 
Level 3 Certification: Setting up Sumo Logic - Oct 2018
Level 3 Certification: Setting up Sumo Logic - Oct  2018Level 3 Certification: Setting up Sumo Logic - Oct  2018
Level 3 Certification: Setting up Sumo Logic - Oct 2018Sumo Logic
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache HiveMurtaza Doctor
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsFrequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsDatabricks
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)John Pape
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElasticsearch
 
Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Mahesh Vallampati
 
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsDATAVERSITY
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Modelspepeborja
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mydbops
 
Maximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereMaximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereSAP Technology
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima Pratima Pandey
 
BI Environment Technical Analysis
BI Environment Technical AnalysisBI Environment Technical Analysis
BI Environment Technical AnalysisRyan Casey
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPHBaseCon
 
Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017Sumo Logic
 
High performance coding practices code project
High performance coding practices code projectHigh performance coding practices code project
High performance coding practices code projectPruthvi B Patil
 

Similaire à How to Design Cubes Efficiently and Optimize Performance (20)

Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...
Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...
Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...
 
Level 3 Certification: Setting up Sumo Logic - Oct 2018
Level 3 Certification: Setting up Sumo Logic - Oct  2018Level 3 Certification: Setting up Sumo Logic - Oct  2018
Level 3 Certification: Setting up Sumo Logic - Oct 2018
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsFrequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on Embeddings
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and action
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2
 
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Models
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
 
Maximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereMaximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL Anywhere
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
 
BI Environment Technical Analysis
BI Environment Technical AnalysisBI Environment Technical Analysis
BI Environment Technical Analysis
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
 
Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017
 
High performance coding practices code project
High performance coding practices code projectHigh performance coding practices code project
High performance coding practices code project
 

Plus de rsnarayanan

Kevin Ms Web Platform
Kevin Ms Web PlatformKevin Ms Web Platform
Kevin Ms Web Platformrsnarayanan
 
Harish Understanding Aspnet
Harish Understanding AspnetHarish Understanding Aspnet
Harish Understanding Aspnetrsnarayanan
 
Harish Aspnet Dynamic Data
Harish Aspnet Dynamic DataHarish Aspnet Dynamic Data
Harish Aspnet Dynamic Datarsnarayanan
 
Harish Aspnet Deployment
Harish Aspnet DeploymentHarish Aspnet Deployment
Harish Aspnet Deploymentrsnarayanan
 
Whats New In Sl3
Whats New In Sl3Whats New In Sl3
Whats New In Sl3rsnarayanan
 
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...rsnarayanan
 
Advanced Silverlight
Advanced SilverlightAdvanced Silverlight
Advanced Silverlightrsnarayanan
 
Occasionally Connected Systems
Occasionally Connected SystemsOccasionally Connected Systems
Occasionally Connected Systemsrsnarayanan
 
Developing Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And ServicesDeveloping Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And Servicesrsnarayanan
 
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...rsnarayanan
 
J Query The Write Less Do More Javascript Library
J Query   The Write Less Do More Javascript LibraryJ Query   The Write Less Do More Javascript Library
J Query The Write Less Do More Javascript Libraryrsnarayanan
 
Ms Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My SqlMs Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My Sqlrsnarayanan
 
Windows 7 For Developers
Windows 7 For DevelopersWindows 7 For Developers
Windows 7 For Developersrsnarayanan
 
What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1rsnarayanan
 
Ux For Developers
Ux For DevelopersUx For Developers
Ux For Developersrsnarayanan
 
A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8rsnarayanan
 

Plus de rsnarayanan (20)

Walther Aspnet4
Walther Aspnet4Walther Aspnet4
Walther Aspnet4
 
Walther Ajax4
Walther Ajax4Walther Ajax4
Walther Ajax4
 
Kevin Ms Web Platform
Kevin Ms Web PlatformKevin Ms Web Platform
Kevin Ms Web Platform
 
Harish Understanding Aspnet
Harish Understanding AspnetHarish Understanding Aspnet
Harish Understanding Aspnet
 
Walther Mvc
Walther MvcWalther Mvc
Walther Mvc
 
Harish Aspnet Dynamic Data
Harish Aspnet Dynamic DataHarish Aspnet Dynamic Data
Harish Aspnet Dynamic Data
 
Harish Aspnet Deployment
Harish Aspnet DeploymentHarish Aspnet Deployment
Harish Aspnet Deployment
 
Whats New In Sl3
Whats New In Sl3Whats New In Sl3
Whats New In Sl3
 
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
 
Advanced Silverlight
Advanced SilverlightAdvanced Silverlight
Advanced Silverlight
 
Netcf Gc
Netcf GcNetcf Gc
Netcf Gc
 
Occasionally Connected Systems
Occasionally Connected SystemsOccasionally Connected Systems
Occasionally Connected Systems
 
Developing Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And ServicesDeveloping Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And Services
 
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
 
J Query The Write Less Do More Javascript Library
J Query   The Write Less Do More Javascript LibraryJ Query   The Write Less Do More Javascript Library
J Query The Write Less Do More Javascript Library
 
Ms Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My SqlMs Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My Sql
 
Windows 7 For Developers
Windows 7 For DevelopersWindows 7 For Developers
Windows 7 For Developers
 
What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1
 
Ux For Developers
Ux For DevelopersUx For Developers
Ux For Developers
 
A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8
 

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

How to Design Cubes Efficiently and Optimize Performance

  • 2. Key Takeaways How to design your cubes efficiently How to effectively partition your facts How to optimize cube and query processing How to ensure that your solution scales well
  • 3. Agenda Cube Design Storage and Partitioning Aggregations Processing Scalability
  • 4. Tips for Designing Dimensions and Facts Base fact data sources on views Can use query hints Can facilitate write-back partitions for measure groups containing semi-additive measures Avoid Linked Dimensions Use the Unknown Member
  • 5. Tips for designing Attributes Avoid unnecessary attributes Use AttributeHierarchyEnabled property with care Use Key Columns appropriately
  • 6. Query performance Dimension storage access is faster Produces more optimal execution plans Aggregation design Enables aggregation design algorithm to produce effective set of aggregations Dimension security DeniedSet = {State.WA} should deny cities and customers in WA—requires attribute relationships
  • 7. How attribute relationships affect performance
  • 8. After adding attribute relationships… Don’t forget to remove redundant relationships! All attributes have implicit relationship to key Examples: Customer  City (not redundant) Customer  State (redundant) Customer  Country (redundant) Date  Month (not redundant) Date  Quarter (redundant) Date  Year (redundant)
  • 9. User Defined Hierarchies Pre-defined navigation paths through dimensional space defined by attributes Why create user defined hierarchies? Guide end users to interesting navigation paths Existing client tools are not “attribute-aware” Performance Optimize navigation path at processing time Materialization of hierarchy tree on disk Aggregation designer favors user defined hierarchies
  • 10. Natural Hierarchies 1:M relation (via attribute relationships) between every pair of adjacent levels Examples: Country-State-City-Customer (natural) Country-City (natural) State-Customer (natural) Age-Gender-Customer (unnatural) Year-Quarter-Month (depends on key columns) How many quarters and months? 4 & 12 across all years (unnatural) 4 & 12 for each year (natural)
  • 11. Natural Hierarchies Performance implications Only natural hierarchies are materialized on disk during processing Unnatural hierarchies are built on the fly during queries (and cached in memory) Create natural hierarchies where possible Using attribute relationships Not always appropriate (e.g., Age-Gender)
  • 12. Benefits of Partitioning Partitions can be added, processed, deleted independently Update to last month’s data does not affect prior months’ partitions Sliding window scenario easy to implement e.g., 24 month window  add June 2006 partition and delete June 2004 Partitions can have different storage settings Storage mode (MOLAP, ROLAP, HOLAP) Aggregation design Alternate disk drive Remote server
  • 13. Benefits of Partitioning Partitions can be processed and queried in parallel Better utilization of server resources Reduced data warehouse load times Queries are isolated to relevant partitions  less data to scan SELECT … FROM … WHERE *Time+.*Year+.*2006] Queries only 2006 partitions Bottom line  partitions enable: Manageability Performance Scalability
  • 14. Best Practices for Partitioning No more than 20M rows per partition Specify partition / data slice Optional (but still recommended) for MOLAP: server auto- detects the slice and validates against user-specified slice (if any) Should reflect, as closely as possible, the data in the partition Must be specified for ROLAP Remote partitions for scale out
  • 15. Best Practices for Designing Partitions Design from the start Partition boundary and intervals Determine what storage model and aggregation level fits best Frequently queried  MOLAP with lots of aggs Periodically queried  MOLAP with less or no aggs Real-time ROLAP with no aggs Pick efficient data types in fact table
  • 16. What is Proactive Caching Benefits of Proactive caching Considerations for using proactive caching Use correct latency and silence settings Useful in a transaction-oriented system in which changes are unpredictable
  • 17. What are Aggregations Benefits of Aggregations Aggregating data in partitions
  • 18. Aggregation Design Algorithm Evaluate cost/benefit of aggregations Relative to other aggregations Designed in “waves” from top of pyramid Cost is related to aggregation size Benefit is related to “distance” from another aggregation Storage design wizard Assumes all combinations of attributes are equally likely Can be done before you know Fact Table the query load Usage based optimization wizard Assumes query pattern resembles your selection from the query log Representative history is needed
  • 19. Aggregation Design Algorithm Examines the AggregationUsage property to build list of candidate attributes Full: Every agg must include the attribute None: No agg can include the attribute Unrestricted: No restrictions on the algorithm Default: Unrestricted if attribute is All, key or belongs to a natural hierarchy, None otherwise Builds the attribute lattice
  • 20. How to Monitor Aggregation Usage?   Hit    Profiler
  • 21. How to Monitor Aggregation Usage?   Miss   Profiler
  • 22. Best Practices for Aggregations Define all possible attribute relationships Set accurate attribute member counts and fact table counts Set AggregationUsage to guide agg designer Set rarely queried attributes to None Set commonly queried attributes to Unrestricted Do not build too many aggregations In the 100s, not 1000s! Do not build aggregations larger than 30% of fact table size (aggregation design algorithm doesn’t)
  • 23. Best Practices for Aggregations Aggregation design cycle Use Storage Design Wizard (~20% perf gain) to design initial set of aggregations Enable query log and run pilot workload (beta test with limited set of users) Use Usage Based Optimization (UBO) Wizard to refine aggregations Use larger perf gain (70-80%) Reprocess partitions for new aggregations to take effect Periodically use UBO to refine aggregations
  • 24. Processing Options ProcessFull Fully processes the object from scratch ProcessClear Clears all data—brings object to unprocessed state ProcessData Reads and stores fact data only (no aggs or indexes) ProcessIndexes Builds aggs and indexes ProcessUpdate Incremental update of dimension (preserves fact data) ProcessAdd Adds new rows to dimension or partition
  • 25. Best Practices for Processing Use XMLA scripts in large production systems Automation (e.g., using ascmd) Finer control over parallelism, transactions, memory usage, etc. Don’t just process the entire database! Dimension processing Performance is limited by attribute relationships Key attribute is a big bottleneck Define all possible attribute relationships Eliminate redundant relationships—especially on key! Bind Dimension data sources to views instead of tables or named queries
  • 26. Best Practices for Processing Partition processing Split ProcessFull into ProcessData + ProcessIndexes for large partitions—consumes less memory Monitor aggregation processing spilling to disk (perfmon counters for temp file usage) Add memory, turn on /3GB, move to x64/ia64 Fully process partitions periodically Achieves better compression over repeated incremental processing Data sources Avoid using .NET data sources—OLEDB is order of magnitude faster for processing
  • 27. Improving multi-user performance Increase Query parallelism Block long running queries Use a load balancing cluster
  • 28.
  • 29.
  • 30. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.