SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Best Practices for Implementing
Enterprise BI Solution
Teo Lachev, Prologika
teo.lachev@prologika.com
Why BI projects fail
• 70-80% corporate BI projects fail (Gartner http://bit.ly/YRi028)
• Top reasons
 Poor communication between IT and Business
 Failure to ask the right questions
 Other reasons from my experience






Business doesn’t know about BI
Inexperience and lack of technical knowledge
“When all you have is a hammer…”
Data inaccuracy
Performance degradation with large datasets
Agenda
• Share best practices and lessons learned
 BI architecture
 Data warehouse design
 ETL
 Semantic layer
 Presentation layer

• Assumptions

 Experience with Microsoft BI and database design

• Microsoft case study

 Records Management Firm Saves $1 Million
http://bit.ly/15exUpM
 Most performance practices around biggish data
Ground rules
• Ask questions
• Turn cellphones off
• Tweet away (@tlachev #BestBI)
About me
• Consultant, author, and mentor with focus on Microsoft BI
• Owner of Prologika – BI consulting and training
company based in Atlanta (www.prologika.com)
• Microsoft SQL Server MVP for 10 years
• Leader of Atlanta BI group (atlantabi.sqlpass.org)
Used phased approach
• Identify critical success factors
• Break project into phases
• Phase 1
• Most important
• Scope it relatively small
• Sets foundation
• Business process to model
• First conformant dimensions
• A few fact tables
Use classic BI solution architecture
Transactional reporting

Dimension
Tables

Fact
Tables
ETL
Integration Services

Multidimensional
OR

Historical &
trend reporting

Tabular

Data Sources

Data is extracted from
data sources,
transformed, and
loaded into DW

Data Warehouse

Data is stored in
dimensional schema
consisting of dimension
and fact tables

Semantic Layer

Great performance
Business calculations
Single version of truth
Client support
Security
Isolation

Ad-hoc reports
Operational reports
Dashboards
Third party tools

Presentation Layer
Standard reporting
Ad-hoc reporting
Dashboards
Keep it simple!

Europe

NA

ASIA

Europe

Teo’s insight: Remove complexity
until it cannot be simplified
anymore

Asia
NA
Consider active-active clustering
Cluster

Database
server

SSAS
server
Check your environment
• I/O
 BACKUP DATABASE [ContosoRetailDW] TO DISK='NUL';

 Or use tools such as IOMeter or CrystalMark
 I/O should be above 500 MB/sec

• Network speed

 select * from <some fact table>
(consider discarding query results)
 Num rows/sec = row count/execution time (sec)
 Aim for > 100K rows/sec

• Virtualization

 Disk pass-through enabled
 Dedicated resources
Agenda
BI architecture
Data warehouse design
ETL
Semantic Layer
Presentation layer
Star schema is your best friend
• Your dimensional model is foundation
• Design it with end user in mind
• Avoid normalization
• Avoid summarized tables
• Use smartkey (YYYYMMDD) or
[date] keys for Date tables
• Use referential integrity

Teo’s insight: The fact that Tabular supports more
flexible relationships doesn’t mean that star
schema is obsolete - just the opposite.
Optimize physical storage
• Set database recovery to Simple
• Index considerations
 Cluster key on DateKey column in fact tables
 Other indexes as needed

 File groups
 File group per each large table
 Files on different drives
 Avoid using Primary file group
Use partitioning
• Partition large tables (above 50 Gb)
 Partition switching
 Better manageability
 Partition elimination when querying data

Good read: “Partitioned Table and Index Strategies Using SQL
Server 2008” whitepaper by Ron Talmage
Use compression
• Consider page compression above 1 TB
• 50-80% saving in disk space
• To estimate storage savings:

 Use SSMS Data Compression Wizard
 sp_estimate_data_compression_savings stored procedure

EXEC sp_estimate_data_compression_savings 'dbo', 'FactResellerSales', 1, NULL, 'PAGE'

Good read: “Data Compression: Strategy, Capacity Planning and
Best Practices” whitepaper by Sanjay Mishra
Agenda
BI architecture
Data warehouse design
ETL
Semantic Layer
Presentation layer
Consider merge design pattern
• More efficient than SSIS transforms
• More flexible than SSIS lookups
• Easier to maintain
stored procedure with T-SQL
merge statement

LOB

Staging
Database
Files

select a,b
from st1 inner join
st2 where...

incremental
extraction
Data Sources

Staging Database

work table

dimension or
fact table

Data Warehouse
Consider Operational Data Store
• ODS advantages
• Offloads transactional data
• Maintains data history
• Smarter “staging” database
Start_Date

End_Date

Store

Product

1/1/2010

5/1/2010

Atlanta

Mountain Bike 1

5/2/2010

3/8/2012

Atlanta

Mountain Bike 2

3/9/2012

12/31/9999

Norcross

Mountain Bike 2

…
Index considerations
• Eliminate read locks
• Indexes: ALLOW_PAGE_LOCKS = OFF and ALLOW_ROW_LOCKS = OFF

• View hints WITH (NOLOCK) or

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

• Drop non-clustered indexes and constraints
 With massive updates (10% or more)
 Enables non-logged load

 Consider COLUMNSTORE indexes when queries
aggregate data
Take advantage of partitioning
• Consider partition switching
 Fast incremental load
 Parallel partition load
 Faster updates

• Use Manage Partition Wizard to generate
 Switch in/out scripts
 Staging table
 Sliding window
For parallel partition load, change the table lock escalation
ALTER TABLE … SET ( LOCK_ESCALATION = AUTO)
To find the table lock escalation:
SELECT lock_escalation_desc FROM sys.tables WHERE name = ‘<table name>'
Optimize big joins
• Set OPTION (HASH JOIN or LOOP JOIN)
http://bit.ly/108HuHR
Agenda
BI architecture
Data warehouse design
ETL
Semantic Layer
Presentation layer
BI Semantic Layer
Third-Party BI
Applications

Reporting Services
Reports

PowerPivot
Applications

Excel
Workbooks

MDX

SharePoint
Dashboards &
Scorecards

DAX

Multidimensional

Tabular

MDX

DAX

MOLAP

ROLAP

xVelocity
(VertiPaq)

DirectQuery

Files

OData
Feeds
Choose semantic layer wisely
• Decision checkpoints
• Data volumes
• Complexity

• Scenarios for considering Multidimensional
 Data warehousing
 Large data volumes
 Complex models

• Scenarios for considering Tabular
 Promoting PowerPivot models to organizational models
 Rapid development for simple models
 Transactional reporting? (be careful)
Optimize Multidimensional
• Don’t be afraid of biggish data
• Avoid complex scope assignments
• Centralize business logic
• Consider fast storage
• Consider single cube
Tabular Considerations
• Improve your design experience http://bit.ly/106iKjt
• Small dataset during dev
• Disable automatic calculation

• Remove unnecessary columns
• Be careful about transactional reporting
• No cross-fact table support
• Performance degradation with
big data - http://bit.ly/136h60U

Dim Date

Fact Orders

Fact Receipts
Partition when makes sense
• Partition large measure groups (above 100 million)
 Mostly management technique
 Useful for incremental processing
 Partition slice: ~50 million

• Automate with partition generator
http://bit.ly/partitiongenerator
• Use SQL views to wrap tables
When to use self-service BI?
• Know your end users
 Power users
 Financial analysts

• When self-service BI make sense?
 Waiting for organizational BI to happen
 Ideate and promote lateral thinking

 Consider 80/20 rule
 80% organizational BI
 20% self-service BI
Agenda
BI architecture
Data warehouse design
ETL
Analytical layer
Presentation layer
Dashboards
“A dashboard is a visual display of the most important information needed to achieve one or more objectives;
consolidated and arranged on a single screen so the information can be monitored at a glance.”
Stephen Few, “Information Dashboard Design” book

From “Information Dashboard Design” book
PerformancePoint in real life
Power View in real life
Excel Services in SharePoint 2013
Consider your dashboard options
Technology

Pros

Cons

PerformancePoint

Designed for scorecards and KPIs
Supporting views
(reports, Excel spreadsheets, PP reports)
Decomposition tree
Customizable

BI pro-oriented
No “wow” effect

Power View

Highly interactive
Easy to implement
End user-oriented

No extensibility
No mobile support yet (but promised)
Currently requires Silverlight
(MS working on HTML5)

Excel Services

Use Excel pivot reports
Easy to implement
Reports updatable in SP 2013

Reports not updatable in SP 2010
No “wow” effect

Reporting Services reports

Highly customizable
Rich visualizations

Require report experience
Reports not updatable
Drillthrough requires new reports
Summary

• I shared proven practices and tips from past experience
• Keep things simple but have sound design
• How to contact me:
•
•
•
•

Email: teo.lachev@prologika.com
Web: www.prologika.com
Blog: http://prologika.com/cs/blogs/
Newsletter: http://prologika.com/Newsroom/News.aspx

Contenu connexe

En vedette

Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biSatya Shyam K Jayanty
 
backbase-cxp-datasheet
backbase-cxp-datasheetbackbase-cxp-datasheet
backbase-cxp-datasheetMykola Bova
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligences.poles
 
GRUPO MARZO PROFESSIONAL SERVICES
GRUPO MARZO PROFESSIONAL SERVICESGRUPO MARZO PROFESSIONAL SERVICES
GRUPO MARZO PROFESSIONAL SERVICESLeopoldo Vizoso
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsJames Serra
 
Inteligancia de negocios
Inteligancia de negociosInteligancia de negocios
Inteligancia de negociosEdgar Barrios
 
Business intelligence architecture
Business intelligence architectureBusiness intelligence architecture
Business intelligence architectureSlava Kokaev
 
Open Source Business Intelligence 2013 (spanish)
Open Source Business Intelligence 2013 (spanish)Open Source Business Intelligence 2013 (spanish)
Open Source Business Intelligence 2013 (spanish)Stratebi
 
Agile BI - SYBIS
Agile BI - SYBISAgile BI - SYBIS
Agile BI - SYBISIman Ef
 
Asian architecture Paper Presentation
Asian architecture Paper PresentationAsian architecture Paper Presentation
Asian architecture Paper PresentationIvy Yee
 
Learn How to Use Microsoft Power BI for Office 365 to Analyze Salesforce Data
Learn How to Use Microsoft Power BI for Office 365 to Analyze Salesforce DataLearn How to Use Microsoft Power BI for Office 365 to Analyze Salesforce Data
Learn How to Use Microsoft Power BI for Office 365 to Analyze Salesforce DataNetwoven Inc.
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkSlava Kokaev
 
The Future of Omni-Channel Banking
The Future of Omni-Channel BankingThe Future of Omni-Channel Banking
The Future of Omni-Channel BankingBackbase
 
Exploring Architectures for Fast and Easy Development of Immersive Learning S...
Exploring Architectures for Fast and Easy Development of Immersive Learning S...Exploring Architectures for Fast and Easy Development of Immersive Learning S...
Exploring Architectures for Fast and Easy Development of Immersive Learning S...Rob Nadolski
 
SAP BI Implementation
SAP BI ImplementationSAP BI Implementation
SAP BI ImplementationRahul Bindroo
 

En vedette (16)

Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
 
Jak znaleźć filmy TED - instrukcja "krok po kroku" / Noemi Gryczko
Jak znaleźć filmy TED - instrukcja "krok po kroku" / Noemi GryczkoJak znaleźć filmy TED - instrukcja "krok po kroku" / Noemi Gryczko
Jak znaleźć filmy TED - instrukcja "krok po kroku" / Noemi Gryczko
 
backbase-cxp-datasheet
backbase-cxp-datasheetbackbase-cxp-datasheet
backbase-cxp-datasheet
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
GRUPO MARZO PROFESSIONAL SERVICES
GRUPO MARZO PROFESSIONAL SERVICESGRUPO MARZO PROFESSIONAL SERVICES
GRUPO MARZO PROFESSIONAL SERVICES
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI Solutions
 
Inteligancia de negocios
Inteligancia de negociosInteligancia de negocios
Inteligancia de negocios
 
Business intelligence architecture
Business intelligence architectureBusiness intelligence architecture
Business intelligence architecture
 
Open Source Business Intelligence 2013 (spanish)
Open Source Business Intelligence 2013 (spanish)Open Source Business Intelligence 2013 (spanish)
Open Source Business Intelligence 2013 (spanish)
 
Agile BI - SYBIS
Agile BI - SYBISAgile BI - SYBIS
Agile BI - SYBIS
 
Asian architecture Paper Presentation
Asian architecture Paper PresentationAsian architecture Paper Presentation
Asian architecture Paper Presentation
 
Learn How to Use Microsoft Power BI for Office 365 to Analyze Salesforce Data
Learn How to Use Microsoft Power BI for Office 365 to Analyze Salesforce DataLearn How to Use Microsoft Power BI for Office 365 to Analyze Salesforce Data
Learn How to Use Microsoft Power BI for Office 365 to Analyze Salesforce Data
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
 
The Future of Omni-Channel Banking
The Future of Omni-Channel BankingThe Future of Omni-Channel Banking
The Future of Omni-Channel Banking
 
Exploring Architectures for Fast and Easy Development of Immersive Learning S...
Exploring Architectures for Fast and Easy Development of Immersive Learning S...Exploring Architectures for Fast and Easy Development of Immersive Learning S...
Exploring Architectures for Fast and Easy Development of Immersive Learning S...
 
SAP BI Implementation
SAP BI ImplementationSAP BI Implementation
SAP BI Implementation
 

Dernier

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Dernier (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Best Practices for Implementing Enterprise BI Solution

  • 1. Best Practices for Implementing Enterprise BI Solution Teo Lachev, Prologika teo.lachev@prologika.com
  • 2. Why BI projects fail • 70-80% corporate BI projects fail (Gartner http://bit.ly/YRi028) • Top reasons  Poor communication between IT and Business  Failure to ask the right questions  Other reasons from my experience      Business doesn’t know about BI Inexperience and lack of technical knowledge “When all you have is a hammer…” Data inaccuracy Performance degradation with large datasets
  • 3. Agenda • Share best practices and lessons learned  BI architecture  Data warehouse design  ETL  Semantic layer  Presentation layer • Assumptions  Experience with Microsoft BI and database design • Microsoft case study  Records Management Firm Saves $1 Million http://bit.ly/15exUpM  Most performance practices around biggish data
  • 4. Ground rules • Ask questions • Turn cellphones off • Tweet away (@tlachev #BestBI)
  • 5. About me • Consultant, author, and mentor with focus on Microsoft BI • Owner of Prologika – BI consulting and training company based in Atlanta (www.prologika.com) • Microsoft SQL Server MVP for 10 years • Leader of Atlanta BI group (atlantabi.sqlpass.org)
  • 6. Used phased approach • Identify critical success factors • Break project into phases • Phase 1 • Most important • Scope it relatively small • Sets foundation • Business process to model • First conformant dimensions • A few fact tables
  • 7. Use classic BI solution architecture Transactional reporting Dimension Tables Fact Tables ETL Integration Services Multidimensional OR Historical & trend reporting Tabular Data Sources Data is extracted from data sources, transformed, and loaded into DW Data Warehouse Data is stored in dimensional schema consisting of dimension and fact tables Semantic Layer Great performance Business calculations Single version of truth Client support Security Isolation Ad-hoc reports Operational reports Dashboards Third party tools Presentation Layer Standard reporting Ad-hoc reporting Dashboards
  • 8. Keep it simple! Europe NA ASIA Europe Teo’s insight: Remove complexity until it cannot be simplified anymore Asia NA
  • 10. Check your environment • I/O  BACKUP DATABASE [ContosoRetailDW] TO DISK='NUL';  Or use tools such as IOMeter or CrystalMark  I/O should be above 500 MB/sec • Network speed  select * from <some fact table> (consider discarding query results)  Num rows/sec = row count/execution time (sec)  Aim for > 100K rows/sec • Virtualization  Disk pass-through enabled  Dedicated resources
  • 11. Agenda BI architecture Data warehouse design ETL Semantic Layer Presentation layer
  • 12. Star schema is your best friend • Your dimensional model is foundation • Design it with end user in mind • Avoid normalization • Avoid summarized tables • Use smartkey (YYYYMMDD) or [date] keys for Date tables • Use referential integrity Teo’s insight: The fact that Tabular supports more flexible relationships doesn’t mean that star schema is obsolete - just the opposite.
  • 13. Optimize physical storage • Set database recovery to Simple • Index considerations  Cluster key on DateKey column in fact tables  Other indexes as needed  File groups  File group per each large table  Files on different drives  Avoid using Primary file group
  • 14. Use partitioning • Partition large tables (above 50 Gb)  Partition switching  Better manageability  Partition elimination when querying data Good read: “Partitioned Table and Index Strategies Using SQL Server 2008” whitepaper by Ron Talmage
  • 15. Use compression • Consider page compression above 1 TB • 50-80% saving in disk space • To estimate storage savings:  Use SSMS Data Compression Wizard  sp_estimate_data_compression_savings stored procedure EXEC sp_estimate_data_compression_savings 'dbo', 'FactResellerSales', 1, NULL, 'PAGE' Good read: “Data Compression: Strategy, Capacity Planning and Best Practices” whitepaper by Sanjay Mishra
  • 16. Agenda BI architecture Data warehouse design ETL Semantic Layer Presentation layer
  • 17. Consider merge design pattern • More efficient than SSIS transforms • More flexible than SSIS lookups • Easier to maintain stored procedure with T-SQL merge statement LOB Staging Database Files select a,b from st1 inner join st2 where... incremental extraction Data Sources Staging Database work table dimension or fact table Data Warehouse
  • 18. Consider Operational Data Store • ODS advantages • Offloads transactional data • Maintains data history • Smarter “staging” database Start_Date End_Date Store Product 1/1/2010 5/1/2010 Atlanta Mountain Bike 1 5/2/2010 3/8/2012 Atlanta Mountain Bike 2 3/9/2012 12/31/9999 Norcross Mountain Bike 2 …
  • 19. Index considerations • Eliminate read locks • Indexes: ALLOW_PAGE_LOCKS = OFF and ALLOW_ROW_LOCKS = OFF • View hints WITH (NOLOCK) or SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED • Drop non-clustered indexes and constraints  With massive updates (10% or more)  Enables non-logged load  Consider COLUMNSTORE indexes when queries aggregate data
  • 20. Take advantage of partitioning • Consider partition switching  Fast incremental load  Parallel partition load  Faster updates • Use Manage Partition Wizard to generate  Switch in/out scripts  Staging table  Sliding window For parallel partition load, change the table lock escalation ALTER TABLE … SET ( LOCK_ESCALATION = AUTO) To find the table lock escalation: SELECT lock_escalation_desc FROM sys.tables WHERE name = ‘<table name>'
  • 21. Optimize big joins • Set OPTION (HASH JOIN or LOOP JOIN) http://bit.ly/108HuHR
  • 22. Agenda BI architecture Data warehouse design ETL Semantic Layer Presentation layer
  • 23. BI Semantic Layer Third-Party BI Applications Reporting Services Reports PowerPivot Applications Excel Workbooks MDX SharePoint Dashboards & Scorecards DAX Multidimensional Tabular MDX DAX MOLAP ROLAP xVelocity (VertiPaq) DirectQuery Files OData Feeds
  • 24. Choose semantic layer wisely • Decision checkpoints • Data volumes • Complexity • Scenarios for considering Multidimensional  Data warehousing  Large data volumes  Complex models • Scenarios for considering Tabular  Promoting PowerPivot models to organizational models  Rapid development for simple models  Transactional reporting? (be careful)
  • 25. Optimize Multidimensional • Don’t be afraid of biggish data • Avoid complex scope assignments • Centralize business logic • Consider fast storage • Consider single cube
  • 26. Tabular Considerations • Improve your design experience http://bit.ly/106iKjt • Small dataset during dev • Disable automatic calculation • Remove unnecessary columns • Be careful about transactional reporting • No cross-fact table support • Performance degradation with big data - http://bit.ly/136h60U Dim Date Fact Orders Fact Receipts
  • 27. Partition when makes sense • Partition large measure groups (above 100 million)  Mostly management technique  Useful for incremental processing  Partition slice: ~50 million • Automate with partition generator http://bit.ly/partitiongenerator • Use SQL views to wrap tables
  • 28. When to use self-service BI? • Know your end users  Power users  Financial analysts • When self-service BI make sense?  Waiting for organizational BI to happen  Ideate and promote lateral thinking  Consider 80/20 rule  80% organizational BI  20% self-service BI
  • 29. Agenda BI architecture Data warehouse design ETL Analytical layer Presentation layer
  • 30. Dashboards “A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.” Stephen Few, “Information Dashboard Design” book From “Information Dashboard Design” book
  • 32. Power View in real life
  • 33. Excel Services in SharePoint 2013
  • 34. Consider your dashboard options Technology Pros Cons PerformancePoint Designed for scorecards and KPIs Supporting views (reports, Excel spreadsheets, PP reports) Decomposition tree Customizable BI pro-oriented No “wow” effect Power View Highly interactive Easy to implement End user-oriented No extensibility No mobile support yet (but promised) Currently requires Silverlight (MS working on HTML5) Excel Services Use Excel pivot reports Easy to implement Reports updatable in SP 2013 Reports not updatable in SP 2010 No “wow” effect Reporting Services reports Highly customizable Rich visualizations Require report experience Reports not updatable Drillthrough requires new reports
  • 35. Summary • I shared proven practices and tips from past experience • Keep things simple but have sound design • How to contact me: • • • • Email: teo.lachev@prologika.com Web: www.prologika.com Blog: http://prologika.com/cs/blogs/ Newsletter: http://prologika.com/Newsroom/News.aspx