3. Agenda
Motivation
Fast Track Offering
– Balanced Architecture Approach for DW
– Example FastTrack Reference Architectures
– Optimizing Storage, Load and Maintenance
– Case Studies
Parallel Data Warehouse Offering Overview
4. Some SQL Data Warehouses today
Big SAN
Big SMP Server
Connected together
What’s wrong with this picture?
5. Answer: system out of balance
This server can consume 16 GB/Sec of IO, but the SAN can
only deliver 2 GB/Sec
– Even when the SAN is dedicated to the SQL Data Warehouse, which
it often isn’t
– Lots of disks for Random IOPS BUT
– Limited controllers Limited IO bandwidth
System is typically IO bound
Queries are slow
Result: significant investment, not delivering performance
6. You can get more sophisticated…
Realize that queries performing complex calculations,
format conversions, multi-dimension hash joins, etc. will be
more cpu-intensive than others
– Complex queries will consume data at a slower per-core rate
than simpler queries
Alternative: Measure per-core data consumption for a
variety of queries, and take the weighted average
– A standard approach to capacity planning
7. Or you can leave it to us…
We’ve measured a mix of TPCH queries that reflect a
‘prototype’ Data Warehouse workload
Concluded that SQL Sever 2008 R2 on current x64 cores
consume ~200 MB/Sec per core on average for this
workload
We use this as a basis for the published reference
architectures
Your mileage will vary!
– For precise system sizing, measure your own workload
8. Potential Performance Bottlenecks
S F
C DISK DISK
P
W Q
A C
I L C FC A
U HBA S
N S A B LUN
CACHE
C W A STORAGE A
SERVER D E C
O
O R H I B CONTROLLER B DISK DISK
R FC A
W V E T B
E HBA B
S E C LUN
S
R H
CPU Feed Rate SQL Server HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate
Read Ahead Rate
9. The Alternative: A Balanced System
Design a server + storage configuration that can deliver all the IO
bandwidth that CPUs can consume when executing a SQL Relational
DW workload
Avoid sharing storage devices among servers
Avoid overinvesting in disk drives
– Focus on scan performance, not IOPS
Layout and manage data to maximize range scan performance and
minimize fragmentation
10. Microsoft Data Warehousing – Product Offering
PDW with
Scale Hub-and-spoke 1 Minimal HW tune
Complexity 4 up/optimization. Supports
HA by default mixed workloads
SW-HW integration 3 2 Balanced solution for mostly scan
centric workloads.
PDW 3 Max HW tune up for most
DW scenarios.
SQL Server 2008 R2 4 Most flexible Architecture for
with Fast Track handling all DW scenarios.
Reference Architecture
2
SQL Server 2008 R2
1
11. Agenda
Motivation
Fast Track Offering
– Balanced Architecture Approach for DW
– Example FastTrack Reference Architectures
– Optimizing Storage, Load and Maintenance
Parallel Data Warehouse Offering Overview
12. SQL Server Fast Track Data Warehouse
Solution to help customers and partners accelerate their data warehouse deployments
A for designing a cost-effective, balanced system for Data
Warehouse workloads
Reference hardware developed in conjunction with
hardware partners using this method
for data layout, loading and management
Relational Database Only – Not SSAS, IS, RS
13. Fast Track Scope
Supporting Systems BI Data Storage Systems Presentation Layer Systems
Integration Analysis Services
Services ETL Cubes
Presentation Data
Presentation Data
Web Analytic Tools
Data Path
Reporting Services
Subject Area
Data Marts
SharePoint Services
SAN, Storage Array Microsoft Office SharePoint
Data Warehouse PerformancePoint
Data Staging, Excel Services
Bulk Loading
Reference Architecture Scope (dashed)
15. Fast Track SQL DW Architecture vs. Traditional DW
Traditional SQL DW Architecture Fast Track SQL DW Architecture
Shared Infrastructure Dedicated DW Infrastructure
Architecture modeled after DW Appliances
Scalability from 4TB to 80TB
Enterprise Shared Shared Network Dedicated Network
SAN Storage Bandwidth Bandwidth
SQL 2008 Data Warehouse Dedicated Low Cost
4 Processor 16 + Core Server SAN Arrays 1 for every
4 CPU Cores
Benefits:
OLTP Applications -Lower TCO
-Balanced CPU to I/O Channel Optimized for DW
-Modular Building Block Approach
-Scale Out or Up within limits of Server and San
16. HP SQL Server Fast Track Data Warehousing
Fast Track G7 Configurations
Coming soon
Scales from SMB to Enterprise
– Prescriptive guidance and optimized methodology for deploying a data warehouse
– Targeted at query workloads patterned for large sequential data reads
– Balanced hardware approach
HP provides
– Configurations, tested performance, guidance and
– Best practices for deploying/operating/managing
– Packaged and custom support
Basic Mainstream Mainstream Premium
8– 16TB 8 – 16TB 20 – 60TB 40– 80 TB
DL38x G7w/ DL38x G7 w/ DL58x G7 w/ DL980 G7 w/
MSA2000 G3 MSA P2000 G3 MSA P2000 G3 MSA P2000 G3
17. HP SQL Server Fast Track Data Warehousing Coming
Fast Track G7 configurations in test
soon
Server: HP ProLiant DL380 G7 with
Small SMP: 2x 6-core Intel Xeon
2- Socket Processor Storage : HP P2000 G3
Configuration
Scalability: 8 – 16TB 2p; 12 core, 64-192GB RAM
Server: HP ProLiant DL 580 G7
Medium SMP: 4- with 4x 8-core Intel Xeon
Socket Processor Storage : HP P2000 G3
Configuration
Scalability: 20 – 40TB 4p; 32 core, 144-512GB RAM
Server: HP ProLiant DL980 G7 with
Large SMP: 8x 8-core Intel Xeon
8- Socket Processor Storage: HP P2000 G3
Configuration
Scalability: 40 – 80TB 8p; 64 core, 2TB RAM
18. Fast Track Component Architecture
SQL Server
Storage Interconnect
Windows Server OS Storage Processor Disk Array
CPU Host Storage Adaptor
Server Storage Enclosure
19. Core Evaluation Metrics
These metrics are used to both validate and position Fast
Track Reference Architectures
– Maximum Consumption Rate – Ability of SQL Server to process data for a
specific CPU and Server combination and a standard SQL query.
– Benchmark Consumption Rate – Ability of SQL Server to process data for a
specific CPU and Server combination and a user workload or query.
– User Data Capacity – Maximum available SQL Server storage for a specific
Fast Track RA assuming 2.5:1 page compression factor.
21. User Data Capacity
UDC is the data capacity required
– Plan for projected growth
• Based on your projections
• Needs to be allocated up-front
– Allocate for data management needs
• Staging database requirements
• Temporary objects
– Allocate for TempDB
• Typically 20-30% of primary data space
• Tempdb is not compressed
22.
23. Storage Layout Implications for SQL Server
LUN 1 LUN 2 LUN 3 LUN16
Permanent FG
Permanant_DB
Permanent_1.ndf Permanent_2.ndf Permanent_3.ndf Permanent_16.ndf
Stage FG
Database
Stage
Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf
Local Drive 1
TempDB
TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB)
Log LUN 1
Permanent DB Log
Stage DB Log
24. Sequential Scan Components
ARY01D1v01 ARY02D1v03 ARY03D1v05 ARY04D1v07
4MB 4MB 4MB 4MB
DB1-1.ndf DB1-3.ndf DB1-5.ndf DB1-7.ndf
ARY01D2v02 ARY02D2v04 ARY03D2v06 ARY04D2v08
4MB 4MB 4MB 4MB
DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf
Contiguous allocation, data striping, pre-fetch, and read-ahead work to create efficient
Sequential IO
– Data stripe width is balanced against read-ahead “Depth”
– Combined, these elements provide effective access to the full data stripe from a single thread
Each element is necessary to maximize efficiency
25. loading
One of the important topics
I hope you saw the session yesterday
If not – you can watch the video
OR
There is Appendix to this presentation -
26. Minimizing File fragmentation
Pre-allocate database files
• Size files correctly to prevent growth
• Do not shrink files
Do not use NTFS file fragmentation tools
– Rebuild table to ensure disk block level optimal organization
Writing data
– Concurrent load operations to the same file will induce fragmentation
– DML change operations (Update/Delete) may induce fragmentation
Use Filegroups and Partitioning to manage concurrent writes
for large tables
29. Agenda
Motivation
Fast Track Offering
– Balanced Architecture Approach for DW
– Example FastTrack Reference Architectures
– Optimizing Storage, Load and Maintenance
Parallel Data Warehouse Offering Overview
– Scale Out Architecture Approach for DW
– SQL Server in Scale Out Story
30. HP Enterprise Data Warehouse Appliance
Transforming today’s SQL
BEFORE AFTER
The world’s most scalable,
easy-to-manage enterprise
data warehousing solution
31. HP Enterprise Data Warehouse Appliance
COMPLETE SIMPLIFIED FOR ANY SCALE
32. HP Enterprise Data Warehouse Appliance
Description
Scale-Out of SQL Server: 10s TB ►100s TB ►PB
Uses massively parallel processing (MPP)
Highly optimised for DW workload at each layer of the
stack
Uses index-Light
Deliver predictable performance at low cost
Simplified deployment and maintenance via appliance
model
Integration with existing SQL Server 2008 DW via Hub &
Spoke Architecture
Lower total cost of ownership
33. HP Parallel Data Warehouse Appliance -
Hardware Architecture Data Rack
Storage Nodes Database Nodes
Control node Control Rack HP ProLiant DL HP MSA P2000 G3
Where clients apps connect Control Nodes SQL
HP ProLiant DL
MPP engine runs here Active / Passive Compute nodes
SQL
Controls DMS on all nodes Store user data
Client Drivers SQL SQL
Central point for all HW Perform local query processing
Dual Fiber Channel
monitoring Run dataSQL
movement service
Dual Infiniband
Management Servers Not accessible to outside world
SQL
Management node
Data Center
S/W upgrades and patch SQL
Monitoring
deployment staging place
Holds S/W images in case a Landing Zone
SQL
node needs reimaging Landing Zone
SQL
ETL Load Interface Staging place for data
loading SQL
Backup node Accessible to outside world
Backup Node SQL
Backup file storage
Corporate Backup
Accessible to outside world
Solution Spare Database Node
Corporate Network Private Network
34. Symmetric Multi-Processing vs. Massively
Parallel Processing
SMP (SQL Server, Fast Track) MPP (PDW)
OLTP, Transactional, Parallel Data Warehousing
Data Warehousing (esp. VLDB, complex workloads)
35. HP Enterprise Parallel Data Warehouse –
Impressive live demo
Massive parallel
query processing
106 billion rows;
10 TB table
High
performance
report without
indexing and
aggregations
36. Agenda
Motivation
Fast Track Offering
– Balanced Architecture Approach for DW
– Example FastTrack Reference Architectures
– Optimizing Storage, Load and Maintenance
Parallel Data Warehouse Offering Overview
– Scale Out Architecture Approach for DW
– SQL Server in Scale Out Story
37. Data Distribution with replication
Database
Date Dim
Customer D_DATE_SK
D_DATE_ID
C-CUSTOMER_SK D_DATE
D_MONTH
C_CUSTOMER_ID Item
C_CURRENT_ADDR …
… I_ITEM_SK
I_ITEM_ID
I_REC_START_DATE
I_ITEM_DESC
…
SS[1] Store Sales
Ss_sold_date_sk
SS[2] Ss_item_sk
Ss_customer_sk
Ss_cdemo_sk
SS[3] Ss_store_sk
Ss_promo_sk
Ss_quantity
Promotion
SS[4]
Customer
…
Demographics P_PROMO_SK
P_PROMO_ID
CD_DEMO_SK P_START_DATE_SK
P_END_DATE_SK
CD_GENDER Store …
CD_MARITAL_STATUS
CD_EDUCATION
… S_STORE_SK
S_STORE_ID
S_REC_START_DATE
S_REC_END_DATE
S_STORE_NAME
…
38. Distributed Data Warehouse Architecture
Departmental
Reporting
MS Office 2010
Regional Reporting
Enterprise data Central Enterprise
can be maintained DW Hub
on a PDW hub
Hub= unified EDW ETL Tools
Spoke= Federated data marts
39. Distributed Data Warehouse Approach
Hub & Spoke model
Enables DW architecture to more closely match the
structure of large enterprises.
Separates user and data workloads eliminating traditional
process and resource conflicts
Integrate both SMP and MPP systems with “Shared
Nothing”
All systems connect via a dedicated high speed netwok
Dual high speed Infiniband
Supports multiple workloads on different systems
40.
41. Microsoft Data Warehousing – Product Offering
PDW with
Scale Hub-and-spoke 1 Minimal HW tune
Complexity 4 up/optimization. Supports
HA by default mixed workloads
SW-HW integration 3 2 Balanced solution for mostly scan
centric workloads.
PDW 3 Max HW tune up for most
DW scenarios.
SQL Server 2008 R2 4 Most flexible Architecture for
with Fast Track handling all DW scenarios.
Reference Architecture
2
SQL Server 2008 R2
1
42. Resources
SQL Server Fast Track DW Home Page
– http://www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx
Fast Track DW 2.0 Architecture Whitepaper
– http://msdn.microsoft.com/en-us/library/dd459178.aspx
Use minimal logged BULK operation (Trace Flag –T 610)
– http://msdn.microsoft.com/en-us/library/dd425070.aspx
45. !Let’s Party
ארוחת ערב – בין השעות 03:02-03:81
תחבורה למסיבה – שאטלים החל מ- 03:02
צמידים לכניסה - מקבלים במעטפות בקבלת החדרים
46.
47. Alternatives for loading
Use a heap
– Practical if queries need to scan whole partitions
or…Use a batchsize = 0
– Fine if no parallelism is needed during load
or…Use a Two-Step Load
1. Load to a Staging Table (heap) with constraint for Deltas
2. INSERT-SELECT from Staging Table into Target CI
Resulting rows are not fragmented
Can use Parallelism in step 1 – essential for large data volumes
48. Two-Step Load Variations
To achieve high parallelism during historical load
– Typically into a partitioned table
– Use a Staging Table (heap) that is partitioned identically to the Target
Table
– Use multiple concurrent streams to load the Staging Table with
moderate batchsize (SSIS, Bulk Insert, etc)
– INSERT-SELECT separate partitions into the Target Table –
potentially in parallel
• Use ALTER TABLE SET ( LOCK_ESCALATION = AUTO)
– Note: If memory is limited, TempDB could be heavily used for sorting
49. Two-Step Load Variations (cont.)
To avoid most TempDB space and TempDB IO during load
– Use a partitioned Staging Table that is also indexed identically to
Target Table
– Load Staging Table using moderate batchsize (< 1M rows)
– Final INSERT-SELECTs will avoid any sort!
• However the staging loads will be logged
– Note: Parallelism will be limited if load batches overlap
50. Loading Data
Goal: maximize read performance
– Minimizes Disk head movement
– Maintains high average request size (Think ~400k not 8k)
– Sustain high average scan rates
Key considerations for a Fast Track data load
– Data Architecture: Destination table, partitioning, and filegroup
– Source Data: Format & size
– System Resources: CPU & Memory
Use minimal logged BULK operation (Trace Flag –T 610)
– http://msdn.microsoft.com/en-us/library/dd425070.aspx