9. Data Warehouse Scope
DataPath
Data Warehouse
Analysis Services
Cubes
PerformancePoint
Dedicated
SAN, Storage
Array
Reporting
Services
Web Analytic
Tools
Integration
Services ETL
SharePoint
Services
Microsoft Office
SharePoint
Data Staging,
Bulk Loading
Supporting
Systems
BI Data Storage
Systems
Presentation Layer
Systems
Data Warehouse Scope
(dashed)
PresentationDataPresentationData
10. Data Warehouse Scenarios
• No longer exclusive to
large enterprises and
specialists analysts
• Growth of affordable
self-service BI tools such
as PowerPivot and
Reporting Services has
created a DW
requirement for smaller
businesses and individual
departments
11. Microsoft Data Warehousing Offerings
Scalable and reliable
SMP platform for data
warehousing on any
hardware
Scalable and reliable
platform for data
warehousing on any
hardware
Reference
architectures offering
best price
performance for data
warehousing
Appliance for high end
MPP Data Warehousing
delivering highest
scalability and
performance
Ideal for data marts or
small to mid-sized
enterprise data
warehouses (EDWs)
Ideal for large data
marts or mid-sized
EDWs
Ideal for data marts or
small to mid-sized
data warehouses with
scan-centric
workloads
Ideal for high scale or
high performance data
marts and EDWs
Software only
Integrated Appliance
(Software and
Hardware)
Reference
Architectures
(Software and
Hardware)
DW Appliance
(Fully integrated
Software and Hardware)
Scale-Up DW Scale-Up DW Scale-Up DW Scale-Out DW with MPP
10s of terabytes <5 terabytes 5–80 terabytes 10s - 100s of TB
Software Assurance;
Premier Mission Critical
Support
3-Year Support Plus 24
Software Assurance;
Premier Mission Critical
Support
Mission Critical
Advantage Program
Enterprise Fast Track Data
Warehouse RA
BDW
Appliance
Parallel Data
Warehouse
12. Microsoft Data Warehouse Offerings
Effort to Build Very High Very Low Modera
te
Modera
te
Moderate Mode
rate
Very
Low
Capacity Variable 5 TB 14 TB 20 TB 40 TB 40 TB 500 TB
Concurrency Variable Light Light Medium Medium High Very
High
Query
Complexity
Variable Medium Mediu
m
Medium Medium High Very
High
14. Business Data Warehouse Appliance
Agile
• Deploy in hours/days, not in
months
• Easy to use through built-in
dedicated tools to load and manage
your data warehouse
• Designed for up to 5TB data
warehouses
• Fast Track 3.0 compliant, license
path to Fast-Track
Complete
• Hardware + Software
+ Services
• Pre-tuned, pre
configured, pre-
installed. Turn on and
go!
• Single point of contact
for support
Optimized
• Specifically for small to
medium data warehouse
workload
• Designed for performance,
energy efficiency, and value
by HP and Microsoft’s best
engineers
• Security and reliability built
in
17. Fast Track Data Warehouse Components
Software:
• SQL Server 2008 R2
Enterprise
• Windows Server 2008 R2
Configuration guidelines:
• Physical table structures
• Indexes
• Compression
• SQL Server settings
• Windows Server settings
• Loading
Hardware:
• Tight specifications for
servers, storage
and networking
• ‘Per core’ building block
19. SQL Server Parallel Data Warehouse
• Tier-1 Enterprise Data Warehouse Appliance Offering
– High scalability from tens to hundreds of terabytes
– High performance through the MPP system
• Flexibility and Choice
– Choice of deployment options through distributed
architecture
• Most Comprehensive Solution
– Complete data warehouse solution spanning desktop,
enterprise data warehouse, and data marts
22. SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Client connections always go through the control
node
Contains no persistent user data
Parallel Data Warehouse advantages:
oProcesses SQL requests
oPrepares execution plan
oOrchestrates distributed execution
Local SQL Server processes final query plan and
aggregates results
Provided by DataDirect
oOpen database connectivity (ODBC),
object linking and embedding database
(OLE DB), Java Database Connectivity
(JDBC), and ActiveX® Data Objects
(ADO.net) client drivers
oWire protocol (SeQuel link)
oDrivers are available for 32 bits and 64 bits
CONTROL NODE
24. SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Provides high-capacity storage for data files
from ETL processes
Is available as a sandbox for other
applications and scripts that run on the
internal network
Provides SQL Server Integration Services
LANDING ZONE
Source
Landing
Zone
Files
Data
Loader
Compute
Nodes
DWLoader or
SQL Server
Integration
Services
26. SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
• Data Rack Servers 10
active + 1 passive
• HP ProLiant DL360 G7
compute nodes
• InfiniBand, FC and
Ethernet switching, 42U
rack
• Expansion Grow from 1–
4 data racks, storage
options, test/dev system
• Storage 10x HP
StorageWorks MSA
P2000 G3
• Consists of COMPUTE
NODES and STORAGE
NODES
27. SQL
• Data Rack Servers 10
active + 1 passive
• HP ProLiant DL360 G7
compute nodes
• InfiniBand, FC and
Ethernet switching, 42U
rack
• Expansion Grow from 1–
4 data racks, storage
options, test/dev system
• Storage 10x HP
StorageWorks MSA
P2000 G3
COMPUTE NODE
Each MPP node is a highly tuned symmetric
multi-processing (SMP) node with standard
interfaces
Provides dedicated hardware, database,
and storage
Runs SQL Server
Spare Node provides failover in case of
node failure
Drives are configured as RAID 1
30. Replicated
A table structure exists as a full copy within each discrete Parallel
Data Warehouse node.
Data Layout Approaches
Distributed
A table structure is hashed on a single column and uniformly
distributed across all nodes on the appliance. Each distribution is a
separate physical table in the database management system
(DBMS).
Ultra Shared-Nothing
Provides the ability to design a schema of both distributed and
replicated tables to minimize data movement between nodes.
Small sets of data can be more efficiently stored in full
(replicated).
Certain set operations (such as single-node operations) are
more efficient against full sets of data.
31. Ultra Shared-Nothing Architecture
Extends Traditional Shared-Nothing Design
Pushes shared-nothing architecture into the SMP node—there is IO and CPU affinity
within SMP nodes
o Eliminates contention for user queries
o Uses full resources for each user query
Provides multiple physical instances of tables
o Distributes large tables
o Replicates small tables
Redistributes rows as needed
Provides Fault Tolerance
All hardware components have redundancy (including CPUs, disks, networks, power, and
storage processors)
Control and compute nodes use failover clustering
Management nodes have active and standby states
33. Parallel Data Warehouse Configuration Manager
Appliance topology
Services status
Network
configuration
Privileges
34. Parallel database
copy technology
enables rapid data
movement and
consistency between
EDW and data marts
Create SQL Server 2008 R2, Fast Track Data Warehouse,
and SQL Server Analysis Services Data Marts
Supports user groups
with very different
service-level
agreements (SLAs):
• Performance
• Capacity
• Loading
• Concurrency
Flexible Business Alignment
A distributed architecture gives you the flexibility to add or change diverse
workloads
or user groups while maintaining data consistency across the enterprise
35. Landing
Zone
ETL Tools
Distributed Data Warehouse Architectures
Departmental
Reporting
Regional
Reporting
High-
Performance
Reporting
Central EDW
Hub
Regional
Reporting
with Business
Decision
Appliance
Third-
Party
RDBMS
Third-
Party
Data
Integrati
on
Mobile
Applicati
ons
36. Determining the Right Solution
What is the workload?
Number of concurrent users
Query complexity
Query mix
Load processing
Performance requirements
What is the customer looking for in a solution?
Simplicity in the appliance
100 percent compatibility with SQL Server 2008 R2
Enterprise scalability
Economical hardware
Incremental expansion and high availability by default
37. Parallel Datawarehouse
Enterprise-class scalability to hundreds of terabytes
High performance
Interoperability with leading BI products
Mission critical support and maintenance
Mature SQL Server platform with high security and robust
engineering process
Strong data warehouse vision and roadmap that includes industry-
leading technologies
Value to Customer
Supporting Features
MPP with ultra shared-nothing architecture
Distributed query optimization
Balanced hardware with pre-tested and pre-tuned appliances optimized for data
warehousing
Third-party product integration (for example, Microstrategy, Business Objects, and
Informatica)
Mission critical support and maintenance
Road map includes column store, petabyte scalability, real-time data warehousing, MDM,
Notes de l'éditeur
The HP Business Data Warehouse Appliance is a great solution for data warehouse environments with light concurrency requirements and relatively low data volumes. This workload profile is becoming increasingly common as organizations recognize the business value in using data marts and departmental data warehouses as a platform for the increasing use of business analysis tools by information workers at all levels of the business. No longer are data warehouses and BI solutions the exclusive domain of huge enterprises – they are now an increasingly important capability for small to medium businesses and decentralized departments. There’s a growing number of businesses who don’t have same concurrency and data volumes, or budgets, as large enterprises; but who want to be able to create a data warehouse for better reporting, analysis, and decision making.
The HP Business Data Warehouse offers a solution for the customers discussed on the previous slide. It’s a solution that is:
Complete – the appliance comes with all the hardware and software you need, pre-configured for a data warehouse workload based on expertise from HP and Microsoft, and includes support services from a single source.
Optimized – Experts from Microsoft and HP have designed and tuned the appliance specifically for data warehouse workloads, so you can be sure it will meet your data warehouse requirements with efficient power utilization and built in security and reliability features.
Agile – Because the BDW is a single hardware appliance, you can just plug it in, switch it on, and within a very short period you’ll have a working data warehouse. The easy to use wizards included in the appliance make it easy to configure and load, enabling your business to start taking advantage of your data warehouse sooner than with a “self-build” solution. And while the BDW is optimized for relatively low data volumes and concurrency, if your business grows significantly you can transfer your BDW software licenses to a Fast Track solution.
There are two key scenarios for using the HP Business Data Warehouse appliance:
A small business or departmental data warehouse for a small group of concurrent users who need to store and analyse up to 5 TB of data.
A spoke in an Enterprise Data Warehouse “hub and spoke” architecture, where the BDW is used to deliver a subset of the corporate data warehouse to a specific set of users.
The appliance is a complete solution with hardware, software, and service that is needed in a mission critical data warehouse. The database is highly scalable and can handle workloads of hundreds of terabytes while maintaining performance. The EDW appliance also works with your existing data warehouses and data marts so you do not have to rip and replace your current investments. Also, you can use familiar tools such as Microsoft Excel to analyze the data in your data warehouse.
Customers will purchase at least two racks for a complete EDW Appliance system.
[Click]
The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
Customers will purchase at least two racks for a complete EDW Appliance system.
[Click]
The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
Customers will purchase at least two racks for a complete EDW Appliance system.
[Click]
The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
Customers will purchase at least two racks for a complete EDW Appliance system.
[Click]
The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
Customers will purchase at least two racks for a complete EDW Appliance system.
[Click]
The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
Customers will purchase at least two racks for a complete EDW Appliance system.
[Click]
The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
Customers will purchase at least two racks for a complete EDW Appliance system.
[Click]
The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
Customers will purchase at least two racks for a complete EDW Appliance system.
[Click]
The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
Data layout options:
Dimension tables are typically replicated
Parallel Data Warehouse maintains data integrity across all nodes
Fact tables are typically distributed
The data model, table sizes, and workloads must all be considered when choosing between replicated and distributed tables
The following join types are used to achieve distribution compatibility:
Shared-nothing join: Achieves distribution compatibility by using compatible distribution keys in the SQL join criteria
Ultra shared-nothing join: Achieves distribution compatibility through a replicated table; no data movement between nodes is required
Redistribution join: Requires data to be dynamically distributed between compute nodes to achieve distribution compatibility
By taking the traditional idea of a shared-nothing architecture a step farther, the EDW appliance does not need to share any information between compute nodes. Each table is duplicated in several places to help with load balancing and fault tolerance. The hardware is redundant and supports automatic failover to the standby hardware to increase the overall system uptime.
The Administrative Console is an Internet Information Services (IIS) web application for SQL Server Parallel Data Warehouse that displays the appliance’s state information. Users connect to the Administrative Console through Microsoft Internet Explorer.
The Configuration Manager is an appliance administration tool that SQL Server Parallel Data Warehouse system administrators use to perform appliance-level operations and to change appliance-level settings. For example, use the Configuration Manager to reset passwords, set the time zone, change IP addresses, configure SSL certificates, enable remote access through the firewall, start or stop the appliance, and set Instant File Initialization.
A distributed data warehouse solution, such as that supported by SQL Server Parallel Data Warehouse, comprises a centralized EDW and a set of loosely coupled data marts. For many years, this has been the preferred approach for enterprise-wide data warehousing, and numerous studies since 2003 confirm that hub and spoke is the most popular data warehouse architecture among DW professionals. Traditionally, implementing a hub and spoke architecture has been challenging due to practical limitations of the database engine and network resources.
[Click to display types of spoke]
With SQL Server Parallel Data Warehouse, you can create a diverse range of types of spoke, from SQL Server Parallel Data Warehouse MPP appliances for user groups that have extreme scalability requirements, Fast Track data warehouse implementations, SQL Server 2008 Enterprise data warehouses, and even SQL Server 2008 Analysis Services OLAP databases.
[Click to display parallel database copy point]
However, the SQL Server Parallel Data Warehouse parallel database copy technology enables rapid data integration between spokes and the SQL Server Parallel Data Warehouse hub, making it easier to build hub and spoke solutions that integrate your diverse data marts and the enterprise data warehouse.
[Click to display multiple-user SLA point]
The SQL Server Parallel Data Warehouse hub and spoke architecture enables you to support user groups with very different SLAs; supports hot, warm, and cold data; supports different requirements for loading data loading, and more.
The EDW appliance can be the central hub in this architecture. The spokes can be anything from a SQL Server departmental data mart to a Fast Track reference implementation, a business decision appliance, or a SQL Server Analysis Services system. EDW is not restricted to any particular model, and the high-speed data copy features enable multiple clients.
With so many choices, there are always questions about which solution is right for the organization. These questions help you to determine the correct solution. While there is rarely any one deciding factor, you can find a solution that is optimized for the things that are most important to you.
The EDW appliance fits in with your existing data warehouse solutions and will enable you to query and report on the large amount of data stored in the appliance.