Data sources Non-Relational Data
HadoopRelational Data Warehouse
Data Platform
Analytics Platform System
SQL Server 2014
Azure HDInsight
Keep legacy
investment
Buy new tier-one
hardware appliance
Acquire Big Data
solution
Acquire business
intelligence
Roadblocks to evolving to a modern data warehouse
Limited
scalability and ability to
handle new data types
Significant training
and data silos
High acquisition
and migration
costs
Complex with low
adoption
Introducing the Microsoft Analytics Platform System
The turnkey modern data warehouse appliance
• Relational and non-relational
data in a single appliance
• Enterprise-ready Hadoop
• Integrated querying across
Hadoop and PDW using T-
SQL
• Direct integration with
Microsoft BI tools such as
Microsoft Excel
• Near real-time performance
with In-Memory Columnstore
• Ability to scale out to
accommodate growing data
• Removal of data warehouse
bottlenecks with MPP SQL
Server
• Concurrency that fuels rapid
adoption
• Industry’s lowest data
warehouse appliance price per
terabyte
• Value through a single
appliance solution
• Value with flexible hardware
options using commodity
hardware
Move HDFS into the warehouse before analysis
ETL
Learn new
skills
T-SQL
Build
Integrate
Manage
Maintain
Support
Hadoop alone is not the answer to all Big Data challenges
Steep learning curve, slow and inefficient
Hadoop ecosystem
New data sources
“New” data sourcesNew data sources
Provides a single T-SQL query model for PDW
and Hadoop with rich features of T-SQL,
including joins without ETL
Uses the power of MPP to enhance query
execution performance
Supports Windows Azure HDInsight to enable
new hybrid cloud scenarios
Provides the ability to query non-Microsoft
Hadoop distributions, such as Hortonworks and
Cloudera
SQL Server
Parallel Data
Warehouse
Microsoft Azure
HDInsight
PolyBase
Microsoft
HDInsight
Hortonworks for
Windows and Linux
Cloudera
Connecting islands of data with PolyBase
Bringing Hadoop point solutions and the data warehouse together for users and IT
Result setSelect…
Use cases where PolyBase simplifies using Hadoop data
Bringing islands of Hadoop data together
Running high performance queries against Hadoop data
Archiving data warehouse data to Hadoop (move)
Exporting relational data to Hadoop (copy)
Importing Hadoop data into a data warehouse (copy)
Big Data insights for anyone
New insights with familiar tools through native Microsoft BI integration
Minimizes IT
intervention for
discovering data
with tools such as
Microsoft Excel
Enables DBA and
power users to
join relational and
Hadoop data with
T-SQL
Offers Hadoop
tools like
MapReduce,
Hive, and Pig for
data scientists
Takes advantage
of high adoption
of Excel, Power
View, PowerPivot,
and SQL Server
Analysis Services
Power users
Data scientist
Everyone else using
Microsoft BI tools
Shinsegae Corporation, a major department store chain
in Korea, needed better performance for customer data
mining and basket purchase analysis. Shinsegae took
advantage of the integration of PDW and Hadoop to
combine 450 terabytes of data, and was pleased to see
PolyBase performing nearly twice as fast as their best
Hive/Hadoop environment.
#1 Retail company in Korea
We are really satisfied with the performance of
PolyBase to allow us to join relational and Hadoop
data (weather data, board data, text data) faster and
easier. PolyBase is a really powerful feature of PDW to
deploy a Big Data system. PolyBase is one of the
reasons we selected PDW as our Big Data platform.
The Royal Bank of Scotland—the leading UK provider of
corporate banking services—needed a powerful
analytics platform to improve performance and
customer services. The bank implemented a Microsoft
SQL Server Parallel Data Warehouse appliance to
increase productivity by 40 percent for faster response to
business needs.
I knew that it would be easy for my team
to transition from managing SQL Server
databases to SQL Server PDW, and the
solution cost about 85 percent less than
products from other vendors.
Microsoft Analytics Platform System
No-compromise modern data warehouse solution
Meeting today’s Big Data
analytics requirements
Enterprise-ready Hadoop
with HDInsight and the
simplicity of PolyBase
Optimized performance
with MPP technology and
In-Memory Columnstore
Providing value with a
low TCO
Data Needs to Be Moved to Be Useful
»80%of the work that data
scientists put into big data projects
is spent on data integration and
resolving data quality issues.
Source: “For Big Data Scientists, “Janitor Work” is Key Hurtle to Insights,” by Steve Lohr, New York
Times, August 17, 2014
Data Integration Remains a Major Challenge
1. Long rollout
2. Lots of personnel
3. Mixed systems
4. Hard to maintain
5. Not real-time
Attunity Replicate for Microsoft APS
19
More Data
Less Time
Less Cost
Data Value
• Easy, no coding, less complexity
• Pre-automated, optimized process
• Fast, high performance integration
• Real-time CDC with low overhead
• Optimized for large volumes in LAN and WAN
Use Cases
Getting Data into Microsoft APS and SQL Server
1. ELT - accelerate new data feeds to your data warehouse
2. CDC – load data in real-time operational analytics
3. Query Offload into ODS and for BI on SQL Server
4. Migrate from another database or data warehouse
5. Hadoop – load data into and out of Hadoop
20
Attunity Replicate for Microsoft APS
Monitoring and Control.
Complete confidence at a glance
Turbo-Stream CDC and Optimizations for loading Microsoft APS
High performance, low-latency, low-impact, and scalability
Zero Footprint Architecture.
Nothing to install on source database for Oracle, SQL Server, DB2, Sybase,
mySQL
Click-2-Load.
Drag. Drop. Done.
Complete, Heterogeneous Data Loading/Replication.
Automating Schema Generation, Full Load and Change Data Capture
21
Attunity Replicate for Microsoft APS
High Level Architecture
22
Web-based Designer and
Management Console
Replication Server
In Memory Stream Processing
Persistent Store
Source
Database
Transaction
Log
Data / Metadata
Data / Metadata
CDC
Bulk
Loader
Stream
Loader
Bulk
Reader Transform
Filter
Optimized
Integration
• Oracle
• SQL Server
• DB2
• DB2 for iSeries & z/OS
• Sybase
• mySQL
• Informix
• Files (CSV)
• Mainframe VSAM, IMS
• ODBC (e.g. Teradata, other DW)
PDW
HDInsight
PolyBase
Optimized Performance
Attunity TurboStream CDC for DW and Microsoft APS
23
In Memory Stream Processing
Attunity TurboStream CDC
Transactional CDC
Transactions applied
in real-time, in order
High-Volume CDC
CDC DW
Loader
SQL
n 2 1
SQL SQL
Consolidation of
change records to
minimize transactions
applied to target
R1
R1
R2
R1
R2
R1
R2
PDW
HDInsight
PolyBase
Attunity Replicate for Hadoop
Ad-hoc
Analytics
Bulk Load
Change Data
Click-2-Replicate Design.
Drag. Drop. Done.
Databases
Data Feed Sources
CSV
BI
Reporting
Visualization
& Analytics
DB/DW
Data Refresh
Data Append
Attunity Replicate for Microsoft APS Benefits
1. High Performance – high volumes, low latency, low impact
2. Heterogeneous – supports many source databases
3. Fast time-to-value – automated turnkey solution
4. Less impact on IT – less development resources required
5. Lower TCO – for both software licenses and implementation services
25
For more information, go to:
www.Attunity.com/aps
• Read the Attunity Replicate for Microsoft APS Solution Sheet
•Download the "Accelerating Big Data Analytics with Microsoft
APS and Attunity Replicate" Whitepaper
•Watch the Accelerating Big Data Analytics with Microsoft APS
and Attunity Replicate Webinar
26