Contenu connexe

Similaire à Accelerating Big Data Analytics(20)


Accelerating Big Data Analytics

  1. Accelerating Big Data Analytics with Microsoft APS and Attunity Replicate
  2. 2 Data sources The traditional data warehouse
  3. 3 Data sourcesNon-relational data The traditional data warehouse
  4. Data sources Non-Relational Data HadoopRelational Data Warehouse Data Platform Analytics Platform System SQL Server 2014 Azure HDInsight
  5. Keep legacy investment Buy new tier-one hardware appliance Acquire Big Data solution Acquire business intelligence Roadblocks to evolving to a modern data warehouse Limited scalability and ability to handle new data types Significant training and data silos High acquisition and migration costs Complex with low adoption
  6. Introducing the Microsoft Analytics Platform System The turnkey modern data warehouse appliance • Relational and non-relational data in a single appliance • Enterprise-ready Hadoop • Integrated querying across Hadoop and PDW using T- SQL • Direct integration with Microsoft BI tools such as Microsoft Excel • Near real-time performance with In-Memory Columnstore • Ability to scale out to accommodate growing data • Removal of data warehouse bottlenecks with MPP SQL Server • Concurrency that fuels rapid adoption • Industry’s lowest data warehouse appliance price per terabyte • Value through a single appliance solution • Value with flexible hardware options using commodity hardware
  7. Microsoft Analytics Platform System The turnkey modern data warehouse appliance
  8. Move HDFS into the warehouse before analysis ETL Learn new skills T-SQL Build Integrate Manage Maintain Support Hadoop alone is not the answer to all Big Data challenges Steep learning curve, slow and inefficient Hadoop ecosystem New data sources “New” data sourcesNew data sources
  9. Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL Uses the power of MPP to enhance query execution performance Supports Windows Azure HDInsight to enable new hybrid cloud scenarios Provides the ability to query non-Microsoft Hadoop distributions, such as Hortonworks and Cloudera SQL Server Parallel Data Warehouse Microsoft Azure HDInsight PolyBase Microsoft HDInsight Hortonworks for Windows and Linux Cloudera Connecting islands of data with PolyBase Bringing Hadoop point solutions and the data warehouse together for users and IT Result setSelect…
  10. Use cases where PolyBase simplifies using Hadoop data Bringing islands of Hadoop data together Running high performance queries against Hadoop data Archiving data warehouse data to Hadoop (move) Exporting relational data to Hadoop (copy) Importing Hadoop data into a data warehouse (copy)
  11. Big Data insights for anyone New insights with familiar tools through native Microsoft BI integration Minimizes IT intervention for discovering data with tools such as Microsoft Excel Enables DBA and power users to join relational and Hadoop data with T-SQL Offers Hadoop tools like MapReduce, Hive, and Pig for data scientists Takes advantage of high adoption of Excel, Power View, PowerPivot, and SQL Server Analysis Services Power users Data scientist Everyone else using Microsoft BI tools
  12. Shinsegae Corporation, a major department store chain in Korea, needed better performance for customer data mining and basket purchase analysis. Shinsegae took advantage of the integration of PDW and Hadoop to combine 450 terabytes of data, and was pleased to see PolyBase performing nearly twice as fast as their best Hive/Hadoop environment. #1 Retail company in Korea We are really satisfied with the performance of PolyBase to allow us to join relational and Hadoop data (weather data, board data, text data) faster and easier. PolyBase is a really powerful feature of PDW to deploy a Big Data system. PolyBase is one of the reasons we selected PDW as our Big Data platform.
  13. The Royal Bank of Scotland—the leading UK provider of corporate banking services—needed a powerful analytics platform to improve performance and customer services. The bank implemented a Microsoft SQL Server Parallel Data Warehouse appliance to increase productivity by 40 percent for faster response to business needs. I knew that it would be easy for my team to transition from managing SQL Server databases to SQL Server PDW, and the solution cost about 85 percent less than products from other vendors.
  14. Microsoft Analytics Platform System No-compromise modern data warehouse solution Meeting today’s Big Data analytics requirements Enterprise-ready Hadoop with HDInsight and the simplicity of PolyBase Optimized performance with MPP technology and In-Memory Columnstore Providing value with a low TCO
  15. Accelerating Big Data Analytics with Microsoft APS and Attunity Replicate
  16. To Use Data, You Must Move it! 16
  17. Data Needs to Be Moved to Be Useful »80%of the work that data scientists put into big data projects is spent on data integration and resolving data quality issues. Source: “For Big Data Scientists, “Janitor Work” is Key Hurtle to Insights,” by Steve Lohr, New York Times, August 17, 2014
  18. Data Integration Remains a Major Challenge 1. Long rollout 2. Lots of personnel 3. Mixed systems 4. Hard to maintain 5. Not real-time
  19. Attunity Replicate for Microsoft APS 19 More Data Less Time Less Cost Data Value • Easy, no coding, less complexity • Pre-automated, optimized process • Fast, high performance integration • Real-time CDC with low overhead • Optimized for large volumes in LAN and WAN
  20. Use Cases Getting Data into Microsoft APS and SQL Server 1. ELT - accelerate new data feeds to your data warehouse 2. CDC – load data in real-time operational analytics 3. Query Offload into ODS and for BI on SQL Server 4. Migrate from another database or data warehouse 5. Hadoop – load data into and out of Hadoop 20
  21. Attunity Replicate for Microsoft APS Monitoring and Control. Complete confidence at a glance Turbo-Stream CDC and Optimizations for loading Microsoft APS High performance, low-latency, low-impact, and scalability Zero Footprint Architecture. Nothing to install on source database for Oracle, SQL Server, DB2, Sybase, mySQL Click-2-Load. Drag. Drop. Done. Complete, Heterogeneous Data Loading/Replication. Automating Schema Generation, Full Load and Change Data Capture 21
  22. Attunity Replicate for Microsoft APS High Level Architecture 22 Web-based Designer and Management Console Replication Server In Memory Stream Processing Persistent Store Source Database Transaction Log Data / Metadata Data / Metadata CDC Bulk Loader Stream Loader Bulk Reader Transform Filter Optimized Integration • Oracle • SQL Server • DB2 • DB2 for iSeries & z/OS • Sybase • mySQL • Informix • Files (CSV) • Mainframe VSAM, IMS • ODBC (e.g. Teradata, other DW) PDW HDInsight PolyBase
  23. Optimized Performance Attunity TurboStream CDC for DW and Microsoft APS 23 In Memory Stream Processing Attunity TurboStream CDC Transactional CDC Transactions applied in real-time, in order High-Volume CDC CDC DW Loader SQL n 2 1 SQL SQL Consolidation of change records to minimize transactions applied to target R1 R1 R2 R1 R2 R1 R2 PDW HDInsight PolyBase
  24. Attunity Replicate for Hadoop Ad-hoc Analytics Bulk Load Change Data Click-2-Replicate Design. Drag. Drop. Done. Databases Data Feed Sources CSV BI Reporting Visualization & Analytics DB/DW Data Refresh Data Append
  25. Attunity Replicate for Microsoft APS Benefits 1. High Performance – high volumes, low latency, low impact 2. Heterogeneous – supports many source databases 3. Fast time-to-value – automated turnkey solution 4. Less impact on IT – less development resources required 5. Lower TCO – for both software licenses and implementation services 25
  26. For more information, go to: • Read the Attunity Replicate for Microsoft APS Solution Sheet •Download the "Accelerating Big Data Analytics with Microsoft APS and Attunity Replicate" Whitepaper •Watch the Accelerating Big Data Analytics with Microsoft APS and Attunity Replicate Webinar 26