Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 36 Publicité

Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata

Recent research has pointed out the complementary nature of Hadoop and other data management solutions and the importance of leveraging existing systems, SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve analytic processing. Come to this session to learn how companies optimize the use of Hadoop with other enterprise systems to improve overall analytical throughput and build new data-driven products. This session covers: ways to achieve high-performance integration between Hadoop and relational-based systems; Hadoop+NoSQL vs Hadoop+SQL architectures; high-speed, massively parallel data transfer to analytical platforms that can aggregate web log data with granular fact data; and strategies for freeing up capacity for more explorative, iterative analytics and ad hoc queries.

Recent research has pointed out the complementary nature of Hadoop and other data management solutions and the importance of leveraging existing systems, SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve analytic processing. Come to this session to learn how companies optimize the use of Hadoop with other enterprise systems to improve overall analytical throughput and build new data-driven products. This session covers: ways to achieve high-performance integration between Hadoop and relational-based systems; Hadoop+NoSQL vs Hadoop+SQL architectures; high-speed, massively parallel data transfer to analytical platforms that can aggregate web log data with granular fact data; and strategies for freeing up capacity for more explorative, iterative analytics and ad hoc queries.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (20)

Publicité

Similaire à Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata (20)

Plus par Cloudera, Inc. (20)

Publicité

Plus récents (20)

Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enterprise Analytic Systems - Tasso Argyros, Teradata

  1. 1. Big Data Architecture Tasso Argyros | co-President | Teradata Aster Twitter: @targyros November, 2011
  2. 2. What We’re Covering Today • Data Science in Enterprise (vs the Valley) • Quick Overview of Teradata Aster’s Technology • Hybrid Hadoop Architectures • Connecting Hadoop to Other Systems • MapReduce Enteprise Use Cases 2 Teradata Confidential and Proprietary
  3. 3. About Aster Data • Aster has been a Big Data & Big Analytics pioneer since 2005 by developing an MPP SQL+MapReduce platform • Aster Data acquisition completed on April 6, 2011 • Opportunity for Teradata to expand its business in the Big Data analytics market to include multi-structured data and new analytical capabilities • Intense Focus on the Enterprise 3 Teradata Confidential and Proprietary
  4. 4. The Nature of Data Scientist Analytics in the Enteprise
  5. 5. What is Data Science? Curiosity/ Data Cleverness Scientists Technical Business Expertise Acumen 5 Teradata Confidential and Proprietary
  6. 6. Data Science is Exploding 6 Teradata Confidential and Proprietary
  7. 7. What is Making Data Science Popular? 1. Proliferation of Data-Driven Products & Businesses 2. Consumer Interactions with Web & Social Channels 3. Breadth of Tools Available 4. Wealth of Machine-Generated Data 7 Teradata Confidential and Proprietary
  8. 8. A Day in the Life of a Data Scientist – “Investigative Analytics” Integrate Investigate Implement 8 Teradata Confidential and Proprietary
  9. 9. Data Scientists in the Enterprise are Not Only Developers SQL Analysts SAS/R Analysts Curiosity/ DBMS Power Users Cleverness Java Coders … Technical Business Expertise Acumen 9 Teradata Confidential and Proprietary
  10. 10. Data Scientists Have Different Skills Combination of: - Analysts - Coders Enterprises - Sys admins / EngOps Hard to find & expensive Web Startups 10 Teradata Confidential and Proprietary
  11. 11. Data Scientists and MapReduce Platforms
  12. 12. A Brief History of MapReduce & Hadoop 2008: Aster Data 2009-2011: becomes the first Follow-on DBMS vendor to incorporate vendors announce 2006: Hadoop MapReduce connectors to becomes the first Hadoop open-source Aster Data implementation of tightly coupled: Hadoop MapReduce embedded MapReduce Distributions/ 2004: Google with SQL to bring Platforms emerge: publishes MapReduce to • Amazon MapReduce paper at enterprises – • Cloudera OSDI Conference SQL-MapReduce® • Hortonworks • Data Stax • MapR • … 12 Teradata Confidential and Proprietary
  13. 13. MapReduce is the SQL of Big Analytics • MapReduce is a parallel Map Function programming framework - “J2EE for Big Data Analytics” Scheduler • MapReduce provides - Automatic parallelization map - Fault tolerance - Monitoring & status updates shuffle • Hadoop reduce - Open source MapReduce • Aster - Commercial implementation of Results MapReduce + SQL 13 Teradata Confidential and Proprietary
  14. 14. 14 Teradata Confidential and Proprietary
  15. 15. The Technology Gap SQL-MR Hadoop-MR • Analyst-friendly • Developer-friendly • Iterative & Fast • Batch-oriented • Integrates well • Requires lots of with BI/Viz Tools coding But what if you need both? 15 Teradata Confidential and Proprietary
  16. 16. Quick Aster & SQL-MapReduce Overview
  17. 17. Filling the Gap: SQL-MapReduce 17 Teradata Confidential and Proprietary
  18. 18. Enabling Analysis of Diverse Data Aster capabilities for processing and analyzing multi-structured, raw data Multi-structured raw data Aster Analytic Platform SQL-MapReduce Output Col1 Col2 Col3 Col4 Structured Data tokenize, unpack, sessionize, … (DW, DBMS) Integrate Data Process and Explore Leverage Results • Load raw data directly • Use SQL-MapReduce • Structured output of into Aster Database functions to interpret & SQL-MapReduce • Bypass complex ETL analyze raw data processing available for pipeline via ELT • Leverage flexible, further use or output to dynamically-created data warehouse schema at runtime 18 Teradata Confidential and Proprietary
  19. 19. SQL-MapReduce for Big Data Analytics Example: Pattern Matching, Time Series Analysis Discover patterns in rows of sequential data Weblogs {user, page, time} Aster SQL-MapReduce Approach Click 1 Click 2 Click 3 Click 4 • Single-pass of data {device, value, time} • Linked list sequential analysis Smart Meters Reading 1 Reading 2 Reading 3 Reading 4 • Gap recognition {user, product, time} Sales Transactions Purchase 1 Purchase 2 Purchase 3 Purchase 4 {stock, price, time} Traditional SQL Approach Stock Tick • Full Table Scans Data Tick 1 Tick 2 Tick 3 Tick 4 • Self-Joins for sequencing Call Data Records {user, number, time} • Limited operators for ordered data Call 1 Call 2 Call 3 Call 4 Call 4 eBusiness Telecomm Financial Federal >Sessionization >Calling Patterns >Trade Sequences >Pattern Detection >Click Analysis >Signal Processing >Pairs Trading >Fuzzy Matching >Golden Path >Forecasting >Fraud Detection >Inference Analysis >Rev Attribution >Inexact linking 19 Teradata Confidential and Proprietary
  20. 20. Sample SQL-MapReduce Packaged Functions Modules SQL-MapReduce Analytic Functions • nPath: complex sequential analysis for time series and behavioral patterns Path Analysis • nPath Extensions: count entrants, track exit paths, count children, and Discover patterns in rows generate subsequences of sequential data • Sessionization: identifies sessions from time series data in single pass Graph and • Graph analysis: finds shortest path from distinct node to all other nodes in Relational Analysis graph • nTree: new function for performing operations on tree hierarchies. * Analyze patterns across New rows of data • Other: triangle finding, square finding, clustering coefficient * • Sentiment Analysis: classify content is positive or negative (for product review, customer feedback) * New • Text Categorization: used to label content as spam/not spam * Text Analysis • Entity Extraction/Rules Engine: identify addresses, phone number, names from textual data * Derive patterns in textual data • Text Processing: counts occurrences of words, identifies roots, & tracks relative positions of words & multi-word phrases • nGram: split an input stream of text into individual words and phrases • Levenshtein Distance: computes the distance between two words • Pivot: convert columns to rows or rows to columns * Data • Log parser: Generalized tool for parsing Apache logs * New Transformation • Unpack: extracts nested data for further analysis Transform data for more • Pack: compress multi-column data into a single column advanced analysis • Antiselect: returns all columns except for specified column 20 • Multicase: Teradata Confidential and Proprietary case statement that supports row match for multiple cases
  21. 21. Complementing Hadoop in the Enterpise
  22. 22. You Need Hybrid Architectures Engineers Data Scientists Business Analysts 5-10 concurrent users 50+ concurrent users 5000+ concurrent users Ingest, Transform, Archive Discover and explore Analyze and Report • Path & pattern • Fast data loading analysis • ELT/ETL • Operational analysis • Image processing • Graph analysis • Transactional analysis • Online archival • Fraud detection • High volume ad-hoc • Text analysis • Elastic data marts Hadoop Aster Teradata Batch Interactive Active 22 Teradata Confidential and Proprietary
  23. 23. Complimentary and Overlapping Use Cases Use cases Use Cases Use Cases • Data preprocessing • Web log analysis • Pattern matching • Image processing • Text processing • Visitor behavior • Search indexes • Genomic, • Graph & relationship • Web crawling Astronomical, , analysis Geo-Spatial, • Investigative scientific analytics BATCH FAST/ PROCESSING INTERACTIVE 23 Teradata Confidential and Proprietary
  24. 24. An Example of an Enterprise Hybrid Architecture Data Business Data Scientists BI Analysts Apps Teradata | Aster Hadoop Multi- Structured Structured Teradata | EDW Data Data • Batch • Weblogs • Financial • Customer Processing • Machine data data addresses, • Data Archival • Customer • SAP, ERP, phones, etc … • Integration with • Data Interaction data • Call center text • Address, financial, Transform- phones, … data operational data ations 24 Teradata Confidential and Proprietary
  25. 25. Connecting Hadoop With Other Systems
  26. 26. 3 Ways to Connect Hadoop to Databases Ad-Hoc Purpose-Built Connectors Hadoop Front-End (Pig/Hive) Batch HDFS Scripts Ease of Use 26 Teradata Confidential and Proprietary
  27. 27. Using Aster Data and Hadoop Together Aster Data for rich, ultra-fast analytics Data Sources Hadoop Aster Database Web data NetFlow data Map Map HDFS Reduce Reduce Connector SQL + SQL/MR Data Source HDFS Log files Text files Diverse Data Sources 1 2 3 4 Non-relational data Hadoop processes Data from HDFS Data used for loaded into Hadoop data transformation loaded into Aster interactive analytics cluster using HDFS connector inside Aster Database 27 Teradata Confidential and Proprietary
  28. 28. The Aster-Hadoop Data Connector Enable users to analyze data where it makes the most sense • Why Is It Needed? Example: - Hadoop can be used batch ETL and batch data processing insert into mytable - Aster for fast, interactive analysis select * - Challenge: slow, tedious manual from operations required to transfer data load_from_hadoop( from Hadoop into Aster Database on mytable host('10.10.3.22') • What Is It? port(9000) - A set of 2 SQL-MapReduce functions delimiter(',') developed by Aster Data nullstring('') • LoadFromHadoop: Parallel data loading from files('hdfs_input_filepaths.txt') HDFS to Aster nCluster • LoadToHadoop: Parallel data loading from Aster ); nCluster to HDFS - Advantages: Parallel performance, Seamless (SQL), Consistency (ACID) 28 Teradata Confidential and Proprietary
  29. 29. MapReduce Enterprise Use Cases
  30. 30. Example #1: SQL-MapReduce for Data Scientist Investigative Analytics Data Scientist Discovery of Bot Detection Algos • Business Goal: • Update bot detection algo’s with new markers of suspect traffic for potential fraud or spam attacks “We’ve always wanted to examine search sub-sessions to really • Aster Data Differentiated Solution: understand what behaviors come • Investigative analysis to identify new attributes that increase from specific searches… the predictive accuracy of bot detection • Correlate data within/across sessions from complex URLs • Use nPath to quickly identify and iteratively explore site All of this requires cursors and activity patterns external programming in Oracle, but can be easily parallelized in • Business Impact : Aster Data even with non- • Site integrity: identify bot traffic which can degrade programmers.” performance and security of www.book.com (B&N) • Improved customer experience: detect and prevent spam Michael Wexler, VP of Analytics, and other automated nuisances to B&N members Barnes & Noble Other Aster Data Applications at Barnes & Noble: • Online marketing attribution – across search, device, features • Customer personalized recommendations - ever-changing 30 Teradata Confidential and Proprietary
  31. 31. Example #2: Enabling Creation of Data-Driven Products / “Cards that fit you” • Personalized recommendations of credit cards that would provide best fit for customer • Uses clickstream analysis + text analysis to process data about customer interests and spending patterns • Business Impact: delivers referral revenue related to click-throughs on specific card offers 31 Teradata Confidential and Proprietary
  32. 32. Example #3: Better Visibility to Marketing Impact “Aster gives us the analytic capability to provide best-in-class digital marketing optimization for our clients, enabling more accurate marketing attribution. With Aster, we can help our clients understand every marketing interaction with consumers over time and across their entire online market ecosystem, knowing the impact of every marketing dollar spent.” Sunil Kavi, Director of Technology Razorfish 32 Teradata Confidential and Proprietary
  33. 33. Visualization Example: Aster Data Tableau Integration with SQL-MapReduce® 33 Teradata Confidential and Proprietary
  34. 34. Summary - MapReduce for the Rest of Us Data Science is Growing Fast but 1 Big Enterprise is not Facebook There is a Gap Between Existing Enterprise 2 Skills and Technology Capabilities To Solve this Problem Look at Utilizing the 3 Right Technology for the Right Problem 34 Teradata Confidential and Proprietary
  35. 35. Thank You! ... Questions? Learn More About SQL-MapReduce • MapReduce Resource Center - www.asterdata.com/mapreduce • Aster Developer Express IDE trial www.asterdata.com/ide • Download white paper at www.asterdata.com See it in action tonight!! – Aster & Tableau Happy Hour Eventi Hotel 851 Avenue of the Americas (6th Avenue) New York, NY 10001 7-9PM 35 Teradata Confidential and Proprietary

×