Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Data lake benefits
Data lake benefits
Chargement dans…3
×

Consultez-les par la suite

1 sur 28 Publicité

The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

Télécharger pour lire hors ligne

Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.

Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.

This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop

Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/

Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.

Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.

This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop

Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Similaire à The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics (20)

Plus par Revolution Analytics (20)

Publicité

Plus récents (20)

The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

  1. 1. © Hortonworks Inc. 2013 Modern Data Architecture …for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Page 1
  2. 2. © Hortonworks Inc. 2013 Your Presenters • David Smith (@revodavid) –VP Marketing and Community at Revolution Analytics –Data Scientist, Blogger and co-author of An Introduction to R • John Kreisa (@marked_man) –VP Strategic Marketing, Hortonworks –Over 20 years in data management as a developer and a marketer –Avid camper Page 2
  3. 3. © Hortonworks Inc. 2013 Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • R’s role in the MDA • Q&A Page 3
  4. 4. © Hortonworks Inc. 2013 Poll #1: What stage are you at looking in Hadoop? •Research •Evaluation •Trial •Haven’t started research Page 4
  5. 5. © Hortonworks Inc. 2013 Existing Data Architecture Page 5 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications
  6. 6. © Hortonworks Inc. 2013 Existing Data Architecture Page 6 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Business Analytics Custom Applications Packaged Applications Source: IDC 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020
  7. 7. © Hortonworks Inc. 2013 - Confidential Modern Data Architecture Enabled Page 7 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications
  8. 8. © Hortonworks Inc. 2013 - Confidential Hadoop Powers Modern Data Architecture Page 8 Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. Hadoop Cluster compute & storage . . . . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
  9. 9. © Hortonworks Inc. 2013 - Confidential Driving Efficiency Driving Opportunity Drivers for Hadoop Adoption Modern Data Architecture Hadoop has a central role in next generation data architectures while integrating with existing data systems Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge Existing Traditional Server log Clickstream Big Data Sets Emerging Sentiment/Social Machine/Sensor Geo-locations
  10. 10. © Hortonworks Inc. 2013 - Confidential Opportunity in types of data 1. Sentiment Understand how your customers feel about your brand and products – right now 2. Clickstream Capture and analyze website visitors’ data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4. Geographic Analyze location-based data to manage operations where they occur 5. Server Logs Research logs to diagnose process failures and prevent security breaches 6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents Value Page 10
  11. 11. © Hortonworks Inc. 2013 - Confidential Efficiency in the Modern Data Architecture Page 11 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) Business Analytics Custom Applications Packaged Applications • Drive efficiency via modern data architecture • Store data once and access it in many ways • Often referred to a data lake or data repository • Infrastructure platform driven • IT-oriented, TCO based
  12. 12. © Hortonworks Inc. 2013 - Confidential Engineered for Interoperability Page 12 APPLICATIONSDATASYSTEMSOURCES RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) HANA BusinessObjects BI OPERATIONAL TOOLS DEV & DATA TOOLS Existing Sources (CRM, ERP, Clickstream, Logs) INFRASTRUCTURE
  13. 13. © Hortonworks Inc. 2013 - Confidential Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Requirements for Hadoop Adoption Page 13 Key Services Platform, operational and data services essential for the enterprise Requirements for Hadoop’s Role in the Modern Data Architecture
  14. 14. © Hortonworks Inc. 2013 - Confidential Revolution R Enterprise Architecture Page 14 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications = Revolution R Enterprise
  15. 15. © Hortonworks Inc. 2013 Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • R’s role in the MDA • Q&A Page 15
  16. 16. © Hortonworks Inc. 2013 Poll #2: Which of the following best describes your use of R and Hadoop? •We have R+ Hadoop in Production •We have testing R+ Hadoop •We have started to investigate but nothing is implemented •No current plans Page 16
  17. 17. Revolution Confidential What is the Open Source R Project?  The R Language:  Object-Oriented Language for Stats, Math and Data Science  Comprehensive data visualization and statistical modeling capabilities  The R Community:  2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and Machine Learning Projects  New graduates with data skills learn R  The R Ecosystem:  5000+ Freely Available Algorithms in CRAN  Specialized methods for finance, economics, genomics, linguistics, and every data-driven domain 17
  18. 18. Revolution Confidential R is open source and drives analytic innovation but has some limitations for Enterprises Bigger data sizes Speed of analysis Production support Memory Bound Big Data Single Threaded Scale out, parallel processing, high speed Community Support Commercial production support Innovation and scale Innovative 5000+ packages Exponential growth Combines with open source R packages where needed
  19. 19. Revolution Confidential Revolution R Enterprise 19 Enterprise-Ready Revolution R Enterprise is the only commercial big data analytics platform based on open source R statistical computing language Cross-Platform Big Data Analytics High Performance Analytics Easier Build & Deploy
  20. 20. Modern Data Architecture Extract and Analyze  Ad-hoc Data Distillation  Exploratory Data Analysis / Data Visualization  Model Development AMBARI MAPREDUCE YARN HDFS REST DATA REFINEMENT HIVEPIG CUSTOM HTTP STREAM LOAD SQOOP FLUME WebHDFS NFS STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles LOAD SQOOP/Hive Web HDFS Data Sources CSV DATABASES INTERACTIVE HIVE Server2 Analytical Tools ANALYTICAL rHadoop
  21. 21. Revolution Confidential The Data Scientist’s Big Data Toolkit 21 Statistical Tests Machine Learning Simulation Descriptive Statistics Data Visualization R Data Step Predictive Models Sampling
  22. 22. Parallel External-Memory Algorithms 22 CPU CPU CPU SMP SERVER
  23. 23. Parallel External-Memory Algorithms 23 HADOOP NODE HADOOP NODE HADOOP NODE HADOOP CLUSTER
  24. 24. Revolution Confidential Modern Data Architecture with RRE7 In-Hadoop Predictive Analytics  Production Data Distillation (e.g. Semantic Analysis)  Production Model Processing / Re-Estimation  Production Model Scoring AMBARI MAPREDUCE YARN HDFS REST DATA REFINEMENT HIVEPIG CUSTOM DISTILLED DATA FILES HTTP STREAM LOAD SQOOP FLUME WebHDFS NFS STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles LOAD SQOOP/Hive Web HDFS Data Sources CSV DATABASES INTERACTIVE HIVE Server2 Analytical Tools ANALYTICAL Revolution R Enterprise
  25. 25. Revolution Confidential Hadoop As An R Engine  Use Revolution R Enterprise PEMAs in Hadoop  No need to change existing R code  Simple R programming  No need to “Think In MapReduce”  Eliminate data movement to slash latencies  Use Hadoop nodes as parallel R computation engines 25 Hadoop
  26. 26. © Hortonworks Inc. 2013 Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Requirements for Hadoop Adoption Page 26 Key Services Platform, operational and data services essential for the enterprise Requirements for Hadoop’s Role in the Modern Data Architecture
  27. 27. © Hortonworks Inc. 2013 Poll #3: Which of the following would you most like to accomplish with R + Hadoop? •Build a model to be put in product in Hadoop •Build a model to be put in product elsewhere •Create new data from Hadoop to supplement an existing analytics process •Something else Page 27
  28. 28. © Hortonworks Inc. 2013 Next Steps: Page 28 More about Revolution Analytics and Hadoop http://www.revolutionanalytics.com/products/r-for- hadoop.php Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/sandbox Follow us: @hortonworks @RevolutionR

Notes de l'éditeur

  • Remember that CRAN is a new term to IT professionals, and anyone who hasn’t learned much about R. Spend some time on it. The acronym stands for: Community R Archive Network – a single repository of R algorithms, test data, evaluations. Use by nearly all R programmers.

×