Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud

•

7 j'aime•993 vues

Cloudera, Inc.

Mike Olson's talk on Hadoop Data Analytics at the O'Reilly Open Source Convention

Technologie Formation

Hadoop
Data Analytics in the Cloud

Mike Olson
Chief Executive Ofﬁcer

Friday, July 17, 2009

Hadoop History

▪ Doug Cutting worked on Nutch (web-scale crawler-based
search), 2002-2004
▪ Google published MapReduce paper in 2004
▪ Cutting adds DFS & MapReduce support to Nutch
▪ Joined by Mike Cafarella
▪ 2006: Yahoo! hires Cutting, Hadoop spins out of Nutch
▪ Web-scale deployments in 2007, 2008 at Y!, Facebook, others
▪ Today: 22 committers to core project
▪ Related projects: HBase, Hive, Pig, Mahout, Hama and others

Friday, July 17, 2009

Why Hadoop?

▪ Large web properties invented MapReduce for large-scale,
reliable, inexpensive analytics
▪ Enterprises generally need these techniques
▪ Retail, ﬁnancial services, oil and gas, health care, green
technologies and more
▪ Hardware trends driving toward long-term retention of valuable
source data
▪ New analytical tools are required
▪ Hadoop complements current-generation data warehousing and
analytical products

Friday, July 17, 2009

Where Does Data Come From?
Many Sources Provide Deeper Insight

Friday, July 17, 2009

Where Does Data Come From?
Many Sources Provide Deeper Insight

▪ Simulations and Scientiﬁc/Experimental Data
▪ genome sequencing, medical imaging, wireless sensors

Friday, July 17, 2009

Hadoop Technical Overview: HDFS
Storing Data: Distributed Over Many Machines

HDFS: Hadoop Distributed File System

Friday, July 17, 2009

Hadoop Technical Overview: HDFS
Storing Data: Distributed Over Many Machines

Commodity Servers

HDFS: Hadoop Distributed File System

Friday, July 17, 2009

Hadoop Technical Overview: HDFS
Storing Data: Distributed Over Many Machines

Commodity Servers

Files are broken into blocks and distributed across all
servers. Replication protects data from hardware failure.

HDFS: Hadoop Distributed File System

Friday, July 17, 2009

Hadoop Technical Overview: MapReduce
Processing Data: Leveraging Data Locality

MapReduce

Friday, July 17, 2009

Hadoop Technical Overview: MapReduce
Processing Data: Leveraging Data Locality

Data elements processed locally, in parallel
Reliable computation implicitly managed by Hadoop

MapReduce

Friday, July 17, 2009

Hadoop Technical Overview: Reliability
Fault Tolerance: Handled with Software

Software Fault Tolerance

Friday, July 17, 2009

Hadoop Technical Overview: Reliability
Fault Tolerance: Handled with Software

Data loss prevented through automatic replication and rebalancing
Computation is restarted automatically without user intervention

Software Fault Tolerance

Friday, July 17, 2009

Cloud Deployment Options for Hadoop
▪ In your data center
• Acquire, provision, administer servers
• Choose a virtualization infrastructure?
▪ On dedicated, hosted services
• Scale up or down by coordinating with your MSP
• On dynamic web services (AWS and others)
• Spin up, use, shut down a cluster

• Issues:

• Data persistence and location, organizational control

Friday, July 17, 2009

(c) 1009 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved.

Friday, July 17, 2009

Contenu connexe

Similaire à Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud

Semantic web meetup 14.november 2013Jean-Pierre König

Apache hadoop bigdata-in-bankingm_hepburn

Hadoop_Its_Not_Just_Internal_Storage_V14John Sing

HadoopHimanshu Soni

201305 hadoop jpl-v3Eric Baldeschwieler

Data Evolution in HBaseHBaseCon

EMC config Hadoopsolarisyougood

Intro To HadoopBill Graham

Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson

Hadoop as data refinerySteve Loughran

Hadoop as Data Refinery - Steve LoughranJAX London

Hadoop @ Sara & BiG GridEvert Lammerts

First NL-HUG: Large-scale data processing at SARA with Apache HadoopEvert Lammerts

Introduction To Big Data & HadoopBlackvard

Presentation architecting virtualized infrastructure for big datasolarisyourep

Presentation architecting virtualized infrastructure for big dataxKinAnx

Introduction to HadoopPOSSCON

20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.

10 Common Hadoop-able Problems WebinarCloudera, Inc.

Similaire à Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud (20)

Semantic web meetup 14.november 2013

Apache hadoop bigdata-in-banking

Hadoop_Its_Not_Just_Internal_Storage_V14

Hadoop

201305 hadoop jpl-v3

Data Evolution in HBase

EMC config Hadoop

Intro To Hadoop

Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014

Hadoop as data refinery

Hadoop as Data Refinery - Steve Loughran

Hadoop @ Sara & BiG Grid

First NL-HUG: Large-scale data processing at SARA with Apache Hadoop

Introduction To Big Data & Hadoop

Presentation architecting virtualized infrastructure for big data

Introduction to Hadoop

20100806 cloudera 10 hadoopable problems webinar

10 Common Hadoop-able Problems Webinar

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.

2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.

Extending Cloudera SDX beyond the PlatformCloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx

Cloudera Data Impact Awards 2021 - Finalists

2020 Cloudera Data Impact Awards Finalists

Edc event vienna presentation 1 oct 2019

Machine Learning with Limited Labeled Data 4/3/19

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Introducing Cloudera DataFlow (CDF) 2.13.19

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19

Leveraging the cloud for analytics and machine learning 1.29.19

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Leveraging the Cloud for Big Data Analytics 12.11.18

Modern Data Warehouse Fundamentals Part 3

Modern Data Warehouse Fundamentals Part 2

Modern Data Warehouse Fundamentals Part 1

Extending Cloudera SDX beyond the Platform

Federated Learning: ML with Privacy on the Edge 11.15.18

Analyst Webinar: Doing a 180 on Customer 360

Build a modern platform for anti-money laundering 9.19.18

Introducing the data science sandbox as a service 8.30.18

Dernier

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

How to write a Business Continuity PlanDatabarracks

"ML in Production",Oleksandr BaganFwdays

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Dernier (20)

Connect Wave/ connectwave Pitch Deck Presentation

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

How to write a Business Continuity Plan

"ML in Production",Oleksandr Bagan

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

The Ultimate Guide to Choosing WordPress Pros and Cons

DSPy a system for AI to Write Prompts and Do Fine Tuning

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

TeamStation AI System Report LATAM IT Salaries 2024

Anypoint Exchange: It’s Not Just a Repo!

What's New in Teams Calling, Meetings and Devices March 2024

Artificial intelligence in cctv survelliance.pptx

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Streamlining Python Development: A Guide to a Modern Project Setup

Vertex AI Gemini Prompt Engineering Tips

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

SAP Build Work Zone - Overview L2-L3.pptx

Advanced Test Driven-Development @ php[tek] 2024

Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud

1. Hadoop Data Analytics in the Cloud Mike Olson Chief Executive Ofﬁcer Friday, July 17, 2009

2. Hadoop History ▪ Doug Cutting worked on Nutch (web-scale crawler-based search), 2002-2004 ▪ Google published MapReduce paper in 2004 ▪ Cutting adds DFS & MapReduce support to Nutch ▪ Joined by Mike Cafarella ▪ 2006: Yahoo! hires Cutting, Hadoop spins out of Nutch ▪ Web-scale deployments in 2007, 2008 at Y!, Facebook, others ▪ Today: 22 committers to core project ▪ Related projects: HBase, Hive, Pig, Mahout, Hama and others Friday, July 17, 2009

3. Why Hadoop? ▪ Large web properties invented MapReduce for large-scale, reliable, inexpensive analytics ▪ Enterprises generally need these techniques ▪ Retail, ﬁnancial services, oil and gas, health care, green technologies and more ▪ Hardware trends driving toward long-term retention of valuable source data ▪ New analytical tools are required ▪ Hadoop complements current-generation data warehousing and analytical products Friday, July 17, 2009

4. Where Does Data Come From? Many Sources Provide Deeper Insight Friday, July 17, 2009

5. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientiﬁc/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors Friday, July 17, 2009

6. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientiﬁc/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors ▪ Existing Databases ▪ product catalogs, historical sales data, transaction histories Friday, July 17, 2009

7. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientiﬁc/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors ▪ Existing Databases ▪ product catalogs, historical sales data, transaction histories ▪ User Data ▪ web logs, clicks on website, pictures, videos, bbs, etc Friday, July 17, 2009

8. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientiﬁc/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors ▪ Existing Databases ▪ product catalogs, historical sales data, transaction histories ▪ User Data ▪ web logs, clicks on website, pictures, videos, bbs, etc ▪ System Generated Data ▪ 1000’s of systems reporting status every second Friday, July 17, 2009

9. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientiﬁc/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors ▪ Existing Databases ▪ product catalogs, historical sales data, transaction histories ▪ User Data ▪ web logs, clicks on website, pictures, videos, bbs, etc ▪ System Generated Data ▪ 1000’s of systems reporting status every second ▪ Data Comes in All Shapes, Sizes, Schemas and Structures ▪ Hadoop combines many sources regardless of format and structure Friday, July 17, 2009

10. Hadoop Technical Overview: HDFS Storing Data: Distributed Over Many Machines HDFS: Hadoop Distributed File System Friday, July 17, 2009

11. Hadoop Technical Overview: HDFS Storing Data: Distributed Over Many Machines HDFS: Hadoop Distributed File System Friday, July 17, 2009

12. Hadoop Technical Overview: HDFS Storing Data: Distributed Over Many Machines Commodity Servers HDFS: Hadoop Distributed File System Friday, July 17, 2009

13. Hadoop Technical Overview: HDFS Storing Data: Distributed Over Many Machines Commodity Servers Files are broken into blocks and distributed across all servers. Replication protects data from hardware failure. HDFS: Hadoop Distributed File System Friday, July 17, 2009

14. Hadoop Technical Overview: MapReduce Processing Data: Leveraging Data Locality MapReduce Friday, July 17, 2009

15. Hadoop Technical Overview: MapReduce Processing Data: Leveraging Data Locality MapReduce Friday, July 17, 2009

16. Hadoop Technical Overview: MapReduce Processing Data: Leveraging Data Locality MapReduce Friday, July 17, 2009

17. Hadoop Technical Overview: MapReduce Processing Data: Leveraging Data Locality Data elements processed locally, in parallel Reliable computation implicitly managed by Hadoop MapReduce Friday, July 17, 2009

18. Hadoop Technical Overview: Reliability Fault Tolerance: Handled with Software Software Fault Tolerance Friday, July 17, 2009

19. Hadoop Technical Overview: Reliability Fault Tolerance: Handled with Software Software Fault Tolerance Friday, July 17, 2009

20. Hadoop Technical Overview: Reliability Fault Tolerance: Handled with Software Data loss prevented through automatic replication and rebalancing Computation is restarted automatically without user intervention Software Fault Tolerance Friday, July 17, 2009

21. Cloud Deployment Options for Hadoop ▪ In your data center • Acquire, provision, administer servers • Choose a virtualization infrastructure? ▪ On dedicated, hosted services • Scale up or down by coordinating with your MSP • On dynamic web services (AWS and others) • Spin up, use, shut down a cluster • Issues: • Data persistence and location, organizational control Friday, July 17, 2009

Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud

Recommandé

Recommandé

Contenu connexe

Similaire à Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud

Similaire à Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud (20)

Plus de Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Dernier

Dernier (20)

Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud