SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
A Hadoop Primer
Feb 2011
10.20.2005
http://redmonk.com/public/hadoop.pdf




                 2
The Background


      3
October, 2003




4
December, 2004




5
Map::Reduce




     6
Job::Map       Reduce::Output




           7
Counting Shakespeare




         8
The Birth of Hadoop


         9
10
11
Project Architecture




       Source: Running Hadoop On Ubuntu Linux, Michael G. Noll, 8.8.07




              12
Project Traction




       13
Employment Potential




         14
Hadoop Users




     15
Why Hadoop?


     16
More Machines = More Faster




             17
The reason everyone knows


            18
BIG DATA

   19
“The big issue is not that everyone will
suddenly operate at petabyte scale; a lot of
folks do not have that much data.

The more important topics are the specifics
of the storage and processing infrastructure
and what approaches best suit each
problem.”
         - Bradford Cross, Flightcaster/Woven


                     20
The reason not everyone
        knows


           21
ru            d
U   st      tu   e       Data
  n            r
          c



              22
What Hadoop Is


      23
“build Amazon's product search indices”
“build the recommender system for behavioral targeting”
“ETL style processing and statistics generation”
“information extraction & search”
“searching and analysis of millions of rental bookings”
“we use Hadoop to summarize of user's tracking data”
“we use Hadoop to store ad serving logs”
“the freedom to query the data in an ad-hoc manner”
“generating web graphs on 100 nodes”
“we use Hadoop for batch-processing large RDF datasets”
“facial similarity and recognition across large datasets“
“We are using Hadoop and Nutch to crawl Blog posts”
“Used for ETL & data analysis on terascale datasets”
                                       Source: http://wiki.apache.org/hadoop/PoweredBy

                           24
What Hadoop Isn't


        25
A relational database killer
   No                Yes




             26
Beyond Hadoop


      27
The Hadoop Ecosystem




         28
What We Use Hadoop For


          29
Crawling Largeish
Unstructured Datasets



          30
Like 1.3M StackOverflow Questions




               31
Or 1.7M HackerNews Entries




            32
Or Years of Apache Log Files




            33
How to Get Started


        34
We use Cloudera




      35
Mostly because it's easy




          36
This easy




   37
Or if you prefer




      38
Or maybe this




     39
QUESTIONS

    40
Student? Talk to us


         41

Contenu connexe

Tendances

Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
yhadoop
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big Data
Sri Ambati
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 

Tendances (20)

Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoop
 
Apache Con Eu2008 Hadoop Tour Tom White
Apache Con Eu2008 Hadoop Tour Tom WhiteApache Con Eu2008 Hadoop Tour Tom White
Apache Con Eu2008 Hadoop Tour Tom White
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Hunk - Unlocking the Power of Big Data
Hunk - Unlocking the Power of Big DataHunk - Unlocking the Power of Big Data
Hunk - Unlocking the Power of Big Data
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big Data
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Big data PPT
Big data PPT Big data PPT
Big data PPT
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
 
Big data references
Big data referencesBig data references
Big data references
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Small intro to Big Data - Old version
Small intro to Big Data - Old versionSmall intro to Big Data - Old version
Small intro to Big Data - Old version
 
Winning With Big Data: Secrets of the Successful Data Scientist
Winning With Big Data:  Secrets of the Successful Data ScientistWinning With Big Data:  Secrets of the Successful Data Scientist
Winning With Big Data: Secrets of the Successful Data Scientist
 
Winning with Big Data: Secrets of the Successful Data Scientist
Winning with Big Data: Secrets of the Successful Data ScientistWinning with Big Data: Secrets of the Successful Data Scientist
Winning with Big Data: Secrets of the Successful Data Scientist
 

Similaire à A Hadoop Primer

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 

Similaire à A Hadoop Primer (20)

Hadoop
HadoopHadoop
Hadoop
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
Hadoop technology doc
Hadoop technology docHadoop technology doc
Hadoop technology doc
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 

Plus de sogrady

The New Kingmakers
The New KingmakersThe New Kingmakers
The New Kingmakers
sogrady
 
What Java Can Learn From JavaScript
What Java Can Learn From JavaScriptWhat Java Can Learn From JavaScript
What Java Can Learn From JavaScript
sogrady
 
Begun, the IP Wars Have
Begun, the IP Wars HaveBegun, the IP Wars Have
Begun, the IP Wars Have
sogrady
 
RedMonk Analytics: Why, How and What
RedMonk Analytics: Why, How and WhatRedMonk Analytics: Why, How and What
RedMonk Analytics: Why, How and What
sogrady
 
All Data Big and Small
All Data Big and SmallAll Data Big and Small
All Data Big and Small
sogrady
 
Open Source + Big Data = Big Money
Open Source + Big Data = Big Money Open Source + Big Data = Big Money
Open Source + Big Data = Big Money
sogrady
 

Plus de sogrady (20)

What Will You Build, and Why?
What Will You Build, and Why?What Will You Build, and Why?
What Will You Build, and Why?
 
The Open Source Forecast is Cloudy
The Open Source Forecast is CloudyThe Open Source Forecast is Cloudy
The Open Source Forecast is Cloudy
 
Innovate / Disrupt
Innovate / DisruptInnovate / Disrupt
Innovate / Disrupt
 
Freedom: For Better and For Worse
Freedom: For Better and For WorseFreedom: For Better and For Worse
Freedom: For Better and For Worse
 
The Cloud and the New Kingmakers
The Cloud and the New KingmakersThe Cloud and the New Kingmakers
The Cloud and the New Kingmakers
 
What a Long Strange Trip It's Been
What a Long Strange Trip It's BeenWhat a Long Strange Trip It's Been
What a Long Strange Trip It's Been
 
The Rise and Fall and Rise of Java (2013)
The Rise and Fall and Rise of Java (2013)The Rise and Fall and Rise of Java (2013)
The Rise and Fall and Rise of Java (2013)
 
The New Kingmakers
The New KingmakersThe New Kingmakers
The New Kingmakers
 
What Java Can Learn From JavaScript
What Java Can Learn From JavaScriptWhat Java Can Learn From JavaScript
What Java Can Learn From JavaScript
 
Open Cloud & The Future of Cloud Computing
Open Cloud &  The Future of Cloud Computing Open Cloud &  The Future of Cloud Computing
Open Cloud & The Future of Cloud Computing
 
Begun, the IP Wars Have
Begun, the IP Wars HaveBegun, the IP Wars Have
Begun, the IP Wars Have
 
Java in the Age of the JVM
Java in the Age of the JVMJava in the Age of the JVM
Java in the Age of the JVM
 
RedMonk Analytics: Why, How and What
RedMonk Analytics: Why, How and WhatRedMonk Analytics: Why, How and What
RedMonk Analytics: Why, How and What
 
The Future of the Cloud is Open
The Future of the Cloud is OpenThe Future of the Cloud is Open
The Future of the Cloud is Open
 
Showcase Your Data w/ RedMonk Analytics
Showcase Your Data w/ RedMonk AnalyticsShowcase Your Data w/ RedMonk Analytics
Showcase Your Data w/ RedMonk Analytics
 
Snapshot: Developer Activity
Snapshot: Developer ActivitySnapshot: Developer Activity
Snapshot: Developer Activity
 
Survival of the Forges
Survival of the ForgesSurvival of the Forges
Survival of the Forges
 
All Data Big and Small
All Data Big and SmallAll Data Big and Small
All Data Big and Small
 
Open Source + Big Data = Big Money
Open Source + Big Data = Big Money Open Source + Big Data = Big Money
Open Source + Big Data = Big Money
 
Open Source + Big Data = Big Money
Open Source + Big Data = Big Money Open Source + Big Data = Big Money
Open Source + Big Data = Big Money
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 

A Hadoop Primer