SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
TACKLING BIG DATA WITH
                     HADOOP
                           David Howell




Sunday, September 11, 11
WHAT IS BIG DATA?




Sunday, September 11, 11
WHAT IS BIG DATA?
                               Google web crawl




Sunday, September 11, 11
WHAT IS BIG DATA?
                             stream of Twitter messages




Sunday, September 11, 11
WHAT IS BIG DATA?
                           Annoying Farmville requests on Facebook




Sunday, September 11, 11
WHAT IS BIG DATA?
                               terabyte-scale data sets
                      awkward to work with using traditional tools




Sunday, September 11, 11
WHAT IS BIG DATA?
                            requires distributed computing




Sunday, September 11, 11
MEDIUM DATA
                                dozens to hundreds of gigabytes
                       still awkward to work with using traditional tools




Sunday, September 11, 11
MAP-REDUCE
                      http://labs.google.com/papers/mapreduce.html




Sunday, September 11, 11
Sunday, September 11, 11
Sunday, September 11, 11
COUNTING AT SCALE




Sunday, September 11, 11
function map_1(t, search_phrase)
                             emit(search_phrase, 1)

                                          sort and shuffle

                           function reduce_1(search_phrase, counts)
                             total = 0
                             for count in counts
                               total += count
                             emit(search_phrase, total)



                           function map_2(search_phrase, total)
                             emit(total, search_phrase)

                                          sort and shuffle

                           function reduce_2(total, search_phrases)
                             for search_phrase in search_phrases
                               emit(search_phrase, total)



Sunday, September 11, 11
map    shuffle     reduce
                      cat IN | sort | uniq -c > OUT



               map            shuffle reduce
    awk ‘{print $2,$1}’ OUT | sort > FINAL



Sunday, September 11, 11
WHY BOTHER?




Sunday, September 11, 11
HADOOP




Sunday, September 11, 11
DISTRIBUTED COMPUTING
                    PLATFORM




Sunday, September 11, 11
TOOLS IN THE PLATFORM

                           Map-Reduce APIs   Higher Level APIs
                           •Java             •Hive
                           •C++              •Cascading
                           •UNIX pipes       •Pig




Sunday, September 11, 11
THE ORIGIN STORY




Sunday, September 11, 11
WHO’S USING IT?




Sunday, September 11, 11
HADOOP
                           How does it work?




Sunday, September 11, 11
Sunday, September 11, 11
Sunday, September 11, 11
Sunday, September 11, 11
Sunday, September 11, 11
DEMO!




Sunday, September 11, 11
YOUR DATA PLATFORM
                               ad hoc
                            unstructured
                            prototyping
                             experiment
                             data-driven
                              curiosity
                                play




Sunday, September 11, 11
LEARN MORE
                            http://hadoop.apache.org/
                           http://www.cloudera.com/
                           Hadoop: The Definitive Guide

                                   @dehowell
                            dave@poorlytrainedape.com
                http://github.com/dehowell/hadoop-crypto-demo




Sunday, September 11, 11

Contenu connexe

Similaire à Tackling Big Data with Hadoop

Rcos presentation
Rcos presentationRcos presentation
Rcos presentationmskmoorthy
 
Ruby + Rails
Ruby + RailsRuby + Rails
Ruby + Railsbetabeers
 
Acceptance & Integration Testing With Behat (PHPNw2011)
Acceptance & Integration Testing With Behat (PHPNw2011)Acceptance & Integration Testing With Behat (PHPNw2011)
Acceptance & Integration Testing With Behat (PHPNw2011)benwaine
 
Optimizing your site for contextual ads: SEO, Design and Content
Optimizing your site for contextual ads: SEO, Design and ContentOptimizing your site for contextual ads: SEO, Design and Content
Optimizing your site for contextual ads: SEO, Design and ContentRaven Tools
 
De vuelta al pasado con SQL y stored procedures
De vuelta al pasado con SQL y stored proceduresDe vuelta al pasado con SQL y stored procedures
De vuelta al pasado con SQL y stored proceduresNorman Clarke
 
soft-shake.ch - Data grids and Data Grids
soft-shake.ch - Data grids and Data Gridssoft-shake.ch - Data grids and Data Grids
soft-shake.ch - Data grids and Data Gridssoft-shake.ch
 
How I Learned To Stop Worrying & Love HTML5
How I Learned To Stop Worrying & Love HTML5How I Learned To Stop Worrying & Love HTML5
How I Learned To Stop Worrying & Love HTML5Dale Cruse
 
Your first rails app - 2
 Your first rails app - 2 Your first rails app - 2
Your first rails app - 2Blazing Cloud
 
Localizing iOS Apps
Localizing iOS AppsLocalizing iOS Apps
Localizing iOS Appsweissazool
 
Coding, Scaling, and Deploys... Oh My!
Coding, Scaling, and Deploys... Oh My!Coding, Scaling, and Deploys... Oh My!
Coding, Scaling, and Deploys... Oh My!Mark Jaquith
 
soft-shake.ch - Data grids and Data Caching
soft-shake.ch - Data grids and Data Cachingsoft-shake.ch - Data grids and Data Caching
soft-shake.ch - Data grids and Data Cachingsoft-shake.ch
 
Active Record Introduction - 3
Active Record Introduction - 3Active Record Introduction - 3
Active Record Introduction - 3Blazing Cloud
 
Fast & Furious: Speed in the Opera browser
Fast & Furious: Speed in the Opera browserFast & Furious: Speed in the Opera browser
Fast & Furious: Speed in the Opera browserAndreas Bovens
 
Breaking the Mobile Web with HTML5
Breaking the Mobile Web with HTML5 Breaking the Mobile Web with HTML5
Breaking the Mobile Web with HTML5 Maximiliano Firtman
 
Modern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureModern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureRaven Tools
 
Orientacao a objetos e design patterns - Secomp Londrina
Orientacao a objetos e design patterns - Secomp LondrinaOrientacao a objetos e design patterns - Secomp Londrina
Orientacao a objetos e design patterns - Secomp LondrinaVinicius Quaiato
 

Similaire à Tackling Big Data with Hadoop (20)

Rcos presentation
Rcos presentationRcos presentation
Rcos presentation
 
Mobile? WT... F?
Mobile? WT... F?Mobile? WT... F?
Mobile? WT... F?
 
Ruby + Rails
Ruby + RailsRuby + Rails
Ruby + Rails
 
Acceptance & Integration Testing With Behat (PHPNw2011)
Acceptance & Integration Testing With Behat (PHPNw2011)Acceptance & Integration Testing With Behat (PHPNw2011)
Acceptance & Integration Testing With Behat (PHPNw2011)
 
Optimizing your site for contextual ads: SEO, Design and Content
Optimizing your site for contextual ads: SEO, Design and ContentOptimizing your site for contextual ads: SEO, Design and Content
Optimizing your site for contextual ads: SEO, Design and Content
 
De vuelta al pasado con SQL y stored procedures
De vuelta al pasado con SQL y stored proceduresDe vuelta al pasado con SQL y stored procedures
De vuelta al pasado con SQL y stored procedures
 
soft-shake.ch - Data grids and Data Grids
soft-shake.ch - Data grids and Data Gridssoft-shake.ch - Data grids and Data Grids
soft-shake.ch - Data grids and Data Grids
 
How I Learned To Stop Worrying & Love HTML5
How I Learned To Stop Worrying & Love HTML5How I Learned To Stop Worrying & Love HTML5
How I Learned To Stop Worrying & Love HTML5
 
Your first rails app - 2
 Your first rails app - 2 Your first rails app - 2
Your first rails app - 2
 
Localizing iOS Apps
Localizing iOS AppsLocalizing iOS Apps
Localizing iOS Apps
 
Coding, Scaling, and Deploys... Oh My!
Coding, Scaling, and Deploys... Oh My!Coding, Scaling, and Deploys... Oh My!
Coding, Scaling, and Deploys... Oh My!
 
Reviving RIM
Reviving RIMReviving RIM
Reviving RIM
 
soft-shake.ch - Data grids and Data Caching
soft-shake.ch - Data grids and Data Cachingsoft-shake.ch - Data grids and Data Caching
soft-shake.ch - Data grids and Data Caching
 
Active Record Introduction - 3
Active Record Introduction - 3Active Record Introduction - 3
Active Record Introduction - 3
 
Fast & Furious: Speed in the Opera browser
Fast & Furious: Speed in the Opera browserFast & Furious: Speed in the Opera browser
Fast & Furious: Speed in the Opera browser
 
Cartoset
CartosetCartoset
Cartoset
 
Breaking the Mobile Web with HTML5
Breaking the Mobile Web with HTML5 Breaking the Mobile Web with HTML5
Breaking the Mobile Web with HTML5
 
Modern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureModern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & Structure
 
Embedjs
EmbedjsEmbedjs
Embedjs
 
Orientacao a objetos e design patterns - Secomp Londrina
Orientacao a objetos e design patterns - Secomp LondrinaOrientacao a objetos e design patterns - Secomp Londrina
Orientacao a objetos e design patterns - Secomp Londrina
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Tackling Big Data with Hadoop