SlideShare une entreprise Scribd logo
1  sur  21
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC.
                         Copyright for all other & referenced work is retained by their respective owners.




Introducing Hadoop
Mastering Hadoop Map-reduce for Data Analysis


Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




What is Hadoop
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




HDFS Architecture
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




Namenode/Datanode, JobTracker/TaskTracker
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                      All other & referenced work is copyrighted to their respective owners




MapReduce
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                      All other & referenced work is copyrighted to their respective owners




ZK Namespace
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




Essential HBase Schema
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




Multi-dimensional View
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                  All other & referenced work is copyrighted to their respective owners




A Map/Hash View

•{


• "row_key_1" : { "name" : {


•     "first_name" : "Jolly", "last_name" : "Goodfellow"


•     } } },


•    "location" : { "zip": "94301" },
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




Architectural View (HBase)
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                       All other & referenced work is copyrighted to their respective owners




The Persistence Mechanism
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




The underlying file format
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                 All other & referenced work is copyrighted to their respective owners




Installing & Setting up Hadoop

• Required software: Java 1.6.x, ssh + sshd


• Download


• Install


• Configure


   • single-node


   • pseudo-distributed


   • cluster
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Download

• Source: http://hadoop.apache.org/


• Version:


   • 0.20.203.x -- current stable


   • 0.20.x -- previous stable


• Includes


   • Hadoop Common -- common utilities, HDFS, MapReduce
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Install

• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz


• Move & Create Symbolic Link


   • ln -s hadoop-0.20.203.0 hadoop


• On Windows


   • http://developer.yahoo.com/hadoop/tutorial/module3.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                               All other & referenced work is copyrighted to their respective owners




Configure -- single-node

• Edit: conf/hadoop-env.sh


  • Set JAVA_HOME


• Default configuration is single-node


• Start bin/hadoop (for command options)


• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/
  single_node_setup.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Configure -- pseduo-distributed

• Edit: conf/core-site.xml (configure HDFS daemon)


• Edit: conf/hdfs-site.xml (configure HDFS replication factor)


• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)


• Enable ssh to localhost (without passphrase)


• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/
  single_node_setup.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                 All other & referenced work is copyrighted to their respective owners




Start Hadoop
• Format HDFS: bin/hadoop namenode -format


• Start all daemons: bin/start-all.sh


• Verify logs


• Browse the web interface:


   • Namenode: http://localhost:50070/


   • JobTracker: http://localhost:50030/
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Take Hadoop for a test-drive
• Run examples (hadoop-examples-0.20.203.0.jar)


• Grep using regular expressions


  • Copy files to HDFS: bin/hadoop fs -put bin input


  • Grep for files which have text beginning with ‘start’


  • Verify output on HDFS: bin/hadoop fs -cat output/*


  • Copy output to local filesystem & verify: bin/hadoop fs -get output output
    && cat output/*
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                               All other & referenced work is copyrighted to their respective owners




Configure -- cluster
• References:


• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html
  (official documentation)


• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a
  Hadoop Cluster. Source: YDN)


• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Questions?




• blog: shanky.org | twitter: @tshanky


• st@treasuryofideas.com

Contenu connexe

Tendances

Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache PigSachin Vakkund
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easyVictor Sanchez Anguix
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystemtfmailru
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov
 
Asset Pipeline
Asset PipelineAsset Pipeline
Asset PipelineEric Berry
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayDataWorks Summit
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersYash Sharma
 
API Design Antipatterns - APICon SF
API Design Antipatterns - APICon SFAPI Design Antipatterns - APICon SF
API Design Antipatterns - APICon SFManish Pandit
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
Puppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet EnterprisePuppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet EnterprisecbowlesUT
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2Rafał Kuć
 

Tendances (18)

Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Asset Pipeline
Asset PipelineAsset Pipeline
Asset Pipeline
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 
API Design Antipatterns - APICon SF
API Design Antipatterns - APICon SFAPI Design Antipatterns - APICon SF
API Design Antipatterns - APICon SF
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
API Design
API DesignAPI Design
API Design
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Puppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet EnterprisePuppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet Enterprise
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2
 

En vedette

รับสมัครครูนาฏศิลป์
รับสมัครครูนาฏศิลป์รับสมัครครูนาฏศิลป์
รับสมัครครูนาฏศิลป์Orapan Chamnan
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopKorea Sdec
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudKorea Sdec
 
SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerKorea Sdec
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionKorea Sdec
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of PigKorea Sdec
 

En vedette (6)

รับสมัครครูนาฏศิลป์
รับสมัครครูนาฏศิลป์รับสมัครครูนาฏศิลป์
รับสมัครครูนาฏศิลป์
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing Hadoop
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloud
 
SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuer
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestion
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of Pig
 

Similaire à Sdec2011 Introducing Hadoop

Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api CompatibilityCloudera, Inc.
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single clusterSalil Navgire
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Hortonworks
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Clark Everetts
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 

Similaire à Sdec2011 Introducing Hadoop (20)

Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Presentation
PresentationPresentation
Presentation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 

Plus de Korea Sdec

SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingKorea Sdec
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveKorea Sdec
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 RapidantKorea Sdec
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACCKorea Sdec
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesKorea Sdec
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedKorea Sdec
 

Plus de Korea Sdec (8)

SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 Rapidant
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & Experiences
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 

Dernier

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Dernier (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Sdec2011 Introducing Hadoop

  • 1. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC. Copyright for all other & referenced work is retained by their respective owners. Introducing Hadoop Mastering Hadoop Map-reduce for Data Analysis Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
  • 2. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners What is Hadoop
  • 3. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners HDFS Architecture
  • 4. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Namenode/Datanode, JobTracker/TaskTracker
  • 5. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners MapReduce
  • 6. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners ZK Namespace
  • 7. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Essential HBase Schema
  • 8. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Multi-dimensional View
  • 9. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners A Map/Hash View •{ • "row_key_1" : { "name" : { • "first_name" : "Jolly", "last_name" : "Goodfellow" • } } }, • "location" : { "zip": "94301" },
  • 10. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Architectural View (HBase)
  • 11. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners The Persistence Mechanism
  • 12. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners The underlying file format
  • 13. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Installing & Setting up Hadoop • Required software: Java 1.6.x, ssh + sshd • Download • Install • Configure • single-node • pseudo-distributed • cluster
  • 14. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Download • Source: http://hadoop.apache.org/ • Version: • 0.20.203.x -- current stable • 0.20.x -- previous stable • Includes • Hadoop Common -- common utilities, HDFS, MapReduce
  • 15. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Install • Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz • Move & Create Symbolic Link • ln -s hadoop-0.20.203.0 hadoop • On Windows • http://developer.yahoo.com/hadoop/tutorial/module3.html
  • 16. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- single-node • Edit: conf/hadoop-env.sh • Set JAVA_HOME • Default configuration is single-node • Start bin/hadoop (for command options) • Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 17. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- pseduo-distributed • Edit: conf/core-site.xml (configure HDFS daemon) • Edit: conf/hdfs-site.xml (configure HDFS replication factor) • Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon) • Enable ssh to localhost (without passphrase) • Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 18. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Start Hadoop • Format HDFS: bin/hadoop namenode -format • Start all daemons: bin/start-all.sh • Verify logs • Browse the web interface: • Namenode: http://localhost:50070/ • JobTracker: http://localhost:50030/
  • 19. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Take Hadoop for a test-drive • Run examples (hadoop-examples-0.20.203.0.jar) • Grep using regular expressions • Copy files to HDFS: bin/hadoop fs -put bin input • Grep for files which have text beginning with ‘start’ • Verify output on HDFS: bin/hadoop fs -cat output/* • Copy output to local filesystem & verify: bin/hadoop fs -get output output && cat output/*
  • 20. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- cluster • References: • http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html (official documentation) • http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a Hadoop Cluster. Source: YDN) • http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
  • 21. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Questions? • blog: shanky.org | twitter: @tshanky • st@treasuryofideas.com