SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Introduction of Apache
Hadoop
Presenter: Prem Chand Mali, Mindfire Solutions
Date: 30/01/2014
About Me
SCJP/OCJP - Oracle Certified Java Programmer
MCP:70-480 - Specialist certification in HTML5
with JavaScript and CSS3 Exam
Skills : Java, Swings, Springs,
Hibernate, JavaFX, Jquery,
prototypeJS, ExtJS.
Connect Me :
https://www.facebook.com/prem.c.mali
http://www.linkedin.com/in/premmali
https://twitter.com/prem_mali
https://plus.google.com/106150245941317924019/about/p/pub
Contact Me :
premchandm@mindfiresolutions.com / prem.c.mali@gmail.com
mfsi_premchandm
Presenter: Prem Chand Mali, Mindfire Solutions
Agenda
History
What is Apache Hadoop
Why Apache Hadoop
HDFS
MapReduce
Q&A

Presenter: Prem Chand Mali, Mindfire Solutions
History
• Nutch Crawler based search
• GFS and Map Reduce paper published.
• Yahoo! hired Doug Cutting and given dedicated team.

Presenter: Prem Chand Mali, Mindfire Solutions
What is Apache Hadoop ?
• Apache Hadoop is an open-source software framework that supports dataintensive distributed applications licensed under the Apache v2 license. It supports
running applications on large clusters of commodity hardware.
• Hadoop are designed with a fundamental assumption that hardware failures (of
individual machines, or racks of machines) are common and thus should be
automatically handled in software by the framework.
• Apache Hadoop's MapReduce and HDFS components originally derived
respectively from Google's MapReduce and Google File System (GFS) papers.

Presenter: Prem Chand Mali, Mindfire Solutions
What is Apache Hadoop ?
• The Apache Hadoop framework is composed of the following modules :
– Hadoop Distributed File System (HDFS) - a distributed file-system that stores
data on the commodity machines, providing very high aggregate bandwidth
across the cluster.
– Hadoop MapReduce - a programming model for large scale data processing.
– Hadoop Common - contains libraries and utilities needed by other Hadoop
modules
– Hadoop YARN - a resource-management platform responsible for managing
compute resources in clusters and using them for scheduling of users'
applications.

Presenter: Prem Chand Mali, Mindfire Solutions
Why Apache Hadoop ?
• State of Data
– 90% of data in past three years.
– Type of data
• Unstructured
• Semi-structured
• Relational
– Relation world can handle GB of data.
• Distributed
• Scalable
• Flexible
• Fault tolerant
• Intelligent

Presenter: Prem Chand Mali, Mindfire Solutions
HDFS
• HDFS is the primary distributed storage used by Hadoop applications. It consist of
following two type of components.
– NameNode
– DataNode
• HDFS, is well suited for distributed storage and distributed processing using
commodity hardware.
• Hadoop supports shell-like commands to interact with HDFS directly.

Presenter: Prem Chand Mali, Mindfire Solutions
HDFS

Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce
• MapReduce if combination of following three things.
– Map
– Shuffle
– Reduce
• It done it's job through Job Tracker and Task Tracker

Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions
MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions
Question and
Answer

Presenter: Prem Chand Mali, Mindfire Solutions
Thank you

Presenter: Prem Chand Mali, Mindfire Solutions
www.mindfiresolutions.com
https://www.facebook.com/MindfireSolutions
http://www.linkedin.com/company/mindfire-solutions
http://twitter.com/mindfires

Presenter: Prem Chand Mali, Mindfire Solutions

Contenu connexe

Tendances (17)

Big data analytics training
Big data analytics trainingBig data analytics training
Big data analytics training
 
Cloudera hadoop developer training
Cloudera hadoop developer trainingCloudera hadoop developer training
Cloudera hadoop developer training
 
Cloudera hadoop developer training
Cloudera hadoop developer trainingCloudera hadoop developer training
Cloudera hadoop developer training
 
Spark in the BigData dark
Spark in the BigData darkSpark in the BigData dark
Spark in the BigData dark
 
SparkSpark in the Big Data dark by Sergey Levandovskiy
SparkSpark in the Big Data dark by Sergey Levandovskiy  SparkSpark in the Big Data dark by Sergey Levandovskiy
SparkSpark in the Big Data dark by Sergey Levandovskiy
 
Hadoop cassandra training
Hadoop cassandra trainingHadoop cassandra training
Hadoop cassandra training
 
Hadoop_RealTime_Processing_eVenkat
Hadoop_RealTime_Processing_eVenkatHadoop_RealTime_Processing_eVenkat
Hadoop_RealTime_Processing_eVenkat
 
Hadoop online training usa
Hadoop online training usaHadoop online training usa
Hadoop online training usa
 
Hadppo training
Hadppo trainingHadppo training
Hadppo training
 
Bigdata slide
Bigdata slideBigdata slide
Bigdata slide
 
Cloudera administrator training
Cloudera administrator trainingCloudera administrator training
Cloudera administrator training
 
Hadoop training and certification
Hadoop training and certificationHadoop training and certification
Hadoop training and certification
 
Big data and hadoop training - Session 2
Big data and hadoop training  - Session 2Big data and hadoop training  - Session 2
Big data and hadoop training - Session 2
 
Hadoop big data online training
Hadoop big data online trainingHadoop big data online training
Hadoop big data online training
 
Hadoop training and certification
Hadoop training and certificationHadoop training and certification
Hadoop training and certification
 
Big data developer training
Big data developer trainingBig data developer training
Big data developer training
 
Hadoop online training usa
Hadoop online training usaHadoop online training usa
Hadoop online training usa
 

En vedette (6)

YSlow For QA
YSlow For QAYSlow For QA
YSlow For QA
 
Getting Started with Apache Jmeter
Getting Started with Apache JmeterGetting Started with Apache Jmeter
Getting Started with Apache Jmeter
 
Filemaker design concept_handout
Filemaker design concept_handoutFilemaker design concept_handout
Filemaker design concept_handout
 
File_Organization_112014
File_Organization_112014File_Organization_112014
File_Organization_112014
 
Ruby Metaprogramming
Ruby MetaprogrammingRuby Metaprogramming
Ruby Metaprogramming
 
Filemaker design concept
Filemaker design conceptFilemaker design concept
Filemaker design concept
 

Similaire à An Introduction to Apache Hadoop

Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
lccinfotech
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
Cloudera, Inc.
 

Similaire à An Introduction to Apache Hadoop (20)

The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoop
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Hadoop 2.0-development
Hadoop 2.0-developmentHadoop 2.0-development
Hadoop 2.0-development
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placement
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placement
 
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesmyHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
 
First cadd big data-hadoop course
First cadd big data-hadoop courseFirst cadd big data-hadoop course
First cadd big data-hadoop course
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
MahoutNew
MahoutNewMahoutNew
MahoutNew
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 

Plus de Mindfire Solutions

Plus de Mindfire Solutions (20)

Physician Search and Review
Physician Search and ReviewPhysician Search and Review
Physician Search and Review
 
diet management app
diet management appdiet management app
diet management app
 
Business Technology Solution
Business Technology SolutionBusiness Technology Solution
Business Technology Solution
 
Remote Health Monitoring
Remote Health MonitoringRemote Health Monitoring
Remote Health Monitoring
 
Influencer Marketing Solution
Influencer Marketing SolutionInfluencer Marketing Solution
Influencer Marketing Solution
 
ELMAH
ELMAHELMAH
ELMAH
 
High Availability of Azure Applications
High Availability of Azure ApplicationsHigh Availability of Azure Applications
High Availability of Azure Applications
 
IOT Hands On
IOT Hands OnIOT Hands On
IOT Hands On
 
Glimpse of Loops Vs Set
Glimpse of Loops Vs SetGlimpse of Loops Vs Set
Glimpse of Loops Vs Set
 
Oracle Sql Developer-Getting Started
Oracle Sql Developer-Getting StartedOracle Sql Developer-Getting Started
Oracle Sql Developer-Getting Started
 
Adaptive Layout In iOS 8
Adaptive Layout In iOS 8Adaptive Layout In iOS 8
Adaptive Layout In iOS 8
 
Introduction to Auto-layout : iOS/Mac
Introduction to Auto-layout : iOS/MacIntroduction to Auto-layout : iOS/Mac
Introduction to Auto-layout : iOS/Mac
 
LINQPad - utility Tool
LINQPad - utility ToolLINQPad - utility Tool
LINQPad - utility Tool
 
Get started with watch kit development
Get started with watch kit developmentGet started with watch kit development
Get started with watch kit development
 
Swift vs Objective-C
Swift vs Objective-CSwift vs Objective-C
Swift vs Objective-C
 
Material Design in Android
Material Design in AndroidMaterial Design in Android
Material Design in Android
 
Introduction to OData
Introduction to ODataIntroduction to OData
Introduction to OData
 
Ext js Part 2- MVC
Ext js Part 2- MVCExt js Part 2- MVC
Ext js Part 2- MVC
 
ExtJs Basic Part-1
ExtJs Basic Part-1ExtJs Basic Part-1
ExtJs Basic Part-1
 
Spring Security Introduction
Spring Security IntroductionSpring Security Introduction
Spring Security Introduction
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

An Introduction to Apache Hadoop

  • 1. Introduction of Apache Hadoop Presenter: Prem Chand Mali, Mindfire Solutions Date: 30/01/2014
  • 2. About Me SCJP/OCJP - Oracle Certified Java Programmer MCP:70-480 - Specialist certification in HTML5 with JavaScript and CSS3 Exam Skills : Java, Swings, Springs, Hibernate, JavaFX, Jquery, prototypeJS, ExtJS. Connect Me : https://www.facebook.com/prem.c.mali http://www.linkedin.com/in/premmali https://twitter.com/prem_mali https://plus.google.com/106150245941317924019/about/p/pub Contact Me : premchandm@mindfiresolutions.com / prem.c.mali@gmail.com mfsi_premchandm Presenter: Prem Chand Mali, Mindfire Solutions
  • 3. Agenda History What is Apache Hadoop Why Apache Hadoop HDFS MapReduce Q&A Presenter: Prem Chand Mali, Mindfire Solutions
  • 4. History • Nutch Crawler based search • GFS and Map Reduce paper published. • Yahoo! hired Doug Cutting and given dedicated team. Presenter: Prem Chand Mali, Mindfire Solutions
  • 5. What is Apache Hadoop ? • Apache Hadoop is an open-source software framework that supports dataintensive distributed applications licensed under the Apache v2 license. It supports running applications on large clusters of commodity hardware. • Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. • Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. Presenter: Prem Chand Mali, Mindfire Solutions
  • 6. What is Apache Hadoop ? • The Apache Hadoop framework is composed of the following modules : – Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster. – Hadoop MapReduce - a programming model for large scale data processing. – Hadoop Common - contains libraries and utilities needed by other Hadoop modules – Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Presenter: Prem Chand Mali, Mindfire Solutions
  • 7. Why Apache Hadoop ? • State of Data – 90% of data in past three years. – Type of data • Unstructured • Semi-structured • Relational – Relation world can handle GB of data. • Distributed • Scalable • Flexible • Fault tolerant • Intelligent Presenter: Prem Chand Mali, Mindfire Solutions
  • 8. HDFS • HDFS is the primary distributed storage used by Hadoop applications. It consist of following two type of components. – NameNode – DataNode • HDFS, is well suited for distributed storage and distributed processing using commodity hardware. • Hadoop supports shell-like commands to interact with HDFS directly. Presenter: Prem Chand Mali, Mindfire Solutions
  • 9. HDFS Presenter: Prem Chand Mali, Mindfire Solutions
  • 10. MapReduce • MapReduce if combination of following three things. – Map – Shuffle – Reduce • It done it's job through Job Tracker and Task Tracker Presenter: Prem Chand Mali, Mindfire Solutions
  • 11. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • 12. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • 13. MapReduce Presenter: Prem Chand Mali, Mindfire Solutions
  • 14. Question and Answer Presenter: Prem Chand Mali, Mindfire Solutions
  • 15. Thank you Presenter: Prem Chand Mali, Mindfire Solutions