SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
INTRODUCTION TO HADOOP
Presented By
www.zenithit.co.uk
WHAT IS ?
 Distributed computing frame work
 For clusters of computers
 Thousands of Compute Nodes
 Petabytes of data
 Open source, Java
 Google’s MapReduce inspired Yahoo’s Hadoop.
 Now part of Apache group
www.zenithit.co.uk
WHAT IS ?
 The Apache Hadoop project develops open-source
software for reliable, scalable, distributed
computing. Hadoop includes:
 Hadoop Common utilities
 Avro: A data serialization system with scripting
languages.
 Chukwa: managing large distributed systems.
 HBase: A scalable, distributed database for large tables.
 HDFS: A distributed file system.
 Hive: data summarization and ad hoc querying.
 MapReduce: distributed processing on compute clusters.
 Pig: A high-level data-flow language for parallel
computation.
 ZooKeeper: coordination service for distributed
applications.
www.zenithit.co.uk
THE IDEA OF MAP REDUCE
www.zenithit.co.uk
MAP AND REDUCE
 The idea of Map, and Reduce is 40+ year
old
 Present in all Functional Programming
Languages.
 See, e.g., APL, Lisp and ML
 Alternate names for Map: Apply-All
 Higher Order Functions
 take function definitions as arguments, or
 return a function as output
 Map and Reduce are higher-order
functions.
www.zenithit.co.uk
MAP: A HIGHER ORDER FUNCTION
 F(x: int) returns r: int
 Let V be an array of integers.
 W = map(F, V)
 W[i] = F(V[i]) for all I
 i.e., apply F to every element of V
www.zenithit.co.uk
MAP EXAMPLES IN HASKELL
 map (+1) [1,2,3,4,5]
== [2, 3, 4, 5, 6]
 map (toLower) "abcDEFG12!@#“
== "abcdefg12!@#“
 map (`mod` 3) [1..10]
== [1, 2, 0, 1, 2, 0, 1, 2, 0, 1]
www.zenithit.co.uk
REDUCE: A HIGHER ORDER FUNCTION
 reduce also known as
fold, accumulate,
compress or inject
 Reduce/fold takes in
a function and folds
it in between the
elements of a list.
www.zenithit.co.uk
FOLD-LEFT IN HASKELL
 Definition
 foldl f z [] = z
 foldl f z (x:xs) = foldl f (f z x) xs
 Examples
 foldl (+) 0 [1..5] ==15
 foldl (+) 10 [1..5] == 25
 foldl (div) 7 [34,56,12,4,23] == 0
www.zenithit.co.uk
FOLD-RIGHT IN HASKELL
 Definition
 foldr f z [] = z
 foldr f z (x:xs) = f x (foldr f z xs)
 Example
 foldr (div) 7 [34,56,12,4,23] == 8
www.zenithit.co.uk
EXAMPLES OF THE
MAP REDUCE IDEA
www.zenithit.co.uk
WORD COUNT EXAMPLE
 Read text files and count how often words occur.
 The input is text files
 The output is a text file
 each line: word, tab, count
 Map: Produce pairs of (word, count)
 Reduce: For each word, sum up the counts.
www.zenithit.co.uk
GREP EXAMPLE
 Search input files for a given pattern
 Map: emits a line if pattern is matched
 Reduce: Copies results to output
www.zenithit.co.uk
INVERTED INDEX EXAMPLE
 Generate an inverted index of words from a given set
of files
 Map: parses a document and emits <word, docId>
pairs
 Reduce: takes all pairs for a given word, sorts the
docId values, and emits a <word, list(docId)> pair
www.zenithit.co.uk
MAP/REDUCE IMPLEMENTATION
IDEA
www.zenithit.co.uk
EXECUTION ON CLUSTERS
1. Input files split (M splits)
2. Assign Master & Workers
3. Map tasks
4. Writing intermediate data to disk (R regions)
5. Intermediate data read & sort
6. Reduce tasks
7. Return
www.zenithit.co.uk
MAP/REDUCE CLUSTER IMPLEMENTATION
split 0
split 1
split 2
split 3
split 4
Output 0
Output 1
Input
files
Output
files
M map
tasks
R reduce
tasks
Intermediate
files
Several map or
reduce tasks can
run on a single
computer
Each intermediate
file is divided into R
partitions, by
partitioning function
Each reduce task
corresponds to one
partition
www.zenithit.co.uk
EXECUTION
www.zenithit.co.uk
FAULT RECOVERY
 Workers are pinged by master periodically
 Non-responsive workers are marked as failed
 All tasks in-progress or completed by failed worker become
eligible for rescheduling
 Master could periodically checkpoint
 Current implementations abort on master failure
www.zenithit.co.uk
POPULAR GOOGLE SEARCH KEY
WORDS
Hadoop training in UK
Bigdata training in UK
Best Hadoop training in UK
Best Bigdata trainin g in UK
Hadoop fee
Hadoop material
Hadoop videos

Contenu connexe

Tendances

Data engineering and analytics using python
Data engineering and analytics using pythonData engineering and analytics using python
Data engineering and analytics using pythonPurna Chander
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python PandasNeeru Mittal
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data VisualizationSakthi Dasans
 
Data structures and algorithms lab10
Data structures and algorithms lab10Data structures and algorithms lab10
Data structures and algorithms lab10Bianca Teşilă
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort AlgorithmLemia Algmri
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciencesalexstorer
 
Python and CSV Connectivity
Python and CSV ConnectivityPython and CSV Connectivity
Python and CSV ConnectivityNeeru Mittal
 
Heap Data Structure
 Heap Data Structure Heap Data Structure
Heap Data StructureSaumya Som
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmssamairaakram
 
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Spark Summit
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort Amit Kundu
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structureSajid Marwat
 

Tendances (20)

R language introduction
R language introductionR language introduction
R language introduction
 
Heapsort using Heap
Heapsort using HeapHeapsort using Heap
Heapsort using Heap
 
Heap sort
Heap sortHeap sort
Heap sort
 
Working with LiDAR
Working with LiDARWorking with LiDAR
Working with LiDAR
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
Data engineering and analytics using python
Data engineering and analytics using pythonData engineering and analytics using python
Data engineering and analytics using python
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 
Data structures and algorithms lab10
Data structures and algorithms lab10Data structures and algorithms lab10
Data structures and algorithms lab10
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
Python and CSV Connectivity
Python and CSV ConnectivityPython and CSV Connectivity
Python and CSV Connectivity
 
Heap Data Structure
 Heap Data Structure Heap Data Structure
Heap Data Structure
 
Pig statements
Pig statementsPig statements
Pig statements
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
 
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Heap tree
Heap treeHeap tree
Heap tree
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 

Similaire à Zenith it-hadoop-training

Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopSvetlin Nakov
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章moai kids
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...shravanthium111
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsDilum Bandara
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 

Similaire à Zenith it-hadoop-training (20)

Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Apache Spark with Scala
Apache Spark with ScalaApache Spark with Scala
Apache Spark with Scala
 
Hadoop
HadoopHadoop
Hadoop
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
SparkNotes
SparkNotesSparkNotes
SparkNotes
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Map reducefunnyslide
Map reducefunnyslideMap reducefunnyslide
Map reducefunnyslide
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 

Dernier

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 

Dernier (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 

Zenith it-hadoop-training

  • 1. INTRODUCTION TO HADOOP Presented By www.zenithit.co.uk
  • 2. WHAT IS ?  Distributed computing frame work  For clusters of computers  Thousands of Compute Nodes  Petabytes of data  Open source, Java  Google’s MapReduce inspired Yahoo’s Hadoop.  Now part of Apache group www.zenithit.co.uk
  • 3. WHAT IS ?  The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes:  Hadoop Common utilities  Avro: A data serialization system with scripting languages.  Chukwa: managing large distributed systems.  HBase: A scalable, distributed database for large tables.  HDFS: A distributed file system.  Hive: data summarization and ad hoc querying.  MapReduce: distributed processing on compute clusters.  Pig: A high-level data-flow language for parallel computation.  ZooKeeper: coordination service for distributed applications. www.zenithit.co.uk
  • 4. THE IDEA OF MAP REDUCE www.zenithit.co.uk
  • 5. MAP AND REDUCE  The idea of Map, and Reduce is 40+ year old  Present in all Functional Programming Languages.  See, e.g., APL, Lisp and ML  Alternate names for Map: Apply-All  Higher Order Functions  take function definitions as arguments, or  return a function as output  Map and Reduce are higher-order functions. www.zenithit.co.uk
  • 6. MAP: A HIGHER ORDER FUNCTION  F(x: int) returns r: int  Let V be an array of integers.  W = map(F, V)  W[i] = F(V[i]) for all I  i.e., apply F to every element of V www.zenithit.co.uk
  • 7. MAP EXAMPLES IN HASKELL  map (+1) [1,2,3,4,5] == [2, 3, 4, 5, 6]  map (toLower) "abcDEFG12!@#“ == "abcdefg12!@#“  map (`mod` 3) [1..10] == [1, 2, 0, 1, 2, 0, 1, 2, 0, 1] www.zenithit.co.uk
  • 8. REDUCE: A HIGHER ORDER FUNCTION  reduce also known as fold, accumulate, compress or inject  Reduce/fold takes in a function and folds it in between the elements of a list. www.zenithit.co.uk
  • 9. FOLD-LEFT IN HASKELL  Definition  foldl f z [] = z  foldl f z (x:xs) = foldl f (f z x) xs  Examples  foldl (+) 0 [1..5] ==15  foldl (+) 10 [1..5] == 25  foldl (div) 7 [34,56,12,4,23] == 0 www.zenithit.co.uk
  • 10. FOLD-RIGHT IN HASKELL  Definition  foldr f z [] = z  foldr f z (x:xs) = f x (foldr f z xs)  Example  foldr (div) 7 [34,56,12,4,23] == 8 www.zenithit.co.uk
  • 11. EXAMPLES OF THE MAP REDUCE IDEA www.zenithit.co.uk
  • 12. WORD COUNT EXAMPLE  Read text files and count how often words occur.  The input is text files  The output is a text file  each line: word, tab, count  Map: Produce pairs of (word, count)  Reduce: For each word, sum up the counts. www.zenithit.co.uk
  • 13. GREP EXAMPLE  Search input files for a given pattern  Map: emits a line if pattern is matched  Reduce: Copies results to output www.zenithit.co.uk
  • 14. INVERTED INDEX EXAMPLE  Generate an inverted index of words from a given set of files  Map: parses a document and emits <word, docId> pairs  Reduce: takes all pairs for a given word, sorts the docId values, and emits a <word, list(docId)> pair www.zenithit.co.uk
  • 16. EXECUTION ON CLUSTERS 1. Input files split (M splits) 2. Assign Master & Workers 3. Map tasks 4. Writing intermediate data to disk (R regions) 5. Intermediate data read & sort 6. Reduce tasks 7. Return www.zenithit.co.uk
  • 17. MAP/REDUCE CLUSTER IMPLEMENTATION split 0 split 1 split 2 split 3 split 4 Output 0 Output 1 Input files Output files M map tasks R reduce tasks Intermediate files Several map or reduce tasks can run on a single computer Each intermediate file is divided into R partitions, by partitioning function Each reduce task corresponds to one partition www.zenithit.co.uk
  • 19. FAULT RECOVERY  Workers are pinged by master periodically  Non-responsive workers are marked as failed  All tasks in-progress or completed by failed worker become eligible for rescheduling  Master could periodically checkpoint  Current implementations abort on master failure www.zenithit.co.uk
  • 20. POPULAR GOOGLE SEARCH KEY WORDS Hadoop training in UK Bigdata training in UK Best Hadoop training in UK Best Bigdata trainin g in UK Hadoop fee Hadoop material Hadoop videos