SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
INTRODUCTION TO HADOOP
Presented By
www.zenithit.co.uk
WHAT IS ?
 Distributed computing frame work
 For clusters of computers
 Thousands of Compute Nodes
 Petabytes of data
 Open source, Java
 Google’s MapReduce inspired Yahoo’s Hadoop.
 Now part of Apache group
www.zenithit.co.uk
WHAT IS ?
 The Apache Hadoop project develops open-source
software for reliable, scalable, distributed
computing. Hadoop includes:
 Hadoop Common utilities
 Avro: A data serialization system with scripting
languages.
 Chukwa: managing large distributed systems.
 HBase: A scalable, distributed database for large tables.
 HDFS: A distributed file system.
 Hive: data summarization and ad hoc querying.
 MapReduce: distributed processing on compute clusters.
 Pig: A high-level data-flow language for parallel
computation.
 ZooKeeper: coordination service for distributed
applications.
www.zenithit.co.uk
THE IDEA OF MAP REDUCE
www.zenithit.co.uk
MAP AND REDUCE
 The idea of Map, and Reduce is 40+ year
old
 Present in all Functional Programming
Languages.
 See, e.g., APL, Lisp and ML
 Alternate names for Map: Apply-All
 Higher Order Functions
 take function definitions as arguments, or
 return a function as output
 Map and Reduce are higher-order
functions.
www.zenithit.co.uk
MAP: A HIGHER ORDER FUNCTION
 F(x: int) returns r: int
 Let V be an array of integers.
 W = map(F, V)
 W[i] = F(V[i]) for all I
 i.e., apply F to every element of V
www.zenithit.co.uk
MAP EXAMPLES IN HASKELL
 map (+1) [1,2,3,4,5]
== [2, 3, 4, 5, 6]
 map (toLower) "abcDEFG12!@#“
== "abcdefg12!@#“
 map (`mod` 3) [1..10]
== [1, 2, 0, 1, 2, 0, 1, 2, 0, 1]
www.zenithit.co.uk
REDUCE: A HIGHER ORDER FUNCTION
 reduce also known as
fold, accumulate,
compress or inject
 Reduce/fold takes in
a function and folds
it in between the
elements of a list.
www.zenithit.co.uk
FOLD-LEFT IN HASKELL
 Definition
 foldl f z [] = z
 foldl f z (x:xs) = foldl f (f z x) xs
 Examples
 foldl (+) 0 [1..5] ==15
 foldl (+) 10 [1..5] == 25
 foldl (div) 7 [34,56,12,4,23] == 0
www.zenithit.co.uk
FOLD-RIGHT IN HASKELL
 Definition
 foldr f z [] = z
 foldr f z (x:xs) = f x (foldr f z xs)
 Example
 foldr (div) 7 [34,56,12,4,23] == 8
www.zenithit.co.uk
EXAMPLES OF THE
MAP REDUCE IDEA
www.zenithit.co.uk
WORD COUNT EXAMPLE
 Read text files and count how often words occur.
 The input is text files
 The output is a text file
 each line: word, tab, count
 Map: Produce pairs of (word, count)
 Reduce: For each word, sum up the counts.
www.zenithit.co.uk
GREP EXAMPLE
 Search input files for a given pattern
 Map: emits a line if pattern is matched
 Reduce: Copies results to output
www.zenithit.co.uk
INVERTED INDEX EXAMPLE
 Generate an inverted index of words from a given set
of files
 Map: parses a document and emits <word, docId>
pairs
 Reduce: takes all pairs for a given word, sorts the
docId values, and emits a <word, list(docId)> pair
www.zenithit.co.uk
MAP/REDUCE IMPLEMENTATION
IDEA
www.zenithit.co.uk
EXECUTION ON CLUSTERS
1. Input files split (M splits)
2. Assign Master & Workers
3. Map tasks
4. Writing intermediate data to disk (R regions)
5. Intermediate data read & sort
6. Reduce tasks
7. Return
www.zenithit.co.uk
MAP/REDUCE CLUSTER IMPLEMENTATION
split 0
split 1
split 2
split 3
split 4
Output 0
Output 1
Input
files
Output
files
M map
tasks
R reduce
tasks
Intermediate
files
Several map or
reduce tasks can
run on a single
computer
Each intermediate
file is divided into R
partitions, by
partitioning function
Each reduce task
corresponds to one
partition
www.zenithit.co.uk
EXECUTION
www.zenithit.co.uk
FAULT RECOVERY
 Workers are pinged by master periodically
 Non-responsive workers are marked as failed
 All tasks in-progress or completed by failed worker become
eligible for rescheduling
 Master could periodically checkpoint
 Current implementations abort on master failure
www.zenithit.co.uk
POPULAR GOOGLE SEARCH KEY
WORDS
Hadoop training in UK
Bigdata training in UK
Best Hadoop training in UK
Best Bigdata trainin g in UK
Hadoop fee
Hadoop material
Hadoop videos

Contenu connexe

Tendances

Data engineering and analytics using python
Data engineering and analytics using pythonData engineering and analytics using python
Data engineering and analytics using pythonPurna Chander
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python PandasNeeru Mittal
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data VisualizationSakthi Dasans
 
Data structures and algorithms lab10
Data structures and algorithms lab10Data structures and algorithms lab10
Data structures and algorithms lab10Bianca Teşilă
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort AlgorithmLemia Algmri
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciencesalexstorer
 
Python and CSV Connectivity
Python and CSV ConnectivityPython and CSV Connectivity
Python and CSV ConnectivityNeeru Mittal
 
Heap Data Structure
 Heap Data Structure Heap Data Structure
Heap Data StructureSaumya Som
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmssamairaakram
 
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Spark Summit
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort Amit Kundu
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structureSajid Marwat
 

Tendances (20)

R language introduction
R language introductionR language introduction
R language introduction
 
Heapsort using Heap
Heapsort using HeapHeapsort using Heap
Heapsort using Heap
 
Heap sort
Heap sortHeap sort
Heap sort
 
Working with LiDAR
Working with LiDARWorking with LiDAR
Working with LiDAR
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
Data engineering and analytics using python
Data engineering and analytics using pythonData engineering and analytics using python
Data engineering and analytics using python
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 
Data structures and algorithms lab10
Data structures and algorithms lab10Data structures and algorithms lab10
Data structures and algorithms lab10
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
Python and CSV Connectivity
Python and CSV ConnectivityPython and CSV Connectivity
Python and CSV Connectivity
 
Heap Data Structure
 Heap Data Structure Heap Data Structure
Heap Data Structure
 
Pig statements
Pig statementsPig statements
Pig statements
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
 
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Heap tree
Heap treeHeap tree
Heap tree
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 

Similaire à Zenith it-hadoop-training

Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopSvetlin Nakov
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章moai kids
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...shravanthium111
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 

Similaire à Zenith it-hadoop-training (20)

Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
 
Apache Spark with Scala
Apache Spark with ScalaApache Spark with Scala
Apache Spark with Scala
 
Hadoop
HadoopHadoop
Hadoop
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
SparkNotes
SparkNotesSparkNotes
SparkNotes
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Map reducefunnyslide
Map reducefunnyslideMap reducefunnyslide
Map reducefunnyslide
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

Dernier

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 

Dernier (20)

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 

Zenith it-hadoop-training

  • 1. INTRODUCTION TO HADOOP Presented By www.zenithit.co.uk
  • 2. WHAT IS ?  Distributed computing frame work  For clusters of computers  Thousands of Compute Nodes  Petabytes of data  Open source, Java  Google’s MapReduce inspired Yahoo’s Hadoop.  Now part of Apache group www.zenithit.co.uk
  • 3. WHAT IS ?  The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes:  Hadoop Common utilities  Avro: A data serialization system with scripting languages.  Chukwa: managing large distributed systems.  HBase: A scalable, distributed database for large tables.  HDFS: A distributed file system.  Hive: data summarization and ad hoc querying.  MapReduce: distributed processing on compute clusters.  Pig: A high-level data-flow language for parallel computation.  ZooKeeper: coordination service for distributed applications. www.zenithit.co.uk
  • 4. THE IDEA OF MAP REDUCE www.zenithit.co.uk
  • 5. MAP AND REDUCE  The idea of Map, and Reduce is 40+ year old  Present in all Functional Programming Languages.  See, e.g., APL, Lisp and ML  Alternate names for Map: Apply-All  Higher Order Functions  take function definitions as arguments, or  return a function as output  Map and Reduce are higher-order functions. www.zenithit.co.uk
  • 6. MAP: A HIGHER ORDER FUNCTION  F(x: int) returns r: int  Let V be an array of integers.  W = map(F, V)  W[i] = F(V[i]) for all I  i.e., apply F to every element of V www.zenithit.co.uk
  • 7. MAP EXAMPLES IN HASKELL  map (+1) [1,2,3,4,5] == [2, 3, 4, 5, 6]  map (toLower) "abcDEFG12!@#“ == "abcdefg12!@#“  map (`mod` 3) [1..10] == [1, 2, 0, 1, 2, 0, 1, 2, 0, 1] www.zenithit.co.uk
  • 8. REDUCE: A HIGHER ORDER FUNCTION  reduce also known as fold, accumulate, compress or inject  Reduce/fold takes in a function and folds it in between the elements of a list. www.zenithit.co.uk
  • 9. FOLD-LEFT IN HASKELL  Definition  foldl f z [] = z  foldl f z (x:xs) = foldl f (f z x) xs  Examples  foldl (+) 0 [1..5] ==15  foldl (+) 10 [1..5] == 25  foldl (div) 7 [34,56,12,4,23] == 0 www.zenithit.co.uk
  • 10. FOLD-RIGHT IN HASKELL  Definition  foldr f z [] = z  foldr f z (x:xs) = f x (foldr f z xs)  Example  foldr (div) 7 [34,56,12,4,23] == 8 www.zenithit.co.uk
  • 11. EXAMPLES OF THE MAP REDUCE IDEA www.zenithit.co.uk
  • 12. WORD COUNT EXAMPLE  Read text files and count how often words occur.  The input is text files  The output is a text file  each line: word, tab, count  Map: Produce pairs of (word, count)  Reduce: For each word, sum up the counts. www.zenithit.co.uk
  • 13. GREP EXAMPLE  Search input files for a given pattern  Map: emits a line if pattern is matched  Reduce: Copies results to output www.zenithit.co.uk
  • 14. INVERTED INDEX EXAMPLE  Generate an inverted index of words from a given set of files  Map: parses a document and emits <word, docId> pairs  Reduce: takes all pairs for a given word, sorts the docId values, and emits a <word, list(docId)> pair www.zenithit.co.uk
  • 16. EXECUTION ON CLUSTERS 1. Input files split (M splits) 2. Assign Master & Workers 3. Map tasks 4. Writing intermediate data to disk (R regions) 5. Intermediate data read & sort 6. Reduce tasks 7. Return www.zenithit.co.uk
  • 17. MAP/REDUCE CLUSTER IMPLEMENTATION split 0 split 1 split 2 split 3 split 4 Output 0 Output 1 Input files Output files M map tasks R reduce tasks Intermediate files Several map or reduce tasks can run on a single computer Each intermediate file is divided into R partitions, by partitioning function Each reduce task corresponds to one partition www.zenithit.co.uk
  • 19. FAULT RECOVERY  Workers are pinged by master periodically  Non-responsive workers are marked as failed  All tasks in-progress or completed by failed worker become eligible for rescheduling  Master could periodically checkpoint  Current implementations abort on master failure www.zenithit.co.uk
  • 20. POPULAR GOOGLE SEARCH KEY WORDS Hadoop training in UK Bigdata training in UK Best Hadoop training in UK Best Bigdata trainin g in UK Hadoop fee Hadoop material Hadoop videos