SlideShare une entreprise Scribd logo
1  sur  19
MapReduce : Simplified Data
Processing on Large Cluster
Dae Ho Kim, Dept of Computer Science, Sangmyung Univ.
Introduction
The Age of Big Data
Introduction
The Age of Big Data
SNS
IoT
Smart
Phone
Introduction
The Age of Big Data
SNS
IoT
Smart
Phone Large-scale Computation
Introduction
The Age of Big Data
Automatic
Powerful
Simple
Introduction
The Age of Big Data
Automatic
Powerful
Simple
MapReduce
Concept Description
MapReduce
Concept Description
MapReduce
MapReduce
Map Reduce
Input Data -> key / value Merge Values
Concept Description
MapReduce
Implementation
Overview, Fault Tolerance, …
Implementation
Execution Overview
In-progress completeidle
Map Reduce
MasterWorker
Implementation
Execution Overview
Implementation
Fault Tolerance
1. Worker Failure
• If no response is received from a worker in a certain amount of time, the master marks the
worker as failed.
• Any map task or reduce task in progress on a failed worker is reset to idle and becomes
eligible for rescheduling.
• Completed map task are re-executed on an failure because their output is stored on the
local disk(s) of the failed machine and is therefore inaccessible. Completed reduce tasks do
not need to be re-executed since their output is stored in a global file system.
• When a map task is executed first by worker A and then later executed by worker B
(because A failed), all workers executing reduce tasks are notified of the re-execution. Any
reduce task that has not already read the data from worker A will read the data from
worker B.
Implementation
Fault Tolerance
2. Master Failure
• Our current implementation aborts the MapReduce computation if the master fails.
Clients can check for this condition and retry the MapReduce operation if they desire.
3. Semantics in the Presence of Failures
• When the user-supplied map and reduce operators are deterministic functions of their
input values, our distributed implementation produces the same output as would have
been produced by a non-faulting sequential execution of the entire program.
• If the master receives a completion message for an already completed map task, it
ignores the message.
• If the same reduce task is executed on multiple machines, multiple rename calls will be
executed for the same final output file. We rely on the atomic rename operation provided
by the underlying file system to guarantee that the final file system state contains just the
data produced by one execution of the reduce task.
• The vast majority of our map and reduce operators are deterministic, and the fact that our
semantics are equivalent to a sequential execution in this case makes it very easy for
programmers to reason about their program’s behavior.
Implementation
Backup Tasks
 Backup Tasks
• One of the common causes that lengthens the total
time taken for a MapReduce operation is a “straggler”:
a machine that takes an unusually long time to
complete one of the last few map or reduce tasks in
the computation.
• When a MapReduce operation is close to completion,
the master schedules backup executions of the
remaining in-progress tasks. The task is marked as
completed whenever either the primary or the backup
execution completes.
• The sort program described in Section 5.3 takes 44%
longer to complete when the backup task mechanism is
disabled.
Refinements
Partitioning Function, Combiner Function, …
Refinements
Partitioning Function, Combiner Function
 Partitioning Function
• Data gets partitioned across these tasks using a partitioning function on the intermediate key.
• A default partitioning function is provided that uses hashing (e.g. “hash(key) mod R”).
• For example, using “hash(Hostname(urlkey)) mod R” as the partitioning function causes all
URLs from the same host to end up in the same output file.
 Combiner Function
• In some cases, there is significant repetition in the intermediate keys produced by each map
task, and the user specified Reduce function is commutative and associative.
• We allow the user to specify an optional Combiner function that does partial merging of this
data before it is sent over the network.
• The Combiner function is executed on each machine that performs a map task.
Refinements
Skipping Bad Records, Counters
 Skipping Bad Records
• Sometimes it is acceptable to ignore a few records, for example when doing statistical
analysis on a large data set.
• We provide an optional mode of execution where the MapReduce library detects which
records cause deterministic crashes and skips these records in order to make forward
progress.
• When the master has seen more than one failure on a particular record, the signal handler
indicates that the record should be skipped when it issues the next re-execution of the
corresponding Map or Reduce task.
 Counters
• To use this facility, user code creates a named counter object and then increments the
counter appropriately in the Map and/or Reduce function.
• The current counter values are also displayed on the master status page so that a human can
watch the progress of the live computation.
Map reduce

Contenu connexe

Tendances

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepSubhas Kumar Ghosh
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationQian Lin
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Load balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemLoad balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemAchal Gupta
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 
load balancing in public cloud
load balancing in public cloudload balancing in public cloud
load balancing in public cloudSudhagarp Cse
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...areej qasrawi
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting techniqueUday Vakalapudi
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)Yu Liu
 
Processing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FMEProcessing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FMESafe Software
 
Processing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FMEProcessing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FMESafe Software
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingLu Wei
 
A load balancing model based on cloud partitioning for the public cloud. ppt
A  load balancing model based on cloud partitioning for the public cloud. ppt A  load balancing model based on cloud partitioning for the public cloud. ppt
A load balancing model based on cloud partitioning for the public cloud. ppt Lavanya Vigrahala
 

Tendances (19)

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Load balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemLoad balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed system
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Map reduce
Map reduceMap reduce
Map reduce
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
load balancing in public cloud
load balancing in public cloudload balancing in public cloud
load balancing in public cloud
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Processing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FMEProcessing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FME
 
Processing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FMEProcessing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FME
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce Scheduling
 
A load balancing model based on cloud partitioning for the public cloud. ppt
A  load balancing model based on cloud partitioning for the public cloud. ppt A  load balancing model based on cloud partitioning for the public cloud. ppt
A load balancing model based on cloud partitioning for the public cloud. ppt
 

Similaire à Map reduce

MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentationVu Thi Trang
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
Introduction of MapReduce
Introduction of MapReduceIntroduction of MapReduce
Introduction of MapReduceHC Lin
 
Architecting for the cloud map reduce creating
Architecting for the cloud   map reduce creatingArchitecting for the cloud   map reduce creating
Architecting for the cloud map reduce creatingLen Bass
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model examIndhujeni
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 

Similaire à Map reduce (20)

MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
E031201032036
E031201032036E031201032036
E031201032036
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Introduction of MapReduce
Introduction of MapReduceIntroduction of MapReduce
Introduction of MapReduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Architecting for the cloud map reduce creating
Architecting for the cloud   map reduce creatingArchitecting for the cloud   map reduce creating
Architecting for the cloud map reduce creating
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
MapReduce
MapReduceMapReduce
MapReduce
 

Dernier

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Dernier (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Map reduce

  • 1. MapReduce : Simplified Data Processing on Large Cluster Dae Ho Kim, Dept of Computer Science, Sangmyung Univ.
  • 3. Introduction The Age of Big Data SNS IoT Smart Phone
  • 4. Introduction The Age of Big Data SNS IoT Smart Phone Large-scale Computation
  • 5. Introduction The Age of Big Data Automatic Powerful Simple
  • 6. Introduction The Age of Big Data Automatic Powerful Simple MapReduce
  • 13. Implementation Fault Tolerance 1. Worker Failure • If no response is received from a worker in a certain amount of time, the master marks the worker as failed. • Any map task or reduce task in progress on a failed worker is reset to idle and becomes eligible for rescheduling. • Completed map task are re-executed on an failure because their output is stored on the local disk(s) of the failed machine and is therefore inaccessible. Completed reduce tasks do not need to be re-executed since their output is stored in a global file system. • When a map task is executed first by worker A and then later executed by worker B (because A failed), all workers executing reduce tasks are notified of the re-execution. Any reduce task that has not already read the data from worker A will read the data from worker B.
  • 14. Implementation Fault Tolerance 2. Master Failure • Our current implementation aborts the MapReduce computation if the master fails. Clients can check for this condition and retry the MapReduce operation if they desire. 3. Semantics in the Presence of Failures • When the user-supplied map and reduce operators are deterministic functions of their input values, our distributed implementation produces the same output as would have been produced by a non-faulting sequential execution of the entire program. • If the master receives a completion message for an already completed map task, it ignores the message. • If the same reduce task is executed on multiple machines, multiple rename calls will be executed for the same final output file. We rely on the atomic rename operation provided by the underlying file system to guarantee that the final file system state contains just the data produced by one execution of the reduce task. • The vast majority of our map and reduce operators are deterministic, and the fact that our semantics are equivalent to a sequential execution in this case makes it very easy for programmers to reason about their program’s behavior.
  • 15. Implementation Backup Tasks  Backup Tasks • One of the common causes that lengthens the total time taken for a MapReduce operation is a “straggler”: a machine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation. • When a MapReduce operation is close to completion, the master schedules backup executions of the remaining in-progress tasks. The task is marked as completed whenever either the primary or the backup execution completes. • The sort program described in Section 5.3 takes 44% longer to complete when the backup task mechanism is disabled.
  • 17. Refinements Partitioning Function, Combiner Function  Partitioning Function • Data gets partitioned across these tasks using a partitioning function on the intermediate key. • A default partitioning function is provided that uses hashing (e.g. “hash(key) mod R”). • For example, using “hash(Hostname(urlkey)) mod R” as the partitioning function causes all URLs from the same host to end up in the same output file.  Combiner Function • In some cases, there is significant repetition in the intermediate keys produced by each map task, and the user specified Reduce function is commutative and associative. • We allow the user to specify an optional Combiner function that does partial merging of this data before it is sent over the network. • The Combiner function is executed on each machine that performs a map task.
  • 18. Refinements Skipping Bad Records, Counters  Skipping Bad Records • Sometimes it is acceptable to ignore a few records, for example when doing statistical analysis on a large data set. • We provide an optional mode of execution where the MapReduce library detects which records cause deterministic crashes and skips these records in order to make forward progress. • When the master has seen more than one failure on a particular record, the signal handler indicates that the record should be skipped when it issues the next re-execution of the corresponding Map or Reduce task.  Counters • To use this facility, user code creates a named counter object and then increments the counter appropriately in the Map and/or Reduce function. • The current counter values are also displayed on the master status page so that a human can watch the progress of the live computation.

Notes de l'éditeur

  1. 그래서 나온 것이 분산컴퓨팅인데 분산컴퓨팅이란 여러 대의 컴퓨터가 하나의 작업을 나누어 처리하는 방식이다. 그리고 이 분산 컴퓨팅을 보다 쉽고 간편하게 하기 위해 만든 것이 맵 리듀스 .
  2. 그래서 나온 것이 분산컴퓨팅인데 분산컴퓨팅이란 여러 대의 컴퓨터가 하나의 작업을 나누어 처리하는 방식이다. 그리고 이 분산 컴퓨팅을 보다 쉽고 간편하게 하기 위해 만든 것이 맵 리듀스 .
  3. Straggler : CPU, Memory, Local disk, Network bandwidth etc..