SlideShare une entreprise Scribd logo
1  sur  24
Byzantine Fault-Tolerant MapReduce
        in Cloud-of-Clouds
       Joint work with: Miguel Correia, Marcelo Pasin,
     Alysson Bessani, Fernando Ramos, Paulo Verissimo
                   Presenter: Pedro Costa


                         Navtalk
Motivation
• How to count the number of words in the
  internet?
• How to do it with the help of a cloud-of-clouds
  (ie, several clouds)
• Guarantee integrity and availability of data




                                                2
Outline
• Introduction
   – MapReduce programming model
   – Fault tolerance in Cloud-of-clouds
   – 3 problems for Basic scheme
• Our approach
   – Byzantine fault-tolerant MapReduce in clouds-of-clouds
• Evaluation




                                                              3
MAPREDUCE AND FAULTS


                       4
What is MapReduce?
• Programming model + execution environment
   • Introduced by Google in 2004
   • Used for processing large data sets using clusters of servers
   • A few implementations available, used by many companies
• Hadoop MapReduce, an open-source MapReduce of Apache
   • The most used, the one we have been using
   • Includes HDFS, a distributed file system for large files




                                                                     5
MapReduce basic idea
A file with all the words
      on the Internet


                            Map Phase   <word,1>

                                                                                 <word,n>


                                                                  Reduce Phase




                                                    Tasktracker
                                                   servers

                                  Tasktracker
                                      servers
                     Job tracker detects and recovers crashed map/reduce tasks              6
MapReduce components
  Wordcount




   TT1        TT2   TT3   TT1   TT3




  (TT)




                                      7
But there are more faults…
• Problem: Accidental faults may affect the correctness of the results
  of MapReduce
    • Task corruptions: memory errors, chipset errors, …
    • Cloud outages: MapReduce job interruptions
                     (as reported in popular clouds)

• Our goal:
    • guarantee integrity and availability (despite task corruptions and
      cloud outages)
    • Develop a new model to compute MapReduce in cloud-of-clouds
    • Commercially feasible?
        Yes, but out of scope of this presentation
        Tobias Kurze et al., Cloud federation. In Proceedings of the 2nd International
        Conference on Cloud Computing, GRIDs, and Virtualization CLOUD COMPUTING
        2011.

                                                                                         8
Byzantine fault-tolerant MapReduce
• Basic idea: to replicate tasks in different clouds and vote the
  results returned by the replicas
   • The set of clouds forms a clouds, so cloud-of-clouds
   • Inputs initially stored in all clouds (i.e., not our problem)


                                                                     Cloud 1


                                                             Cloud 2


                                                                Cloud 3




                                                                               9
System model
• Client is correct (not part of MapReduce)
• Clouds: up to t clouds can arbitrarily corrupt all tasks and
  other modules they execute
• Why use t and not f? t≤f

• Next:
   • Basic BFT MapReduce scheme
   • 3 problems of the Basic scheme
   • Our approach: Full BFT MapReduce scheme




                                                                 10
MapReduce: Map perspective

Official               Cloud-of-Clouds




                       Replicas in different
                              clouds




                                               11
MapReduce: Reduce perspective

Official                    Cloud-of-Clouds




                                                   Replicas in different
                                                          clouds
                But we can do better.         12
Improvements over basic version
• 3 problems have risen
   • Computation problem
   • Communication problem
   • Job execution control problem


• 3 Solutions: Our BFT MapReduce can be thought of as this
  basic version plus the following mechanisms,
   • Deferred execution (computation problem)
   • Digest communication (communication problem)
   • Distributed Job tracker (job execution control problem)


                                                               13
Problem 1: computation


                        split 0                                   part 0




                        split 0                                   part 0




                                                                                Replicas in different
Replicas in different




                                                                                       clouds
       clouds




                        split 0                                   part 0




                                  Tasks are executed 2t+1 times            14
Solution 1: Deferred execution
• Computation problem is uncommon
• Job Tracker replicates tasks across t+1 clouds (t in standby)
• If results differ or one cloud stops, request 1 more (up to t)


     split 0

                                                part 0

     split 0

                                                part 0



                                                                   15
Problem 2: communication


    split 0                                     part 0




    split 0                                     part 0




                                                                  Replicas in different
                                                                         clouds
    split 0                                     part 0




All this communication through the Internet (delay, cost)!   16
Solution 2: Transferring Digests
• Reduces must fetch the map task outputs
• Intra-cloud fetch: output fetched normally
• Inter-cloud fetch: only hash of the output fetched – key idea


          split 0




                                                            other clouds same cloud
                                                   part 0

          split 0




          split 0
                                                                                      17
Problem 3: Job execution control
• Job tracker controls all task executions in the task trackers in
  all clouds
• If Job tracker is in one cloud separated from many task
  trackers by the internet:
   • Communication is slow
   • Large timeouts for detecting task tracker failure
   • …and it’s a single point of failure (this is the case in MR & Hadoop MR)




                                                                            18
Solution 3: Job execution control
                                      Client
                                               VJT




                                               Job
                                           Tracker


            Job                                Task                       Job
          Tracker                          Tracker                      Tracker
                               Task                    Task
                              Tracker                 Tracker
           Task                                                          Task
          Tracker                                                       Tracker
 Task                Task                                       Task               Task
Tracker             Tracker                                   Tracker             Tracker


                                                                                            19
EVALUATION


             20
Setup and Test
Platform configuration
• 3 clouds
• Each cloud has 3 nodes
• 1 JT and 3TT for each cloud
• All JTs are interconnected

Job submitted (Wordcount)
• Input data: 26 chunks of 64 MB (total 1.5GB )
• Map tasks: 26
• Reduce tasks: 120, 180, 360, 400

                                                  21
Number of reduce tasks executed
          (no faults, t=1)


                             Nr.      Job          Job        Diff
                             Reduce   duration     duration
                             tasks    (Official)   (CoC)
                             120      00:15:35     00:17:13   00:02:35
                             180      00:19:35     00:21:36   00:02:01
                             360      00:31:12     00:33:30   00:02:18
                             400      00:33:37     00:36:24   00:02:47
Task details
Official                                                      BFT Cloud-of-clouds: 1 view
                Map Duration: 00:06:47                                      Map duration: 00:07:08
 Map Tasks




                                                   Map Tasks
                Reduce duration: 00:13:18                                  Reduce duration: 00:14:46
 Reduce Tasks




                                                   Reduce Tasks




                                                                                                       23
Conclusions
• Our method guarantee integrity and availability despite task
  corruptions and cloud outages
• BFT MapReduce in cloud-of-clouds is feasible!
   • No need to execute in all 2t+1 clouds
   • Only digests sent through the Internet (no “big data”)
   • Control job execution within each cloud




                          Thank you
                                                                 24

Contenu connexe

Tendances

Scheduling MapReduce Jobs in HPC Clusters
Scheduling MapReduce Jobs in HPC ClustersScheduling MapReduce Jobs in HPC Clusters
Scheduling MapReduce Jobs in HPC Clusters
Marcelo Veiga Neves
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
Hiroshi Ono
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
Dan Harvey
 

Tendances (20)

Ppt
PptPpt
Ppt
 
Fast Optimization Intevac
Fast Optimization IntevacFast Optimization Intevac
Fast Optimization Intevac
 
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
 
SVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image WatermarkingSVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image Watermarking
 
XCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and AggregationXCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and Aggregation
 
Scheduling MapReduce Jobs in HPC Clusters
Scheduling MapReduce Jobs in HPC ClustersScheduling MapReduce Jobs in HPC Clusters
Scheduling MapReduce Jobs in HPC Clusters
 
Fuzzy causal order
Fuzzy causal orderFuzzy causal order
Fuzzy causal order
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
145 153
145 153145 153
145 153
 
Gh2411361141
Gh2411361141Gh2411361141
Gh2411361141
 
Hadoop
HadoopHadoop
Hadoop
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Design and implemation of an enhanced dds based digital
Design and implemation of an enhanced dds based digitalDesign and implemation of an enhanced dds based digital
Design and implemation of an enhanced dds based digital
 
Classification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud ComputingClassification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud Computing
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution Service
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
discrete wavelet transform based satellite image resolution enhancement
discrete wavelet transform based satellite image resolution enhancement discrete wavelet transform based satellite image resolution enhancement
discrete wavelet transform based satellite image resolution enhancement
 

Similaire à Bft mr-clouds-of-clouds-discco2012 - navtalk

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
10c introduction
10c introduction10c introduction
10c introduction
Inyoung Cho
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 

Similaire à Bft mr-clouds-of-clouds-discco2012 - navtalk (20)

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptx
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
10c introduction
10c introduction10c introduction
10c introduction
 
10c introduction
10c introduction10c introduction
10c introduction
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting Boar
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Scalable and Available Services with Docker and Kubernetes
Scalable and Available Services with Docker and KubernetesScalable and Available Services with Docker and Kubernetes
Scalable and Available Services with Docker and Kubernetes
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
E031201032036
E031201032036E031201032036
E031201032036
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Bft mr-clouds-of-clouds-discco2012 - navtalk

  • 1. Byzantine Fault-Tolerant MapReduce in Cloud-of-Clouds Joint work with: Miguel Correia, Marcelo Pasin, Alysson Bessani, Fernando Ramos, Paulo Verissimo Presenter: Pedro Costa Navtalk
  • 2. Motivation • How to count the number of words in the internet? • How to do it with the help of a cloud-of-clouds (ie, several clouds) • Guarantee integrity and availability of data 2
  • 3. Outline • Introduction – MapReduce programming model – Fault tolerance in Cloud-of-clouds – 3 problems for Basic scheme • Our approach – Byzantine fault-tolerant MapReduce in clouds-of-clouds • Evaluation 3
  • 5. What is MapReduce? • Programming model + execution environment • Introduced by Google in 2004 • Used for processing large data sets using clusters of servers • A few implementations available, used by many companies • Hadoop MapReduce, an open-source MapReduce of Apache • The most used, the one we have been using • Includes HDFS, a distributed file system for large files 5
  • 6. MapReduce basic idea A file with all the words on the Internet Map Phase <word,1> <word,n> Reduce Phase Tasktracker servers Tasktracker servers Job tracker detects and recovers crashed map/reduce tasks 6
  • 7. MapReduce components Wordcount TT1 TT2 TT3 TT1 TT3 (TT) 7
  • 8. But there are more faults… • Problem: Accidental faults may affect the correctness of the results of MapReduce • Task corruptions: memory errors, chipset errors, … • Cloud outages: MapReduce job interruptions (as reported in popular clouds) • Our goal: • guarantee integrity and availability (despite task corruptions and cloud outages) • Develop a new model to compute MapReduce in cloud-of-clouds • Commercially feasible? Yes, but out of scope of this presentation Tobias Kurze et al., Cloud federation. In Proceedings of the 2nd International Conference on Cloud Computing, GRIDs, and Virtualization CLOUD COMPUTING 2011. 8
  • 9. Byzantine fault-tolerant MapReduce • Basic idea: to replicate tasks in different clouds and vote the results returned by the replicas • The set of clouds forms a clouds, so cloud-of-clouds • Inputs initially stored in all clouds (i.e., not our problem) Cloud 1 Cloud 2 Cloud 3 9
  • 10. System model • Client is correct (not part of MapReduce) • Clouds: up to t clouds can arbitrarily corrupt all tasks and other modules they execute • Why use t and not f? t≤f • Next: • Basic BFT MapReduce scheme • 3 problems of the Basic scheme • Our approach: Full BFT MapReduce scheme 10
  • 11. MapReduce: Map perspective Official Cloud-of-Clouds Replicas in different clouds 11
  • 12. MapReduce: Reduce perspective Official Cloud-of-Clouds Replicas in different clouds But we can do better. 12
  • 13. Improvements over basic version • 3 problems have risen • Computation problem • Communication problem • Job execution control problem • 3 Solutions: Our BFT MapReduce can be thought of as this basic version plus the following mechanisms, • Deferred execution (computation problem) • Digest communication (communication problem) • Distributed Job tracker (job execution control problem) 13
  • 14. Problem 1: computation split 0 part 0 split 0 part 0 Replicas in different Replicas in different clouds clouds split 0 part 0 Tasks are executed 2t+1 times 14
  • 15. Solution 1: Deferred execution • Computation problem is uncommon • Job Tracker replicates tasks across t+1 clouds (t in standby) • If results differ or one cloud stops, request 1 more (up to t) split 0 part 0 split 0 part 0 15
  • 16. Problem 2: communication split 0 part 0 split 0 part 0 Replicas in different clouds split 0 part 0 All this communication through the Internet (delay, cost)! 16
  • 17. Solution 2: Transferring Digests • Reduces must fetch the map task outputs • Intra-cloud fetch: output fetched normally • Inter-cloud fetch: only hash of the output fetched – key idea split 0 other clouds same cloud part 0 split 0 split 0 17
  • 18. Problem 3: Job execution control • Job tracker controls all task executions in the task trackers in all clouds • If Job tracker is in one cloud separated from many task trackers by the internet: • Communication is slow • Large timeouts for detecting task tracker failure • …and it’s a single point of failure (this is the case in MR & Hadoop MR) 18
  • 19. Solution 3: Job execution control Client VJT Job Tracker Job Task Job Tracker Tracker Tracker Task Task Tracker Tracker Task Task Tracker Tracker Task Task Task Task Tracker Tracker Tracker Tracker 19
  • 21. Setup and Test Platform configuration • 3 clouds • Each cloud has 3 nodes • 1 JT and 3TT for each cloud • All JTs are interconnected Job submitted (Wordcount) • Input data: 26 chunks of 64 MB (total 1.5GB ) • Map tasks: 26 • Reduce tasks: 120, 180, 360, 400 21
  • 22. Number of reduce tasks executed (no faults, t=1) Nr. Job Job Diff Reduce duration duration tasks (Official) (CoC) 120 00:15:35 00:17:13 00:02:35 180 00:19:35 00:21:36 00:02:01 360 00:31:12 00:33:30 00:02:18 400 00:33:37 00:36:24 00:02:47
  • 23. Task details Official BFT Cloud-of-clouds: 1 view Map Duration: 00:06:47 Map duration: 00:07:08 Map Tasks Map Tasks Reduce duration: 00:13:18 Reduce duration: 00:14:46 Reduce Tasks Reduce Tasks 23
  • 24. Conclusions • Our method guarantee integrity and availability despite task corruptions and cloud outages • BFT MapReduce in cloud-of-clouds is feasible! • No need to execute in all 2t+1 clouds • Only digests sent through the Internet (no “big data”) • Control job execution within each cloud Thank you 24