SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Apache HAMA: An Introduction to
Bulk Synchronization Parallel on Hadoop           (Ver 0.1)



         Hyunsik Choi <hyunsik.choi@gmail.com>,
        Edward J. Yoon <edwardyoon@apache.org>,


            Apache Hama Team
Table Of Contents

•   An Overview of the BSP Model
•   Why the BSP on Hadoop?
•   A Comparison to M/R Programming Model
•   What Applications are Appropriate for BSP?
•   The BSP Implementation on Hadoop
•   Some Examples on BSP
•   What's Next?
An Overview of the BSP Model

The BSP (Bulk Synchronous Parallel) is a parallel
computational model that performs a series supersteps.

.   A superstep consists of three ordered stages:
    • Concurrent Computation: Computation on locally stored data
    • Communication: Send and Receive messages in a point-to-point manner
    • Barrier Synchronization: Wait and Synchronize all processors at end of
      superstep

A BSP system is composed of a number of networked computers with
both local memory and disk.

More detailed information
    • http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
A Vertical Structure of Superstep
Why the BSP on Hadoop?

We want to use Hadoop cluster for more complicated
computations
 • But, the MapReduce is not easy to program complex
   mathematical and graphcial operations
   o   e.g., Matrix Multiply, Page-Rank, Breadth-First Search, ..., etc
• Preserving data locality is an important factor in
  distributed/parallel environments.
   o   Data locality is also important in MapReduce for efficient processing.
• However, MapReduce frameworks do not preserve data
  locality in consecutive operations due to its inherent natures.
A Comparison to M/R Programming model

Map/Reduce does any filtering and/or transformations while
each mapper is reading input data split by InputFormat.
 • Simple API
   o   Map and Reduce
• Automatic parallelization and distribution
• During MapReduce processing, it generally passes input data
  through either many passes of MapReduce or MapReduce iteration
  in order to derive final results.
• Partitioner is too simple
   o MR does not provide an way to preserve data locality either a transition
     from Map to Reduce or between MapReduce iteration.
   o It incurs not only intensive communication cost but also unnecessary
     processing cost.
A Comparison to M/R Programming model

The BSP is tailored towards processing data with locality.
 • The BSP enables each peer to communicate only necessary
   data to other peers while peers are preserving data locality.
 • Simple API and Easy Synchronization
 • Like MapReduce, it makes programs to be automatically
   parallelized and distributed across cluster nodes.
What Applications are Appropriate for BSP?

• BSP will show its innate capability in applications with
  following characteristics:
   o Important to data locality
   o Processing data with complicated relations
   o Using many iterations and recursions
The BSP Implementation on Hadoop

• Written in Java
• The BSP package is now available in the Hama repository.
  o   Implementation available for Hadoop version greater than 0.20.x
  o   Allows to develop new applications
• Hadoop RPC is used for BSP peers to communicate each
  other.
• Barrier Sync mechanism is helped by zookeeper.
Serialize Printing of Hello World (shows synchronization)

// BSP start with 10 threads.
// Each thread will have a shuffled ID number from 0 to 9.

for (int i = 0; i < numberOfPeer; i++) {
    if (myId == i) {
      System.out.println("Hello BSP world from " + i + " of " + numberOfPeer);
    }

    peer.sync();
}

// BSP end
Breath-first Search on Graph

public void expand(Map<Vertex, Integer> input, Map<Vertex, Integer> nextQueue) {
   for (Map.Entry<Vertex, Integer> e : input.entrySet()) {
     Vertex vertex = e.getKey();
     if (vertex.isVisited() == true) { // only visit a vertex that has never been visited.
       continue;
     } else {
       vertex.visit();
     }

        // Put vertices adjacent to current vertex into nextQueue with increment distance.
        for (Integer i : vertex.getAdjacent()) {
          if (needToVisit(i, vertex, input.get(e.getKey()))) {
            nextQueue.put(getVertexByKey(i), e.getValue() + 1);
          }
        }
    }
}
Breath-first Search on Graph

public void process() {
   currentQueue = new HashMap<Vertex, Integer>();
   nextQueue = new HashMap<Integer, Integer>();
   currentQueue.put(startVertex,0); // initially put a start vertex into the current queue.

    while (true) {
     expand(currentQueue, nextQueue);

         // The peer.sync method can determine if certain vertex resides
         // on local disk according to vertex id.
         peer.sync(nextQueue); // Synchronize nextQueue with other peers.
                            // At this step, the peer communicates vertex IDs corresponding to
                            // vertices that resides on remote peers.
        if (peer.state == TERMINATE)
          break;
        // Convert nextQueue that contains vertex IDs into currentQueue queue that contains vertices.
        currentQueue = convertQueue(nextQueue);
        nextQueue = new HashMap<Vertex, Integer>();
    }
}
What's Next?

• Novel matrix computation algorithms using BSP
• Angrapa
  o   A graph computation framework based on BSP
  o   API familiar with graph features
• More improved fault tolerance for BSP
  o   Now, BSP has poor fault tolerance mechanisms.
• Support collective and compressed communication
  mechanisms for high performance computing
Who We Are
• Apache Hama Project Team
  o (http://incubator.apache.org/hama)
• Mailing List
  o   hama-user@incubator.apache.org
  o   hama-dev@incubator.apache.org

Contenu connexe

Tendances

Map Reduce
Map ReduceMap Reduce
Map Reduceschapht
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXAndrea Iacono
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advancedChirag Ahuja
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDataWorks Summit
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)npinto
 

Tendances (20)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache Giraph
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
 

Similaire à Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop

Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopYahoo Developer Network
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigKhanKhaja1
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airshipdave_revell
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystemmashoodsyed66
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 

Similaire à Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop (20)

Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Apache Crunch
Apache CrunchApache Crunch
Apache Crunch
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 

Plus de Edward Yoon

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learningEdward Yoon
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링Edward Yoon
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스Edward Yoon
 
Quick Understanding of NoSQL
Quick Understanding of NoSQLQuick Understanding of NoSQL
Quick Understanding of NoSQLEdward Yoon
 
The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big dataEdward Yoon
 
Apache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyApache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyEdward Yoon
 
Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Edward Yoon
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introductionEdward Yoon
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsEdward Yoon
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time applicationEdward Yoon
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear AlgebraEdward Yoon
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And HbaseEdward Yoon
 

Plus de Edward Yoon (13)

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스
 
Quick Understanding of NoSQL
Quick Understanding of NoSQLQuick Understanding of NoSQL
Quick Understanding of NoSQL
 
The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
Apache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyApache hama @ Samsung SW Academy
Apache hama @ Samsung SW Academy
 
Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introduction
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in clouds
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear Algebra
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And Hbase
 
Heart Proposal
Heart ProposalHeart Proposal
Heart Proposal
 

Dernier

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Dernier (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop

  • 1. Apache HAMA: An Introduction to Bulk Synchronization Parallel on Hadoop (Ver 0.1) Hyunsik Choi <hyunsik.choi@gmail.com>, Edward J. Yoon <edwardyoon@apache.org>, Apache Hama Team
  • 2. Table Of Contents • An Overview of the BSP Model • Why the BSP on Hadoop? • A Comparison to M/R Programming Model • What Applications are Appropriate for BSP? • The BSP Implementation on Hadoop • Some Examples on BSP • What's Next?
  • 3. An Overview of the BSP Model The BSP (Bulk Synchronous Parallel) is a parallel computational model that performs a series supersteps. . A superstep consists of three ordered stages: • Concurrent Computation: Computation on locally stored data • Communication: Send and Receive messages in a point-to-point manner • Barrier Synchronization: Wait and Synchronize all processors at end of superstep A BSP system is composed of a number of networked computers with both local memory and disk. More detailed information • http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
  • 4. A Vertical Structure of Superstep
  • 5. Why the BSP on Hadoop? We want to use Hadoop cluster for more complicated computations • But, the MapReduce is not easy to program complex mathematical and graphcial operations o e.g., Matrix Multiply, Page-Rank, Breadth-First Search, ..., etc • Preserving data locality is an important factor in distributed/parallel environments. o Data locality is also important in MapReduce for efficient processing. • However, MapReduce frameworks do not preserve data locality in consecutive operations due to its inherent natures.
  • 6. A Comparison to M/R Programming model Map/Reduce does any filtering and/or transformations while each mapper is reading input data split by InputFormat. • Simple API o Map and Reduce • Automatic parallelization and distribution • During MapReduce processing, it generally passes input data through either many passes of MapReduce or MapReduce iteration in order to derive final results. • Partitioner is too simple o MR does not provide an way to preserve data locality either a transition from Map to Reduce or between MapReduce iteration. o It incurs not only intensive communication cost but also unnecessary processing cost.
  • 7. A Comparison to M/R Programming model The BSP is tailored towards processing data with locality. • The BSP enables each peer to communicate only necessary data to other peers while peers are preserving data locality. • Simple API and Easy Synchronization • Like MapReduce, it makes programs to be automatically parallelized and distributed across cluster nodes.
  • 8. What Applications are Appropriate for BSP? • BSP will show its innate capability in applications with following characteristics: o Important to data locality o Processing data with complicated relations o Using many iterations and recursions
  • 9. The BSP Implementation on Hadoop • Written in Java • The BSP package is now available in the Hama repository. o Implementation available for Hadoop version greater than 0.20.x o Allows to develop new applications • Hadoop RPC is used for BSP peers to communicate each other. • Barrier Sync mechanism is helped by zookeeper.
  • 10. Serialize Printing of Hello World (shows synchronization) // BSP start with 10 threads. // Each thread will have a shuffled ID number from 0 to 9. for (int i = 0; i < numberOfPeer; i++) { if (myId == i) { System.out.println("Hello BSP world from " + i + " of " + numberOfPeer); } peer.sync(); } // BSP end
  • 11. Breath-first Search on Graph public void expand(Map<Vertex, Integer> input, Map<Vertex, Integer> nextQueue) { for (Map.Entry<Vertex, Integer> e : input.entrySet()) { Vertex vertex = e.getKey(); if (vertex.isVisited() == true) { // only visit a vertex that has never been visited. continue; } else { vertex.visit(); } // Put vertices adjacent to current vertex into nextQueue with increment distance. for (Integer i : vertex.getAdjacent()) { if (needToVisit(i, vertex, input.get(e.getKey()))) { nextQueue.put(getVertexByKey(i), e.getValue() + 1); } } } }
  • 12. Breath-first Search on Graph public void process() { currentQueue = new HashMap<Vertex, Integer>(); nextQueue = new HashMap<Integer, Integer>(); currentQueue.put(startVertex,0); // initially put a start vertex into the current queue. while (true) { expand(currentQueue, nextQueue); // The peer.sync method can determine if certain vertex resides // on local disk according to vertex id. peer.sync(nextQueue); // Synchronize nextQueue with other peers. // At this step, the peer communicates vertex IDs corresponding to // vertices that resides on remote peers. if (peer.state == TERMINATE) break; // Convert nextQueue that contains vertex IDs into currentQueue queue that contains vertices. currentQueue = convertQueue(nextQueue); nextQueue = new HashMap<Vertex, Integer>(); } }
  • 13. What's Next? • Novel matrix computation algorithms using BSP • Angrapa o A graph computation framework based on BSP o API familiar with graph features • More improved fault tolerance for BSP o Now, BSP has poor fault tolerance mechanisms. • Support collective and compressed communication mechanisms for high performance computing
  • 14. Who We Are • Apache Hama Project Team o (http://incubator.apache.org/hama) • Mailing List o hama-user@incubator.apache.org o hama-dev@incubator.apache.org