SlideShare a Scribd company logo
1 of 28
Download to read offline
A Generate-Test-Aggregate
Parallel Programming Library
Yu Liu1, Kento Emoto2, Zhenjiang Hu3
1The Graduate University for Advanced Studies
2The University of Tokyo
3National Institute of Informatics
PPoPP PMAM 2013
Systematic Parallel Programming for MapReduce
Outline
Introduction to GTA
The GTA library
 Implementation strategy
 Programming interface
 Automatic parallelization and optimization
Applications and evaluations
Conclusions
Outline
Introduction to GTA
The GTA library
 Implementation strategy
 Programming interface
 Automatic parallelization and optimization
Applications and evaluations
Conclusions
The GTA Programming Methodology
 Simple programming pattern
1. Generate all possible solution candidates;
2. Test and filter candidates;
3. Aggregate the valid candidates.
 Expressive and code efficient
 Covers a large class of problems
 Automatic optimization and parallelization
~ Kento Emoto, et.al., [ESOP’12]
An Example: The Knapsack Problem
Writing a parallel (MapReduce) program for the
knapsack problem is not easy.
Picture from Wikipedia
input: [ (1 $, 2 Kg), (2 $, 6 Kg), (3 $, 10 Kg) ]
weight limitation =15
generate:
[ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$,
6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ], [(2$, 6 Kg) , (3 $, 10 Kg) ],
[(1$, 2 Kg) , (2$, 6 Kg) , (3 $, 10 Kg) ] ]
test: [true, true, true, true, true, false, false]
filter: [ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ],
[(1$, 2 Kg) , (2$, 6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ] ]
aggregate: 0$, 1$, 2 $, 3$, 3$, 4$
Naively implementing Knapsack is inefficient (O(2n)).
Input (length) Time (ms)
8 30
12 86
16 97
20 2829
24
java.lang.OutOfMemoryError: Java heap
space
performance of the naïve Knapsack program
The GTA fusion theorem is introduced for resolve
efficiency problem
GTA Fusion
mapReduce
able
predicates
generator
aggregator
map ( mapReduceable.f ) .
reduce ( mapReduceable.combine )
MapReduce
Definitions of G,T,A
Class Name Algebraic Structure
Generator polymorphic semiring
generator
Predicate almost list
homomorphism
Aggregator semiring homomorphism
Ref: K.Emoto [ESOP’12]
Main Contributions
The implementation of a GTA library
 A simple and statically typed GTA-DSL is
implemented
 Algebraic structures and
computations/transformations of them are
implemented
Evaluation of GTA methodology
Outline
GTA programming methodology
The GTA library
 Implementation strategy
 Programming interface
 Automatic parallelization and optimization
Applications and evaluations
Conclusions
Object-oriented Functional Style
We defined the basic
algebraic structures.
Relations/transformations
of the algebras are well
typed
Examples
Outline
GTA programming methodology
The GTA library
 Implementation strategy
 Programming interface
 Automatic parallelization and optimization
Applications and evaluations
Conclusions
The users write GTA expressions like:
generate(g:GEN) filter(t:Predicate)* aggregate(a:Aggregator)
G‧T‧A Programming DSL
GEN, Aggregator, Predicate are Scala traits defined in the GTA library
Outline
GTA programming methodology
The GTA library
 Implementation strategy
 Programming interface
 Automatic parallelization and optimization
Applications and evaluations
Conclusions
GTA-fusion
G+A+T 𝑀𝑎𝑝𝑅𝑒𝑑𝑢𝑐𝑒𝑎𝑏𝑙𝑒[𝑓,⊕]
Input x1, x2, x3, … , xn
MAP
REDUCE
table1 tablen
f f f f
…
table1 tablentable2 ⊕ ⊕⊕ …
[EuroPar’11]
Implementation of GTA
Fusion/Optimization
The main difficulties:
How to define a polymorphic generator
How to define a predicate for test
How to define intermediate data structures
and other algebraic structures
Outline
GTA programming methodology
The GTA library
 Implementation strategy
 Programming interface
 Automatic parallelization and optimization
Applications and evaluations
Conclusions
More Examples
More examples in the paper and source package:
 Extended Knapsack problems
 The maximum-segments-sum problem
 Finding the most possible sequence (viterbi algorithm)
More information on: https://bitbucket.org/inii/gtalib
G‧T‧A Building Blocks
Our library provides commonly used G·T·A building
blocks and users can also implement their own G,T,As.
Performance Evaluations
Evaluations on EdubaseCluster (Cloud)
– Up to 32 VM nodes, each has 3GB RAM, 1 single
core CPU
– Executed on Spark – an in-memory MR cluster
Execution Time (Knapsack)
203.63
92.83 64.64 47.76 37.06 29.78 25.17 23.25
1727.973
679.305
637.33
471.2
362.36
287.08
234.25 223.44
0
200
400
600
800
1000
1200
1400
1600
1800
4 8 12 16 20 24 28 32
Time(second)
Number of VM nodes
1.00E+07 items
1.00E+08 items
Linear Speedup
0
1
2
3
4
5
6
7
8
9
4 8 12 16 20 24 28 32
speedup
number of VM
Knapsack
ViterbiAlg
MSS
Outline
GTA programming methodology
The GTA library
 Implementation strategy
 Programming interface
 Automatic parallelization and optimization
Applications and evaluations
Conclusions
Conclusions
We show GTA can be efficiently implemented
GTA-DSL can simplify parallel programming
 Simple programming model
 Good code efficiency
GTA-DSL is architecture independent
Future Works
Enrich the library by more building blocks in
terms of G, T, A
GTA-DSL can be extended to processing more
complex data structures such as tree/graph
Q&A
Thank you very much!

More Related Content

What's hot

Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
Shiladitya Sen
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
vsachde
 

What's hot (20)

強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
virtualization
virtualizationvirtualization
virtualization
 
Graph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized versionGraph 500 DISLIB powered optimized version
Graph 500 DISLIB powered optimized version
 
Towards quantum machine learning calogero zarbo - meet up
Towards quantum machine learning  calogero zarbo - meet upTowards quantum machine learning  calogero zarbo - meet up
Towards quantum machine learning calogero zarbo - meet up
 
Observations on dag scheduling and dynamic load-balancing using genetic algor...
Observations on dag scheduling and dynamic load-balancing using genetic algor...Observations on dag scheduling and dynamic load-balancing using genetic algor...
Observations on dag scheduling and dynamic load-balancing using genetic algor...
 
Kobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectKobeworkshop pubchemqc project
Kobeworkshop pubchemqc project
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
Introduction to glpk
Introduction to glpkIntroduction to glpk
Introduction to glpk
 
Sleep Period Optimization Model For Layered Video Service Delivery Over eMBMS...
Sleep Period Optimization Model For Layered Video Service Delivery Over eMBMS...Sleep Period Optimization Model For Layered Video Service Delivery Over eMBMS...
Sleep Period Optimization Model For Layered Video Service Delivery Over eMBMS...
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 

Viewers also liked (10)

Fineness of fine aggregates perfect (1)
Fineness of fine aggregates perfect  (1)Fineness of fine aggregates perfect  (1)
Fineness of fine aggregates perfect (1)
 
Use of waste plastic in road construction
Use of waste plastic in road construction Use of waste plastic in road construction
Use of waste plastic in road construction
 
Analysis of properties of plastic coated aggregate for construction ...
Analysis  of  properties of   plastic  coated  aggregate  for   construction ...Analysis  of  properties of   plastic  coated  aggregate  for   construction ...
Analysis of properties of plastic coated aggregate for construction ...
 
Sitaram1
Sitaram1Sitaram1
Sitaram1
 
Design and construction of highway (flexible pavement
Design and construction of highway (flexible pavementDesign and construction of highway (flexible pavement
Design and construction of highway (flexible pavement
 
Construction of flexible pavements
Construction of flexible pavementsConstruction of flexible pavements
Construction of flexible pavements
 
POLYMER MODIFIED BITUMEN
POLYMER MODIFIED BITUMENPOLYMER MODIFIED BITUMEN
POLYMER MODIFIED BITUMEN
 
Utilisation of-waste-plastic-in-bituminous-mixes-for-road-construction
Utilisation of-waste-plastic-in-bituminous-mixes-for-road-constructionUtilisation of-waste-plastic-in-bituminous-mixes-for-road-construction
Utilisation of-waste-plastic-in-bituminous-mixes-for-road-construction
 
Concrete
ConcreteConcrete
Concrete
 
Highway Materials Unit-III
Highway Materials  Unit-IIIHighway Materials  Unit-III
Highway Materials Unit-III
 

Similar to A Generate-Test-Aggregate Parallel Programming Library on Spark

Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on HadoopImplementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Yu Liu
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
Noha Elprince
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Yusuke Izawa
 
Orthogonal Matching Pursuit in 2D for Java with GPGPU Prospectives
Orthogonal Matching Pursuit in 2D for Java with GPGPU ProspectivesOrthogonal Matching Pursuit in 2D for Java with GPGPU Prospectives
Orthogonal Matching Pursuit in 2D for Java with GPGPU Prospectives
Matt Simons
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Carol McDonald
 

Similar to A Generate-Test-Aggregate Parallel Programming Library on Spark (20)

Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on HadoopImplementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform Designs
 
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...
Fault-tolerant topology and routing synthesis for IEEE time-sensitive network...
 
this-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxthis-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptx
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
Orthogonal Matching Pursuit in 2D for Java with GPGPU Prospectives
Orthogonal Matching Pursuit in 2D for Java with GPGPU ProspectivesOrthogonal Matching Pursuit in 2D for Java with GPGPU Prospectives
Orthogonal Matching Pursuit in 2D for Java with GPGPU Prospectives
 
Collections forceawakens
Collections forceawakensCollections forceawakens
Collections forceawakens
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
 
Js on-microcontrollers
Js on-microcontrollersJs on-microcontrollers
Js on-microcontrollers
 
Javascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and Gulp
 
What is new with JavaScript in Gnome: The 2021 edition
What is new with JavaScript in Gnome: The 2021 editionWhat is new with JavaScript in Gnome: The 2021 edition
What is new with JavaScript in Gnome: The 2021 edition
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
 

More from Yu Liu

More from Yu Liu (20)

A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search Engines
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
 
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming Framework
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on HadoopScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
 
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel ProgrammingA Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

A Generate-Test-Aggregate Parallel Programming Library on Spark

  • 1. A Generate-Test-Aggregate Parallel Programming Library Yu Liu1, Kento Emoto2, Zhenjiang Hu3 1The Graduate University for Advanced Studies 2The University of Tokyo 3National Institute of Informatics PPoPP PMAM 2013 Systematic Parallel Programming for MapReduce
  • 2. Outline Introduction to GTA The GTA library  Implementation strategy  Programming interface  Automatic parallelization and optimization Applications and evaluations Conclusions
  • 3. Outline Introduction to GTA The GTA library  Implementation strategy  Programming interface  Automatic parallelization and optimization Applications and evaluations Conclusions
  • 4. The GTA Programming Methodology  Simple programming pattern 1. Generate all possible solution candidates; 2. Test and filter candidates; 3. Aggregate the valid candidates.  Expressive and code efficient  Covers a large class of problems  Automatic optimization and parallelization ~ Kento Emoto, et.al., [ESOP’12]
  • 5. An Example: The Knapsack Problem Writing a parallel (MapReduce) program for the knapsack problem is not easy. Picture from Wikipedia
  • 6. input: [ (1 $, 2 Kg), (2 $, 6 Kg), (3 $, 10 Kg) ] weight limitation =15 generate: [ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$, 6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ], [(2$, 6 Kg) , (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$, 6 Kg) , (3 $, 10 Kg) ] ] test: [true, true, true, true, true, false, false] filter: [ [ ], [ (1$, 2 Kg) ], [ (2$, 6 Kg) ], [ (3 $, 10 Kg) ], [(1$, 2 Kg) , (2$, 6 Kg) ], [1$, 2 Kg) , (3 $, 10 Kg) ] ] aggregate: 0$, 1$, 2 $, 3$, 3$, 4$
  • 7. Naively implementing Knapsack is inefficient (O(2n)). Input (length) Time (ms) 8 30 12 86 16 97 20 2829 24 java.lang.OutOfMemoryError: Java heap space performance of the naïve Knapsack program The GTA fusion theorem is introduced for resolve efficiency problem
  • 8. GTA Fusion mapReduce able predicates generator aggregator map ( mapReduceable.f ) . reduce ( mapReduceable.combine ) MapReduce
  • 9. Definitions of G,T,A Class Name Algebraic Structure Generator polymorphic semiring generator Predicate almost list homomorphism Aggregator semiring homomorphism Ref: K.Emoto [ESOP’12]
  • 10. Main Contributions The implementation of a GTA library  A simple and statically typed GTA-DSL is implemented  Algebraic structures and computations/transformations of them are implemented Evaluation of GTA methodology
  • 11. Outline GTA programming methodology The GTA library  Implementation strategy  Programming interface  Automatic parallelization and optimization Applications and evaluations Conclusions
  • 12. Object-oriented Functional Style We defined the basic algebraic structures. Relations/transformations of the algebras are well typed
  • 14. Outline GTA programming methodology The GTA library  Implementation strategy  Programming interface  Automatic parallelization and optimization Applications and evaluations Conclusions
  • 15. The users write GTA expressions like: generate(g:GEN) filter(t:Predicate)* aggregate(a:Aggregator) G‧T‧A Programming DSL GEN, Aggregator, Predicate are Scala traits defined in the GTA library
  • 16. Outline GTA programming methodology The GTA library  Implementation strategy  Programming interface  Automatic parallelization and optimization Applications and evaluations Conclusions
  • 17. GTA-fusion G+A+T 𝑀𝑎𝑝𝑅𝑒𝑑𝑢𝑐𝑒𝑎𝑏𝑙𝑒[𝑓,⊕] Input x1, x2, x3, … , xn MAP REDUCE table1 tablen f f f f … table1 tablentable2 ⊕ ⊕⊕ … [EuroPar’11]
  • 18. Implementation of GTA Fusion/Optimization The main difficulties: How to define a polymorphic generator How to define a predicate for test How to define intermediate data structures and other algebraic structures
  • 19. Outline GTA programming methodology The GTA library  Implementation strategy  Programming interface  Automatic parallelization and optimization Applications and evaluations Conclusions
  • 20. More Examples More examples in the paper and source package:  Extended Knapsack problems  The maximum-segments-sum problem  Finding the most possible sequence (viterbi algorithm) More information on: https://bitbucket.org/inii/gtalib
  • 21. G‧T‧A Building Blocks Our library provides commonly used G·T·A building blocks and users can also implement their own G,T,As.
  • 22. Performance Evaluations Evaluations on EdubaseCluster (Cloud) – Up to 32 VM nodes, each has 3GB RAM, 1 single core CPU – Executed on Spark – an in-memory MR cluster
  • 23. Execution Time (Knapsack) 203.63 92.83 64.64 47.76 37.06 29.78 25.17 23.25 1727.973 679.305 637.33 471.2 362.36 287.08 234.25 223.44 0 200 400 600 800 1000 1200 1400 1600 1800 4 8 12 16 20 24 28 32 Time(second) Number of VM nodes 1.00E+07 items 1.00E+08 items
  • 24. Linear Speedup 0 1 2 3 4 5 6 7 8 9 4 8 12 16 20 24 28 32 speedup number of VM Knapsack ViterbiAlg MSS
  • 25. Outline GTA programming methodology The GTA library  Implementation strategy  Programming interface  Automatic parallelization and optimization Applications and evaluations Conclusions
  • 26. Conclusions We show GTA can be efficiently implemented GTA-DSL can simplify parallel programming  Simple programming model  Good code efficiency GTA-DSL is architecture independent
  • 27. Future Works Enrich the library by more building blocks in terms of G, T, A GTA-DSL can be extended to processing more complex data structures such as tree/graph