SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
© 2012 IBM Corporation
Towards Approximate Computing:
Programming with Relaxed Synchronization
Lakshminarayanan Renganarayana
Vijayalakshmi Srinivasan
Ravi Nair (presenting)
Dan Prener
IBM T.J. Watson Research Center
October 21, 2012
© 2012 IBM
CorporationRN 10/21/12 2
Emerging Use of Computing
  Emerging important applications
  Search
  Inexact answers to inexact queries
  Proactive delivery of information
  Advertisements, offers, news, alerts, advise
  Media
  Approximate rendition of audio, video, or images
  Stream Processing
  Fast decisions based on possibly incomplete data
  These applications benefit in cost-performance through
the use of prediction techniques
  They allow trading off accuracy of results for speed of
delivering results
© 2012 IBM
CorporationRN 10/21/12 3
Approximate Computing
  Current techniques
  Exact (hence energy-inefficient) computation done on exact
(hence expensive) hardware to produce results that need not be
exact
  With increasing unreliability and variability of devices in new
generations of technology, the cost of such computation will
increase dramatically
  What is needed: Approximate Computing
  Computational models that allow computational results to fall in
some acceptable range, and
  Hardware that produces results that fall in some acceptable
range
© 2012 IBM
CorporationRN 10/21/12 4
Hardware
Exact Less Exact
Accurate
Less Accurate, less
up-to-date, possibly
corrupted
Reliable
Variable
Computation
Data
Computing model
today
Dimensions of Approximate Computing
© 2012 IBM
CorporationRN 10/21/12 5
Hardware
Exact Less Exact
Accurate
Less Accurate, less
up-to-date, possibly
corrupted
Reliable
Variable
Computation
Data
Computing model
today
Future model
Human Brain
Path to
Efficiency
Improving Efficiency through Approximation
© 2012 IBM
CorporationRN 10/21/12 6
Hardware
Exact Less Exact
Accurate
Less Accurate, less
up-to-date, possibly
corrupted
Reliable
Variable
Computation
Data
Computing model
today Relaxed
Synchronization
Human Brain
Path to
Efficiency
Relaxed Synchronization
© 2012 IBM
CorporationRN 10/21/12 7
Relaxing Synchronization: Challenges and Opportunities
  Principal uses of synchronization
  Ensuring that all threads see consistent values of shared
variables (e.g. atomic updates with locks)
  Best candidate for relaxing synchronization
  Ensuring that threads reach various points in their
execution in a predictable manner (e.g. barrier)
  Can relax (depending on context)
  Parallel update of data structures (e.g. linked lists)
  Difficult to relax (can lead to fatal crash of the program)
  Key questions about relaxed synchronization
  Will the program produce results that are acceptable?
  If acceptable, would the results be produced in
considerably less time?
© 2012 IBM
CorporationRN 10/21/12 8
  Partition data into k clusters
Randomly
select
K “initial”
means
Create k clusters by
associating every
data point with the
nearest centroid
Recompute
centroids of k
clusters
Iterate until centroids
do not change
8
  Widely used
  Market segmentation, data mining, computer vision, astronomy
Kmeans Clustering
© 2012 IBM
CorporationRN 10/21/12 9
Kmeans Synchronization Overhead
  Kmeans : 8 clusters with large input set
  OpenMP based parallel code
  Parallel execution on an IBM Power7 machine
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8
# of threads
PercentClusteringTime
Synchronization
Compute
Synchronization time as high as 90% for 8 threads
© 2012 IBM
CorporationRN 10/21/12 10
Relaxing Synchronization in Kmeans (NU-MineBench)
  Parallel evaluation of all points moving between clusters
  Synchronization is used to update list of points moving
  Stopping criterion:
} while (delta > threshold && loop++ < 500);
  Number of points moving (delta) < 0.1% of total points (threshold)
  Subject to a maximum of 500 iterations
  This is already an approximation
  What would happen if synchronization were relaxed?
Performance for 8 Clusters
0
10
20
30
40
50
60
2 3 4 5 6 7 8
Number of Threads
ExecutionTime(secs)
0
1
2
3
4
5
6
7
8
9
10
Speedup
Original
Relaxed
Speedup
© 2012 IBM
CorporationRN 10/21/12 11
Performance Improvement
Performance Speedup with Relaxed Synchronization
0
2
4
6
8
10
12
14
4 5 6 7 8 9 10 11 12 13
Number of Clusters
Speedup
2 threads
3 threads
4 threads
5 threads
6 threads
7 threads
8 threads
Significant speedup observed in many combinations.
No combination showed a degradation in performance.
© 2012 IBM
CorporationRN 10/21/12 12
Quality of Solution
  Computed total sum of
distances of all points from
the centroids of clusters
they belong to
  Good indicator of quality of
solution
  Each bar in graph
represents a different
relaxed run
  Non-determinism is a side-
effect of relaxation
Quality of solution is not degraded due to relaxed synchronization
Degradation	
  in	
  Quality	
  of	
  Solution	
  -­‐	
  8	
  clusters,	
  8	
  thds
0.00%
0.05%
0.10%
0.15%
0.20%
0.25%
0.30%
0.35%
1 2 3 4 5 6 7 8 9 10
Run Number
Differencebetween
RelaxedcostandOriginalCost
© 2012 IBM
CorporationRN 10/21/12 13
Graph500
  Breadth-first search (BFS) of a very large scale-free graph
  Representative of certain real-life analytics problems
  E.g. Mail a flyer to all people who are connected directly
and transitively to a certain person or sets of persons
  Experiments performed on version of award-winning IBM
solution
Sample Scale-Free Graphs
© 2012 IBM
CorporationRN 10/21/12 14
Initial Experiment: Relaxing Synchronization
  The parallel examination of nodes on a wavefront
requires an atomic update of a data structure
representing list of visited nodes
  Atomic update is a synchronizing operation
  Question: What would happen if we updated the
data structure with a simple OR, rather than an
atomic-OR operation?
© 2012 IBM
CorporationRN 10/21/12 15
Sample Result
  Each column represents a different relaxed run
  Relaxed results are non-deterministic
  Very few vertices missing
  Results
  Very few vertices with wrong level numbers
  Better than 3x the performance
  In most cases, it was possible to detect all missing vertices by simply
validating the obtained tree
CPUS 8
THREADS 16
SCALE 20
Time taken with accurate sync 0.248 0.248 0.248 0.248 0.248
Time taken with relaxed sync 0.077 0.078 0.078 0.077 0.078
Performance gain with relaxed sync 3.20 3.20 3.20 3.20 3.19
Vertices visited in accurate list 646056 646056 646056 646056 646056
Vertices visited in relaxed list 645985 645991 645990 646002 645993
Vertices common in both 645985 645991 645990 646002 645993
Missing vertices 29 35 34 46 37
Other vertices with wrong level numbers 71 49 66 34 59
© 2012 IBM
CorporationRN 10/21/12 16
Relax-and-Check: A Framework to Exploit Relaxed
Synchronization
  Assume there is a criterion that allows us to estimate the quality of a
solution without adding high cost
  Relax-and-Check
  Modifies an existing program
  Run initially without synchronizations
  Revert to fully synchronized version if quality does not fall within acceptable limits
  Such criteria are often already explicit in programs
  E.g. convergence criteria
  Other “syndromes” can often be identified
© 2012 IBM
CorporationRN 10/21/12 17
Sample Results
  Three benchmarks from
STAMP suite
  Acceptability criteria of results
identical to original
  Significant reduction of run
time due to relaxation of
synchronization requirement
SSCA2
Kmeans (STAMP) Labyrinth
© 2012 IBM
CorporationRN 10/21/12 18
Relax-and-Check Strategy for BFS: A Slight Twist
  Very high correlation
between validation errors
and time to solution
  Possibly because of
redundant visit of vertices
  Also, these occurred
principally when hardware
parameters were not suitable
for problem size
  Use validation errors as
measure of quality
  Validate the solution
produced by approximate
technique
  If number of errors exceeds a
threshold, say 1000, revert to
synchronized solution for
future instances with similar
problem size
Performance vs. validation errors
0
0.5
1
1.5
2
2.5
3
3.5
0 500 1000 1500 2000 2500 3000 3500 4000
Number of validation errors
Performancerelativeto
accuratesolution
scale = 18
scale = 20
scale = 22
scale = 24
Relax-and-Check provides a technique to
exploit the advantages of relaxed
synchronization with a quality safety-net
© 2012 IBM
CorporationRN 10/21/12 19
Applications Suitable for Relax-and-Check
Domain / Applications Examples
Scientific Finite Difference Methods, Successive over-
relaxation, N-body methods
Data Mining, Text Mining, Risk Analysis, Machine
Learning
K-means, Hierarchical clustering, Multinomial
regression, Feature reduction
Numerical optimization, Forecasting, Financial
Modeling, and Mathematical Programming
Non-convex optimization solvers, constrained
optimization, multi-dimensional cubing
EDA VLSI place and route
Graph processing and Social Network Analysis Mesh refinement, Metis style graph partitioning,
Graph traversals
Program analyses in compilers and software
engineering
Iterative data flow analysis
A large majority of applications in these domains choose a good solution from
many correct solutions, and use acceptance criteria such as convergence tests,
fixed-point tests, or boolean acceptability tests
© 2012 IBM
CorporationRN 10/21/12 20
Hardware
Exact Less Exact
Accurate
Less Accurate, less
up-to-date, possibly
corrupted
Reliable
Variable
Computation
Data
Computing model
today
Future model
Human Brain
Path to
Efficiency
Move Towards Attributes of the Human Brain

Contenu connexe

Similaire à Programming with Relaxed Synchronization

Mixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of ProgressMixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of ProgressIBM Decision Optimization
 
Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalabilitymabuhr
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Peter Tröger
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...Nesma
 
Bosch ConnectedWorld 2017: Striving for Zero DPPM
Bosch ConnectedWorld 2017: Striving for Zero DPPMBosch ConnectedWorld 2017: Striving for Zero DPPM
Bosch ConnectedWorld 2017: Striving for Zero DPPMDavid Park
 
Reproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral ModelsReproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral Modelsfnothaft
 
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014Amazon Web Services
 
Advanced Econometrics L13-14.pptx
Advanced Econometrics L13-14.pptxAdvanced Econometrics L13-14.pptx
Advanced Econometrics L13-14.pptxakashayosha
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop ComputingMichael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop ComputingMichael Gschwind
 

Similaire à Programming with Relaxed Synchronization (20)

Mixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of ProgressMixed Integer Programming: Analyzing 12 Years of Progress
Mixed Integer Programming: Analyzing 12 Years of Progress
 
Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalability
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
13 2017.03.30 freeman 7th pvpmc iec 61853 presentation
13 2017.03.30 freeman 7th pvpmc iec 61853 presentation13 2017.03.30 freeman 7th pvpmc iec 61853 presentation
13 2017.03.30 freeman 7th pvpmc iec 61853 presentation
 
Recent and Planned Improvements to the System Advisor Model
Recent and Planned Improvements to the System Advisor ModelRecent and Planned Improvements to the System Advisor Model
Recent and Planned Improvements to the System Advisor Model
 
Iwsm2014 performance measurement for cloud computing applications using iso...
Iwsm2014   performance measurement for cloud computing applications using iso...Iwsm2014   performance measurement for cloud computing applications using iso...
Iwsm2014 performance measurement for cloud computing applications using iso...
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
Bosch ConnectedWorld 2017: Striving for Zero DPPM
Bosch ConnectedWorld 2017: Striving for Zero DPPMBosch ConnectedWorld 2017: Striving for Zero DPPM
Bosch ConnectedWorld 2017: Striving for Zero DPPM
 
Reproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral ModelsReproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral Models
 
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
Advanced Econometrics L13-14.pptx
Advanced Econometrics L13-14.pptxAdvanced Econometrics L13-14.pptx
Advanced Econometrics L13-14.pptx
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Umit hw6
Umit hw6Umit hw6
Umit hw6
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop ComputingMichael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
Michael Gschwind, Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 

Dernier

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Programming with Relaxed Synchronization

  • 1. © 2012 IBM Corporation Towards Approximate Computing: Programming with Relaxed Synchronization Lakshminarayanan Renganarayana Vijayalakshmi Srinivasan Ravi Nair (presenting) Dan Prener IBM T.J. Watson Research Center October 21, 2012
  • 2. © 2012 IBM CorporationRN 10/21/12 2 Emerging Use of Computing   Emerging important applications   Search   Inexact answers to inexact queries   Proactive delivery of information   Advertisements, offers, news, alerts, advise   Media   Approximate rendition of audio, video, or images   Stream Processing   Fast decisions based on possibly incomplete data   These applications benefit in cost-performance through the use of prediction techniques   They allow trading off accuracy of results for speed of delivering results
  • 3. © 2012 IBM CorporationRN 10/21/12 3 Approximate Computing   Current techniques   Exact (hence energy-inefficient) computation done on exact (hence expensive) hardware to produce results that need not be exact   With increasing unreliability and variability of devices in new generations of technology, the cost of such computation will increase dramatically   What is needed: Approximate Computing   Computational models that allow computational results to fall in some acceptable range, and   Hardware that produces results that fall in some acceptable range
  • 4. © 2012 IBM CorporationRN 10/21/12 4 Hardware Exact Less Exact Accurate Less Accurate, less up-to-date, possibly corrupted Reliable Variable Computation Data Computing model today Dimensions of Approximate Computing
  • 5. © 2012 IBM CorporationRN 10/21/12 5 Hardware Exact Less Exact Accurate Less Accurate, less up-to-date, possibly corrupted Reliable Variable Computation Data Computing model today Future model Human Brain Path to Efficiency Improving Efficiency through Approximation
  • 6. © 2012 IBM CorporationRN 10/21/12 6 Hardware Exact Less Exact Accurate Less Accurate, less up-to-date, possibly corrupted Reliable Variable Computation Data Computing model today Relaxed Synchronization Human Brain Path to Efficiency Relaxed Synchronization
  • 7. © 2012 IBM CorporationRN 10/21/12 7 Relaxing Synchronization: Challenges and Opportunities   Principal uses of synchronization   Ensuring that all threads see consistent values of shared variables (e.g. atomic updates with locks)   Best candidate for relaxing synchronization   Ensuring that threads reach various points in their execution in a predictable manner (e.g. barrier)   Can relax (depending on context)   Parallel update of data structures (e.g. linked lists)   Difficult to relax (can lead to fatal crash of the program)   Key questions about relaxed synchronization   Will the program produce results that are acceptable?   If acceptable, would the results be produced in considerably less time?
  • 8. © 2012 IBM CorporationRN 10/21/12 8   Partition data into k clusters Randomly select K “initial” means Create k clusters by associating every data point with the nearest centroid Recompute centroids of k clusters Iterate until centroids do not change 8   Widely used   Market segmentation, data mining, computer vision, astronomy Kmeans Clustering
  • 9. © 2012 IBM CorporationRN 10/21/12 9 Kmeans Synchronization Overhead   Kmeans : 8 clusters with large input set   OpenMP based parallel code   Parallel execution on an IBM Power7 machine 0% 20% 40% 60% 80% 100% 2 3 4 5 6 7 8 # of threads PercentClusteringTime Synchronization Compute Synchronization time as high as 90% for 8 threads
  • 10. © 2012 IBM CorporationRN 10/21/12 10 Relaxing Synchronization in Kmeans (NU-MineBench)   Parallel evaluation of all points moving between clusters   Synchronization is used to update list of points moving   Stopping criterion: } while (delta > threshold && loop++ < 500);   Number of points moving (delta) < 0.1% of total points (threshold)   Subject to a maximum of 500 iterations   This is already an approximation   What would happen if synchronization were relaxed? Performance for 8 Clusters 0 10 20 30 40 50 60 2 3 4 5 6 7 8 Number of Threads ExecutionTime(secs) 0 1 2 3 4 5 6 7 8 9 10 Speedup Original Relaxed Speedup
  • 11. © 2012 IBM CorporationRN 10/21/12 11 Performance Improvement Performance Speedup with Relaxed Synchronization 0 2 4 6 8 10 12 14 4 5 6 7 8 9 10 11 12 13 Number of Clusters Speedup 2 threads 3 threads 4 threads 5 threads 6 threads 7 threads 8 threads Significant speedup observed in many combinations. No combination showed a degradation in performance.
  • 12. © 2012 IBM CorporationRN 10/21/12 12 Quality of Solution   Computed total sum of distances of all points from the centroids of clusters they belong to   Good indicator of quality of solution   Each bar in graph represents a different relaxed run   Non-determinism is a side- effect of relaxation Quality of solution is not degraded due to relaxed synchronization Degradation  in  Quality  of  Solution  -­‐  8  clusters,  8  thds 0.00% 0.05% 0.10% 0.15% 0.20% 0.25% 0.30% 0.35% 1 2 3 4 5 6 7 8 9 10 Run Number Differencebetween RelaxedcostandOriginalCost
  • 13. © 2012 IBM CorporationRN 10/21/12 13 Graph500   Breadth-first search (BFS) of a very large scale-free graph   Representative of certain real-life analytics problems   E.g. Mail a flyer to all people who are connected directly and transitively to a certain person or sets of persons   Experiments performed on version of award-winning IBM solution Sample Scale-Free Graphs
  • 14. © 2012 IBM CorporationRN 10/21/12 14 Initial Experiment: Relaxing Synchronization   The parallel examination of nodes on a wavefront requires an atomic update of a data structure representing list of visited nodes   Atomic update is a synchronizing operation   Question: What would happen if we updated the data structure with a simple OR, rather than an atomic-OR operation?
  • 15. © 2012 IBM CorporationRN 10/21/12 15 Sample Result   Each column represents a different relaxed run   Relaxed results are non-deterministic   Very few vertices missing   Results   Very few vertices with wrong level numbers   Better than 3x the performance   In most cases, it was possible to detect all missing vertices by simply validating the obtained tree CPUS 8 THREADS 16 SCALE 20 Time taken with accurate sync 0.248 0.248 0.248 0.248 0.248 Time taken with relaxed sync 0.077 0.078 0.078 0.077 0.078 Performance gain with relaxed sync 3.20 3.20 3.20 3.20 3.19 Vertices visited in accurate list 646056 646056 646056 646056 646056 Vertices visited in relaxed list 645985 645991 645990 646002 645993 Vertices common in both 645985 645991 645990 646002 645993 Missing vertices 29 35 34 46 37 Other vertices with wrong level numbers 71 49 66 34 59
  • 16. © 2012 IBM CorporationRN 10/21/12 16 Relax-and-Check: A Framework to Exploit Relaxed Synchronization   Assume there is a criterion that allows us to estimate the quality of a solution without adding high cost   Relax-and-Check   Modifies an existing program   Run initially without synchronizations   Revert to fully synchronized version if quality does not fall within acceptable limits   Such criteria are often already explicit in programs   E.g. convergence criteria   Other “syndromes” can often be identified
  • 17. © 2012 IBM CorporationRN 10/21/12 17 Sample Results   Three benchmarks from STAMP suite   Acceptability criteria of results identical to original   Significant reduction of run time due to relaxation of synchronization requirement SSCA2 Kmeans (STAMP) Labyrinth
  • 18. © 2012 IBM CorporationRN 10/21/12 18 Relax-and-Check Strategy for BFS: A Slight Twist   Very high correlation between validation errors and time to solution   Possibly because of redundant visit of vertices   Also, these occurred principally when hardware parameters were not suitable for problem size   Use validation errors as measure of quality   Validate the solution produced by approximate technique   If number of errors exceeds a threshold, say 1000, revert to synchronized solution for future instances with similar problem size Performance vs. validation errors 0 0.5 1 1.5 2 2.5 3 3.5 0 500 1000 1500 2000 2500 3000 3500 4000 Number of validation errors Performancerelativeto accuratesolution scale = 18 scale = 20 scale = 22 scale = 24 Relax-and-Check provides a technique to exploit the advantages of relaxed synchronization with a quality safety-net
  • 19. © 2012 IBM CorporationRN 10/21/12 19 Applications Suitable for Relax-and-Check Domain / Applications Examples Scientific Finite Difference Methods, Successive over- relaxation, N-body methods Data Mining, Text Mining, Risk Analysis, Machine Learning K-means, Hierarchical clustering, Multinomial regression, Feature reduction Numerical optimization, Forecasting, Financial Modeling, and Mathematical Programming Non-convex optimization solvers, constrained optimization, multi-dimensional cubing EDA VLSI place and route Graph processing and Social Network Analysis Mesh refinement, Metis style graph partitioning, Graph traversals Program analyses in compilers and software engineering Iterative data flow analysis A large majority of applications in these domains choose a good solution from many correct solutions, and use acceptance criteria such as convergence tests, fixed-point tests, or boolean acceptability tests
  • 20. © 2012 IBM CorporationRN 10/21/12 20 Hardware Exact Less Exact Accurate Less Accurate, less up-to-date, possibly corrupted Reliable Variable Computation Data Computing model today Future model Human Brain Path to Efficiency Move Towards Attributes of the Human Brain