SlideShare une entreprise Scribd logo
1  sur  31
Concurrency Control for 
Parallel Machine Learning 
Dimitris Papailiopoulos 
Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick, Dimitris Papailiopoulos, Joseph 
Bradley, Michael I. Jordan
Model 
State 
Data 
Serial Inference
Model 
State 
Parallel Inference 
Processor 1 
Processor 2 
Data
Model 
State 
Data 
Parallel Inference 
Processor 1 
Processor 2 
Concurrency: 
more machines = less time 
Correctness: 
serial equivalence 
?
Model 
State 
Data 
Coordination Free Parallel 
Inference 
Processor 1 
Processor 2 
? 
Ignore collisions 
Concurrency: 
(almost) free 
+ 
Speedup = #CPU 
Correctness? 
Not always...
Correctness 
Serial 
Low High
Correctness 
Concurrency 
Coordination-free 
Serial 
High 
Low High 
Low
Correctness 
Concurrency 
Coordination-free 
Serial 
High 
Low High 
Low 
Concurrency 
Control 
Database mechanisms 
o Guarantee correctness 
o Maximize concurrency 
 Mutual exclusion 
 Optimistic CC
Model 
State 
Data 
Mutual Exclusion Through 
Locking 
Processor 1 
Processor 2 
Introduce locking (scheduling) protocols to prevent 
conflicts.
Mutual Exclusion Through 
Model 
State 
Data 
Processor 1 
Processor 2 
Locking 
✗ 
Enforce local serialization to avoid conflicts.
Optimistic Concurrency Control 
Model 
State 
Data 
Processor 1 
Processor 2 
Allow computation to proceed without blocking. 
Kung & Robinson. On optimistic methods for concurrency 
control.
Optimistic Concurrency Control 
Model 
State 
Data 
Invalid Outcome 
✗ ✗ 
Processor 1 
Processor 2 
Validate potential conflicts. 
Kung & Robinson. On optimistic methods for concurrency 
control.
Optimistic Concurrency Control 
Model 
State 
Data 
✗ ✗ 
Processor 1 
Processor 2 
Rollback and Redo 
Take a compensating action. 
Kung & Robinson. On optimistic methods for concurrency 
control.
Concurrency Control 
14 
Coordination Free: 
Provably fast and correct under key assumptions. 
Concurrency Control: 
Provably correct and fast under key assumptions. 
Systems Ideas to 
Improve Efficiency
Machine Learning + Concurrency 
Clusteri 
ng 
Online 
Facility 
Location 
Control 
(Xinghao Pan et al.) 
Submodular 
Maximization 
Subset selection, diminishing 
marginal gains 
Max Graph 
Cut 
Set 
Cover 
Sensor Placement 
Social Network 
Influence 
Propagation 
Document 
Summarization 
Sports 
Football 
Word Series 
Giants 
Cardinals 
Politics 
Midterm 
Obama 
Democrat 
Tea 
Finance 
QE 
market 
interest 
Dow 
Topic Modelling 
Correlation 
Clustering 
Deduplication 
Community 
Detection
Machine Learning + Concurrency 
Clusteri 
ng 
Online 
Facility 
Location 
Control 
(Xinghao Pan et al.) 
Submodular 
Maximization 
Subset selection, diminishing 
marginal gains 
Max Graph 
Cut 
Set 
Cover 
Sensor Placement 
Social Network 
Influence 
Propagation 
Document 
Summarization 
Sports 
Football 
Word Series 
Giants 
Cardinals 
Politics 
Midterm 
Obama 
Democrat 
Tea 
Finance 
QE 
market 
interest 
Dow 
Topic Modelling 
Correlation 
Clustering 
Deduplication 
Community 
Detection 
Serial ML 
algorithm 
Sequence of 
transactions 
Identify potential 
conflicts 
Apply Concurrency 
Control 
mechanisms 
Parallel ML 
algorithm
Application: Deduplication 
Computer Science 
Division – University of 
California Berkeley CA 
University of California at Berkeley 
Department of 
Physics Stanford 
University California 
Lawrence Berkeley National 
Labs <ref>California</ref>
Application: Deduplication
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices
Serial Correlation Clustering 
Nir Ailon, Moses Charikar, and Alantha Newman. 
Aggregating inconsistent information: ranking and clustering. 
Journal of the ACM (JACM), 55(5):23, 2008. 
Serially process vertices 
Approximation 3 OPT (in expectation)
Parallel Correlation Clustering
Concurrency Control Correlation Clustering 
(C4) Parallel Correlation Clustering 
Cannot Resolve introduce 
by 
Mutual adjacent Exclusion 
cluster 
centers
Concurrency Control Correlation Clustering 
(C4) 
Common Resolve neighbor by 
must be 
assigned Optimistic to Concurrency 
earliest center 
Control 
? 
Optimistic Assumption 
No other new cluster created 
Resolution 
Assign common neighbor to earliest cluster
Properties of C4 
(Concurrency Control Correlation Clustering) 
Theorem: C4 is correct. 
C4 preserves same guarantees as serial algorithm (3 
OPT). 
Concurren Correctness 
Theorem: C4 has provably small overheads. 
cy 
= almost linear speedup 
Expected #blocked transactions < 2τ |E| / |V|. 
τ ≡ diff in parallel cpu’s progress
Empirical Validation on Billion Edge 
Graphs 
Amazon EC2 r3.8xlarge instances 
Multicore up to 16 threads 
Real and synthetic graphs 
100 runs (10 random orderings x 10 runs) 
Graph Vertices Edges 
IT-2004 Italian web-graph 41 Million 1.14 Billion 
Webbase-2001 WebBase crawl 118 Million 1.02 Billion 
Erdos-Renyi Synthetic random 100 Million ≈ 1.0 Billion
C4: Cost of Coordination 
< 0.02% blocked
C4: Speed-up 
Ideal 
10x 
speedu 
p
Conclusion 
Concurrency Control 
for Parallel ML 
o Guarantee 
correctness 
o Maximize 
concurrency 
Code release in the works! 
https://amplab.cs.berkeley.edu/projects/cc 
ml/ 
xinghao@berkeley.edu 
Applications 
Correlation Clustering 
Submodular Maximization 
Clustering 
Online Facility Location 
Feature Modeling

Contenu connexe

Similaire à Concurrency Control for Parallel Machine Learning

Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
Data Con LA
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
butest
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learning
butest
 

Similaire à Concurrency Control for Parallel Machine Learning (20)

Black-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsBlack-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software Systems
 
October 26, Optimization
October 26, OptimizationOctober 26, Optimization
October 26, Optimization
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
My
MyMy
My
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
PPT
PPTPPT
PPT
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club
 
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDIMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learning
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 

Plus de jeykottalam (8)

AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 

Dernier

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Concurrency Control for Parallel Machine Learning

  • 1. Concurrency Control for Parallel Machine Learning Dimitris Papailiopoulos Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick, Dimitris Papailiopoulos, Joseph Bradley, Michael I. Jordan
  • 2. Model State Data Serial Inference
  • 3. Model State Parallel Inference Processor 1 Processor 2 Data
  • 4. Model State Data Parallel Inference Processor 1 Processor 2 Concurrency: more machines = less time Correctness: serial equivalence ?
  • 5. Model State Data Coordination Free Parallel Inference Processor 1 Processor 2 ? Ignore collisions Concurrency: (almost) free + Speedup = #CPU Correctness? Not always...
  • 7. Correctness Concurrency Coordination-free Serial High Low High Low
  • 8. Correctness Concurrency Coordination-free Serial High Low High Low Concurrency Control Database mechanisms o Guarantee correctness o Maximize concurrency  Mutual exclusion  Optimistic CC
  • 9. Model State Data Mutual Exclusion Through Locking Processor 1 Processor 2 Introduce locking (scheduling) protocols to prevent conflicts.
  • 10. Mutual Exclusion Through Model State Data Processor 1 Processor 2 Locking ✗ Enforce local serialization to avoid conflicts.
  • 11. Optimistic Concurrency Control Model State Data Processor 1 Processor 2 Allow computation to proceed without blocking. Kung & Robinson. On optimistic methods for concurrency control.
  • 12. Optimistic Concurrency Control Model State Data Invalid Outcome ✗ ✗ Processor 1 Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.
  • 13. Optimistic Concurrency Control Model State Data ✗ ✗ Processor 1 Processor 2 Rollback and Redo Take a compensating action. Kung & Robinson. On optimistic methods for concurrency control.
  • 14. Concurrency Control 14 Coordination Free: Provably fast and correct under key assumptions. Concurrency Control: Provably correct and fast under key assumptions. Systems Ideas to Improve Efficiency
  • 15. Machine Learning + Concurrency Clusteri ng Online Facility Location Control (Xinghao Pan et al.) Submodular Maximization Subset selection, diminishing marginal gains Max Graph Cut Set Cover Sensor Placement Social Network Influence Propagation Document Summarization Sports Football Word Series Giants Cardinals Politics Midterm Obama Democrat Tea Finance QE market interest Dow Topic Modelling Correlation Clustering Deduplication Community Detection
  • 16. Machine Learning + Concurrency Clusteri ng Online Facility Location Control (Xinghao Pan et al.) Submodular Maximization Subset selection, diminishing marginal gains Max Graph Cut Set Cover Sensor Placement Social Network Influence Propagation Document Summarization Sports Football Word Series Giants Cardinals Politics Midterm Obama Democrat Tea Finance QE market interest Dow Topic Modelling Correlation Clustering Deduplication Community Detection Serial ML algorithm Sequence of transactions Identify potential conflicts Apply Concurrency Control mechanisms Parallel ML algorithm
  • 17. Application: Deduplication Computer Science Division – University of California Berkeley CA University of California at Berkeley Department of Physics Stanford University California Lawrence Berkeley National Labs <ref>California</ref>
  • 19. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  • 20. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  • 21. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  • 22. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices
  • 23. Serial Correlation Clustering Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Serially process vertices Approximation 3 OPT (in expectation)
  • 25. Concurrency Control Correlation Clustering (C4) Parallel Correlation Clustering Cannot Resolve introduce by Mutual adjacent Exclusion cluster centers
  • 26. Concurrency Control Correlation Clustering (C4) Common Resolve neighbor by must be assigned Optimistic to Concurrency earliest center Control ? Optimistic Assumption No other new cluster created Resolution Assign common neighbor to earliest cluster
  • 27. Properties of C4 (Concurrency Control Correlation Clustering) Theorem: C4 is correct. C4 preserves same guarantees as serial algorithm (3 OPT). Concurren Correctness Theorem: C4 has provably small overheads. cy = almost linear speedup Expected #blocked transactions < 2τ |E| / |V|. τ ≡ diff in parallel cpu’s progress
  • 28. Empirical Validation on Billion Edge Graphs Amazon EC2 r3.8xlarge instances Multicore up to 16 threads Real and synthetic graphs 100 runs (10 random orderings x 10 runs) Graph Vertices Edges IT-2004 Italian web-graph 41 Million 1.14 Billion Webbase-2001 WebBase crawl 118 Million 1.02 Billion Erdos-Renyi Synthetic random 100 Million ≈ 1.0 Billion
  • 29. C4: Cost of Coordination < 0.02% blocked
  • 30. C4: Speed-up Ideal 10x speedu p
  • 31. Conclusion Concurrency Control for Parallel ML o Guarantee correctness o Maximize concurrency Code release in the works! https://amplab.cs.berkeley.edu/projects/cc ml/ xinghao@berkeley.edu Applications Correlation Clustering Submodular Maximization Clustering Online Facility Location Feature Modeling