SlideShare a Scribd company logo
1 of 25
Sasha Goldshtein
CTO, Sela Group
Task and Data Parallelism
Agenda
•Multicore machines have been a cheap
commodity for >10 years
•Adoption of concurrent programming is
still slow
•Patterns and best practices are scarce
•We discuss the APIs first…
•…and then turn to examples, best
practices, and tips
TPL Evolution
• GPU
parallelism?
• SIMD
support?
• Language-
level
parallelism?
The Future
• DataFlow in
.NET 4.5
(NuGet)
• Augmented
with
language
support
(await, async
methods)
2012
• Released in
full glory
with .NET
4.0
2010
• Incubated
for 3 years as
“Parallel
Extensions
for .NET”
2008
Tasks
•A task is a unit of work
–May be executed in parallel with other tasks by
a scheduler (e.g. Thread Pool)
–Much more than threads, and yet much
cheaper
Task<string> t = Task.Factory.StartNew(
() => { return DnaSimulation(…); });
t.ContinueWith(r => Show(r.Exception),
TaskContinuationOptions.OnlyOnFaulted);
t.ContinueWith(r => Show(r.Result),
TaskContinuationOptions.OnlyOnRanToCompletion);
DisplayProgress();
try { //The C# 5.0 version
var task = Task.Run(DnaSimulation);
DisplayProgress();
Show(await task);
} catch (Exception ex) {
Show(ex);
}
Parallel Loops
•Ideal for parallelizing work over a collection
of data
•Easy porting of for and foreach loops
–Beware of inter-iteration dependencies!
Parallel.For(0, 100, i => {
...
});
Parallel.ForEach(urls, url => {
webClient.Post(url, options, data);
});
Parallel LINQ
•Mind-bogglingly easy parallelization of
LINQ queries
•Can introduce ordering into the pipeline, or
preserve order of original elements
var query = from monster in monsters.AsParallel()
where monster.IsAttacking
let newMonster = SimulateMovement(monster)
orderby newMonster.XP
select newMonster;
query.ForAll(monster => Move(monster));
Measuring Concurrency
•Visual Studio Concurrency Visualizer to the
rescue
Recursive Parallelism Extraction
•Divide-and-conquer algorithms are often
parallelized through the recursive call
–Be careful with parallelization threshold and
watch out for dependencies
void FFT(float[] src, float[] dst, int n, int r, int s) {
if (n == 1) {
dst[r] = src[r];
} else {
FFT(src, n/2, r, s*2);
FFT(src, n/2, r+s, s*2);
//Combine the two halves in O(n) time
}
}
Parallel.Invoke(
() => FFT(src, n/2, r, s*2),
() => FFT(src, n/2, r+s, s*2)
);
DEMO
Recursive parallel QuickSort
Symmetric Data Processing
•For a large set of uniform data items that
need to processed, parallel loops are usually
the best choice and lead to ideal work
distribution
•Inter-iteration dependencies complicate
things (think in-place blur)
Parallel.For(0, image.Rows, i => {
for (int j = 0; j < image.Cols; ++j) {
destImage.SetPixel(i, j, PixelBlur(image, i, j));
}
});
Uneven Work Distribution
•With non-uniform data items, use custom
partitioning or manual distribution
–Primes: 7 is easier to check than 10,320,647
var work = Enumerable.Range(0, Environment.ProcessorCount)
.Select(n => Task.Run(() =>
CountPrimes(start+chunk*n, start+chunk*(n+1))));
Task.WaitAll(work.ToArray());
versus
Parallel.ForEach(Partitioner.Create(Start, End, chunkSize),
chunk => CountPrimes(chunk.Item1, chunk.Item2)
);
DEMO
Uneven workload distribution
Complex Dependency Management
•Must extract all dependencies and
incorporate them into the algorithm
–Typical scenarios: 1D loops, dynamic
algorithms
–Edit distance: each task depends on 2
predecessors, wavefront
C = x[i-1] == y[i-1] ? 0 : 1;
D[i, j] = min(
D[i-1, j] + 1,
D[i, j-1] + 1,
D[i-1, j-1] + C);
0,0
m,n
DEMO
Dependency management
Synchronization > Aggregation
•Excessive synchronization brings parallel
code to its knees
–Try to avoid shared state
–Aggregate thread- or task-local state and mergeParallel.ForEach(
Partitioner.Create(Start, End, ChunkSize),
() => new List<int>(), //initial local state
(range, pls, localPrimes) => { //aggregator
for (int i = range.Item1; i < range.Item2; ++i)
if (IsPrime(i)) localPrimes.Add(i);
return localPrimes;
},
localPrimes => { lock (primes) //combiner
primes.AddRange(localPrimes);
});
DEMO
Aggregation
Creative Synchronization
• We implement a collection of stock prices,
initialized with 105 name/price pairs
– 107 reads/s, 106 “update” writes/s, 103 “add”
writes/day
– Many reader threads, many writer threads
GET(key):
if safe contains key then return safe[key]
lock { return unsafe[key] }
PUT(key, value):
if safe contains key then safe[key] = value
lock { unsafe[key] = value }
Lock-Free Patterns (1)
•Try to avoid Windows synchronization and
use hardware synchronization
–Primitive operations such as
Interlocked.Increment,
Interlocked.CompareExchange
–Retry pattern with
Interlocked.CompareExchange enables
arbitrary lock-free algorithms
int InterlockedMultiply(ref int x, int y) {
int t, r;
do {
t = x;
r = t * y;
}
while (Interlocked.CompareExchange(ref x, r, t) != t);
return r;
}
Oldvalue
Newvalue
Comparand
Lock-Free Patterns (2)
•User-mode spinlocks (SpinLock class) can
replace locks you acquire very often, which
protect tiny computations
class __DontUseMe__SpinLock {
private volatile int _lck;
public void Enter() {
while (Interlocked.CompareExchange(ref _lck, 1, 0) != 0);
}
public void Exit() {
_lck = 0;
}
}
Miscellaneous Tips (1)
•Don’t mix several concurrency frameworks
in the same process
•Some parallel work is best organized in
pipelines – TPL DataFlow
BroadcastBlock
<Uri>
TransformBlock
<Uri, byte[]>
TransformBlock
<byte[],
string>
ActionBlock
<string>
Miscellaneous Tips (2)
•Some parallel work can be offloaded to the
GPU – C++ AMP
void vadd_exp(float* x, float* y, float* z, int n) {
array_view<const float,1> avX(n, x), avY(n, y);
array_view<float,1> avZ(n, z);
avZ.discard_data();
parallel_for_each(avZ.extent, [=](index<1> i) ... {
avZ[i] = avX[i] + fast_math::exp(avY[i]);
});
avZ.synchronize();
}
Miscellaneous Tips (3)
•Invest in SIMD parallelization of heavy
math or data-parallel algorithms
–Already available on Mono (Mono.Simd)
•Make sure to take cache effects into
account, especially on MP systems
START:
movups xmm0, [esi+4*ecx]
addps xmm0, [edi+4*ecx]
movups [ebx+4*ecx], xmm0
sub ecx, 4
jns START
Summary
• Avoid shared state and synchronization
• Parallelize judiciously and apply
thresholds
• Measure and understand performance
gains or losses
• Concurrency and parallelism are still hard
• A body of best practices, tips, patterns,
examples is being built
Additional References
THANK YOU!
Sasha Goldshtein
CTO, Sela Group
blog.sashag.net
@goldshtn

More Related Content

What's hot

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnArnaud Joly
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabIntroduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabCloudxLab
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theanoMassimo Quadrana
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersBayu Aldi Yansyah
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...Edureka!
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Databricks
 
Uncommon Design Patterns
Uncommon Design PatternsUncommon Design Patterns
Uncommon Design PatternsStefano Fago
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...Edge AI and Vision Alliance
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installationmarwa Ayad Mohamed
 
Lec05 buffers basic_examples
Lec05 buffers basic_examplesLec05 buffers basic_examples
Lec05 buffers basic_examplesTaras Zakharchenko
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimizationTaras Zakharchenko
 
The Erlang Programming Language
The Erlang Programming LanguageThe Erlang Programming Language
The Erlang Programming LanguageDennis Byrne
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnGilles Louppe
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
Introduction to theano, case study of Word Embeddings
Introduction to theano, case study of Word EmbeddingsIntroduction to theano, case study of Word Embeddings
Introduction to theano, case study of Word EmbeddingsShashank Gupta
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learnJeff Klukas
 
Constructors and Destructors
Constructors and DestructorsConstructors and Destructors
Constructors and DestructorsKeyur Vadodariya
 
Integrating Erlang and Java
Integrating Erlang and Java Integrating Erlang and Java
Integrating Erlang and Java Dennis Byrne
 

What's hot (20)

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabIntroduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theano
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
Dive Into PyTorch
Dive Into PyTorchDive Into PyTorch
Dive Into PyTorch
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Uncommon Design Patterns
Uncommon Design PatternsUncommon Design Patterns
Uncommon Design Patterns
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installation
 
Lec05 buffers basic_examples
Lec05 buffers basic_examplesLec05 buffers basic_examples
Lec05 buffers basic_examples
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimization
 
The Erlang Programming Language
The Erlang Programming LanguageThe Erlang Programming Language
The Erlang Programming Language
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Tensor board
Tensor boardTensor board
Tensor board
 
Introduction to theano, case study of Word Embeddings
Introduction to theano, case study of Word EmbeddingsIntroduction to theano, case study of Word Embeddings
Introduction to theano, case study of Word Embeddings
 
Using Parallel Computing Platform - NHDNUG
Using Parallel Computing Platform - NHDNUGUsing Parallel Computing Platform - NHDNUG
Using Parallel Computing Platform - NHDNUG
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
Constructors and Destructors
Constructors and DestructorsConstructors and Destructors
Constructors and Destructors
 
Integrating Erlang and Java
Integrating Erlang and Java Integrating Erlang and Java
Integrating Erlang and Java
 

Viewers also liked

Concurrency basics
Concurrency basicsConcurrency basics
Concurrency basicsAnkur Choudhary
 
Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsSyed Zaid Irshad
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...Ahmed kasim
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Dr.K. Thirunadana Sikamani
 
Symmetric multiprocessing
Symmetric multiprocessingSymmetric multiprocessing
Symmetric multiprocessingMohammad Ali Khan
 
Smp and asmp architecture.
Smp and asmp architecture.Smp and asmp architecture.
Smp and asmp architecture.Gaurav Dalvi
 
IntelÂŽ hyper threading technology
IntelÂŽ hyper threading technologyIntelÂŽ hyper threading technology
IntelÂŽ hyper threading technologyAmirali Sharifian
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) A B Shinde
 

Viewers also liked (8)

Concurrency basics
Concurrency basicsConcurrency basics
Concurrency basics
 
Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
 
Symmetric multiprocessing
Symmetric multiprocessingSymmetric multiprocessing
Symmetric multiprocessing
 
Smp and asmp architecture.
Smp and asmp architecture.Smp and asmp architecture.
Smp and asmp architecture.
 
IntelÂŽ hyper threading technology
IntelÂŽ hyper threading technologyIntelÂŽ hyper threading technology
IntelÂŽ hyper threading technology
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 

Similar to Task and Data Parallelism

State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net PerformanceCUSTIS
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Dina Goldshtein
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and ParallelizationDmitri Nesteruk
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithmK Hari Shankar
 
Deuce STM - CMP'09
Deuce STM - CMP'09Deuce STM - CMP'09
Deuce STM - CMP'09Guy Korland
 
Java Performance Tweaks
Java Performance TweaksJava Performance Tweaks
Java Performance TweaksJim Bethancourt
 
Introduction to Python Objects and Strings
Introduction to Python Objects and StringsIntroduction to Python Objects and Strings
Introduction to Python Objects and StringsSangeetha S
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfInSync2011
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*IntelÂŽ Software
 
Robust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksRobust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksStoyan Nikolov
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATLAmine Benelallam
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowS N
 
Talk on Standard Template Library
Talk on Standard Template LibraryTalk on Standard Template Library
Talk on Standard Template LibraryAnirudh Raja
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using javaNarayan Sau
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Raffi Khatchadourian
 
DSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfDSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfArumugam90
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded ProgrammingSri Prasanna
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニックUnity Technologies Japan K.K.
 
LCDS - State Presentation
LCDS - State PresentationLCDS - State Presentation
LCDS - State PresentationRuochun Tzeng
 

Similar to Task and Data Parallelism (20)

State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and Parallelization
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithm
 
Deuce STM - CMP'09
Deuce STM - CMP'09Deuce STM - CMP'09
Deuce STM - CMP'09
 
Java Performance Tweaks
Java Performance TweaksJava Performance Tweaks
Java Performance Tweaks
 
Introduction to Python Objects and Strings
Introduction to Python Objects and StringsIntroduction to Python Objects and Strings
Introduction to Python Objects and Strings
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Robust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksRobust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time Checks
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATL
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Talk on Standard Template Library
Talk on Standard Template LibraryTalk on Standard Template Library
Talk on Standard Template Library
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using java
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
 
DSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfDSJ_Unit I & II.pdf
DSJ_Unit I & II.pdf
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
 
LCDS - State Presentation
LCDS - State PresentationLCDS - State Presentation
LCDS - State Presentation
 

More from Sasha Goldshtein

Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeSasha Goldshtein
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerSasha Goldshtein
 
Staring into the eBPF Abyss
Staring into the eBPF AbyssStaring into the eBPF Abyss
Staring into the eBPF AbyssSasha Goldshtein
 
Visual Studio 2015 and the Next .NET Framework
Visual Studio 2015 and the Next .NET FrameworkVisual Studio 2015 and the Next .NET Framework
Visual Studio 2015 and the Next .NET FrameworkSasha Goldshtein
 
Swift: Apple's New Programming Language for iOS and OS X
Swift: Apple's New Programming Language for iOS and OS XSwift: Apple's New Programming Language for iOS and OS X
Swift: Apple's New Programming Language for iOS and OS XSasha Goldshtein
 
C# Everywhere: Cross-Platform Mobile Apps with Xamarin
C# Everywhere: Cross-Platform Mobile Apps with XamarinC# Everywhere: Cross-Platform Mobile Apps with Xamarin
C# Everywhere: Cross-Platform Mobile Apps with XamarinSasha Goldshtein
 
Modern Backends for Mobile Apps
Modern Backends for Mobile AppsModern Backends for Mobile Apps
Modern Backends for Mobile AppsSasha Goldshtein
 
.NET Debugging Workshop
.NET Debugging Workshop.NET Debugging Workshop
.NET Debugging WorkshopSasha Goldshtein
 
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013Sasha Goldshtein
 
Mastering IntelliTrace in Development and Production
Mastering IntelliTrace in Development and ProductionMastering IntelliTrace in Development and Production
Mastering IntelliTrace in Development and ProductionSasha Goldshtein
 
Introduction to RavenDB
Introduction to RavenDBIntroduction to RavenDB
Introduction to RavenDBSasha Goldshtein
 
State of the Platforms
State of the PlatformsState of the Platforms
State of the PlatformsSasha Goldshtein
 
Delivering Millions of Push Notifications in Minutes
Delivering Millions of Push Notifications in MinutesDelivering Millions of Push Notifications in Minutes
Delivering Millions of Push Notifications in MinutesSasha Goldshtein
 
Building Mobile Apps with a Mobile Services .NET Backend
Building Mobile Apps with a Mobile Services .NET BackendBuilding Mobile Apps with a Mobile Services .NET Backend
Building Mobile Apps with a Mobile Services .NET BackendSasha Goldshtein
 
Building iOS and Android Apps with Mobile Services
Building iOS and Android Apps with Mobile ServicesBuilding iOS and Android Apps with Mobile Services
Building iOS and Android Apps with Mobile ServicesSasha Goldshtein
 
What's New in C++ 11?
What's New in C++ 11?What's New in C++ 11?
What's New in C++ 11?Sasha Goldshtein
 
Attacking Web Applications
Attacking Web ApplicationsAttacking Web Applications
Attacking Web ApplicationsSasha Goldshtein
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile ServicesSasha Goldshtein
 
First Steps in Android Development
First Steps in Android DevelopmentFirst Steps in Android Development
First Steps in Android DevelopmentSasha Goldshtein
 
First Steps in iOS Development
First Steps in iOS DevelopmentFirst Steps in iOS Development
First Steps in iOS DevelopmentSasha Goldshtein
 

More from Sasha Goldshtein (20)

Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF Primer
 
Staring into the eBPF Abyss
Staring into the eBPF AbyssStaring into the eBPF Abyss
Staring into the eBPF Abyss
 
Visual Studio 2015 and the Next .NET Framework
Visual Studio 2015 and the Next .NET FrameworkVisual Studio 2015 and the Next .NET Framework
Visual Studio 2015 and the Next .NET Framework
 
Swift: Apple's New Programming Language for iOS and OS X
Swift: Apple's New Programming Language for iOS and OS XSwift: Apple's New Programming Language for iOS and OS X
Swift: Apple's New Programming Language for iOS and OS X
 
C# Everywhere: Cross-Platform Mobile Apps with Xamarin
C# Everywhere: Cross-Platform Mobile Apps with XamarinC# Everywhere: Cross-Platform Mobile Apps with Xamarin
C# Everywhere: Cross-Platform Mobile Apps with Xamarin
 
Modern Backends for Mobile Apps
Modern Backends for Mobile AppsModern Backends for Mobile Apps
Modern Backends for Mobile Apps
 
.NET Debugging Workshop
.NET Debugging Workshop.NET Debugging Workshop
.NET Debugging Workshop
 
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
 
Mastering IntelliTrace in Development and Production
Mastering IntelliTrace in Development and ProductionMastering IntelliTrace in Development and Production
Mastering IntelliTrace in Development and Production
 
Introduction to RavenDB
Introduction to RavenDBIntroduction to RavenDB
Introduction to RavenDB
 
State of the Platforms
State of the PlatformsState of the Platforms
State of the Platforms
 
Delivering Millions of Push Notifications in Minutes
Delivering Millions of Push Notifications in MinutesDelivering Millions of Push Notifications in Minutes
Delivering Millions of Push Notifications in Minutes
 
Building Mobile Apps with a Mobile Services .NET Backend
Building Mobile Apps with a Mobile Services .NET BackendBuilding Mobile Apps with a Mobile Services .NET Backend
Building Mobile Apps with a Mobile Services .NET Backend
 
Building iOS and Android Apps with Mobile Services
Building iOS and Android Apps with Mobile ServicesBuilding iOS and Android Apps with Mobile Services
Building iOS and Android Apps with Mobile Services
 
What's New in C++ 11?
What's New in C++ 11?What's New in C++ 11?
What's New in C++ 11?
 
Attacking Web Applications
Attacking Web ApplicationsAttacking Web Applications
Attacking Web Applications
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile Services
 
First Steps in Android Development
First Steps in Android DevelopmentFirst Steps in Android Development
First Steps in Android Development
 
First Steps in iOS Development
First Steps in iOS DevelopmentFirst Steps in iOS Development
First Steps in iOS Development
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 

Task and Data Parallelism

  • 1. Sasha Goldshtein CTO, Sela Group Task and Data Parallelism
  • 2. Agenda •Multicore machines have been a cheap commodity for >10 years •Adoption of concurrent programming is still slow •Patterns and best practices are scarce •We discuss the APIs first… •…and then turn to examples, best practices, and tips
  • 3. TPL Evolution • GPU parallelism? • SIMD support? • Language- level parallelism? The Future • DataFlow in .NET 4.5 (NuGet) • Augmented with language support (await, async methods) 2012 • Released in full glory with .NET 4.0 2010 • Incubated for 3 years as “Parallel Extensions for .NET” 2008
  • 4. Tasks •A task is a unit of work –May be executed in parallel with other tasks by a scheduler (e.g. Thread Pool) –Much more than threads, and yet much cheaper Task<string> t = Task.Factory.StartNew( () => { return DnaSimulation(…); }); t.ContinueWith(r => Show(r.Exception), TaskContinuationOptions.OnlyOnFaulted); t.ContinueWith(r => Show(r.Result), TaskContinuationOptions.OnlyOnRanToCompletion); DisplayProgress(); try { //The C# 5.0 version var task = Task.Run(DnaSimulation); DisplayProgress(); Show(await task); } catch (Exception ex) { Show(ex); }
  • 5. Parallel Loops •Ideal for parallelizing work over a collection of data •Easy porting of for and foreach loops –Beware of inter-iteration dependencies! Parallel.For(0, 100, i => { ... }); Parallel.ForEach(urls, url => { webClient.Post(url, options, data); });
  • 6. Parallel LINQ •Mind-bogglingly easy parallelization of LINQ queries •Can introduce ordering into the pipeline, or preserve order of original elements var query = from monster in monsters.AsParallel() where monster.IsAttacking let newMonster = SimulateMovement(monster) orderby newMonster.XP select newMonster; query.ForAll(monster => Move(monster));
  • 7. Measuring Concurrency •Visual Studio Concurrency Visualizer to the rescue
  • 8. Recursive Parallelism Extraction •Divide-and-conquer algorithms are often parallelized through the recursive call –Be careful with parallelization threshold and watch out for dependencies void FFT(float[] src, float[] dst, int n, int r, int s) { if (n == 1) { dst[r] = src[r]; } else { FFT(src, n/2, r, s*2); FFT(src, n/2, r+s, s*2); //Combine the two halves in O(n) time } } Parallel.Invoke( () => FFT(src, n/2, r, s*2), () => FFT(src, n/2, r+s, s*2) );
  • 10. Symmetric Data Processing •For a large set of uniform data items that need to processed, parallel loops are usually the best choice and lead to ideal work distribution •Inter-iteration dependencies complicate things (think in-place blur) Parallel.For(0, image.Rows, i => { for (int j = 0; j < image.Cols; ++j) { destImage.SetPixel(i, j, PixelBlur(image, i, j)); } });
  • 11. Uneven Work Distribution •With non-uniform data items, use custom partitioning or manual distribution –Primes: 7 is easier to check than 10,320,647 var work = Enumerable.Range(0, Environment.ProcessorCount) .Select(n => Task.Run(() => CountPrimes(start+chunk*n, start+chunk*(n+1)))); Task.WaitAll(work.ToArray()); versus Parallel.ForEach(Partitioner.Create(Start, End, chunkSize), chunk => CountPrimes(chunk.Item1, chunk.Item2) );
  • 13. Complex Dependency Management •Must extract all dependencies and incorporate them into the algorithm –Typical scenarios: 1D loops, dynamic algorithms –Edit distance: each task depends on 2 predecessors, wavefront C = x[i-1] == y[i-1] ? 0 : 1; D[i, j] = min( D[i-1, j] + 1, D[i, j-1] + 1, D[i-1, j-1] + C); 0,0 m,n
  • 15. Synchronization > Aggregation •Excessive synchronization brings parallel code to its knees –Try to avoid shared state –Aggregate thread- or task-local state and mergeParallel.ForEach( Partitioner.Create(Start, End, ChunkSize), () => new List<int>(), //initial local state (range, pls, localPrimes) => { //aggregator for (int i = range.Item1; i < range.Item2; ++i) if (IsPrime(i)) localPrimes.Add(i); return localPrimes; }, localPrimes => { lock (primes) //combiner primes.AddRange(localPrimes); });
  • 17. Creative Synchronization • We implement a collection of stock prices, initialized with 105 name/price pairs – 107 reads/s, 106 “update” writes/s, 103 “add” writes/day – Many reader threads, many writer threads GET(key): if safe contains key then return safe[key] lock { return unsafe[key] } PUT(key, value): if safe contains key then safe[key] = value lock { unsafe[key] = value }
  • 18. Lock-Free Patterns (1) •Try to avoid Windows synchronization and use hardware synchronization –Primitive operations such as Interlocked.Increment, Interlocked.CompareExchange –Retry pattern with Interlocked.CompareExchange enables arbitrary lock-free algorithms int InterlockedMultiply(ref int x, int y) { int t, r; do { t = x; r = t * y; } while (Interlocked.CompareExchange(ref x, r, t) != t); return r; } Oldvalue Newvalue Comparand
  • 19. Lock-Free Patterns (2) •User-mode spinlocks (SpinLock class) can replace locks you acquire very often, which protect tiny computations class __DontUseMe__SpinLock { private volatile int _lck; public void Enter() { while (Interlocked.CompareExchange(ref _lck, 1, 0) != 0); } public void Exit() { _lck = 0; } }
  • 20. Miscellaneous Tips (1) •Don’t mix several concurrency frameworks in the same process •Some parallel work is best organized in pipelines – TPL DataFlow BroadcastBlock <Uri> TransformBlock <Uri, byte[]> TransformBlock <byte[], string> ActionBlock <string>
  • 21. Miscellaneous Tips (2) •Some parallel work can be offloaded to the GPU – C++ AMP void vadd_exp(float* x, float* y, float* z, int n) { array_view<const float,1> avX(n, x), avY(n, y); array_view<float,1> avZ(n, z); avZ.discard_data(); parallel_for_each(avZ.extent, [=](index<1> i) ... { avZ[i] = avX[i] + fast_math::exp(avY[i]); }); avZ.synchronize(); }
  • 22. Miscellaneous Tips (3) •Invest in SIMD parallelization of heavy math or data-parallel algorithms –Already available on Mono (Mono.Simd) •Make sure to take cache effects into account, especially on MP systems START: movups xmm0, [esi+4*ecx] addps xmm0, [edi+4*ecx] movups [ebx+4*ecx], xmm0 sub ecx, 4 jns START
  • 23. Summary • Avoid shared state and synchronization • Parallelize judiciously and apply thresholds • Measure and understand performance gains or losses • Concurrency and parallelism are still hard • A body of best practices, tips, patterns, examples is being built
  • 25. THANK YOU! Sasha Goldshtein CTO, Sela Group blog.sashag.net @goldshtn