SlideShare une entreprise Scribd logo
1  sur  22
PARALLEL & ASYNC
PROCESSING USING TPL
DATAFLOW
Petru Rebeja
AGENDA
• What is Dataflow?
• When to use it?
• How to use it?
• Q&A
THE BIG PICTURE
CLR Thread Pool
Tasks
PLINQ Parallel Loops
Concurrent Collections
Dataflow
DATAFLOW BENEFITS
• Effortless use of multi-threading
• Performance boost via painless optimization
• Development focus is on the ‘what’ rather than ‘how’
DATAFLOW USAGES
High throughput, low-latency scenarios
Robotics
Manufacturing
Imaging Biology
Oil & Gas
Finance
PROGRAMMING MODEL
• Actor-based programming
• In-process message passing
• Components (blocks) for creating data processing pipelines
ARCHITECTURE
IDataflowBlock
ISourceBlock<TOutput> ITargetBlock<TInput>
IPropagatorBlock<Tinput,Toutput>
COMPOSITION
Source
Target
Propagator
Optional
Transform
BUFFERING BLOCKS
BufferBlock<T>
BroadcastBlock<T>
WriteOnceBlock<T>
EXECUTION BLOCKS
ActionBlock<T>
TransformBlock<T,V>
TransformManyBlock<T,V>
GROUPING BLOCKS
BatchBlock<T>
JoinBlock<T1,T2,…>
BatchedJoinBlock<T1,T2>
BEHAVIOR CONFIGURATION OPTIONS
• BufferBlock<T>
• BroadcastBlock<T>
• WriteOnceBlock<T>
DataflowBlockOptions
• ActionBlock<T>
• TransformBlock<TIn, TOut>
• TransformManyBlock<TIn, TOut>
ExecutionDataflowBlockOptions
• BatchBlock<T>
• JoinBlock<T1, T2[, T3]>
• BatchedJoinBlock<T1, T2>
GroupingDataflowBlockOptions
COMPLETION & CANCELLATION
• To know when a block completes await block.Completion
or add a continuation task to it
• To propagate completion from source to target, set
DataflowLinkOptions.PropagateCompletion when
linking
• Set DataflowBlockOptions.CancellationToken to
enable cancellation
ERROR HANDLING
• If the exception does not affect the integrity of the
pipeline – use a try/catch inside the block
• Otherwise, handle errors outside of the pipeline by
• Adding a continuation to block.Completion
• Propagating errors through the pipeline
DEALING WITH CONCURRENCY
• Rule of thumb: avoid shared state whenever possible.
• Use ConcurrentExclusiveSchedulerPair to perform
updates on shared state
• Be aware of the caveats with
ConcurrentExclusiveSchedulerPair
CREATING CUSTOM BLOCKS
The easy way:
DataflowBlock.Encapsulate<TInput, TOutput>(
target, source)
CREATING CUSTOM BLOCKS
The hard(core) way:
class CustomBlock:
IPropagatorBlock<TInput, TOutput>
{
}
CREATING CUSTOM BLOCKS
Either way you choose, don’t forget to:
• Propagate completion
• Pool for cancellation
REFERENCES & FURTHER READING
Dataflow (Task Parallel Library) http://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx
Stephen Toub
TPL Dataflow Tour
http://channel9.msdn.com/posts/TPL-Dataflow-Tour
Joseph Albahari
The Future of .NET Parallel
Programming
http://channel9.msdn.com/events/TechEd/Australia/Tech-Ed-Australia-
2011/DEV308
Stephen Toub
Inside TPL Dataflow
http://channel9.msdn.com/Shows/Going+Deep/Stephen-Toub-Inside-TPL-
Dataflow
Alexey Kursov
Pipeline TPL Dataflow Usage examples
https://www.youtube.com/watch?v=AI9KxgDF43k
https://www.youtube.com/watch?v=AI9KxgDF43k
Richard Blewett, Andrew Clymer
Pro Asynchronous Programming with
.NET
APRESS 2013
ISBN: 978-1430259206
AKKA.NET http://getakka.net/
QUESTIONS?
THANK YOU!
Petru.Rebeja@gmail.com
Parallel & Async Processing using TPL
Dataflow

Contenu connexe

Tendances

Tendances (20)

Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Lego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming PipelinesLego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming Pipelines
 
InfluxDb
InfluxDbInfluxDb
InfluxDb
 
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier AguedesSpark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Javier Aguedes
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Machine Learning Deep Dive
Machine Learning Deep DiveMachine Learning Deep Dive
Machine Learning Deep Dive
 
Engineers guide to data analysis
Engineers guide to data analysisEngineers guide to data analysis
Engineers guide to data analysis
 
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
 
Streaming Analytics @ Uber
Streaming Analytics @ UberStreaming Analytics @ Uber
Streaming Analytics @ Uber
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
 
What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Extending The Yahoo Streaming Benchmark to Apache Apex
Extending The Yahoo Streaming Benchmark to Apache ApexExtending The Yahoo Streaming Benchmark to Apache Apex
Extending The Yahoo Streaming Benchmark to Apache Apex
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
University program - writing an apache apex application
University program  - writing an apache apex applicationUniversity program  - writing an apache apex application
University program - writing an apache apex application
 
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data AnalyticsFugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 

Similaire à Parallel & async processing using tpl dataflow

Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
Enkitec
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
Enkitec
 

Similaire à Parallel & async processing using tpl dataflow (20)

DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATA
 
Cloudera Customer Success Story
Cloudera Customer Success StoryCloudera Customer Success Story
Cloudera Customer Success Story
 
Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015
 
Live Coding a KSQL Application
Live Coding a KSQL ApplicationLive Coding a KSQL Application
Live Coding a KSQL Application
 
Reactive Spring 5
Reactive Spring 5Reactive Spring 5
Reactive Spring 5
 
Professional SQL for Developers
Professional SQL for DevelopersProfessional SQL for Developers
Professional SQL for Developers
 
Intro to Telegraf
Intro to TelegrafIntro to Telegraf
Intro to Telegraf
 
Recovery as a Service Technical Deep Dive
Recovery as a Service Technical Deep DiveRecovery as a Service Technical Deep Dive
Recovery as a Service Technical Deep Dive
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017
 
Hekaton (xtp) introduction
Hekaton (xtp) introductionHekaton (xtp) introduction
Hekaton (xtp) introduction
 
Travelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraTravelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian Widera
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackIntroduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK Stack
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 

Plus de Codecamp Romania

Plus de Codecamp Romania (20)

Cezar chitac the edge of experience
Cezar chitac   the edge of experienceCezar chitac   the edge of experience
Cezar chitac the edge of experience
 
Cloud powered search
Cloud powered searchCloud powered search
Cloud powered search
 
Ccp
CcpCcp
Ccp
 
Business analysis techniques exercise your 6-pack
Business analysis techniques   exercise your 6-packBusiness analysis techniques   exercise your 6-pack
Business analysis techniques exercise your 6-pack
 
Bpm company code camp - configuration or coding with pega
Bpm company   code camp - configuration or coding with pegaBpm company   code camp - configuration or coding with pega
Bpm company code camp - configuration or coding with pega
 
Andrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabaseAndrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabase
 
Agility and life
Agility and lifeAgility and life
Agility and life
 
2015 dan ardelean develop for windows 10
2015 dan ardelean   develop for windows 10 2015 dan ardelean   develop for windows 10
2015 dan ardelean develop for windows 10
 
The bigrewrite
The bigrewriteThe bigrewrite
The bigrewrite
 
The case for continuous delivery
The case for continuous deliveryThe case for continuous delivery
The case for continuous delivery
 
Stefan stolniceanu spritekit, 2 d or not 2d
Stefan stolniceanu   spritekit, 2 d or not 2dStefan stolniceanu   spritekit, 2 d or not 2d
Stefan stolniceanu spritekit, 2 d or not 2d
 
Sizing epics tales from an agile kingdom
Sizing epics   tales from an agile kingdomSizing epics   tales from an agile kingdom
Sizing epics tales from an agile kingdom
 
Scale net apps in aws
Scale net apps in awsScale net apps in aws
Scale net apps in aws
 
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
 
Parallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflowParallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflow
 
Material design screen transitions in android
Material design screen transitions in androidMaterial design screen transitions in android
Material design screen transitions in android
 
Kickstart your own freelancing career
Kickstart your own freelancing careerKickstart your own freelancing career
Kickstart your own freelancing career
 
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu   the soft stuff is the hard stuff. the agile soft skills toolkitIonut grecu   the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
 
Ecma6 in the wild
Ecma6 in the wildEcma6 in the wild
Ecma6 in the wild
 
Diana antohi me against myself or how to fail and move forward
Diana antohi   me against myself  or how to fail  and move forwardDiana antohi   me against myself  or how to fail  and move forward
Diana antohi me against myself or how to fail and move forward
 

Parallel & async processing using tpl dataflow

Notes de l'éditeur

  1. When discussing about how to use Dataflow we’ll touch the following points of interest: - programming model (what are the entities exposed by Dataflow?) - configuring the behavior of the entities (parallelism, completion, error handling) - although Dataflow removes the need for dealing with concurrent scenarios there are cases when concurrency is inevitable and developers must properly deal with concurrency pitfalls - whenever the functionality of built-in blocks isn’t enough, Dataflow offers the possibility to create custom blocks
  2. .NET Framework 4.0 comes with three APIs for Parallel Programming: Tasks (lower level), PLINQ and Parallel (upper level). The Dataflow library is a natural extension of the TPL library that allows developers to create data-processing pipelines in their applications. The Dataflow library provides a framework for creating blocks that perform a specific function asynchronously. These blocks can be composed together to form a pipeline where data flows into one end of the pipeline and some result or results come out from the other end. This is great when data can be processed at different rates or when parallel processing can efficiently spread work out across multiple CPU cores.
  3. Dataflow is a paradigm shift but when the developers overcome the discomfort of the paradigm shift they will benefit from the high expressivity of the code.