3. AGENDA
• What is Dataflow?
• When to use it?
• How to use it?
• Q&A
4. THE BIG PICTURE
CLR Thread Pool
Tasks
PLINQ Parallel Loops
Concurrent Collections
Dataflow
5. DATAFLOW BENEFITS
• Effortless use of multi-threading
• Performance boost via painless optimization
• Development focus is on the ‘what’ rather than ‘how’
14. COMPLETION & CANCELLATION
• To know when a block completes await block.Completion
or add a continuation task to it
• To propagate completion from source to target, set
DataflowLinkOptions.PropagateCompletion when
linking
• Set DataflowBlockOptions.CancellationToken to
enable cancellation
15. ERROR HANDLING
• If the exception does not affect the integrity of the
pipeline – use a try/catch inside the block
• Otherwise, handle errors outside of the pipeline by
• Adding a continuation to block.Completion
• Propagating errors through the pipeline
16. DEALING WITH CONCURRENCY
• Rule of thumb: avoid shared state whenever possible.
• Use ConcurrentExclusiveSchedulerPair to perform
updates on shared state
• Be aware of the caveats with
ConcurrentExclusiveSchedulerPair
When discussing about how to use Dataflow we’ll touch the following points of interest:
- programming model (what are the entities exposed by Dataflow?)
- configuring the behavior of the entities (parallelism, completion, error handling)
- although Dataflow removes the need for dealing with concurrent scenarios there are cases when concurrency is inevitable and developers must properly deal with concurrency pitfalls
- whenever the functionality of built-in blocks isn’t enough, Dataflow offers the possibility to create custom blocks
.NET Framework 4.0 comes with three APIs for Parallel Programming: Tasks (lower level), PLINQ and Parallel (upper level).
The Dataflow library is a natural extension of the TPL library that allows developers to create data-processing pipelines in their applications. The Dataflow library provides a framework for creating blocks that perform a specific function asynchronously. These blocks can be composed together to form a pipeline where data flows into one end of the pipeline and some result or results come out from the other end. This is great when data can be processed at different rates or when parallel processing can efficiently spread work out across multiple CPU cores.
Dataflow is a paradigm shift but when the developers overcome the discomfort of the paradigm shift they will benefit from the high expressivity of the code.