Session Acensi : Avec le développement des ordinateurs à plusieurs cœurs, la programmation concurrente est devenue incontournable pour optimiser l’utilisation de toutes les ressources disponibles. Sa complexité limite pourtant son adoption par un grand nombre de personnes. La librairie Task Parrallel Library (TPL) cherche à simplifier ce paradigme de programmation. Dans cette session, nous allons vous faire découvrir ou redécouvrir l’une des nouveautés du Framework .NET 4.5 à savoir l’introduction de la librairie DataFlow dans la Task Parallel Library (TPL). Cette librairie fournit un ensemble de primitives permettant d’implémenter le pattern « Acteur / Agent ». Elle permet notamment de créer très facilement des pipelines complexes de tâches grâce à un enchaînement de « Blocks ». Dans cette session nous allons donc vous démontrer avec un exemple d’application concret comment TPL DataFlow peut nous aider à simplifier un problème de programmation concurrente.
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
La programmation concurrente par flux de données
1. ACENSI, Tour Monge - 22, Place des Vosges - 92 400 Courbevoie - La Défense 5 - www.acensi.fr
Concurrent programming based on
dataflow
TPL DATAFLOW
A new approach to Monte Carlo
VAR
09/02/2015Version du document
3. TPL Dataflow Presentation
Why TPL Dataflow ?
A natural extension of framework 4.0
The library
Use cases
Case study : Monte Carlo Value At Risk (VAR)
What is VAR ?
Monte Carlo VAR: Basic Approach
Monte Carlo VAR: Dataflow Approach
Conclusion
SUMMARY
09/02/2015Version du document
4. Speakers Presentation
Yves Alexandre SIMON James KOUTHON Julien LEBOT Adina SANDOU
R&D Director Technical Director .Net Expert .Net Expert
Information systems and Microsoft technologies Consulting
WHO ARE WE ?
09/02/2015 4Version du document
6. TPL DATAFLOW: A NATURAL EXTENSION OF FRAMEWORK 4.0
Promotes actor-agent oriented designs
through primitives.
Allows developers to create blocks to
express computations based on directed
dataflow graphs.
09/02/2015Version du document 6
7. TPL DATAFLOW: THE LIBRARY
Overview
TPL Dataflow falls in line with Map/Reduce
Can handle large volumes of data
Ideal for long computations
TPL Dataflow: paradigm shift
Tasks are created and linked together as a graph
Each node can receive data as input and/or output data
09/02/2015 7Version du document
8. TPL DATAFLOW: THE LIBRARY
Source blocks (1): acts like a source of data
ISourceBlock<TOutput>
Target blocks (2): acts like a receiver of data
ITargetBlock<TInput>
Propagator blocks: acts like (1) and (2)
IPropagatorBlock<TInput, TOutput>
09/02/2015 8Version du document
9. TPL DATAFLOW: THE LIBRARY
Basic blocks
BufferBlock: is a queue, a FIFO (First In First Out) buffer.
ActionBlock: like a “foreach”, it executes a delegate for each
input item.
ex: var node = new ActionBlock<string>(s => Console.WriteLine(s));
TransformBlock: acts like a “Linq” select
ex: var node = new TransformBlock<int, int>(p => p * 100);
Advanced blocks
BroadcastBlock: forwards copies of data items as its output.
JoinBlock: collects many inputs and output a tuple
Others
09/02/2015 9Version du document
10. TPL DATAFLOW: THE LIBRARY
Linking
Used to link two blocks together.
Predicates and parallelism options available.
There’s no limit to what you can link.
Completion Status
Each block supports an asynchronous form of
completion to propagate finished state.
09/02/2015 10Version du document
11. WHY TPL DATAFLOW?
TPL Dataflow benefits
Paradigm shift for higher code expressivity
Using multithreading without effort
Boosting performance (optimization) painlessly
Focusing on the 'what' rather than the 'how'
09/02/2015 11Version du document
12. TPL DATAFLOW: USE CASES
Build more complex systems easily
Samples:
Data analysis/mining services
Web-crawlers
Image and Sound processors
Databases engine designs
Financial computation
…
09/02/2015 12Version du document
13. Monte Carlo Value at Risk (VAR)
CASE STUDY
09/02/2015 13Version du document
14. WHAT IS VAR?
What is VAR?
Value at risk (VAR)
Monitor risk in trading portfolio
Financial Global risk indicator
Our use case
Market VAR (VAR on market move)
Intensive computation (especially for Monte Carlo VAR)
09/02/2015 14Version du document
Example
VAR 99/1D : Maximum lost in 1 day with
99% probability
VAR Calculation Methods
Historical
VAR
(historical
data)
Parametric
VAR
(formula data)
Monte
Carlo VAR
(montecarlo
simulation
data)
15. SIMPLE MONTECARLO VAR WORKFLOW
09/02/2015 15Version du document
Start
Portfolios
Composition
Market Data
Static Data
Global
Position
Position Pricing
With MonteCarlo
Calculus
Position Pricing
With MonteCarlo
Calculus
Position Pricing
With MonteCarlo
Calculus
Statistics on
Global
Distribution (VAR)
End
1 2 3 4
17. MONTE CARLO VAR: BASIC APPROACH
Pipeline:
09/02/2015 17Version du document
Start
Portfolios
Composition
Market Data
Global
Position
Position Pricing
With MonteCarlo
Calculus
Statistics on
Global
Distribution (VAR)
End
18. MONTE CARLO VAR: BASIC APPROACH
Portfolio composition
Fetch portfolios by using the provider
Market data
Get product parameters from market data provider
Global position
Look over all portfolios and nettings and get the positions
09/02/2015 18Version du document
Portfolios = PortfolioProvider.Portfolios;
ProductParameters = ProductParametersProvider.ProductsParameters;
Portfolios
Composition
Market Data
Global
Position
IEnumerable<KeyValuePair<Product, long>> allTransactions =
Portfolios.SelectMany(x => x.Transactions)
.GroupBy(y => y.Product)
.Select(z => new KeyValuePair<Product, long>
(z.Key, z.Sum(x => x.Position)));
Positions = allTransactions.ToDictionary(t => t.Key, t => t.Value);
19. MONTE CARLO VAR: BASIC APPROACH
Position pricing
For each product, run the Monte Carlo simulation
Statistics on global
Multiply the result by the position value and calculate the lost value
09/02/2015 19Version du document
IEnumerable<double> results =
StatisticsUtilities.SimulateMonteCarloWithPosition(
new MonteCarloInput
{
Parameters = parameters,
Position = position,
Product = product
},
TotalSimulations);
Position Pricing
With MonteCarlo
Calculus
IList<double> totals = new List<double>();
Func<IList<double>, string, IList<double>> sumList = (current, key) =>
Helpers.SumList(current, lostsValuesByProduct[key].ToList());
20. MONTE CARLO VAR: BASIC APPROACH
09/02/2015 20Version du document
totals = lostsValuesByProduct.Keys.Aggregate(totals, sumList);
StatisticsUtilities.CalculateVar(totals, 0.99);
Aggregate the lost value for all products
Choose the VAR at 99% for 1 day
Statistics on
Global
Distribution (VAR)
22. MONTE CARLO VAR: DATAFLOW APPROACH
DataFlow Graph
09/02/2015 22Version du document
Portfolios
Composition
And Market
Data
Global
Position
Position Pricing
With MonteCarlo
Calculus
Position Pricing
With MonteCarlo
Calculus
Position Pricing
With MonteCarlo
Calculus
Aggregator
Statistics on
Global
Distribution (VAR)
DataFlow
23. MONTE CARLO VAR: DATAFLOW APPROACH
Chosen approach: parallelize per product
09/02/2015 23Version du document
Product
Product
Product
Product
…
N threads
CalculateLoss() x M iterations
CalculateLoss() x M iterations
CalculateLoss() x M iterations
CalculateLoss() x M iterations
24. MONTE CARLO VAR: DATAFLOW APPROACH
Process overview
09/02/2015 24Version du document
TransformBlock
Price
Mean
Standard Dev
Position
IN: MonteCarloInput OUT: IEnumerable<double>
Losses
Normal
distribution
Calculate Loss
ActionBlock TotalsLosses
IN: IEnumerable<double> OUT: IEnumerable<double>
Aggregator
25. MONTE CARLO VAR: DATAFLOW APPROACH
TransformBlock runs the Monte Carlo simulation
Key points:
▬ Do only one thing
▬ Keep work data local
▬ Fully enumerate returned data
09/02/2015 25Version du document
var monteCarlo = new TransformBlock<MonteCarloInput, IEnumerable<double>>(input =>
{
var normalDistribution = new NormalEnumerable();
return normalDistribution.Take(TotalSimulations)
.Select(alea => StatisticsUtilities.CalculateLoss(input, alea))
.ToList(); // Very important
}, ExecutionOptions);
Position Pricing
With MonteCarlo
Calculus
26. MONTE CARLO VAR: DATAFLOW APPROACH
ActionBlock aggregates the result
No need to synchronize access to shared data
09/02/2015 26Version du document
var totals = new List<double>();
var aggregate = new ActionBlock<IEnumerable<double>>(doubles =>
{
if (!totals.Any())
{
totals.AddRange(doubles);
}
else
{
var losses = doubles.ToList();
foreach (var i in Enumerable.Range(0, losses.Count()))
{
totals[i] += losses[i];
}
}
});
Aggregator
27. MONTE CARLO VAR: DATAFLOW APPROACH
Linking the blocks together
Triggering the data flow chain
Data posted asynchronously
09/02/2015 27Version du document
foreach (var portfolio in Portfolios
.SelectMany(x => x.Transactions)
.GroupBy(y => y.Product)
.Select(z => new KeyValuePair<Product, long>(z.Key, z.Sum(x => x.Position))))
{
var position = portfolio.Value;
var parameters = ProductParameters.First(x => x.Product.Equals(portfolio.Key));
monteCarlo.Post(new MonteCarloInput
{
Parameters = parameters,
Position = position
});
}
monteCarlo.LinkTo(aggregate, DataflowLinkOptions);
Global
Position
28. MONTE CARLO VAR: DATAFLOW APPROACH
Completing the tasks
Tricky to get right
▬ Can cause deadlocks
▬ Solution: Automatically propagate completion
09/02/2015 28Version du document
monteCarlo.Complete();
aggregate.Completion.Wait();
DataflowLinkOptions = new DataflowLinkOptions
{
PropagateCompletion = true
}
29. MONTE CARLO VAR: DATAFLOW APPROACH
Manual completion propagation
Maximizing CPU usage
09/02/2015 29Version du document
monteCarlo.Completion.ContinueWith(t =>
{
if (t.IsFaulted)
{
((IDataflowBlock)aggregate).Fault(t.Exception); // Pass exception
}
else
{
aggregate.Complete(); // Mark next completed
}
});
ExecutionOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
}
30. MONTE CARLO VAR: DATAFLOW APPROACH
Result
09/02/2015 30Version du document
0
500
1000
1500
2000
2500
3000
3500
4000
i5-4200U 4 @
2.30GHz
Intel Celeron
G1820 2 @
2.70GHz
Intel i5-2400 4 @
3.00GHz
i7-3770K w/ 8 @
5.09GHz
i7-4790K w/ 8 @
4.00GHz
milliseconds
CPU
Benchmark (lower is better)
Basic Data flow
31. What did we learn?
CONCLUSION
09/02/2015 31Version du document
32. CONCLUSION
Performance increase
Faster
Automatically scale to hardware
Paradigm shift
Macro-level optimization
New primitives
09/02/2015 32Version du document
github.com/acensi/techdays-2015
msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx
github.com/akkadotnet/akka.net
Find out
more !
Experiment with the code
Parallelize data loading
Try new blocks
Come see us at the booth
Going further