Architecture decision records - How not to get lost in the past
Using dask for large systems of financial models
1. Using Dask for large
systems of financial models
Petr Wolf, Dhivya Shankaranarayan
PyData NYC 2018
2. 2 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
The views expressed here are those of the authors and do not necessarily
represent or reflect the views of Barclays
Disclaimer
3. 3 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
1. Financial models for Planning and Stress-Testing
2. Modular approach to model development and integration
3. Benefits of an integrated solution
4. Notebook example
5. Role of Open Source Software
Agenda
4. 4 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Contacts
• https://www.linkedin.com/in/dhivyanarayan
• https://www.linkedin.com/in/petrwolf
Slides
• https://www.slideshare.net/PetrWolf1/using-dask-for-large-systems-of-
financial-models
Notebook
• https://github.com/PetrWolf/pydata_nyc_2018
Links
5. 5 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Financial planning - to model the change in firm’s
financial situation over time, based on various
inputs and scenarios
Stress testing – to “measure the resilience of
banks to hypothetical adverse scenarios”1
Specific requirements
• Robust development and internal validation
processes
• Transparent methodologies for business
sign-off of models and projected results
Financial models for Planning and Stress-Testing
1 Dent, Kieran & Westwood, Ben & Segoviano, Miguel - Stress testing of banks: an introduction, Bank of England Quarterly Bulletin, vol. 56(3)
6. 6 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
• Multiple platforms with silo-ed
model development and execution
• Manual & error-prone execution
• Lack of reproducibility and complete
audit trail
• Limited ability to perform end-to-end
sensitivity and what-if analysis
Traditional organizations do not scale up to modern
requirements
Business
Planning
Stress
Testing
Risk
Macro-
Economic
Factors
Current
Financials
Manual
Overrides
Business 1
models
Business 2
models
Business 3
models
Ad-hoc Model Development
7. 7 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Typical modeling approach mixes several separate concerns
• Data processing, validation
modeling logic, orchestration,
reporting and visualization co-
mingled.
• Potential inconsistency
across model inputs and
outputs
• Lack of clarity on units,
benchmarks and assumptions
9. 9 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
1. Establish shared conventions
– Model using pure functions (aligns with Dask best practices)
– Define “public” data labels as interface to/from functions
2. Extend syntax for easier development
– Annotate functions (using decorators) to label inputs and outputs
3. Automate composition
– Automatically connect functions into Dask graphs based on data
dependencies (using Custom Graph API)
Modular approach to model development and integration
10. 10 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Automated dependency graphs with discoverable interface
Market
Share
f
Industry
Size
Volume Volume
Margin
Profitg
Volume
…
…
11. 11 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
• Dask is a light-weight open-source library for parallel computing in Python
• Scales from a laptop to large clusters (1000s of cores)
• Popular in Python scientific community1, including banks2
• High-level “pandas-like” interfaces, integration in scikit_learn
• Exposes Custom Graph API
Dask exposes its low-level interface for direct graph creation
Custom Graph API1 https://dask-stories.readthedocs.io
2 https://www.anaconda.com/blog/developer-blog/credit-modeling-with-dask/
12. 12 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
The tale of 3 APIs
High-level API using dask.dataframe
df = dask.dataframe.read_csv('*.csv')
df.groupby(df.account_id)
.balance
.sum()
df.compute()
13. 13 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
The tale of 3 APIs
Explicit graph creating unig Custom Graph API or @dask.delayed:
def inc(x):
return x + 1
def mul(x, y):
return x * y
graph = {
'z' = (inc, 5),
'w' = (mul, 'z', 7)
}
dask.get(graph, 'w')
14. 14 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
@model(ind_size="Industry Size",
mkt_share="Market Share",
output="Volume"):
def f(ind_size, mkt_share) :
return ind_size * mkt_share
Extend python functions with annotations of inputs/outputs
internal
parameter name
external
data label
↓ generate Dask graph entry
{"Volume": (f, "Industry Size", "Market Share")}
Market
Share
f
Industry
Size
Volume
15. 15 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
The tale of 3 APIs
Automatic graph generation using @model
@model(ind_size="Industry Size", ..., output="Volume"):
def f(ind_size, mkt_share) :
return ind_size * mkt_share
@model(vol="Volume", margin="Margin", output="Profit"):
def g(vol, margin):
return vol * margin
graph = join(f, g, ...)
dask.get(graph, 'Profit')
16. 16 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
High Level API Custom Graph API @model
Functions
def inc(x):
...
def mul(x, y):
...
@model(ind_size=...):
def f(ind_size,
mkt_share):
...
@model(vol=...):
def g(vol, margin):
...
Structure
df = dd.read_csv('*.csv')
df.groupby(df.account_id)
.balance
.sum()
graph = {
'z' = (inc, 5),
'w' = (mul, 'z', 7),
...
}
graph = compose(f, g, ...)
The tale of 3 APIs
AlgorithmOperations Dependencies
17. 17 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Keep a central definition of all data elements:
• Industry Size (int, units: “mUSD”): Annual volume on the US market
• Market Share (float, units: “%”): …
• Volume …
• Margin…
• Profit …
Global Data Dictionary for consistency and clarity
Market
Share
Volumef
Industry
Size
Margin
Profitg
18. 18 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Bottom-up composition for incremental build-out
Function Model Ensemble
Single unit of modeling
logic (formula, calibrated
model, business process)
Multiple functions
connected via data
dependencies,
representing products or
business segments
Models integrated
together to represent
entire business units,
legal entities or modeling
areas.
Unit tests Regression tests Integration tests
Z = X+Y
X
Y
Z
20. 20 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Automated connections make changes simple:
– Compose, break up, re-use, replace
– Champion/Challenger, version
upgrade/rollback
Separation of code/data make testing flexible:
– Use of different inputs and environments
(development, production)
Open to adjustments and extensions
– “Overrides” (abs. or rel.) to selected nodes
– Node/edge properties (units, types, …)
Combined dependency graph for flexible development
21. 21 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
• Dependency analysis,
• Sensitivity testing (external
inputs and internal assumptions
or intermediate values),
• Back-testing,
• “What-if” analysis,
• Reverse Stress-testing,
• Support in decision making,
• Reinforcement-learning
Integrated framework for end-to-end analysis and robust
execution
Macro
Economic
Variables
Overrides
Financials
Business 1
models
Business 2
models
Business 3
models
Scenario analysis
Sensitivity analysis
…
Business
Planning
Stress
Testing
…
Risk
Integrated Framework
Review &
Analysis
23. 23 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Huge thanks to OSS maintainers of modeling libraries (Pandas, NumPy, SciPy,
Dask, NetworkX), tools (jupyter , Sphinx, PyLint, PyTest) and many others
Role of OSS in large financial corporations has changed and
is accelerating
How you can help?
Consumption Highlight use of open source in projects
Contribution Establish a firm-wide policy
Publication Find suitable pilot projects
Prohibited
Isolated
Supported
Encouraged
24. 24 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
• Large financial modeling projects face unique challenges arising from a disparity
of models and data
• Strict rules and technology can help in building a flexible development and
execution infrastructure
• Automation opens doors to further analysis and re-use that would otherwise not
be possible
• Open source is a key enabler for even small teams to build great projects
quickly
Summary