SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Using Dask for large
systems of financial models
Petr Wolf, Dhivya Shankaranarayan
PyData NYC 2018
2 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
The views expressed here are those of the authors and do not necessarily
represent or reflect the views of Barclays
Disclaimer
3 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
1. Financial models for Planning and Stress-Testing
2. Modular approach to model development and integration
3. Benefits of an integrated solution
4. Notebook example
5. Role of Open Source Software
Agenda
4 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Contacts
• https://www.linkedin.com/in/dhivyanarayan
• https://www.linkedin.com/in/petrwolf
Slides
• https://www.slideshare.net/PetrWolf1/using-dask-for-large-systems-of-
financial-models
Notebook
• https://github.com/PetrWolf/pydata_nyc_2018
Links
5 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Financial planning - to model the change in firm’s
financial situation over time, based on various
inputs and scenarios
Stress testing – to “measure the resilience of
banks to hypothetical adverse scenarios”1
Specific requirements
• Robust development and internal validation
processes
• Transparent methodologies for business
sign-off of models and projected results
Financial models for Planning and Stress-Testing
1 Dent, Kieran & Westwood, Ben & Segoviano, Miguel - Stress testing of banks: an introduction, Bank of England Quarterly Bulletin, vol. 56(3)
6 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
• Multiple platforms with silo-ed
model development and execution
• Manual & error-prone execution
• Lack of reproducibility and complete
audit trail
• Limited ability to perform end-to-end
sensitivity and what-if analysis
Traditional organizations do not scale up to modern
requirements
Business
Planning
Stress
Testing
Risk
Macro-
Economic
Factors
Current
Financials
Manual
Overrides
Business 1
models
Business 2
models
Business 3
models
Ad-hoc Model Development
7 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Typical modeling approach mixes several separate concerns
• Data processing, validation
modeling logic, orchestration,
reporting and visualization co-
mingled.
• Potential inconsistency
across model inputs and
outputs
• Lack of clarity on units,
benchmarks and assumptions
Modular approach
for a productive model development and
flexible integration process
9 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
1. Establish shared conventions
– Model using pure functions (aligns with Dask best practices)
– Define “public” data labels as interface to/from functions
2. Extend syntax for easier development
– Annotate functions (using decorators) to label inputs and outputs
3. Automate composition
– Automatically connect functions into Dask graphs based on data
dependencies (using Custom Graph API)
Modular approach to model development and integration
10 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Automated dependency graphs with discoverable interface
Market
Share
f
Industry
Size
Volume Volume
Margin
Profitg
Volume
…
…
11 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
• Dask is a light-weight open-source library for parallel computing in Python
• Scales from a laptop to large clusters (1000s of cores)
• Popular in Python scientific community1, including banks2
• High-level “pandas-like” interfaces, integration in scikit_learn
• Exposes Custom Graph API
Dask exposes its low-level interface for direct graph creation
Custom Graph API1 https://dask-stories.readthedocs.io
2 https://www.anaconda.com/blog/developer-blog/credit-modeling-with-dask/
12 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
The tale of 3 APIs
High-level API using dask.dataframe
df = dask.dataframe.read_csv('*.csv')
df.groupby(df.account_id)
.balance
.sum()
df.compute()
13 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
The tale of 3 APIs
Explicit graph creating unig Custom Graph API or @dask.delayed:
def inc(x):
return x + 1
def mul(x, y):
return x * y
graph = {
'z' = (inc, 5),
'w' = (mul, 'z', 7)
}
dask.get(graph, 'w')
14 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
@model(ind_size="Industry Size",
mkt_share="Market Share",
output="Volume"):
def f(ind_size, mkt_share) :
return ind_size * mkt_share
Extend python functions with annotations of inputs/outputs
internal
parameter name
external
data label
↓ generate Dask graph entry
{"Volume": (f, "Industry Size", "Market Share")}
Market
Share
f
Industry
Size
Volume
15 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
The tale of 3 APIs
Automatic graph generation using @model
@model(ind_size="Industry Size", ..., output="Volume"):
def f(ind_size, mkt_share) :
return ind_size * mkt_share
@model(vol="Volume", margin="Margin", output="Profit"):
def g(vol, margin):
return vol * margin
graph = join(f, g, ...)
dask.get(graph, 'Profit')
16 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
High Level API Custom Graph API @model
Functions
def inc(x):
...
def mul(x, y):
...
@model(ind_size=...):
def f(ind_size,
mkt_share):
...
@model(vol=...):
def g(vol, margin):
...
Structure
df = dd.read_csv('*.csv')
df.groupby(df.account_id)
.balance
.sum()
graph = {
'z' = (inc, 5),
'w' = (mul, 'z', 7),
...
}
graph = compose(f, g, ...)
The tale of 3 APIs
AlgorithmOperations Dependencies
17 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Keep a central definition of all data elements:
• Industry Size (int, units: “mUSD”): Annual volume on the US market
• Market Share (float, units: “%”): …
• Volume …
• Margin…
• Profit …
Global Data Dictionary for consistency and clarity
Market
Share
Volumef
Industry
Size
Margin
Profitg
18 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Bottom-up composition for incremental build-out
Function Model Ensemble
Single unit of modeling
logic (formula, calibrated
model, business process)
Multiple functions
connected via data
dependencies,
representing products or
business segments
Models integrated
together to represent
entire business units,
legal entities or modeling
areas.
Unit tests Regression tests Integration tests
Z = X+Y
X
Y
Z
Benefits
Advantages of an integrated solution
20 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
 Automated connections make changes simple:
– Compose, break up, re-use, replace
– Champion/Challenger, version
upgrade/rollback
 Separation of code/data make testing flexible:
– Use of different inputs and environments
(development, production)
 Open to adjustments and extensions
– “Overrides” (abs. or rel.) to selected nodes
– Node/edge properties (units, types, …)
Combined dependency graph for flexible development
21 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
• Dependency analysis,
• Sensitivity testing (external
inputs and internal assumptions
or intermediate values),
• Back-testing,
• “What-if” analysis,
• Reverse Stress-testing,
• Support in decision making,
• Reinforcement-learning
Integrated framework for end-to-end analysis and robust
execution
Macro
Economic
Variables
Overrides
Financials
Business 1
models
Business 2
models
Business 3
models
Scenario analysis
Sensitivity analysis
…
Business
Planning
Stress
Testing
…
Risk
Integrated Framework
Review &
Analysis
Example
Sample code using Dask
23 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
Huge thanks to OSS maintainers of modeling libraries (Pandas, NumPy, SciPy,
Dask, NetworkX), tools (jupyter , Sphinx, PyLint, PyTest) and many others
Role of OSS in large financial corporations has changed and
is accelerating
How you can help?
Consumption Highlight use of open source in projects
Contribution Establish a firm-wide policy
Publication Find suitable pilot projects
Prohibited
Isolated
Supported
Encouraged
24 | Using Dask for large systems of Financial models | PyData NYC 2018
Unrestricted
• Large financial modeling projects face unique challenges arising from a disparity
of models and data
• Strict rules and technology can help in building a flexible development and
execution infrastructure
• Automation opens doors to further analysis and re-use that would otherwise not
be possible
• Open source is a key enabler for even small teams to build great projects
quickly
Summary
Thank you!
Q&A

Contenu connexe

Tendances

Wenzhe Xu (Evelyn) Resume for Data Science
Wenzhe Xu (Evelyn) Resume for Data ScienceWenzhe Xu (Evelyn) Resume for Data Science
Wenzhe Xu (Evelyn) Resume for Data ScienceWenzhe(Evelyn) Xu
 
Data flow and data analysis
Data flow and data analysis Data flow and data analysis
Data flow and data analysis faisalqau
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Singh
 
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Dataiku
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case studyNandita Nityanandam
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
 
EvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source SoftwareEvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source Softwarebpupadhyaya
 
Power BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics ResearchPower BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics ResearchLuciano Vilas Boas
 
Resume anh chu data analyst
Resume anh chu data analystResume anh chu data analyst
Resume anh chu data analystANH CHU
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingCambridge Semantics
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceCambridge Semantics
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Muhammad Fahad
 
Fms invited talk_2018 v5
Fms invited talk_2018 v5Fms invited talk_2018 v5
Fms invited talk_2018 v5Nisha Talagala
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare sumiteshkr
 
Resume anh chu
Resume anh chuResume anh chu
Resume anh chuANH CHU
 

Tendances (17)

Wenzhe Xu (Evelyn) Resume for Data Science
Wenzhe Xu (Evelyn) Resume for Data ScienceWenzhe Xu (Evelyn) Resume for Data Science
Wenzhe Xu (Evelyn) Resume for Data Science
 
Data flow and data analysis
Data flow and data analysis Data flow and data analysis
Data flow and data analysis
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
 
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
EvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source SoftwareEvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source Software
 
Power BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics ResearchPower BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics Research
 
Resume anh chu data analyst
Resume anh chu data analystResume anh chu data analyst
Resume anh chu data analyst
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)
 
Fms invited talk_2018 v5
Fms invited talk_2018 v5Fms invited talk_2018 v5
Fms invited talk_2018 v5
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare
 
Introduction to XMILE slides
 Introduction to XMILE slides Introduction to XMILE slides
Introduction to XMILE slides
 
Resume anh chu
Resume anh chuResume anh chu
Resume anh chu
 

Similaire à Using dask for large systems of financial models

Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2Parviz Vakili
 
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world""Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world"Pavel Hardak
 
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldLessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldDatabricks
 
Rahul_Resume
Rahul_ResumeRahul_Resume
Rahul_ResumeRahul R
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryMatouš Havlena
 
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...Big Data Value Association
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
 
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 DataBench
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data LakesLinked Enterprise Date Services
 
Reuse Strategy for MBSE Data - GPDIS 2022
Reuse Strategy for MBSE Data - GPDIS 2022Reuse Strategy for MBSE Data - GPDIS 2022
Reuse Strategy for MBSE Data - GPDIS 2022SodiusWillert
 
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...FIWARE
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsLinkurious
 
Examination into it & competitive strategies within construction
Examination into it & competitive strategies within constructionExamination into it & competitive strategies within construction
Examination into it & competitive strategies within constructionsai0513
 
Data & Analytics at Scale
Data & Analytics at ScaleData & Analytics at Scale
Data & Analytics at ScaleWalid Mehanna
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 

Similaire à Using dask for large systems of financial models (20)

Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2
 
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world""Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
 
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldLessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
 
Rahul_Resume
Rahul_ResumeRahul_Resume
Rahul_Resume
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive Industry
 
srikanthg
srikanthgsrikanthg
srikanthg
 
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
 
Reuse Strategy for MBSE Data - GPDIS 2022
Reuse Strategy for MBSE Data - GPDIS 2022Reuse Strategy for MBSE Data - GPDIS 2022
Reuse Strategy for MBSE Data - GPDIS 2022
 
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projects
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
Resume
ResumeResume
Resume
 
Resume
ResumeResume
Resume
 
Examination into it & competitive strategies within construction
Examination into it & competitive strategies within constructionExamination into it & competitive strategies within construction
Examination into it & competitive strategies within construction
 
Data & Analytics at Scale
Data & Analytics at ScaleData & Analytics at Scale
Data & Analytics at Scale
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 

Dernier

WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 

Dernier (20)

WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 

Using dask for large systems of financial models

  • 1. Using Dask for large systems of financial models Petr Wolf, Dhivya Shankaranarayan PyData NYC 2018
  • 2. 2 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted The views expressed here are those of the authors and do not necessarily represent or reflect the views of Barclays Disclaimer
  • 3. 3 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted 1. Financial models for Planning and Stress-Testing 2. Modular approach to model development and integration 3. Benefits of an integrated solution 4. Notebook example 5. Role of Open Source Software Agenda
  • 4. 4 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted Contacts • https://www.linkedin.com/in/dhivyanarayan • https://www.linkedin.com/in/petrwolf Slides • https://www.slideshare.net/PetrWolf1/using-dask-for-large-systems-of- financial-models Notebook • https://github.com/PetrWolf/pydata_nyc_2018 Links
  • 5. 5 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted Financial planning - to model the change in firm’s financial situation over time, based on various inputs and scenarios Stress testing – to “measure the resilience of banks to hypothetical adverse scenarios”1 Specific requirements • Robust development and internal validation processes • Transparent methodologies for business sign-off of models and projected results Financial models for Planning and Stress-Testing 1 Dent, Kieran & Westwood, Ben & Segoviano, Miguel - Stress testing of banks: an introduction, Bank of England Quarterly Bulletin, vol. 56(3)
  • 6. 6 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted • Multiple platforms with silo-ed model development and execution • Manual & error-prone execution • Lack of reproducibility and complete audit trail • Limited ability to perform end-to-end sensitivity and what-if analysis Traditional organizations do not scale up to modern requirements Business Planning Stress Testing Risk Macro- Economic Factors Current Financials Manual Overrides Business 1 models Business 2 models Business 3 models Ad-hoc Model Development
  • 7. 7 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted Typical modeling approach mixes several separate concerns • Data processing, validation modeling logic, orchestration, reporting and visualization co- mingled. • Potential inconsistency across model inputs and outputs • Lack of clarity on units, benchmarks and assumptions
  • 8. Modular approach for a productive model development and flexible integration process
  • 9. 9 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted 1. Establish shared conventions – Model using pure functions (aligns with Dask best practices) – Define “public” data labels as interface to/from functions 2. Extend syntax for easier development – Annotate functions (using decorators) to label inputs and outputs 3. Automate composition – Automatically connect functions into Dask graphs based on data dependencies (using Custom Graph API) Modular approach to model development and integration
  • 10. 10 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted Automated dependency graphs with discoverable interface Market Share f Industry Size Volume Volume Margin Profitg Volume … …
  • 11. 11 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted • Dask is a light-weight open-source library for parallel computing in Python • Scales from a laptop to large clusters (1000s of cores) • Popular in Python scientific community1, including banks2 • High-level “pandas-like” interfaces, integration in scikit_learn • Exposes Custom Graph API Dask exposes its low-level interface for direct graph creation Custom Graph API1 https://dask-stories.readthedocs.io 2 https://www.anaconda.com/blog/developer-blog/credit-modeling-with-dask/
  • 12. 12 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted The tale of 3 APIs High-level API using dask.dataframe df = dask.dataframe.read_csv('*.csv') df.groupby(df.account_id) .balance .sum() df.compute()
  • 13. 13 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted The tale of 3 APIs Explicit graph creating unig Custom Graph API or @dask.delayed: def inc(x): return x + 1 def mul(x, y): return x * y graph = { 'z' = (inc, 5), 'w' = (mul, 'z', 7) } dask.get(graph, 'w')
  • 14. 14 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted @model(ind_size="Industry Size", mkt_share="Market Share", output="Volume"): def f(ind_size, mkt_share) : return ind_size * mkt_share Extend python functions with annotations of inputs/outputs internal parameter name external data label ↓ generate Dask graph entry {"Volume": (f, "Industry Size", "Market Share")} Market Share f Industry Size Volume
  • 15. 15 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted The tale of 3 APIs Automatic graph generation using @model @model(ind_size="Industry Size", ..., output="Volume"): def f(ind_size, mkt_share) : return ind_size * mkt_share @model(vol="Volume", margin="Margin", output="Profit"): def g(vol, margin): return vol * margin graph = join(f, g, ...) dask.get(graph, 'Profit')
  • 16. 16 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted High Level API Custom Graph API @model Functions def inc(x): ... def mul(x, y): ... @model(ind_size=...): def f(ind_size, mkt_share): ... @model(vol=...): def g(vol, margin): ... Structure df = dd.read_csv('*.csv') df.groupby(df.account_id) .balance .sum() graph = { 'z' = (inc, 5), 'w' = (mul, 'z', 7), ... } graph = compose(f, g, ...) The tale of 3 APIs AlgorithmOperations Dependencies
  • 17. 17 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted Keep a central definition of all data elements: • Industry Size (int, units: “mUSD”): Annual volume on the US market • Market Share (float, units: “%”): … • Volume … • Margin… • Profit … Global Data Dictionary for consistency and clarity Market Share Volumef Industry Size Margin Profitg
  • 18. 18 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted Bottom-up composition for incremental build-out Function Model Ensemble Single unit of modeling logic (formula, calibrated model, business process) Multiple functions connected via data dependencies, representing products or business segments Models integrated together to represent entire business units, legal entities or modeling areas. Unit tests Regression tests Integration tests Z = X+Y X Y Z
  • 19. Benefits Advantages of an integrated solution
  • 20. 20 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted  Automated connections make changes simple: – Compose, break up, re-use, replace – Champion/Challenger, version upgrade/rollback  Separation of code/data make testing flexible: – Use of different inputs and environments (development, production)  Open to adjustments and extensions – “Overrides” (abs. or rel.) to selected nodes – Node/edge properties (units, types, …) Combined dependency graph for flexible development
  • 21. 21 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted • Dependency analysis, • Sensitivity testing (external inputs and internal assumptions or intermediate values), • Back-testing, • “What-if” analysis, • Reverse Stress-testing, • Support in decision making, • Reinforcement-learning Integrated framework for end-to-end analysis and robust execution Macro Economic Variables Overrides Financials Business 1 models Business 2 models Business 3 models Scenario analysis Sensitivity analysis … Business Planning Stress Testing … Risk Integrated Framework Review & Analysis
  • 23. 23 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted Huge thanks to OSS maintainers of modeling libraries (Pandas, NumPy, SciPy, Dask, NetworkX), tools (jupyter , Sphinx, PyLint, PyTest) and many others Role of OSS in large financial corporations has changed and is accelerating How you can help? Consumption Highlight use of open source in projects Contribution Establish a firm-wide policy Publication Find suitable pilot projects Prohibited Isolated Supported Encouraged
  • 24. 24 | Using Dask for large systems of Financial models | PyData NYC 2018 Unrestricted • Large financial modeling projects face unique challenges arising from a disparity of models and data • Strict rules and technology can help in building a flexible development and execution infrastructure • Automation opens doors to further analysis and re-use that would otherwise not be possible • Open source is a key enabler for even small teams to build great projects quickly Summary