This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/xc3j20Om3UM
Description:
Data science is indeed one of the sexy jobs of the 21st century. But it is also a lot of hard work. And the hard work is seldom about the math or the algorithms. It is about building relevant machine learning products for the real world. We will go over some of the must-haves as you take your machine learning model out of the sandbox and make it work in the big, bad world outside.
Speaker's Bio:
Krish Swamy is an experienced professional with deep skills in applying analytics and BigData capabilities to challenging business problems and driving customer insights. Krish's analytic experience includes marketing and pricing, credit risk, digital analytics and most recently, big data analytics and data transformation. His key experiences lie in banking and financial services, the digital customer experience domain, with a background in management consulting. Other key skills include influencing organizational change towards a data and analytics driven culture, and building teams of analysts, statisticians and data scientists.
Helping data scientists escape the seduction of the sandbox - Krish Swamy, Wells Fargo
1. Krish Swamy
Senior Vice President, Enterprise Analytics and Data Science
Wells Fargo
linkedin.com/in/krishswamy
Helping Data Scientists
escape the seduction
of the SandBox
2. Krish Swamy
• Passionate about transforming companies and adding value to
customers using the power of information
• Not a data scientist – but a Data.Scientist
• If I were to retire right at this moment – I’d do something interesting
around classical music appreciation for the masses
3. Why do Data Scientists love the Sandbox?
• The data is always clean
• You have access to the best toys tools
• The models ALWAYS look great
• (what’s not to like about it????)
4. Business value of data scientists in a sandbox
- $$ + $$$$
How do you get data scientists out of the sandbox
and into the real world?
5. At Wells Fargo, we started out by building out
a world-class machine learning platform
Data
Provisioning
Points
Model training area
(Spark, H2O, other ML
packages)
Model scoring area
Consuming
Applications
Enterprise Service Bus
GitHub
6. Next, we obsessed about operationalizing our
models
• Business and People Integration
• Systems Integration
• Code and Data IntegrationA technical “solution” which
is INCOMPLETE
7. Business Integration - solving the right problem
in an effective, sustainable manner
Identifying problems that
have strategic priority for the
business
Seeing the problem through
the lens of the decision
maker
Engaging closely with the
business SMEs throughout
the model development
lifecycle
Engaging with business
partners at a strategic level
to understand their key
immediate priorities and
where machine learning can
make a difference
Understand the decision
making process and its
constraints, so that a
solution is immediately
relevant
“Walking” through the
solution with business users
so that their needs are being
addressed
Engage with business users
formally every two weeks
throughout the life of the
project
8. Systems integration – identifying (and solving)
for technology dependencies
Early conversations with
architecture and ensure
alignment
Design for three kinds of
model deployment
(Most) Data Science
solutions always have data
management implications
Also operational application
implications where the
model score is expected to
“live”
Planning for three generic
forms of operationalizing
solutions
9. It is important to integrate with the data flows
in the company
Applications Applications Applications Applications
Operational Data Flows
Analytic Datamarts
X X ML Model
Training
Data
Transformation
and Cleansing
ML Model
Scoring
10. Three general forms of model deployment
Data
Application
{REST} API
Data
Batch Scored
Where what you need is
historical data
Example:
Targeting models for marketing
campaigns, House price
predictions, Risk Management
ML Model
Scoring
Model as a Service
Predictions rely on real time +
historical data
Example:
Fraud prediction
Embedded Models or “AI at
the Edge”
Predictions rely on real time data
Example:
Home Assistants
11. Code and data integration – ensuring efficient
and complete translation of model intent
Code Integration Data Integration
Ensuring consistent tooling
between model
development and run-time
environments
Minimize hand-offs between
data science and tech,
minimize code rewrite
Extensive use of H2O and
PySpark
In (almost) every case,
model scoring and model
training data will come from
different sources
In that case,
12. You don’t want just a math whiz … they also
need to be a solid process engineer
Model Prediction = F ( )
Historical Data Real Time DataNear Real
Time Data
Model Training Sample
M
O
D
E
L
Model Scoring Sample
M
O
D
E
LData Gap
Time
13. And finally, it is critical to get the right data
scientists and put them in the right structure
• The “traditional” Data Scientist • Additionally,
• Process Discipline
• Skepticism
• Tenacity
14. To recap, getting data scientists out of the
sandbox depends on 3 things
Build the top-down Strategic
Alignment for the program
Obsess about
Operationalization
Hire and Organize great
talent
15. We’re hiring!
• Visit http://www.wellsfargo.com/about/careers to apply!
Job ID Job Title
5408830 Data Scientist/Analytic Consultant 3
5398760 Data Scientist/Analytic Consultant 4
5398594 Data Scientist/Analytic Consultant 4
5398649 Senior Data Scientist/Analytic Consultant 5
5397949 Senior Data Scientist/Analytic Consultant 5
Job ID Job Title
5406935 Apps Systems Engineer 5 – Java Developer
5406943 Analytic Consultant 5 – Business Intelligence Engineer
5408557 Machine Learning Platform – Linux, Python, R
Administrator
5408567 ASE6 - AI Platform – Spark & Elastic Search Administrator
5408593 AI Platform Engineer - HPC/Linux Administrator – Apps
Systems Engineer 5
5401110 ML & NLP Data Engineer
5405910 ASE6 AI Platform Engineer
5406923 Apps Systems Engineer
Data Management & Insights –
Enterprise Analytics & Data Science
Innovation Group –
Artificial Intelligence Enterprise Solutions
16. Thank you - and questions!!!
• Krishnan.swamy@wellsfargo.com