SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
How to Make Your Data Scientists Happy
A use-case backed approach for enabling data science in enterprise
April 2018
ANACONDACON 2018
HUSSAIN SULTAN
WASHINGTON DC
Leader in computational
Python development and
Data Science
Amazon and Capital One
Consulting clients: leading
Fintech lenders and
mega-regional banks
TIM HORAN
WASHINGTON DC
10 years of consumer lending
Led US Credit Card Valuations
at Capital One
Consulting clients: leading
market place installment loan
lender and global 100 banks
Introduction
3
Explosion of Data
Modern Analytics
Analytics and data management technology have
progressed significantly in the last 10 years
Cloud Computing
Software Development
Predictive Analytics
Open Source
Infrastructure Automation
90% of today's data was created in the last two
years1
$219.6 billion spent globally on public cloud
services in 2016 and predicted to be $411 billion
by 20202
The line between software development and
sustainable analysis is blurring
The hive-mind of open source clearly has a
space in modern analytics as enterprise
solutions build on top and around it
Low cost compute and storage makes
Machine Learning and Artificial Intelligence
accessible
By the end of 2018, spending on IT-as-a-Service
for data centers, software, and services will be
just under $550 billion worldwide3
1IBM 10 Key Marketing Trends for 2017 - https://ibm.co/2y0r7Ee
2Gartner Press Release - http://gtnr.it/2Fw5LmJ
3Deloitte Technology, Media, and Telecommunications Predictions 2017 - http://bit.ly/2jMYdwm
4
In 2014, Gartner Research predicted 60% of
Big Data projects through 2017 would be failures.
When 2017 rolled around ...
Despite significant investment by enterprises to embrace
Big Data and modern analytics, most efforts are failing.
5
In 2014, Gartner Research predicted 60% of
Big Data projects through 2017 would be failures.
When 2017 rolled around ...
Despite significant investment by enterprises to embrace
Big Data and modern analytics, most efforts are failing.
6
We blame unhappy Data Scientists
7
Let’s start with a game
8
Let’s start with a game
9
Who are your Data Scientists, and what do they do?
Biz Analyst Data Scientist Developer Data Engineer DevOps
Business Insight Generation
Model Building
Insight / Model Deployment
Analytical Tool Creation
Data Science Enablement
Data Management
1Leveraging Base Framework from Anaconda – Journey to Open Data Science - http://bit.ly/2FyvpHD
10
Who are your Data Scientists, and what do they do?
Biz Analyst Data Scientist Developer Data Engineer DevOps
Business Insight Generation
Model Building
Insight / Model Deployment
Analytical Tool Creation
Data Science Enablement
Data Management
Data Scientists play a critical bridge role between
Biz Analysts and traditional IT roles in enterprise
1Leveraging Base Framework from Anaconda – Journey to Open Data Science - http://bit.ly/2FyvpHD
11
Who are your Data Scientists, and what do they do?
Biz Analyst Data Scientist Developer Data Engineer DevOps
Business Insight Generation
Model Building
Insight / Model Deployment
Analytical Tool Creation
Data Science Enablement
Data Management
Deployment in enterprise requires the
most coordination across teams
1Leveraging Base Framework from Anaconda – Journey to Open Data Science - http://bit.ly/2FyvpHD
12
How to make your Data Scientists happy
13
How to make your Data Scientists happy
14
Data Scientists want to drive change in their organization
using Data
It begins with the first word, Data …
Tools to get the job done ...
Transparent path from insight to impact …
• Raw data handled and stored consistently to eliminate data silos
• Metadata readily available, in particular, lineage working backward to raw data sources
• A well understood and thoughtful data access process
• Open source first and foremost (Python/Anaconda, R)
• Scaled Data Science platform to enable interactive exploration and visualization
• Thoughtful and well understood open source governance process
• Automated workflows to deploy new insights to market and monitor results
• At minimum, transparency on how to bring insights to market/production
15
It begins with the first word, Data …
Tools to get the job done ...
Transparent path from insight to impact …
• Raw data handled and stored consistently to eliminate data silos
• Metadata readily available, in particular, lineage working backward to raw data sources
• A well understood and thoughtful data access process
• Open source first and foremost (Python/Anaconda, R)
• Scaled Data Science platform to enable interactive exploration and visualization
• Thoughtful and well understood open source governance process
• Automated workflows to deploy new insights to market and monitor results
• At minimum, transparency on how to bring insights to market/production
A path from insight to implementation is consistently the largest gap to
successful ”Big Data” / modern analytics projects.
Data Scientists want to drive change in their organization
using Data
16
A Common “Big Data” Project Life Cycle
Production
Infrastructure
Analytics and
Monitoring
Stack
Implementation
Process
On-Prem Hadoop or
Cloud Database Scaled Data Science
Environment
Data Scientists
New Insights:
• Model
• Strategy
Parallel Modernization Lab or Center of Excellence
Enterprise BAU
(Business as Usual) Solution
Biz Analyst /
Data Scientists
New ETL
Process
New Implementation Process
17
A Common “Big Data” Project Life Cycle
Production
Infrastructure
Analytics and
Monitoring
Stack
Implementation
Process
On-Prem Hadoop or
Cloud based data
solution Scaled Data Science
Environment
Data Scientists
New Insights:
• Model
• Strategy
Parallel Modernization Lab or Center of Excellence
Enterprise BAU
(Business as Usual) Solution
Biz Analyst /
Data Scientists
New ETL
Process
New Implementation Process
Common Challenge #1
• Key performance indicator for new ETL focused on moving as
much data into lake as possible
• Data landing with limited metadata or challenging structures
• BAU solution not built on raw schema may not have been re-
created in new ETL process
18
A Common “Big Data” Project Life Cycle
Production
Infrastructure
Analytics and
Monitoring
Stack
Implementation
Process
On-Prem Hadoop or
Cloud based data
solution Scaled Data Science
Environment
Data Scientists
New Insights:
• Model
• Strategy
Parallel Modernization Lab or Center of Excellence
Enterprise BAU
(Business as Usual) Solution
Biz Analyst /
Data Scientists
New ETL
Process
New Implementation Process
Common Challenge #2
• Translation required due to separate development environments
• New technology implemented on legacy infrastructure creates
unexpected hurdles or brick walls
• Production implementation requires buy-in that prototypes or
proof of concepts don’t require
19
Recommended Approach
Our modern analytics and Big Data engagements center around
an effective use case from which all software, infrastructure,
and organizational investments are informed.
Modernize analytics
infrastructure as needed
Identify a Use Case
Build use case as a
iteratively improving product
Sustain the new product
and infrastructure
20
Strategically Important
• Does the use case align with corporate imperatives?
• Will its success open the door for more use cases in
your direct team and across the broader organization?
Actionable
• Will insights or results from the use case lead to
in-market changes?
• Can insights or results drive change quickly and be
iteratively improved over time?
Material
• Can insights or results from the use case drive material
impact to the business?
Identify
a strategically-important,
actionable, material use
case to gain support and
guide your investment
21
To illustrate the approach, a specific use case, a
marketing response propensity model
Analytical Database
Biz Analyst /
Data Scientists
Linear Response
Model Specifications
Marketing Targeting
Production Intent
3rd Party
Implementation
& Data Partner
(Credit Bureau)
Marketed Prospects
Non-Marketed Prospects
22
To illustrate the approach, a specific use case, a
marketing response propensity model
Analytical Database
Biz Analyst /
Data Scientists
Linear Response
Model Specifications
Marketing Targeting
Production Intent
3rd Party
Implementation
& Data Partner
(Credit Bureau)
Marketed Prospects
Non-Marketed Prospects
Modern Analytics Use Case Litmus Test
Strategically Important: Response model used regularly to target
marketing spend – driving the growth of a critical business.
23
To illustrate the approach, a specific use case, a
marketing response propensity model
Analytical Database
Biz Analyst /
Data Scientists
Linear Response
Model Specifications
Marketing Targeting
Production Intent
3rd Party
Implementation
& Data Partner
(Credit Bureau)
Marketed Prospects
Non-Marketed Prospects
Modern Analytics Use Case Litmus Test
Actionable: There is opportunity to leverage new machine learning
techniques to build models that typically out perform traditional linear
response models. Unclear if our implementation partner can support
new model types.
24
To illustrate the approach, a specific use case, a
marketing response propensity model
Analytical Database
Biz Analyst /
Data Scientists
Linear Response
Model Specifications
Marketing Targeting
Production Intent
3rd Party
Implementation
& Data Partner
(Credit Bureau)
Marketed Prospects
Non-Marketed Prospects
Modern Analytics Use Case Litmus Test
Material: Determined by measuring the net incremental responders
generated when the model is implemented. If the juice is not worth
the squeeze don’t invest.
Legacy process and tools are marred with manual touch-points
and lack modern techniques
Marketed Prospects
Non-Marketed
Prospects
Source Systems Modeling &
Analytics
Environment
Production C
Environment
Raw Data
Processed
Data
Engineer
Enterprise Guide
Legacy process and tools are marred with manual touch-points
and lack modern techniques
Marketed Prospects
Non-Marketed
Prospects
Source Systems Modeling &
Analytics
Environment
Production C
Environment
Raw Data
Processed
Data
Engineer
Enterprise Guide
Response Model Passed
Developer
& DevOps
Implemented in Production
and Compared with
Analytics Environment
Data Scientist
Legacy process and tools are marred with manual touch-points
and lack modern techniques
Marketed Prospects
Non-Marketed
Prospects
Source Systems Modeling &
Analytics
Environment
Production C
Environment
Raw Data
Processed
Data
Engineer
Enterprise Guide
Developer
& DevOps
Implemented in Production
and Compared with
Analytics Environment
Marketing Targeting Passed
Biz Analyst
Legacy process and tools are marred with manual touch-points
and lack modern techniques
Marketed Prospects
Non-Marketed
Prospects
Source Systems Modeling &
Analytics
Environment
Production C
Environment
Raw Data
Processed
Data
Engineer
Enterprise Guide
Response Model Passed
Developer
& DevOps
Implemented in Production
and Compared with
Analytics Environment
Data Scientist
Marketing Targeting Passed
Biz Analyst
Legacy process and tools are marred with manual touch-points
and lack modern techniques
Marketed Prospects
Non-Marketed
Prospects
Source Systems Modeling &
Analytics
Environment
Production C
Environment
Raw Data
Processed
Data
Engineer
Enterprise Guide
Pain Point #1: Limited modern analytics
tool chest for response model building
Response Model Passed
Developer
& DevOps
Implemented in Production
and Compared with
Analytics Environment
Data Scientist
Marketing Targeting Passed
Biz Analyst
Legacy process and tools are marred with manual touch-points
and lack modern techniques
Marketed Prospects
Non-Marketed
Prospects
Source Systems Modeling &
Analytics
Environment
Production C
Environment
Raw Data
Processed
Data
Engineer
Enterprise Guide
Pain Point #2: Manual, bespoke testing
and go-to-production process
Response Model Passed
Developer
& DevOps
Implemented in Production
and Compared with
Analytics Environment
Data Scientist
Marketing Targeting Passed
Biz Analyst
Rather than standing up a whole new process, we focused on the
largest pain points and improved them
Marketed Prospects
Non-Marketed
Prospects
Production C
Environment
Changing either the Source Systems or Production Environments had
the most interdependencies outside of the use case, so left unchanged
Production C
Environment
Source
Systems
Rather than standing up a whole new process, we focused on the
largest pain points and improved them
Marketed Prospects
Non-Marketed
Prospects
Modeling &
Analytics
Environment
Production C
Environment
Enterprise Guide
Raw Data
Processed
Data
Engineer
Open Source
Sandbox
Modeling
Data
Ported
Data
Scientist
Replacing the overall Modeling and Analytics environment was costly and
time consuming, so we stood up a separate Open Source Sandbox
Source
Systems
Rather than standing up a whole new process, we focused on the
largest pain points and improved them
Marketed Prospects
Non-Marketed
Prospects
Production C
Environment
Enterprise Guide
XGBoost
to C
Package
Raw Data
Processed
Data
Engineer
Open Source
Sandbox
Modeling
Data
Ported
Data
Scientist
Response Model
Passed
Data Scientist
To enable Machine Learning Models like GBM (Gradient Boosting Machine), we
created an XGBoost model dump to C translation package
Modeling &
Analytics
Environment
Source
Systems
Rather than standing up a whole new process, we focused on the
largest pain points and improved them
Marketed Prospects
Non-Marketed
Prospects
Production C
Environment
Enterprise Guide
Raw Data
Processed
Data
Engineer
Open Source
Sandbox
Modeling
Data
Ported
Data
Scientist
Biz Analyst
Marketing Targeting Passed
Local Intent
Testing
To lessen the iterative, manual Marketing Targeting intent checks, we
deployed testing that verified Excel inputs against production outputs
Modeling &
Analytics
Environment
Source
Systems
Rather than standing up a whole new process, we focused on the
largest pain points and improved them
Marketed Prospects
Non-Marketed
Prospects
Production C
Environment
Enterprise Guide
XGBoost
to C
Package
Raw Data
Processed
Data
Engineer
Open Source
Sandbox
Modeling
Data
Ported
Data
Scientist
Response Model
Passed
Data Scientist
Biz Analyst
Marketing Targeting Passed
Local Intent
Testing
The initial use case deliverable enabled modern machine learning models and
lessened the manual testing previously required
Modeling &
Analytics
Environment
Source
Systems
36
Remember our Common “Big Data” Project Life Cycle
Production
Infrastructure
Analytics and
Monitoring
Stack
Implementation
Process
On-Prem Hadoop or
Cloud based data
solution Scaled Data Science
Environment
Data Scientists
New Insights:
• Model
• Strategy
Parallel Modernization Lab or Center of Excellence
Enterprise BAU
(Business as Usual) Solution
Biz Analyst /
Data Scientists
New ETL
Process
New Implementation Process
37
Our use case approach often leads to hybrid solutions
that get material results as quickly as possible
Analytics or
Monitoring Stack
Production
Infrastructure
Initial Use Case Solution
Biz Analyst /
Data Scientists
Implementation
Process
Analytics and
Monitoring Stack
New Open
Source Sandbox
GBM Conversion
Routine and Local
Intent Testing
38
As part of initial launch, we build the use case as a
product that can iteratively improve
Product
Team
Backlog
Test Build Deploy
Software Engineering Best Practices
Internal
Customer(s)
Biz Analysts
Features
Product team deliver features with a focus on continued
improvement not getting the product “done”
Machine Learning
Response Model
Illustrative Product Structure
Model iteratively improved as
more or new data is available
!
Finalize potential architecture as you iterate
Biz Analyst
Data Scientist
Computational Frameworks
Distributed Compute & Storage
Model Grid search, Distributed Model
Training, Model Conversion
Build New
Model
Automated
Model Validation
Anaconda
Repository
Historical model
versions are built and
stored for future use
Automated Builds
and Job Scheduling
Continuous
Integration
Databases/ 3rd
Party Services/
Prediction APIs
Deploy
Model Build
Package
Marketed Prospects
Non-Marketed
Prospects
40
Data Scientists want to drive change in their organization
using Data
It begins with the first word, Data …
Tools to get the job done ...
Transparent path from insight to impact …
• Raw data handled and stored consistently to eliminate data silos
• Metadata readily available, in particular, lineage working backward to raw data sources
• A well understood and thoughtful data access process
• Open source first and foremost (Python/Anaconda, R)
• Scaled Data Science platform to enable interactive exploration and visualization
• Thoughtful and well understood open source governance process
• Automated workflows to deploy new insights to market and monitor results
• At minimum, transparency on how to bring insights to market/production
41
Work Backwards from a Specific Use Case
• Identify the problem you want to solve, not the technology you
want to use.
Identify Path to Implementation ASAP
• The Path to Implementation is historically the largest challenge
to successful Big Data and Modern analytics challenges – learn
from others’ mistakes.
Think MacGyver not Michelangelo
• The goal is to get material enhancement into production as
quickly as possible. You won’t have the perfect architecture on
your first pass.
Organize Around Products
• Setting up a product team, clear customers, and a backlog, the
initial answer can be enhanced bit by bit while continuing to
drive better in production solutions.
Take Aways
42
Thank you!

Contenu connexe

Tendances

Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsSSaudia
 
Predictive and prescriptive analytics: Transform the finance function with gr...
Predictive and prescriptive analytics: Transform the finance function with gr...Predictive and prescriptive analytics: Transform the finance function with gr...
Predictive and prescriptive analytics: Transform the finance function with gr...Grant Thornton LLP
 
Data Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science CultureData Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science CultureFormulatedby
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Big Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightBig Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightSunil Ranka
 
940 sponsor gazdak_using our laptop
940 sponsor gazdak_using our laptop940 sponsor gazdak_using our laptop
940 sponsor gazdak_using our laptopRising Media, Inc.
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningHoang Nguyen
 
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Cloudera, Inc.
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudJaipaul Agonus
 
Systems of Insights: BI Trends and the Smart Tools of the Future
Systems of Insights: BI Trends and the Smart Tools of the FutureSystems of Insights: BI Trends and the Smart Tools of the Future
Systems of Insights: BI Trends and the Smart Tools of the FutureYellowfin BI
 
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...Michael Mortenson
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptopRising Media, Inc.
 
Rady School Master of Science Business Analytics (MSBA) Program Overview
Rady School Master of Science Business Analytics (MSBA) Program OverviewRady School Master of Science Business Analytics (MSBA) Program Overview
Rady School Master of Science Business Analytics (MSBA) Program OverviewUC San Diego Rady School of Management
 
Building an Effective Organizational Analytics Capability
Building an Effective Organizational Analytics CapabilityBuilding an Effective Organizational Analytics Capability
Building an Effective Organizational Analytics CapabilityJeff Crawford
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Lviv Startup Club
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 

Tendances (20)

Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Predictive and prescriptive analytics: Transform the finance function with gr...
Predictive and prescriptive analytics: Transform the finance function with gr...Predictive and prescriptive analytics: Transform the finance function with gr...
Predictive and prescriptive analytics: Transform the finance function with gr...
 
Data Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science CultureData Science Salon: Building a Data Science Culture
Data Science Salon: Building a Data Science Culture
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Big Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightBig Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to Foresight
 
Predictive analytics 2025_br
Predictive analytics 2025_brPredictive analytics 2025_br
Predictive analytics 2025_br
 
940 sponsor gazdak_using our laptop
940 sponsor gazdak_using our laptop940 sponsor gazdak_using our laptop
940 sponsor gazdak_using our laptop
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
 
Systems of Insights: BI Trends and the Smart Tools of the Future
Systems of Insights: BI Trends and the Smart Tools of the FutureSystems of Insights: BI Trends and the Smart Tools of the Future
Systems of Insights: BI Trends and the Smart Tools of the Future
 
Unit 2
Unit 2Unit 2
Unit 2
 
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
System Dynamics, Analytics & Big Data (16th Conference of the UK Chapter of t...
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 
Rady School Master of Science Business Analytics (MSBA) Program Overview
Rady School Master of Science Business Analytics (MSBA) Program OverviewRady School Master of Science Business Analytics (MSBA) Program Overview
Rady School Master of Science Business Analytics (MSBA) Program Overview
 
Building an Effective Organizational Analytics Capability
Building an Effective Organizational Analytics CapabilityBuilding an Effective Organizational Analytics Capability
Building an Effective Organizational Analytics Capability
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 

Similaire à How to make your data scientists happy

How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudPerficient, Inc.
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 
Blocks & Bots - Digital Summit Harvard Business School 2015
Blocks & Bots - Digital Summit Harvard Business School 2015Blocks & Bots - Digital Summit Harvard Business School 2015
Blocks & Bots - Digital Summit Harvard Business School 2015Mona M. Vernon
 
Building the Analytics Capability
Building the Analytics CapabilityBuilding the Analytics Capability
Building the Analytics CapabilityBala Iyer
 
Innovative Data Leveraging for Procurement Analytics
Innovative Data Leveraging for Procurement AnalyticsInnovative Data Leveraging for Procurement Analytics
Innovative Data Leveraging for Procurement AnalyticsTejari
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?SAS Canada
 
Delivering Value Through Business Analytics
Delivering Value Through Business AnalyticsDelivering Value Through Business Analytics
Delivering Value Through Business AnalyticsSocial Media Today
 
M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classmcAnalytics99
 
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...Big Data Week
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonSocietyConsulting
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 

Similaire à How to make your data scientists happy (20)

Agile BI success factors
Agile BI success factorsAgile BI success factors
Agile BI success factors
 
How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics Cloud
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old Constraints
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
Blocks & Bots - Digital Summit Harvard Business School 2015
Blocks & Bots - Digital Summit Harvard Business School 2015Blocks & Bots - Digital Summit Harvard Business School 2015
Blocks & Bots - Digital Summit Harvard Business School 2015
 
Building the Analytics Capability
Building the Analytics CapabilityBuilding the Analytics Capability
Building the Analytics Capability
 
Innovative Data Leveraging for Procurement Analytics
Innovative Data Leveraging for Procurement AnalyticsInnovative Data Leveraging for Procurement Analytics
Innovative Data Leveraging for Procurement Analytics
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?
 
Delivering Value Through Business Analytics
Delivering Value Through Business AnalyticsDelivering Value Through Business Analytics
Delivering Value Through Business Analytics
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson class
 
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
 
Big Data
Big DataBig Data
Big Data
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 

Dernier

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Dernier (20)

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

How to make your data scientists happy

  • 1. How to Make Your Data Scientists Happy A use-case backed approach for enabling data science in enterprise April 2018 ANACONDACON 2018
  • 2. HUSSAIN SULTAN WASHINGTON DC Leader in computational Python development and Data Science Amazon and Capital One Consulting clients: leading Fintech lenders and mega-regional banks TIM HORAN WASHINGTON DC 10 years of consumer lending Led US Credit Card Valuations at Capital One Consulting clients: leading market place installment loan lender and global 100 banks Introduction
  • 3. 3 Explosion of Data Modern Analytics Analytics and data management technology have progressed significantly in the last 10 years Cloud Computing Software Development Predictive Analytics Open Source Infrastructure Automation 90% of today's data was created in the last two years1 $219.6 billion spent globally on public cloud services in 2016 and predicted to be $411 billion by 20202 The line between software development and sustainable analysis is blurring The hive-mind of open source clearly has a space in modern analytics as enterprise solutions build on top and around it Low cost compute and storage makes Machine Learning and Artificial Intelligence accessible By the end of 2018, spending on IT-as-a-Service for data centers, software, and services will be just under $550 billion worldwide3 1IBM 10 Key Marketing Trends for 2017 - https://ibm.co/2y0r7Ee 2Gartner Press Release - http://gtnr.it/2Fw5LmJ 3Deloitte Technology, Media, and Telecommunications Predictions 2017 - http://bit.ly/2jMYdwm
  • 4. 4 In 2014, Gartner Research predicted 60% of Big Data projects through 2017 would be failures. When 2017 rolled around ... Despite significant investment by enterprises to embrace Big Data and modern analytics, most efforts are failing.
  • 5. 5 In 2014, Gartner Research predicted 60% of Big Data projects through 2017 would be failures. When 2017 rolled around ... Despite significant investment by enterprises to embrace Big Data and modern analytics, most efforts are failing.
  • 6. 6 We blame unhappy Data Scientists
  • 9. 9 Who are your Data Scientists, and what do they do? Biz Analyst Data Scientist Developer Data Engineer DevOps Business Insight Generation Model Building Insight / Model Deployment Analytical Tool Creation Data Science Enablement Data Management 1Leveraging Base Framework from Anaconda – Journey to Open Data Science - http://bit.ly/2FyvpHD
  • 10. 10 Who are your Data Scientists, and what do they do? Biz Analyst Data Scientist Developer Data Engineer DevOps Business Insight Generation Model Building Insight / Model Deployment Analytical Tool Creation Data Science Enablement Data Management Data Scientists play a critical bridge role between Biz Analysts and traditional IT roles in enterprise 1Leveraging Base Framework from Anaconda – Journey to Open Data Science - http://bit.ly/2FyvpHD
  • 11. 11 Who are your Data Scientists, and what do they do? Biz Analyst Data Scientist Developer Data Engineer DevOps Business Insight Generation Model Building Insight / Model Deployment Analytical Tool Creation Data Science Enablement Data Management Deployment in enterprise requires the most coordination across teams 1Leveraging Base Framework from Anaconda – Journey to Open Data Science - http://bit.ly/2FyvpHD
  • 12. 12 How to make your Data Scientists happy
  • 13. 13 How to make your Data Scientists happy
  • 14. 14 Data Scientists want to drive change in their organization using Data It begins with the first word, Data … Tools to get the job done ... Transparent path from insight to impact … • Raw data handled and stored consistently to eliminate data silos • Metadata readily available, in particular, lineage working backward to raw data sources • A well understood and thoughtful data access process • Open source first and foremost (Python/Anaconda, R) • Scaled Data Science platform to enable interactive exploration and visualization • Thoughtful and well understood open source governance process • Automated workflows to deploy new insights to market and monitor results • At minimum, transparency on how to bring insights to market/production
  • 15. 15 It begins with the first word, Data … Tools to get the job done ... Transparent path from insight to impact … • Raw data handled and stored consistently to eliminate data silos • Metadata readily available, in particular, lineage working backward to raw data sources • A well understood and thoughtful data access process • Open source first and foremost (Python/Anaconda, R) • Scaled Data Science platform to enable interactive exploration and visualization • Thoughtful and well understood open source governance process • Automated workflows to deploy new insights to market and monitor results • At minimum, transparency on how to bring insights to market/production A path from insight to implementation is consistently the largest gap to successful ”Big Data” / modern analytics projects. Data Scientists want to drive change in their organization using Data
  • 16. 16 A Common “Big Data” Project Life Cycle Production Infrastructure Analytics and Monitoring Stack Implementation Process On-Prem Hadoop or Cloud Database Scaled Data Science Environment Data Scientists New Insights: • Model • Strategy Parallel Modernization Lab or Center of Excellence Enterprise BAU (Business as Usual) Solution Biz Analyst / Data Scientists New ETL Process New Implementation Process
  • 17. 17 A Common “Big Data” Project Life Cycle Production Infrastructure Analytics and Monitoring Stack Implementation Process On-Prem Hadoop or Cloud based data solution Scaled Data Science Environment Data Scientists New Insights: • Model • Strategy Parallel Modernization Lab or Center of Excellence Enterprise BAU (Business as Usual) Solution Biz Analyst / Data Scientists New ETL Process New Implementation Process Common Challenge #1 • Key performance indicator for new ETL focused on moving as much data into lake as possible • Data landing with limited metadata or challenging structures • BAU solution not built on raw schema may not have been re- created in new ETL process
  • 18. 18 A Common “Big Data” Project Life Cycle Production Infrastructure Analytics and Monitoring Stack Implementation Process On-Prem Hadoop or Cloud based data solution Scaled Data Science Environment Data Scientists New Insights: • Model • Strategy Parallel Modernization Lab or Center of Excellence Enterprise BAU (Business as Usual) Solution Biz Analyst / Data Scientists New ETL Process New Implementation Process Common Challenge #2 • Translation required due to separate development environments • New technology implemented on legacy infrastructure creates unexpected hurdles or brick walls • Production implementation requires buy-in that prototypes or proof of concepts don’t require
  • 19. 19 Recommended Approach Our modern analytics and Big Data engagements center around an effective use case from which all software, infrastructure, and organizational investments are informed. Modernize analytics infrastructure as needed Identify a Use Case Build use case as a iteratively improving product Sustain the new product and infrastructure
  • 20. 20 Strategically Important • Does the use case align with corporate imperatives? • Will its success open the door for more use cases in your direct team and across the broader organization? Actionable • Will insights or results from the use case lead to in-market changes? • Can insights or results drive change quickly and be iteratively improved over time? Material • Can insights or results from the use case drive material impact to the business? Identify a strategically-important, actionable, material use case to gain support and guide your investment
  • 21. 21 To illustrate the approach, a specific use case, a marketing response propensity model Analytical Database Biz Analyst / Data Scientists Linear Response Model Specifications Marketing Targeting Production Intent 3rd Party Implementation & Data Partner (Credit Bureau) Marketed Prospects Non-Marketed Prospects
  • 22. 22 To illustrate the approach, a specific use case, a marketing response propensity model Analytical Database Biz Analyst / Data Scientists Linear Response Model Specifications Marketing Targeting Production Intent 3rd Party Implementation & Data Partner (Credit Bureau) Marketed Prospects Non-Marketed Prospects Modern Analytics Use Case Litmus Test Strategically Important: Response model used regularly to target marketing spend – driving the growth of a critical business.
  • 23. 23 To illustrate the approach, a specific use case, a marketing response propensity model Analytical Database Biz Analyst / Data Scientists Linear Response Model Specifications Marketing Targeting Production Intent 3rd Party Implementation & Data Partner (Credit Bureau) Marketed Prospects Non-Marketed Prospects Modern Analytics Use Case Litmus Test Actionable: There is opportunity to leverage new machine learning techniques to build models that typically out perform traditional linear response models. Unclear if our implementation partner can support new model types.
  • 24. 24 To illustrate the approach, a specific use case, a marketing response propensity model Analytical Database Biz Analyst / Data Scientists Linear Response Model Specifications Marketing Targeting Production Intent 3rd Party Implementation & Data Partner (Credit Bureau) Marketed Prospects Non-Marketed Prospects Modern Analytics Use Case Litmus Test Material: Determined by measuring the net incremental responders generated when the model is implemented. If the juice is not worth the squeeze don’t invest.
  • 25. Legacy process and tools are marred with manual touch-points and lack modern techniques Marketed Prospects Non-Marketed Prospects Source Systems Modeling & Analytics Environment Production C Environment Raw Data Processed Data Engineer Enterprise Guide
  • 26. Legacy process and tools are marred with manual touch-points and lack modern techniques Marketed Prospects Non-Marketed Prospects Source Systems Modeling & Analytics Environment Production C Environment Raw Data Processed Data Engineer Enterprise Guide Response Model Passed Developer & DevOps Implemented in Production and Compared with Analytics Environment Data Scientist
  • 27. Legacy process and tools are marred with manual touch-points and lack modern techniques Marketed Prospects Non-Marketed Prospects Source Systems Modeling & Analytics Environment Production C Environment Raw Data Processed Data Engineer Enterprise Guide Developer & DevOps Implemented in Production and Compared with Analytics Environment Marketing Targeting Passed Biz Analyst
  • 28. Legacy process and tools are marred with manual touch-points and lack modern techniques Marketed Prospects Non-Marketed Prospects Source Systems Modeling & Analytics Environment Production C Environment Raw Data Processed Data Engineer Enterprise Guide Response Model Passed Developer & DevOps Implemented in Production and Compared with Analytics Environment Data Scientist Marketing Targeting Passed Biz Analyst
  • 29. Legacy process and tools are marred with manual touch-points and lack modern techniques Marketed Prospects Non-Marketed Prospects Source Systems Modeling & Analytics Environment Production C Environment Raw Data Processed Data Engineer Enterprise Guide Pain Point #1: Limited modern analytics tool chest for response model building Response Model Passed Developer & DevOps Implemented in Production and Compared with Analytics Environment Data Scientist Marketing Targeting Passed Biz Analyst
  • 30. Legacy process and tools are marred with manual touch-points and lack modern techniques Marketed Prospects Non-Marketed Prospects Source Systems Modeling & Analytics Environment Production C Environment Raw Data Processed Data Engineer Enterprise Guide Pain Point #2: Manual, bespoke testing and go-to-production process Response Model Passed Developer & DevOps Implemented in Production and Compared with Analytics Environment Data Scientist Marketing Targeting Passed Biz Analyst
  • 31. Rather than standing up a whole new process, we focused on the largest pain points and improved them Marketed Prospects Non-Marketed Prospects Production C Environment Changing either the Source Systems or Production Environments had the most interdependencies outside of the use case, so left unchanged Production C Environment Source Systems
  • 32. Rather than standing up a whole new process, we focused on the largest pain points and improved them Marketed Prospects Non-Marketed Prospects Modeling & Analytics Environment Production C Environment Enterprise Guide Raw Data Processed Data Engineer Open Source Sandbox Modeling Data Ported Data Scientist Replacing the overall Modeling and Analytics environment was costly and time consuming, so we stood up a separate Open Source Sandbox Source Systems
  • 33. Rather than standing up a whole new process, we focused on the largest pain points and improved them Marketed Prospects Non-Marketed Prospects Production C Environment Enterprise Guide XGBoost to C Package Raw Data Processed Data Engineer Open Source Sandbox Modeling Data Ported Data Scientist Response Model Passed Data Scientist To enable Machine Learning Models like GBM (Gradient Boosting Machine), we created an XGBoost model dump to C translation package Modeling & Analytics Environment Source Systems
  • 34. Rather than standing up a whole new process, we focused on the largest pain points and improved them Marketed Prospects Non-Marketed Prospects Production C Environment Enterprise Guide Raw Data Processed Data Engineer Open Source Sandbox Modeling Data Ported Data Scientist Biz Analyst Marketing Targeting Passed Local Intent Testing To lessen the iterative, manual Marketing Targeting intent checks, we deployed testing that verified Excel inputs against production outputs Modeling & Analytics Environment Source Systems
  • 35. Rather than standing up a whole new process, we focused on the largest pain points and improved them Marketed Prospects Non-Marketed Prospects Production C Environment Enterprise Guide XGBoost to C Package Raw Data Processed Data Engineer Open Source Sandbox Modeling Data Ported Data Scientist Response Model Passed Data Scientist Biz Analyst Marketing Targeting Passed Local Intent Testing The initial use case deliverable enabled modern machine learning models and lessened the manual testing previously required Modeling & Analytics Environment Source Systems
  • 36. 36 Remember our Common “Big Data” Project Life Cycle Production Infrastructure Analytics and Monitoring Stack Implementation Process On-Prem Hadoop or Cloud based data solution Scaled Data Science Environment Data Scientists New Insights: • Model • Strategy Parallel Modernization Lab or Center of Excellence Enterprise BAU (Business as Usual) Solution Biz Analyst / Data Scientists New ETL Process New Implementation Process
  • 37. 37 Our use case approach often leads to hybrid solutions that get material results as quickly as possible Analytics or Monitoring Stack Production Infrastructure Initial Use Case Solution Biz Analyst / Data Scientists Implementation Process Analytics and Monitoring Stack New Open Source Sandbox GBM Conversion Routine and Local Intent Testing
  • 38. 38 As part of initial launch, we build the use case as a product that can iteratively improve Product Team Backlog Test Build Deploy Software Engineering Best Practices Internal Customer(s) Biz Analysts Features Product team deliver features with a focus on continued improvement not getting the product “done” Machine Learning Response Model Illustrative Product Structure Model iteratively improved as more or new data is available !
  • 39. Finalize potential architecture as you iterate Biz Analyst Data Scientist Computational Frameworks Distributed Compute & Storage Model Grid search, Distributed Model Training, Model Conversion Build New Model Automated Model Validation Anaconda Repository Historical model versions are built and stored for future use Automated Builds and Job Scheduling Continuous Integration Databases/ 3rd Party Services/ Prediction APIs Deploy Model Build Package Marketed Prospects Non-Marketed Prospects
  • 40. 40 Data Scientists want to drive change in their organization using Data It begins with the first word, Data … Tools to get the job done ... Transparent path from insight to impact … • Raw data handled and stored consistently to eliminate data silos • Metadata readily available, in particular, lineage working backward to raw data sources • A well understood and thoughtful data access process • Open source first and foremost (Python/Anaconda, R) • Scaled Data Science platform to enable interactive exploration and visualization • Thoughtful and well understood open source governance process • Automated workflows to deploy new insights to market and monitor results • At minimum, transparency on how to bring insights to market/production
  • 41. 41 Work Backwards from a Specific Use Case • Identify the problem you want to solve, not the technology you want to use. Identify Path to Implementation ASAP • The Path to Implementation is historically the largest challenge to successful Big Data and Modern analytics challenges – learn from others’ mistakes. Think MacGyver not Michelangelo • The goal is to get material enhancement into production as quickly as possible. You won’t have the perfect architecture on your first pass. Organize Around Products • Setting up a product team, clear customers, and a backlog, the initial answer can be enhanced bit by bit while continuing to drive better in production solutions. Take Aways