SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Patterns for Success in
Data Science Engagements
Dr. David Michel
Overview
Overview
ThoughtWorks has been expanding the quantity and depth of our data related engagements across the
EU and globally under our Intelligent Empowerment offering.
This talk will focus on lessons learned from 4 short term engagements with four different clients over the
last 18 months
● Length of engagements varied between 2 weeks - 2 months
● Commonalities/differences in necessary approach
● Themes for success
● Pitfalls to avoid
Project Summary
CLIENT 1 CLIENT 2 CLIENT 3 CLIENT 4
Web presence for South
American media conglomerate
Home recipe delivery service
offered by major UK retailer
Major UK automobile reseller Major UK retailer
Wanted to predict age/sex of
anonymous users based on
behaviour of registered users
Wanted to more accurately
predict demand for new recipes
and different combinations of
existing ones
Wanted better insight and put
themselves on the path towards
making more data driven
decisions
Wanted to determine optimum
shelf capacity for preset product
range
Existing model and infrastructure
in place
Set of heuristics in use Lots of reports and excel
spreadsheets. No modelling to
speak of
Existing tool in use. Series of
SQL queries run directly off data
warehouse
Client 1
Client 1 (Web Branch for Media Company)
● Problem clearly defined
○ Identify men/women
○ Identify men 18-35, women 25-49
● Data clearly accessible (though latency was high)
○ Jupyter sandbox with access to BigQuery data store
● Existing model in place
○ XGBoost with ~250 features
○ Updated weekly with ~2 hour training time
● Metrics in place (though likely suboptimal)
○ Accuracy in all three demos as defined by Nielsen
Client 1 (Web Branch for Media Company)
● Limited time period (5 weeks) required scaling level of
ambition and comprehensiveness of work
● Focus limited to three areas:
○ Quality of training data
○ Time period over which features were aggregated
○ Sub selection of training data to better serve usecase
● Emphasis placed on logging and reproducibility of results
Results in Six Graphs
Client 2
Client 2 (Home Recipe Box)
● Problem clearly defined
○ Better forecast demand three weeks in advance of recipe offerings (when ingredients are
ordered) to lower waste
● Data clearly accessible (and small)
○ ~35 week order history comprising ~6000 orders from ~5000 unique customers
● No model in place
○ Set of heuristics whose usefulness were visibly depreciating over time as number of recipes
and variety combinations increased
● Metrics in place (though likely suboptimal)
○ Percentage over/undershoot of prediction compared to actual orders
Client 2 (Home Recipe Box)
● Limited time period (3 weeks to deliver)
● Emphasis placed on creating functional forecasting tool and with large amounts of time
budgeted for training and handover
● Self updating forecasting/visualisation tool built in Colaboratory Notebook
○ Recipe metadata and historical sales used to retrain random forest (initial attempts with
regularised linear models were problematic) weekly with rolling 10 week windows of
historical sales
Results
● Internal forecasting for 9 weeks worth of data:
○ Mean error: 2.6%
○ Median error: 2.0%
○ Max overestimate: 10%
○ Max underestimate: 10%
● New model:
○ Mean error: 2.0%
○ Median error: 1.5%
○ Max overestimate: 7.2%
○ Max underestimate: 8.7%
Client 3
Client 3 (B2B Auto reseller)
● No real problem defined
○ Interest in sales channel allocation, but no internal agreement on desired output
● Data clearly accessible (in varying degrees of quality and latency)
○ SQL server 2008 enterprise warehouse with vehicle information
○ Refurb and auction information available only via spreadsheets downloaded from partner
portals
● No existing model in place
● Metrics for success not defined
Client 3 (B2B Auto reseller)
● 2 weeks to investigate available data and provide POC
● Lots of room to choose right/wrong problem
● Area chosen was POC for sale price forecasting based on channel and vehicle specifications
● Was thought to be the lowest hanging fruit that would allow for higher return on
investment for each asset
○ Website vs Auction
○ Optimal refurb parameters for specific vehicles
Results
Auction
Website
Refurb
Client 4
Client 4 (Major Grocer)
● Problem (reasonably) well defined
○ Investigate efficacy of current tool in use to determine shelf capacity
(given fixed product range)
● Data accessible, but not discoverable and with numerous
(often conflicting) sources of truth
● No existing model in place
○ Existing product used fixed calculation that was run via a series of SQL
queries inside data warehouse
● No metrics in use to benchmark existing product
Client 4 (Major Grocer)
● ~8 weeks to investigate
● With no metric to evaluate existing metric this was the obvious place to start
● What makes an ideal shelf capacity:
○ Availability (minimise lost sales)
○ Minimise labour costs re: stocking
○ Minimise waste
● Versions of these of varying quality/usefulness available internally
Results: Conceptual Metric
Tool Output
Store Info
Prod Info
Shelf Cap History
Lost Sales History
Labour History
Waste History
Lost Sales
Forecast
Labour
Forecast
Waste
Forecast
∑ Metric
Results: Forecasting
● Two years sample data
● Subset of stores and products
○ ~500 “essential” products
○ 2 stores of varying design and location (most recent 20% set aside for validation)
● Forecasting POC done for labour costs and lost sales
○ Cross validation grid with PCA used to reduce feature space
○ Random forest regressor gave better results than regularised linear models
Results: Lost Sales Forecasting
Results: Labour Forecasting
Lessons
learned
Important Questions to Ask
Technology/information
● Do they have historical data and what is its
consistency?
● Multiple sources?
● Access in volume and at speed?
● Discoverable?
Enthusiasm
● Have they defined a problem or class of
problems they would like to solve?
● How comfortable are they with more
modern ML/AI based approaches?
Important Questions to Ask
● Measurement of outcomes
○ Is there a metric or metrics in place to optimise for?
○ Do said metrics relate to valuable business outcomes in a meaningful way?
● History/Reproducibility
○ What have they tried before?
○ Have those efforts been logged in a way such that they are accessible and understandable?
Takeaways
● Start simple
○ Easier to solve problems can often help quickly sway the unconverted and there is usually
some obvious low-hanging fruit
○ Less complex models are easier to explain, train and maintain
● Align goals and expectations
○ Agree on metrics and what they actually represent
○ Call out any disconnects between KPIs and the value the client believes they represent
○ Take the time to explain ways of working and potential outcomes to team members
Takeaways
● Demonstrate value
○ In cases where clients are suspicious/unconvinced of new methodologies, easy wins and new
knowledge trump elegant solutions
● Invest time in knowledge transfer and training
○ Do your best to log your efforts (especially those that were unsuccessful) in a manner easier
accessible to potential future investigators
○ If you’re going to leave what you’ve created in someone else’s hands they should be
comfortable maintaining it
Thank you

Contenu connexe

Similaire à Patterns for Success in Data Science Engagements

Digicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp
 
Yaswanthreddy- 4.4Yrs Exp Product Management
Yaswanthreddy- 4.4Yrs Exp Product ManagementYaswanthreddy- 4.4Yrs Exp Product Management
Yaswanthreddy- 4.4Yrs Exp Product ManagementYASWANTH REDDY KETHIREDDY
 
Data Drive Better Sales Conversions - Dawn of the Data Age Lecture Series
Data Drive Better Sales Conversions  - Dawn of the Data Age Lecture SeriesData Drive Better Sales Conversions  - Dawn of the Data Age Lecture Series
Data Drive Better Sales Conversions - Dawn of the Data Age Lecture SeriesLuciano Pesci, PhD
 
Customer Research For Product Managers - Dawn of The Data Age Lecture Series
Customer Research For Product Managers - Dawn of The Data Age Lecture SeriesCustomer Research For Product Managers - Dawn of The Data Age Lecture Series
Customer Research For Product Managers - Dawn of The Data Age Lecture SeriesLuciano Pesci, PhD
 
Supply Chain Strategy Assessment
Supply Chain Strategy AssessmentSupply Chain Strategy Assessment
Supply Chain Strategy AssessmentChief Innovation
 
Legacy Content: Applying your new content strategy to old information
Legacy Content: Applying your new content strategy to old informationLegacy Content: Applying your new content strategy to old information
Legacy Content: Applying your new content strategy to old informationSalesforce Engineering
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsTasktop
 
An Ounce of Validation = a Pound of Pivot by LinkedIn PM
An Ounce of Validation = a Pound of Pivot by LinkedIn PMAn Ounce of Validation = a Pound of Pivot by LinkedIn PM
An Ounce of Validation = a Pound of Pivot by LinkedIn PMProduct School
 
GraphTour London 2020 - Customer Journey
GraphTour London 2020  - Customer Journey GraphTour London 2020  - Customer Journey
GraphTour London 2020 - Customer Journey Neo4j
 
Leverage The Power of Small Data
Leverage The Power of Small DataLeverage The Power of Small Data
Leverage The Power of Small DataKaryn Zuidinga
 
"From Insights to Production with Big Data Analytics", Eliano Marques, Senior...
"From Insights to Production with Big Data Analytics", Eliano Marques, Senior..."From Insights to Production with Big Data Analytics", Eliano Marques, Senior...
"From Insights to Production with Big Data Analytics", Eliano Marques, Senior...Dataconomy Media
 
Process Redesign or Improvement Approach Options
Process Redesign or Improvement Approach OptionsProcess Redesign or Improvement Approach Options
Process Redesign or Improvement Approach OptionsChief Innovation
 
Agile practices for management
Agile practices for managementAgile practices for management
Agile practices for managementIcalia Labs
 
The How, Why and What of Metrics?
The How, Why and What of Metrics?The How, Why and What of Metrics?
The How, Why and What of Metrics?The Wisdom Daily
 
How to Use Competitive Analysis and Strategy by YouTube PM
How to Use Competitive Analysis and Strategy by YouTube PMHow to Use Competitive Analysis and Strategy by YouTube PM
How to Use Competitive Analysis and Strategy by YouTube PMProduct School
 
Growth Hacking Master Class
Growth Hacking Master ClassGrowth Hacking Master Class
Growth Hacking Master ClassDani Hart
 

Similaire à Patterns for Success in Data Science Engagements (20)

Digicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics Apps
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Yaswanthreddy- 4.4Yrs Exp Product Management
Yaswanthreddy- 4.4Yrs Exp Product ManagementYaswanthreddy- 4.4Yrs Exp Product Management
Yaswanthreddy- 4.4Yrs Exp Product Management
 
Data Drive Better Sales Conversions - Dawn of the Data Age Lecture Series
Data Drive Better Sales Conversions  - Dawn of the Data Age Lecture SeriesData Drive Better Sales Conversions  - Dawn of the Data Age Lecture Series
Data Drive Better Sales Conversions - Dawn of the Data Age Lecture Series
 
Customer Research For Product Managers - Dawn of The Data Age Lecture Series
Customer Research For Product Managers - Dawn of The Data Age Lecture SeriesCustomer Research For Product Managers - Dawn of The Data Age Lecture Series
Customer Research For Product Managers - Dawn of The Data Age Lecture Series
 
Supply Chain Strategy Assessment
Supply Chain Strategy AssessmentSupply Chain Strategy Assessment
Supply Chain Strategy Assessment
 
Legacy Content: Applying your new content strategy to old information
Legacy Content: Applying your new content strategy to old informationLegacy Content: Applying your new content strategy to old information
Legacy Content: Applying your new content strategy to old information
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
 
An Ounce of Validation = a Pound of Pivot by LinkedIn PM
An Ounce of Validation = a Pound of Pivot by LinkedIn PMAn Ounce of Validation = a Pound of Pivot by LinkedIn PM
An Ounce of Validation = a Pound of Pivot by LinkedIn PM
 
GraphTour London 2020 - Customer Journey
GraphTour London 2020  - Customer Journey GraphTour London 2020  - Customer Journey
GraphTour London 2020 - Customer Journey
 
Yaswanth reddy 4.7 years.docx
Yaswanth reddy 4.7 years.docxYaswanth reddy 4.7 years.docx
Yaswanth reddy 4.7 years.docx
 
Leverage The Power of Small Data
Leverage The Power of Small DataLeverage The Power of Small Data
Leverage The Power of Small Data
 
Big Data Analytics: From Insights to Production
Big Data Analytics: From Insights to ProductionBig Data Analytics: From Insights to Production
Big Data Analytics: From Insights to Production
 
"From Insights to Production with Big Data Analytics", Eliano Marques, Senior...
"From Insights to Production with Big Data Analytics", Eliano Marques, Senior..."From Insights to Production with Big Data Analytics", Eliano Marques, Senior...
"From Insights to Production with Big Data Analytics", Eliano Marques, Senior...
 
OKR framing (1).pdf
OKR framing (1).pdfOKR framing (1).pdf
OKR framing (1).pdf
 
Process Redesign or Improvement Approach Options
Process Redesign or Improvement Approach OptionsProcess Redesign or Improvement Approach Options
Process Redesign or Improvement Approach Options
 
Agile practices for management
Agile practices for managementAgile practices for management
Agile practices for management
 
The How, Why and What of Metrics?
The How, Why and What of Metrics?The How, Why and What of Metrics?
The How, Why and What of Metrics?
 
How to Use Competitive Analysis and Strategy by YouTube PM
How to Use Competitive Analysis and Strategy by YouTube PMHow to Use Competitive Analysis and Strategy by YouTube PM
How to Use Competitive Analysis and Strategy by YouTube PM
 
Growth Hacking Master Class
Growth Hacking Master ClassGrowth Hacking Master Class
Growth Hacking Master Class
 

Plus de Thoughtworks

Design System as a Product
Design System as a ProductDesign System as a Product
Design System as a ProductThoughtworks
 
Designers, Developers & Dogs
Designers, Developers & DogsDesigners, Developers & Dogs
Designers, Developers & DogsThoughtworks
 
Cloud-first for fast innovation
Cloud-first for fast innovationCloud-first for fast innovation
Cloud-first for fast innovationThoughtworks
 
More impact with flexible teams
More impact with flexible teamsMore impact with flexible teams
More impact with flexible teamsThoughtworks
 
Culture of Innovation
Culture of InnovationCulture of Innovation
Culture of InnovationThoughtworks
 
Developer Experience
Developer ExperienceDeveloper Experience
Developer ExperienceThoughtworks
 
When we design together
When we design togetherWhen we design together
When we design togetherThoughtworks
 
Hardware is hard(er)
Hardware is hard(er)Hardware is hard(er)
Hardware is hard(er)Thoughtworks
 
Customer-centric innovation enabled by cloud
 Customer-centric innovation enabled by cloud Customer-centric innovation enabled by cloud
Customer-centric innovation enabled by cloudThoughtworks
 
Amazon's Culture of Innovation
Amazon's Culture of InnovationAmazon's Culture of Innovation
Amazon's Culture of InnovationThoughtworks
 
When in doubt, go live
When in doubt, go liveWhen in doubt, go live
When in doubt, go liveThoughtworks
 
Don't cross the Rubicon
Don't cross the RubiconDon't cross the Rubicon
Don't cross the RubiconThoughtworks
 
Your test coverage is a lie!
Your test coverage is a lie!Your test coverage is a lie!
Your test coverage is a lie!Thoughtworks
 
Docker container security
Docker container securityDocker container security
Docker container securityThoughtworks
 
Redefining the unit
Redefining the unitRedefining the unit
Redefining the unitThoughtworks
 
Technology Radar Webinar UK - Vol. 22
Technology Radar Webinar UK - Vol. 22Technology Radar Webinar UK - Vol. 22
Technology Radar Webinar UK - Vol. 22Thoughtworks
 
A Tribute to Turing
A Tribute to TuringA Tribute to Turing
A Tribute to TuringThoughtworks
 
Rsa maths worked out
Rsa maths worked outRsa maths worked out
Rsa maths worked outThoughtworks
 

Plus de Thoughtworks (20)

Design System as a Product
Design System as a ProductDesign System as a Product
Design System as a Product
 
Designers, Developers & Dogs
Designers, Developers & DogsDesigners, Developers & Dogs
Designers, Developers & Dogs
 
Cloud-first for fast innovation
Cloud-first for fast innovationCloud-first for fast innovation
Cloud-first for fast innovation
 
More impact with flexible teams
More impact with flexible teamsMore impact with flexible teams
More impact with flexible teams
 
Culture of Innovation
Culture of InnovationCulture of Innovation
Culture of Innovation
 
Dual-Track Agile
Dual-Track AgileDual-Track Agile
Dual-Track Agile
 
Developer Experience
Developer ExperienceDeveloper Experience
Developer Experience
 
When we design together
When we design togetherWhen we design together
When we design together
 
Hardware is hard(er)
Hardware is hard(er)Hardware is hard(er)
Hardware is hard(er)
 
Customer-centric innovation enabled by cloud
 Customer-centric innovation enabled by cloud Customer-centric innovation enabled by cloud
Customer-centric innovation enabled by cloud
 
Amazon's Culture of Innovation
Amazon's Culture of InnovationAmazon's Culture of Innovation
Amazon's Culture of Innovation
 
When in doubt, go live
When in doubt, go liveWhen in doubt, go live
When in doubt, go live
 
Don't cross the Rubicon
Don't cross the RubiconDon't cross the Rubicon
Don't cross the Rubicon
 
Error handling
Error handlingError handling
Error handling
 
Your test coverage is a lie!
Your test coverage is a lie!Your test coverage is a lie!
Your test coverage is a lie!
 
Docker container security
Docker container securityDocker container security
Docker container security
 
Redefining the unit
Redefining the unitRedefining the unit
Redefining the unit
 
Technology Radar Webinar UK - Vol. 22
Technology Radar Webinar UK - Vol. 22Technology Radar Webinar UK - Vol. 22
Technology Radar Webinar UK - Vol. 22
 
A Tribute to Turing
A Tribute to TuringA Tribute to Turing
A Tribute to Turing
 
Rsa maths worked out
Rsa maths worked outRsa maths worked out
Rsa maths worked out
 

Dernier

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 

Dernier (20)

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 

Patterns for Success in Data Science Engagements

  • 1. Patterns for Success in Data Science Engagements Dr. David Michel
  • 3. Overview ThoughtWorks has been expanding the quantity and depth of our data related engagements across the EU and globally under our Intelligent Empowerment offering. This talk will focus on lessons learned from 4 short term engagements with four different clients over the last 18 months ● Length of engagements varied between 2 weeks - 2 months ● Commonalities/differences in necessary approach ● Themes for success ● Pitfalls to avoid
  • 4. Project Summary CLIENT 1 CLIENT 2 CLIENT 3 CLIENT 4 Web presence for South American media conglomerate Home recipe delivery service offered by major UK retailer Major UK automobile reseller Major UK retailer Wanted to predict age/sex of anonymous users based on behaviour of registered users Wanted to more accurately predict demand for new recipes and different combinations of existing ones Wanted better insight and put themselves on the path towards making more data driven decisions Wanted to determine optimum shelf capacity for preset product range Existing model and infrastructure in place Set of heuristics in use Lots of reports and excel spreadsheets. No modelling to speak of Existing tool in use. Series of SQL queries run directly off data warehouse
  • 6. Client 1 (Web Branch for Media Company) ● Problem clearly defined ○ Identify men/women ○ Identify men 18-35, women 25-49 ● Data clearly accessible (though latency was high) ○ Jupyter sandbox with access to BigQuery data store ● Existing model in place ○ XGBoost with ~250 features ○ Updated weekly with ~2 hour training time ● Metrics in place (though likely suboptimal) ○ Accuracy in all three demos as defined by Nielsen
  • 7. Client 1 (Web Branch for Media Company) ● Limited time period (5 weeks) required scaling level of ambition and comprehensiveness of work ● Focus limited to three areas: ○ Quality of training data ○ Time period over which features were aggregated ○ Sub selection of training data to better serve usecase ● Emphasis placed on logging and reproducibility of results
  • 8. Results in Six Graphs
  • 10. Client 2 (Home Recipe Box) ● Problem clearly defined ○ Better forecast demand three weeks in advance of recipe offerings (when ingredients are ordered) to lower waste ● Data clearly accessible (and small) ○ ~35 week order history comprising ~6000 orders from ~5000 unique customers ● No model in place ○ Set of heuristics whose usefulness were visibly depreciating over time as number of recipes and variety combinations increased ● Metrics in place (though likely suboptimal) ○ Percentage over/undershoot of prediction compared to actual orders
  • 11. Client 2 (Home Recipe Box) ● Limited time period (3 weeks to deliver) ● Emphasis placed on creating functional forecasting tool and with large amounts of time budgeted for training and handover ● Self updating forecasting/visualisation tool built in Colaboratory Notebook ○ Recipe metadata and historical sales used to retrain random forest (initial attempts with regularised linear models were problematic) weekly with rolling 10 week windows of historical sales
  • 12. Results ● Internal forecasting for 9 weeks worth of data: ○ Mean error: 2.6% ○ Median error: 2.0% ○ Max overestimate: 10% ○ Max underestimate: 10% ● New model: ○ Mean error: 2.0% ○ Median error: 1.5% ○ Max overestimate: 7.2% ○ Max underestimate: 8.7%
  • 14. Client 3 (B2B Auto reseller) ● No real problem defined ○ Interest in sales channel allocation, but no internal agreement on desired output ● Data clearly accessible (in varying degrees of quality and latency) ○ SQL server 2008 enterprise warehouse with vehicle information ○ Refurb and auction information available only via spreadsheets downloaded from partner portals ● No existing model in place ● Metrics for success not defined
  • 15. Client 3 (B2B Auto reseller) ● 2 weeks to investigate available data and provide POC ● Lots of room to choose right/wrong problem ● Area chosen was POC for sale price forecasting based on channel and vehicle specifications ● Was thought to be the lowest hanging fruit that would allow for higher return on investment for each asset ○ Website vs Auction ○ Optimal refurb parameters for specific vehicles
  • 18. Client 4 (Major Grocer) ● Problem (reasonably) well defined ○ Investigate efficacy of current tool in use to determine shelf capacity (given fixed product range) ● Data accessible, but not discoverable and with numerous (often conflicting) sources of truth ● No existing model in place ○ Existing product used fixed calculation that was run via a series of SQL queries inside data warehouse ● No metrics in use to benchmark existing product
  • 19. Client 4 (Major Grocer) ● ~8 weeks to investigate ● With no metric to evaluate existing metric this was the obvious place to start ● What makes an ideal shelf capacity: ○ Availability (minimise lost sales) ○ Minimise labour costs re: stocking ○ Minimise waste ● Versions of these of varying quality/usefulness available internally
  • 20. Results: Conceptual Metric Tool Output Store Info Prod Info Shelf Cap History Lost Sales History Labour History Waste History Lost Sales Forecast Labour Forecast Waste Forecast ∑ Metric
  • 21. Results: Forecasting ● Two years sample data ● Subset of stores and products ○ ~500 “essential” products ○ 2 stores of varying design and location (most recent 20% set aside for validation) ● Forecasting POC done for labour costs and lost sales ○ Cross validation grid with PCA used to reduce feature space ○ Random forest regressor gave better results than regularised linear models
  • 22. Results: Lost Sales Forecasting
  • 25. Important Questions to Ask Technology/information ● Do they have historical data and what is its consistency? ● Multiple sources? ● Access in volume and at speed? ● Discoverable? Enthusiasm ● Have they defined a problem or class of problems they would like to solve? ● How comfortable are they with more modern ML/AI based approaches?
  • 26. Important Questions to Ask ● Measurement of outcomes ○ Is there a metric or metrics in place to optimise for? ○ Do said metrics relate to valuable business outcomes in a meaningful way? ● History/Reproducibility ○ What have they tried before? ○ Have those efforts been logged in a way such that they are accessible and understandable?
  • 27. Takeaways ● Start simple ○ Easier to solve problems can often help quickly sway the unconverted and there is usually some obvious low-hanging fruit ○ Less complex models are easier to explain, train and maintain ● Align goals and expectations ○ Agree on metrics and what they actually represent ○ Call out any disconnects between KPIs and the value the client believes they represent ○ Take the time to explain ways of working and potential outcomes to team members
  • 28. Takeaways ● Demonstrate value ○ In cases where clients are suspicious/unconvinced of new methodologies, easy wins and new knowledge trump elegant solutions ● Invest time in knowledge transfer and training ○ Do your best to log your efforts (especially those that were unsuccessful) in a manner easier accessible to potential future investigators ○ If you’re going to leave what you’ve created in someone else’s hands they should be comfortable maintaining it