SlideShare a Scribd company logo
1 of 61
Executive Briefing
Lessons learned managing data
science projects: Adopting a team
data science process
Our strategy is to build best-in-class
platforms and productivity services for
an intelligent cloud and an
intelligent edge infused with artificial
intelligence (“AI”).
Microsoft Form 10-K 2016
Data Science
Toolbox of a Data Scientist
8
8
Do it like a Professional!
Understand the Decision Process
Tip #1
What is the business problem that
needs to be solved, independent of
the technology solution?
What is the decision or action has to
be taken that can be informed by
data.
Predictive Maintenance
Understanding the Decision Process
Key Decision
Should I service
this piece of
equipment?
Data Science Question
What is the probability
this equipment will fail
within the next X days?
Predictive Maintenance
Business Scenario Key Decision Data Science Question
Energy Forecasting Should I buy or sell energy
contracts?
What will be the long/short term demand
for energy in a region?
Customer Churn Which customers should I
prioritize to reduce churn?
What is probability of churn within X days
for each customer?
Personalized Marketing What product should I offer
first?
What is the probability that customer will
purchase each product?
Product Feedback Which service/product needs
attention?
What is social media sentiment for each
service/product?
Framing Data Science Question based on the Scenario
Be obsessed with data
Tip #2
Being Obsessed with Data
Can only complete the process with the right data!
Bring in the people that know the data
Establish Performance Metrics
Tip #3
What is considered a
success for the
business?
How do you measure it?
Establish a
Qualitative
Objective
Translate into
Quantifiable
Metric
Quantify the
metric value
improvement
useful (e.g., 10%
fewer failures 
savings of
$1MM/year)
Establish a
baseline
(e.g., current
failure rate =
10% per year)
Establish how to
measure the
improvement in
the metric with
the data science
solution (e.g.
80% of the
equipment
maintained
based on
predictive
model)
Using Performance Metrics
Document
Success Metrics
using a template
Tips:
1. Data science team embedded within
the business
2. Allow exploring multiple problem
formulations to get to end metric goal
3. Past goal, go within set time period
4. Ensure reproducibility
Establish the E2E solution
Tip #4
1. Set up the end to end solution and
the metrics
2. Launch with a baseline/simple
model
3. Act on the recommendations of
the solution
4. Measure and iterate
Establishing a E2E solution helps with
buy-in from the business
Keep a Human in the Loop
Tip #5
• Empower ALL to perform like the BEST
• Automate repetitive human tasks
• Embed expert knowledge into the solution
• How to interpret the model?
• Importance of Features
• Bias in the model
• Interpreting predictions per instance
• What-if analysis
Users don’t trust black-box models
Data Science is a Team Sport
Learn and Educate
Tip #6
1. Learn from experiments
• Why?
• Both Successes or Failures
2. Share the learnings
3. Promote successful experiments to production
4. Move on to the next hypothesis to experiment
• Failure is a valid outcome of an
experiment
• Learn and refine the next experiment
Adopt a Process
Tip #7
A process specifies a detailed sequence of activities
necessary to perform specific business tasks.
It is used to standardize procedures and
establish best practices.
Microsoft’s Team Data Science Process
https://aka.ms/tdsp
Standard Project Lifecycle
Standardized Document
Templates, Project Structure
Shared, Distributed
Resources
Productivity Tools, Shared
Utilities
Cross-Industry Standard Process for Data Mining
(CRISP-DM)
Knowledge Discovery in Databases
(KDD)
• Data science virtual
machines (DSVMs) as the
fundamental development
platform on cloud
• Use Visual Studio Team
Services (VSTS)
• Work item tracking and scrum planning
• Git repositories
• Shared data science utilities
in Git repository
• Use cloud-based Azure
resources as needed
• Terminology:
• Feature: a project
• Story: a stage in the E2E
process of a DS project
• Tasks: specific
coding/documentation/othe
r activities that are needed
to complete a story
• Iteration: usually a 2-week
sprint
App Developer Source Control
Cloud Services
CI/CD Pipelines
IDE
Data Scientist
Training Environment
[ { "cat": 0.99218,
"feline": 0.81242,
"puma": 0.45456: } ]
IDE
App code
Apps
Edge Devices
Model Storage
PUBLISHCODE CONSUME
Lifecycle Management
Processes. Templates. Permissions
Embed model
CNTK/TF/SCIKIT
KERAS/ …
Train&
testmodel
Data Lake
App telemetry
A/B
Testing
BUILD & TEST
Training+
testcode
Continuous retraining
Testmodel
+app
Model Source Control
• Processes and procedures to make models
reproducible (from source control to data
retention policies)
• Make it easy to work on multiple models
(consistent process)
Model Validation
• Unit testing, functional testing and
performance testing
• Validation needs to be performed both
isolation and when embedded in an
application
Model Versioning & Storage
• Provide a consistent way to store & share
models, plus a way to track where models are
embedded / running
• Provide a consistent model format
• Provide traceability on where a model came
from (which data, which experiment, where’s
the code / notebook)
• Provide a way to track where model is running
• Control who has access to what models
Model Deployment
• Provide an efficient process to get a model build into an
application or service and leveraged to light up an end-user
scenario.
• Simplify the process to interact with the model (through code-
generation, API specifications / interfaces or other methods)
• Support a variety of inferencing targets (cloud / app / edge)
(including FPGAs or dedicated frameworks like CoreML & WinML)
• Provide secrets / service endpoint management to remove
friction from configuring the release process.
Accumulate a toolbox of tricks
Tip #8
• Data Exploration
• RFM – User Behavior Modeling
• Hyper parameter tuning
• Auto Featurization
Note: Domain expertise is still
helpful
Building an Org’s Toolbox
Continuous Learning
Tip #9
Lots of common sense… but not common
practice
Thank you!
Also thanks to Pavandeep Kalra, Jacob Spolstra, Wee Hyong
Tok, Richin Jain, Brandon Rohrer

More Related Content

What's hot

DevOps Maturity Curve v5
DevOps Maturity Curve v5DevOps Maturity Curve v5
DevOps Maturity Curve v5Paul Peissner
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...NETWAYS
 
Blameless Post-mortems: Everything You Ever Wanted to Know
Blameless Post-mortems: Everything You Ever Wanted to KnowBlameless Post-mortems: Everything You Ever Wanted to Know
Blameless Post-mortems: Everything You Ever Wanted to KnowVictorOps
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Janusz Nowak
 
Azure DevOps Presentation
Azure DevOps PresentationAzure DevOps Presentation
Azure DevOps PresentationInCycleSoftware
 
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...SlideTeam
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionBATbern
 
Api observability
Api observability Api observability
Api observability Red Hat
 
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
Anatomy of a Continuous Integration and Delivery (CICD) PipelineAnatomy of a Continuous Integration and Delivery (CICD) Pipeline
Anatomy of a Continuous Integration and Delivery (CICD) PipelineRobert McDermott
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to productionHerman Wu
 
Building a DevOps organization
Building a DevOps organizationBuilding a DevOps organization
Building a DevOps organizationZinnov
 
Spring Cloud: Why? How? What?
Spring Cloud: Why? How? What?Spring Cloud: Why? How? What?
Spring Cloud: Why? How? What?Orkhan Gasimov
 
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Amazon Web Services
 
Introduction to Azure DevOps
Introduction to Azure DevOpsIntroduction to Azure DevOps
Introduction to Azure DevOpsLorenzo Barbieri
 
Data Mesh at Nordea with Kafka and Hadoop
Data Mesh at Nordea with Kafka and HadoopData Mesh at Nordea with Kafka and Hadoop
Data Mesh at Nordea with Kafka and HadoopRaduDragusin1
 

What's hot (20)

DevOps Maturity Curve v5
DevOps Maturity Curve v5DevOps Maturity Curve v5
DevOps Maturity Curve v5
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
 
Blameless Post-mortems: Everything You Ever Wanted to Know
Blameless Post-mortems: Everything You Ever Wanted to KnowBlameless Post-mortems: Everything You Ever Wanted to Know
Blameless Post-mortems: Everything You Ever Wanted to Know
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
 
Azure DevOps Presentation
Azure DevOps PresentationAzure DevOps Presentation
Azure DevOps Presentation
 
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...
 
The future of AIOps
The future of AIOpsThe future of AIOps
The future of AIOps
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
Api observability
Api observability Api observability
Api observability
 
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
Anatomy of a Continuous Integration and Delivery (CICD) PipelineAnatomy of a Continuous Integration and Delivery (CICD) Pipeline
Anatomy of a Continuous Integration and Delivery (CICD) Pipeline
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to production
 
Building a DevOps organization
Building a DevOps organizationBuilding a DevOps organization
Building a DevOps organization
 
Spring Cloud: Why? How? What?
Spring Cloud: Why? How? What?Spring Cloud: Why? How? What?
Spring Cloud: Why? How? What?
 
Actionable Agile Metrics
Actionable Agile MetricsActionable Agile Metrics
Actionable Agile Metrics
 
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
 
Introduction to Azure DevOps
Introduction to Azure DevOpsIntroduction to Azure DevOps
Introduction to Azure DevOps
 
Data Mesh at Nordea with Kafka and Hadoop
Data Mesh at Nordea with Kafka and HadoopData Mesh at Nordea with Kafka and Hadoop
Data Mesh at Nordea with Kafka and Hadoop
 

Similar to Managing Data Science Projects

Webinar - Design Thinking for Platform Engineering
Webinar - Design Thinking for Platform EngineeringWebinar - Design Thinking for Platform Engineering
Webinar - Design Thinking for Platform EngineeringOpenCredo
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderProduct School
 
Establish the right practices for Effective AI
Establish the right practices for Effective AIEstablish the right practices for Effective AI
Establish the right practices for Effective AIWee Hyong Tok
 
[Webinar] Visa's Journey to a Culture of Experimentation
[Webinar] Visa's Journey to a Culture of Experimentation[Webinar] Visa's Journey to a Culture of Experimentation
[Webinar] Visa's Journey to a Culture of ExperimentationOptimizely
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaDatabricks
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdfDataScienceConferenc1
 
PureApp Presentation
PureApp PresentationPureApp Presentation
PureApp PresentationProlifics
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsTasktop
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsSanjeev Sharma
 
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...Lviv Startup Club
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderProduct School
 
Atmosphere 2016 - Berk Dulger - DevOps Tactical Adoption Theory
Atmosphere 2016 - Berk Dulger  - DevOps Tactical Adoption TheoryAtmosphere 2016 - Berk Dulger  - DevOps Tactical Adoption Theory
Atmosphere 2016 - Berk Dulger - DevOps Tactical Adoption TheoryPROIDEA
 
How to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPOHow to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPOProduct School
 
Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsDatabricks
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence WorkshopDavid Tan
 
How to Get Your Organizations To Start Using Microsoft Teams
How to Get Your Organizations To Start Using Microsoft TeamsHow to Get Your Organizations To Start Using Microsoft Teams
How to Get Your Organizations To Start Using Microsoft TeamsDux Raymond Sy
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Maxim Salnikov
 
Supervised learning
Supervised learningSupervised learning
Supervised learningankit_ppt
 
Tech reboot Jan All staff 2015 DRAFT 4
Tech reboot Jan All staff 2015 DRAFT 4Tech reboot Jan All staff 2015 DRAFT 4
Tech reboot Jan All staff 2015 DRAFT 4Rachel Murphy
 

Similar to Managing Data Science Projects (20)

Webinar - Design Thinking for Platform Engineering
Webinar - Design Thinking for Platform EngineeringWebinar - Design Thinking for Platform Engineering
Webinar - Design Thinking for Platform Engineering
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
 
Establish the right practices for Effective AI
Establish the right practices for Effective AIEstablish the right practices for Effective AI
Establish the right practices for Effective AI
 
[Webinar] Visa's Journey to a Culture of Experimentation
[Webinar] Visa's Journey to a Culture of Experimentation[Webinar] Visa's Journey to a Culture of Experimentation
[Webinar] Visa's Journey to a Culture of Experimentation
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
 
PureApp Presentation
PureApp PresentationPureApp Presentation
PureApp Presentation
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOps
 
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
 
Atmosphere 2016 - Berk Dulger - DevOps Tactical Adoption Theory
Atmosphere 2016 - Berk Dulger  - DevOps Tactical Adoption TheoryAtmosphere 2016 - Berk Dulger  - DevOps Tactical Adoption Theory
Atmosphere 2016 - Berk Dulger - DevOps Tactical Adoption Theory
 
How to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPOHow to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPO
 
Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOps
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
 
How to Get Your Organizations To Start Using Microsoft Teams
How to Get Your Organizations To Start Using Microsoft TeamsHow to Get Your Organizations To Start Using Microsoft Teams
How to Get Your Organizations To Start Using Microsoft Teams
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
 
Agile testing
Agile testingAgile testing
Agile testing
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Tech reboot Jan All staff 2015 DRAFT 4
Tech reboot Jan All staff 2015 DRAFT 4Tech reboot Jan All staff 2015 DRAFT 4
Tech reboot Jan All staff 2015 DRAFT 4
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Managing Data Science Projects

  • 1. Executive Briefing Lessons learned managing data science projects: Adopting a team data science process
  • 2.
  • 3.
  • 4.
  • 5. Our strategy is to build best-in-class platforms and productivity services for an intelligent cloud and an intelligent edge infused with artificial intelligence (“AI”). Microsoft Form 10-K 2016
  • 6.
  • 8. Toolbox of a Data Scientist 8 8
  • 9. Do it like a Professional!
  • 10. Understand the Decision Process Tip #1
  • 11. What is the business problem that needs to be solved, independent of the technology solution? What is the decision or action has to be taken that can be informed by data.
  • 13. Understanding the Decision Process Key Decision Should I service this piece of equipment? Data Science Question What is the probability this equipment will fail within the next X days?
  • 15. Business Scenario Key Decision Data Science Question Energy Forecasting Should I buy or sell energy contracts? What will be the long/short term demand for energy in a region? Customer Churn Which customers should I prioritize to reduce churn? What is probability of churn within X days for each customer? Personalized Marketing What product should I offer first? What is the probability that customer will purchase each product? Product Feedback Which service/product needs attention? What is social media sentiment for each service/product? Framing Data Science Question based on the Scenario
  • 16. Be obsessed with data Tip #2
  • 17. Being Obsessed with Data Can only complete the process with the right data!
  • 18. Bring in the people that know the data
  • 19.
  • 20.
  • 21.
  • 23. What is considered a success for the business?
  • 24. How do you measure it?
  • 25. Establish a Qualitative Objective Translate into Quantifiable Metric Quantify the metric value improvement useful (e.g., 10% fewer failures  savings of $1MM/year) Establish a baseline (e.g., current failure rate = 10% per year) Establish how to measure the improvement in the metric with the data science solution (e.g. 80% of the equipment maintained based on predictive model) Using Performance Metrics
  • 27. Tips: 1. Data science team embedded within the business 2. Allow exploring multiple problem formulations to get to end metric goal 3. Past goal, go within set time period 4. Ensure reproducibility
  • 28. Establish the E2E solution Tip #4
  • 29. 1. Set up the end to end solution and the metrics 2. Launch with a baseline/simple model 3. Act on the recommendations of the solution 4. Measure and iterate
  • 30. Establishing a E2E solution helps with buy-in from the business
  • 31. Keep a Human in the Loop Tip #5
  • 32. • Empower ALL to perform like the BEST • Automate repetitive human tasks • Embed expert knowledge into the solution
  • 33. • How to interpret the model? • Importance of Features • Bias in the model • Interpreting predictions per instance • What-if analysis Users don’t trust black-box models
  • 34. Data Science is a Team Sport
  • 36.
  • 37. 1. Learn from experiments • Why? • Both Successes or Failures 2. Share the learnings 3. Promote successful experiments to production 4. Move on to the next hypothesis to experiment
  • 38. • Failure is a valid outcome of an experiment • Learn and refine the next experiment
  • 40. A process specifies a detailed sequence of activities necessary to perform specific business tasks. It is used to standardize procedures and establish best practices.
  • 41. Microsoft’s Team Data Science Process https://aka.ms/tdsp Standard Project Lifecycle Standardized Document Templates, Project Structure Shared, Distributed Resources Productivity Tools, Shared Utilities
  • 42.
  • 43. Cross-Industry Standard Process for Data Mining (CRISP-DM) Knowledge Discovery in Databases (KDD)
  • 44.
  • 45. • Data science virtual machines (DSVMs) as the fundamental development platform on cloud • Use Visual Studio Team Services (VSTS) • Work item tracking and scrum planning • Git repositories • Shared data science utilities in Git repository • Use cloud-based Azure resources as needed
  • 46.
  • 47. • Terminology: • Feature: a project • Story: a stage in the E2E process of a DS project • Tasks: specific coding/documentation/othe r activities that are needed to complete a story • Iteration: usually a 2-week sprint
  • 48.
  • 49.
  • 50. App Developer Source Control Cloud Services CI/CD Pipelines IDE Data Scientist Training Environment [ { "cat": 0.99218, "feline": 0.81242, "puma": 0.45456: } ] IDE App code Apps Edge Devices Model Storage PUBLISHCODE CONSUME Lifecycle Management Processes. Templates. Permissions Embed model CNTK/TF/SCIKIT KERAS/ … Train& testmodel Data Lake App telemetry A/B Testing BUILD & TEST Training+ testcode Continuous retraining Testmodel +app
  • 51. Model Source Control • Processes and procedures to make models reproducible (from source control to data retention policies) • Make it easy to work on multiple models (consistent process)
  • 52. Model Validation • Unit testing, functional testing and performance testing • Validation needs to be performed both isolation and when embedded in an application
  • 53. Model Versioning & Storage • Provide a consistent way to store & share models, plus a way to track where models are embedded / running • Provide a consistent model format • Provide traceability on where a model came from (which data, which experiment, where’s the code / notebook) • Provide a way to track where model is running • Control who has access to what models
  • 54. Model Deployment • Provide an efficient process to get a model build into an application or service and leveraged to light up an end-user scenario. • Simplify the process to interact with the model (through code- generation, API specifications / interfaces or other methods) • Support a variety of inferencing targets (cloud / app / edge) (including FPGAs or dedicated frameworks like CoreML & WinML) • Provide secrets / service endpoint management to remove friction from configuring the release process.
  • 55. Accumulate a toolbox of tricks Tip #8
  • 56. • Data Exploration • RFM – User Behavior Modeling • Hyper parameter tuning • Auto Featurization Note: Domain expertise is still helpful Building an Org’s Toolbox
  • 58.
  • 59. Lots of common sense… but not common practice
  • 60.
  • 61. Thank you! Also thanks to Pavandeep Kalra, Jacob Spolstra, Wee Hyong Tok, Richin Jain, Brandon Rohrer

Editor's Notes

  1. [D]