This document discusses common reasons why many data science projects fail. It provides six examples of projects that failed due to issues such as having an unrealistic scope, poor data quality, lack of stakeholder involvement, and resistance to the results of data analysis. For each example, it explains the root cause of the failure and provides recommendations for avoiding similar pitfalls in the future, such as prioritizing data architecture and ensuring business stakeholders are involved throughout the project. The overall message is that data science projects require consideration of the full lifecycle from data to deployment and addressing organizational challenges.
4. “Through 2022, only 20% of analytic insights will deliver
business outcomes.”
“Through 2022, only 15% of use cases leveraging AI techniques
(such as ML and DNNs) and involving edge
and IoT environments will be successful.”
Gartner 2019
4
8. Disclaimer
All characters and events depicted in this session are entirely fictitious. Any
similarity to actual events or persons, living or dead, is purely coincidental.
All the views discussed in this session are entirely mine and doesn’t
represent my employer view.
8
10. We scare because we care
» Monsters inc. is having a hard time
tracking its employees.
» Employees are logging their time
incorrectly
» They allow unauthorized individuals
into the company building
» Managers are suffering with resources
utilization
10
11. What the client wants:
Computer vision system attached to a camera at the
entrance of the company to detect and log the check in
/ out of employees during the day!
11
12. What the client needs :
Time logging application
Project Management / Planning tool
Company Process Improvement
Finger print machine / ID verification machine at the
entrances
Extra:
Resource allocation optimization model
12
13. Why this is a problem?
» Effective solutions are the simplest.
» The Client will realize later that AI didn’t solve the problem
and it will shake their trust in the technology itself.
» The technical debt of using complex solution will finally
overshadow the value of it .
» Using AI for problems that can be done by simple softwares
devalues the whole industry
13
15. Wakanda Forever!
Wakanda national bank would like to relieve
some of the pressure on its hotline support
system and save some of its operational
cost by releasing a chatbot to answer its
customers repeated requests and inquiries
online
15
16. Project team
16
Wakanda Bank
Project Manager
Product Manager
IT director
Data Engineer
System Engineer
Your Company
PM
Data Analyst
ML Engineer
Chatbot specialist
Dev ops
17. After several months of work….
1 month after the release , the bank customers
feedback was very bad that the management decided
to kill the chatbot till further notice.
When you analyzed the customers comments and
dialogs you discovered that the customers are basically
using different language of expressions and idioms
than the one you used and there are alot of uncovered
intents
17
18. What is the root
cause of the chatbot
failure?
18
19. Designing the solution with the wrong stakeholders
The most important stakeholder is the person you are trying to
mimic his manual work.
For us it is the hotline support agent.
19
21. Upon this rock, I will
build my church!
The Ministry of health in Sokovia wants to
utilize its hospital better by predicting the
potential duration a patient will take in
hospital bed.
The hospitals should provide all the data
related to the citizen medical history
(scans,tests, prescriptions ..etc)
21
22. During the data exploration...
You find :
Discrepancies in the citizens medical records.
Incomplete records.
Data sample size is small.
Records come in different formats.
They take very long time to bring new data samples.
22
23. Sokovia Data architecture
» No unified system for recording patients data.
» Manuscripts and admission documents are still written on
papers.
» Digital information are scattered across multiple systems.
» Systems interfaces don’t support enough validation
on user input.
23
24. There is no AI without IA
(Information
Architecture)
Poor information architecture
mean poor data quality and
slow data pipeline
24
25. How do we move forward?..
Before diving into your project ask about your client IA
Digital transformation should have the highest priority for everyone.
Data collection, unification and governance projects first then data
science projects comes second.
We need more data engineers and we need them now!
25
27. What is the cost of lies?
Chernobyl for Petroleum industries is a
global Oil & Gas company with more than
3.2 million miles of oil pipes.
Their team wants to reduce its operational
costs by reducing the inspection and
maintenance time for its pipes.
27
28. Problem statement cont..
They need a predictive model to estimate which pipes
and when they will need maintenance.
The pipes logs from sensors and earlier inspection &
maintenance reports are all stored in a massive data
cluster accessed only by the client data engineers.
28
29. After longs weeks of analysis and
modelling...
The data engineer of Chernobyl brings a new sample form the data
cluster for validating your model but your model doesn’t do well !!
The results drops dramatically between training and testing
although you did your (train/test split).
You plug the new data ,retraining your model and
request new test sample , but again the model fails!!
29
30. Detecting unrepresentative sample
» Review & discuss the sampling method with your client team.
» Compare the different sample against each other to test the
sampling methodology .
» Compare the sample characteristic with what you already know
about the population.
30
32. Netflix Prize
In October 2006, Netflix announced "The Netflix Prize.
The mission was to make make the company's
recommendation engine 10% more accurate.
In 2009, the prize went to a team BellKor's Pragmatic
Chaos who crossed the required threshold.
32
34. “The additional accuracy gains that we measured did not
seem to justify the engineering effort needed to bring
them into a production environment”
Netflix
34
35. Why we struggle in moving to production?
» ML model lifecycle starts & ends with user experience not
input data & model outputs.
» Lack of deployment knowledge or lack of deployment roles .
» Poor design of the data flow pipeline.
» The huge gap between prototyping environments and actual
live servers.
» The gap between the training sample size and the
population size.
35
36. Important consideration for ML solutions
36
How will the end user access my model ?
What is the expected performance load on my model ?
What is the frequency of retraining needed?
Will my model work in real time mode or batching?
What is the current platform that I’m integrating with?
What is the infrastructure architecture that will host my solution?
38. Resistance to data science
38
» The business people don’t like what your data is
saying.
» The teams have resistance to using AI powered
solutions.
» The management doesn’t support the data
initiatives enough.
39. What can we do about it??
» Involve the business people as early as possible in
every step of your analysis
» Be prepared to share a detailed description of your
methods and analysis.
» Use your audience language while explaining your
findings.
» Be compassionate towards your audience .
39
40. 40
In summary ...
» When it comes to tech… You ARE the expert.
When it comes to business ...Your client IS the expert.
Don’t ever switch places!
» The market need more on digital transformation , data architecture &
governance.
» Data products start with a user click & ends with a user gaining value.
» Companies need to consider a place for data engineers & S/W
engineers in their data teams.
» Data scientists need to skill up in solutions design,
usability and deployment areas
43. CREDITS
Special thanks to all the people who made and
released these awesome resources for free:
» Presentation template by SlidesCarnival
» Photographs by Unsplash
» Icons by flaticons
43
Notes de l'éditeur
Icons made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>
Icons made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>
Icons made by <a href="https://www.flaticon.com/free-icon/mosquito_3023261" title="smalllikeart">smalllikeart</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>
Icons made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>
Icons made by <a href="http://www.freepik.com/" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>
<a href="https://iconscout.com/icons/24" target="_blank">24 hour customer service Icon</a> on <a href="https://iconscout.com">Iconscout</a>
Icons made by <a href="https://www.flaticon.com/authors/eucalyp" title="Eucalyp">Eucalyp</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>
Icons made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>
Icons made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>