6. BEST MODEL
• Which one would you
choose here?
• It’s about making a
tradeoff
• This trade off is the most
important job of the PO
• A 100% correct answer
might not exist!!!
8. ULTIMATELY
• It’s about creating value from data
• Using Machine Learning, Advanced Analytics, and
visualization
9. WHEN YOU SAY DATA SCIENCE,
COMPANIES UNDERSTAND
• All the things big data
• Predictive modeling & Advanced Analytics
• More money
• Do all the cool things the others are doing
13. WHAT COMPANIES
GOT
• A lot of POCs
• A lot of
screenshots/presentations/dashboards
on a laptop
• Nice stories to tell to their network,
about those screenshots and especially
those dashboards
• Headaches with data and infra even
more scattered
14. BUT…
• We got a data scientist working on trees, and forests
• Neural networks!
• Deep learning!!!
15. WHAT DO COMPANIES ACTUALLY NEED
• Put things into production
• They don’t teach that in any
data science course or MOOC
(that I know)
18. KAGGLE CURSE
• gdd.li/toldYouSo
• Many data scientists approach the
problem at hand with a Kaggle-like
mentality: delivering the best model in
absolute terms, no matter what the
practical implications are.
• In reality it's not the best model that we
implement, but the one that combines
quality and practicality: a continuous
balancing act
• Netflix competition
22. SKILLS
• Participate in actually building production
quality systems OR being proficient
enough in R or python to hack together a
prototype on a very small dataset?
• Supply of the second group keeps growing
while demand is flat or shrinking
• Especially as executives get burned by “data
scientists” who don't know how to help
them build things of value
23. HIRING
• Companies that are not engineering driven, often have
trouble hiring good technical people
• The “IQ” test is not really representative of applied
data science
• At GoDataDriven we do a “at home, at your
convenience” assessment
• Real dataset, real business question, real product
• Models are software: treat them as such
24. TAKEAWAYS
• POs should know “their stuff”
• Automate all the data movements
• Hire data scientists that are good at programming (or hire machine
learning engineers)
WHERE DOES IT GO WRONG WHEN DEVELOPING DATA PRODUCTS?
To do data science we start from: what is ML?
Amazon doesn’t use the best recommendations
The model is not the goal, is a mean!
Cost of mispredictions and missed prediction in data driven supervision
Images
Logs
Scraping
External API
Plus actual code! That needs to run fast. Energy company example
This landscape though, has many challenges: Energy company example
15 cent per client/month x 1 million client = 1.8 million euros per year
And don’t forget those reports! Somebody give me a report!
Always remember that (almost) everything can and should be measured!
Everything should be automated!
Example NS: 300 man hours per year on data movement for a **single** source!
POs fault
BellKor’s Pragmatic Chaos
The Ensemble
Train stops in the middle of nowhere
Then account managers
Then look at malicious activity: low cost, but high risk
First self driving cars
Then account managers
Then look at malicious activity: low cost, but high risk
The second group has gotten extremely crowded [from people] […] who have completed MOOCs or bootcamps
At KPMG I had to count how many triangles there were in a picture, or stuff like that
Here’s our managing director
This is how he looks like when he’s serious
So, really, come work for us. We’re awesome