Data Science: A Mindset for Productivity
Keynote at 2015 Ronin Labs West Coast CTO Summit
https://www.eventjoy.com/e/west-coast-cto-summit-2015
Abstract
Data science isn't just about using a collection of technologies and algorithms. Data science requires a mindset that solves problems at a higher level of abstraction. How do we model utility when we think about optimization? How do we decide which hypotheses to test? How do we allocate our scarce resources to make progress?
There are no silver bullets. But I'll share what I've learned from a variety of contexts over the course of my work at Endeca, Google, and LinkedIn; and I hope you'll leave this talk with some practical wisdom you can apply to your next data science project.
4. But nobody knows everything.*
Class HashMap<K,V>
java.lang.Object
java.util.AbstractMap<K,V>
java.util.HashMap<K,V>
Type Parameters:
K - the type of keys maintained by this map
V - the type of mapped values
All Implemented Interfaces:
Serializable, Cloneable, Map<K,V>
*Except Jeff Dean.
8. Data science is a mindset.
Explain
Iterate using explainable models.
Express
Model your utility and inputs.
Experiment
Optimize for speed of learning.
13. The importance of being explainable.
• Algorithms can protect you from overfitting, but they
can’t protect you from the biases you introduce.
• Introspection into your models and features makes it
easier for you and others to debug them.
• Especially if you don’t completely trust your objective
function or representativeness of your training data.
14. Linear models? Decision trees?
• Linear regression and decision trees favor explainability over accuracy,
compared to more sophisticated models.
• But size matters. If you have too many features or too deep a decision
tree, you lose explainability.
• You can always upgrade to a more sophisticated model when you trust
your objective function and training data.
• Build a machine learning model is an iterative process. Optimize for the
speed of your own learning.
22. How to find your prince.
You have to kiss a lot of frogs to find one prince. So
how can you find your prince faster?
By finding more frogs and
kissing them faster and faster.
-- Mike Moran
23. Think like an economist.
Yesterday
Experiments are expensive,
choose hypotheses wisely.
Today
Experiments are cheap,
do as many as you can!
26. Test one variable at a time.
• Autocomplete
• Entity Tagging
• Vertical Intent
• # of Suggestions
• Suggestion Order
• Language
• Query Construction
• Ranking Model
27. tl;dr
The most important part of data science is picking
the right problem and figuring out how to frame it.