4. The Architecture vs The Practice(aka: Form vs Function)
Platforms for Big Data storage, processing & analytics.
VS
Actual applications of Data-at-Scale
5. Themes for This Morning
How DataSift Manages, Processes & Delivers
Data Visualization via Tableau
Causal Inference & Statistical Modeling
Movies & Coffee
6. Who am I?
Tim Shea
@SheaNineSeven
Data Scientist & Sales Engineer at DataSift
7. Focus on Alliances & Channels:
Tableau, Alteryx, Microstrategy, Informatica, SAP
Data Science as a Practice:
Disambiguation, Classification, Causality
8. What is DataSift?
Social Data Platform
Full “Firehose” Access
2 Billion Posts per Day
½ Trillion Posts Historical Archive
10. We Make it Simple for You
Focus on Filtering
Big Data < Relevant Data
Enrichments:
- Demographics
- Links
- Emotion & Intent
- Learned Classification
12. DataSift: Beyond “Social Listening”
Ex. “Does Social have anything to do with my Business?”
Line Charts and Graphs
Vs
Operationalized Decision Making
18. Does The Past have anything at all to do with The Future?
19. Defending Your Hypotheses
How can I create & defend my Hypotheses?
How do I communicate my findings to Laypeople
(non-Data Scientists) like your Boss?
21. Movies
Through the Lens of:
DataSift - What we do as a Social Data Platform
Tableau - How to Make Sense of a Mountain of Data
Good Data & Good Tools
22.
23.
24.
25. Risk Management is Hard
Q: What is a “Sure Bet”?
Q: Should I spend $100MM making this movie?
Q: How can I make this process less risky?
39. The Model
Y = a + bX
Y = Box Office (the predicted)
X = Social Volume (the predictor)
B = Coefficient
A = Some offset
40. Defend the Model v1
P-value: There is an X% chance that the Null Hypothesis is true.
Null Hypothesis: The linear coefficient is equal to zero.
41. Defend the Model v2
P-value (again): We can be (100 – X)% confident that the correlation
were seeing can be explained by our model.
R-Squared: Our model explains about Y% of the variability (points
outside the regression line) given “Sum of Least Squared”
42. Defend the Model v3
Every Bitly click predicts about $240 in Box Office Sales
I’m extremely confident (99%) that this is not due to chance.
With ~96% confidence we can rely on this model in the future.
43. The Model (cont)
Y “is predicted by” a + bX
Box Office = 0 + $240 * (# bitly clicks)
Box Office = 0 + $130 * (# tweets)
44. Benchmarking
If my Bitly #’s drop below $240
If my Twitter #’s drop below $130
If my Instagram #’s drop below $2809
If my Facebook #’s drop below $3871