4. Revolution Confidential
Statistician Data Scientist
Image Baseball (Cricket) HBR Sexiest Job of 21st Century
Mode Reactive Consultative
Works Solo In a team
Inputs Data File, Hypothesis A Business Problem
Data Pre-prepared, clean Distributed, messy, unstructured
Data Size Kilobytes Gigabytes
Tools SAS, Mainframe R, Python, awk, Hadoop, Linux,
…
Nouns Tables Data Visualizations
Focus Inference (why) Prediction (what)
Output Report Data App / Data Product
Latency Weeks Seconds
Stars G.E.P Box
Trevor Hastie
Hilary Mason
Nate Silver
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 4
5. Revolution Confidential
Statistician Data Scientist
Image Baseball (Cricket) HBR Sexiest Job of 21st Century
Mode Reactive Consultative
Works Solo In a team
Inputs Data File, Hypothesis A Business Problem
Data Pre-prepared, clean Distributed, messy, unstructured
Data Size Kilobytes Gigabytes
Tools SAS, Mainframe R, Python, awk, Hadoop, Linux,
…
Nouns Tables Data Visualizations
Focus Inference (why) Prediction (what)
Output Report Data App / Data Product
Latency Weeks Seconds
Stars G.E.P Box
Trevor Hastie
Hilary Mason
Nate Silver
5
6. Revolution Confidential
Three Essential Skills of Data Scientists
6
Drew Conway
http://www.dataists.com/2010/09/the-data-science-venn-diagram/
Data Integration
Mashups
Applications
Models
Visualization
Predictions
Uncertainty
Problems
Data Sources
Credibility
Effective
Data
Applications
8. Revolution Confidential
Business
Intelligence Data Science
Perspective Looking backwards Looking forwards
Actions Slice and Dice Interact
Expertise Business User Data Scientist
Data Warehoused, Siloed Distributed, real-time
Scope Unlimited Specific business question
Questions What happened? What will happen?
What if?
Output Table Answer
Applicability Historic, possible
confounding factors
Future, correcting for influences
Tools SAP, Cognos,
Microstrategy, SAS
Revolution R Enterprise
QlikView, Tableau, Jaspersoft
Hot or not? So 1997 Transformational
8
9. What is Data Science?
By Carla Gentry
Data Scientist
Analytical-Solution
10. Data Science is….
• The term "data science" has existed for over
thirty years – first mentioned by Peter Naur in
1960 but more recently it has gained a lot of
attention!
11. Data Science can be broken down into
4 main areas of expertise.
• Data knowledge
– design & structure
• Programming
– SAS, R, SQL, NO-SQL
• Analytics
– Insight
• Communication
– Tell the story
12. Data Knowledge: Part analyst - part IT
• What kind of servers do you own?
- Servers vs. Mainframe
• What kind of load can the server handle?
- Iterations matter
– Why ask this?
13. Programming – Pick a language and
use it wisely
• Efficiency is KING!
- Why?
• Number of iterations & complex algorithms or
scripts. Snowflakes vs. Star schema?
-Design is import but why?
• Key things: normalize, index, there is more to
Data Science than just analytics.
14. How can I learn about Data Science?
• For those who want to invest their time and
talent there are resources.
• College Courses
• Online
• Webinars
• Blogs
http://www.hilarymason.com/media_and_press/im-in-glamour-magazine/Ivan Fellegi, Chief Statistician of Canada and SSC President for 1981http://www.flickr.com/photos/ssc_liaison/431047111/
Churn: bestalgorithms for predicting churn have lift of 5-7 – 5-7 times better than random. Behavioral advertising: 2-3% CTR – 10 times better than random