This document provides an overview of data science. It discusses the history of data science and how it has evolved with larger amounts of diverse data available. Specifically, it notes that data science now focuses on providing actionable insights from data rather than just exposing raw data. It also defines key concepts in data science like data mining, statistics, and the types of data involved. Finally, it outlines the common techniques, tools, and applications of data science, such as machine learning, visualization, and using data science to improve customer experiences.
2. History…
• Questions first, data later
• Data model first, data processing later
• Size first, project second, react overtime
• Focus on accuracy, assume little
• Importance to completeness and comprehensiveness
• Expose raw data to decision makers
• Provide insights but those that are not actionable
• Bound by constraints (Procurement, Process, Build Insights,
Interaction)
3. What’s Changed ?
• Medium to participate is vast
• Mode to reach expanded
• Data types are vast and voluminous
• Noise is huge, yet accepted
• Urgency precedes accuracy
• Guidance is better than completeness
• Cost to store and process has fallen (and still falling)
• More ways and means to process data at scale
4. Speaking of Data
• Volume - Data at rest
• Variety - Data in many forms
• Velocity - Data in motion
• Veracity - Data in doubt
5. Data Science
“ Data Science is the art of turning data into actions ”
This is accomplished through creation of data
products, that provide actionable information
without exposing underlying data or analytics
“ Scientific study of the creation, validation
and transformation of data to create meaning ”
http://www.datascienceassn.org/code-of-conduct.html
6. While we are on definitions…
Data Mining
“ Non-trivial process of identifying valid, novel, potentially
useful and understandable structures or patterns or models or
relationships in data to enable data driven decision making ”
Statistics
“ Science of learning from data or of
making sense out of data ”
7. Science of Data Science
• Analyze and understand data that’s available
• Find and acquire what more is needed
• Discover what’s not known from data
• Predict and build “actionable insights” from data
• Build data products that has “immediate” business impact
• Make it easy for business to “use”
• Help decision making to drive “business value”
8. Data Science Toolkit
Python
R
Java
Textwrangler
SQL
C, C++
Mahout
NLTK
OpenNLP
GPText
SciPy
Pandas
scikit-leam
Hadoop
Hive
HAWQ
PL/Python
PL/R
PL/Java
Proprietary
D3.js
Gephi
Graphviz
R
Tableau
Proprietary
Languages Libraries Database Visualization
10. Data Science In Action
• Improving User Experience
• Multi-device event stream analysis
• Intrusion detection, avoidance
• Collocation analysis from
cell-phone towers
• Text Mining, Bandwidth Throttling
• Network Performance &
Optimization
• Mobile User Location Analytics
• Customer Churn Prevention
• Social Media and Sentiment
Analysis
• Location Based Initiatives