8. Big data analytics
Analytics is
The scientific process of transforming data into
insights for making better decisions.
Data Insight Decision
IT logs, cloud,
social media,
sensors,
experiments,
etc.
statistical &
operations research
modeling
judgement,
constraints,
intuition
"resource" "product" "goal"
9. Predictive analytics extracts information from data and
use it to predict future trends and behavior patterns.
regression models
discrete choice models
time series models
classification models (decision tree, random forest, support vector machine,
neural network, etc.)
clustering models (k-means, density based, graph based, etc.)
association analysis
...
Big data analytics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
10. Always keep in mind...
> business objectives are the origin of every data mining solution
> data preparation is more than half of the data mining process
> all patterns are subject to change
> there will always be new knowledge
Always pause and ask yourself:
Does this work relate to the business question we try to answer?
Is the original business question still valid?
12. Industry applications of big data
analytics
Customer acquisition
predict customers' buying habits in order to promote relevant products at
multiple touch points.
http://www.youtube.com/watch?feature=player_embedded&v=3WspJ16Ubhw
Clinical decision support
Experts use predictive analysis in health care primarily to determine which
patients are at risk of developing certain conditions, like diabetes, asthma, heart
disease, and other lifetime illnesses.
Cross sale
predictive analytics can help analyze customers' spending, usage and other
behavior, leading to efficient cross sales, or selling additional products to
current customers (beer & diaper)
Ads targeting
http://www.slideshare.net/dennyglee/yahoo-tao-case-study-excerpt
13. Fraud detection
A predictive model can help weed out the "bads" and reduce a business's
exposure to fraud.
Image and Speech Recognition
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.
com/en/us/people/jeff/MIT_BigData_Sep2012.pdf
Operations
Jet Engine + Humans
http://www.youtube.com/watch?v=JHc4ZTTWKrQ
Industry applications of big data
analytics
Amazon wareouse operational efficiency: http://www.youtube.com/watch?
v=Kafs9tZskuo
16. What are those startups doing?
Bloomreach
http://www.youtube.com/watch?feature=player_embedded&v=K12awAj4tW8
Datastax
http://www.nytimes.com/2013/02/25/business/media/for-house-of-cards-using-big-data-to-guarantee-
its-popularity.html?pagewanted=all
Paraccel
http://www.paraccel.com/solutions/paraccel-solutions-big-data.php#.UXG207WG3Ct
Kaggle
http://www.kaggle.com/c/acm-sf-chapter-hackathon-big
17. VC funding for "Big Data"
Data from 71 start-ups. Funding is
counted starting from 2004.
19. Interesting view points
" Special (domain) knowledge becomes less relevant;
organizations should focus on collecting people who know
how to extract value and insights from data."
" In god we trust. All others must bring data."
" The usefulness of a variable in a model is inversely
related to the time you spend creating it."
"Noise is convex but information is concave."
"Big data is sexy but small data is beautiful."
noise
information
data size
20. Interesting view points
"All models are wrong, but some are useful."
"Big data is like teenage sex: everyone talks about it,
nobody really knows how to do it; everyone thinks everyone
else is doing it, so they claim they are doing it."
"Statistics: The Art and Science of Learning from Data"
22. Open discussion
Potential opportunities / challenges for
entrepreneurs?
- visualization
- internet of things
- analytics as a service (a3
s)
Standardization v.s. customization
Human and data interaction
- data v.s intuition
24. Data Science v.s. OR
risk management strategic planning
predictive analytics optimization
Risk
Measurable of Objective
skill sets of data scientists
25.
26. Big data types
● Web & social media: clickstream, web content,
amazon reviews, facebook postings & 'like'...
● M2M:smart meters, oil rig sensor reading, GPS
signals...
● Transaction:retail store, healthcare claims, utility
billing...
● Biometrics:fingerprint, face, voice, handwriting..
● Human-generated data:call logs, emails, surveys...
27. Web & social media
● Transaction: orders, revenue,
● Conversion: click thru, convert to
purchase,...
● Session: length, bounce rate
● Lifetime value: repeat, frequency,...
● Social interaction: intensity,
influence,...
Shopping cart analysis
CTR prediction
Personalization
Retention/customer
churn
A/B testing
Targeted ads
Lifetime value
30. Processing Pipeline
Hadoop
MapReduce
log
sensor
web
...
Structured
Data
Note: Hadoop -- an open-source software framework that supports data-intensive distributed
applications, licensed under the Apache v2 license. It supports the running of applications on large
clusters of commodity hardware. Orginated from Google MapReduce and further developed/promoted by
Yahoo.
SQL
HIVE
Dremel ...
Analytics
Big Data
Cloud
Computing
http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
31. How big is big?
When your data set becomes so large that you have to
start innovating around how to collect, store, organize,
analyze and share it ...
External
> web sites (blogs/reviews)
> social media (Facebook,
LinkedIn, Google+, Twitter)
> images and videos
> ...
Internal
> transactions
> server logs
> machines and sensors
> emails
> ...