Contenu connexe Similaire à Social Media Data in Predictive Analytics (20) Plus de Rising Media Ltd. (20) Social Media Data in Predictive Analytics1. 1© GfK 2015 | Predictive Analytics World Berlin | November 2015
Unleashing the potential of social media
Predictive Analytics World Berlin - November 2015
Hwan Kim and Nina Meinel, GfK SE
2. 2© GfK 2015 | Predictive Analytics World Berlin | November 2015
Social Media is rich in information
1011010100101101010010110101001011010100101101010010110101001011010100101101010010110101001011
0101001011010100101101010001101010010110101001011010100101101010010110101001011010100101101010
0101101010010110101001011010100101101010010110101001011010100101101010010110101001011010100101
1010100101101010010110101001011010100101101010010110101001011010100101101010001101010010110101
0010110101001011010100101101010010110101001011010100101101010010110101001011010100101101010010
1101010010110101001011010100101101010010110101001011010100101101010010110101001011010100101101
0100101101010010110101001011010100011010100101101010010110101001011010100101101010010110101001
0110101001011010100101101010010110101001011010100101101010010110101001011010100101101010010110
1010010110101001011010100101101010010110101001011010100101101010010110101001011010100011010100
3. 3© GfK 2015 | Predictive Analytics World Berlin | November 2015
1011010100101101010010110101001011010100101101010010110101001011010100101101010010110101001011
0101001011010100101101010001101010010110101001011010100101101010010110101001011010100101101010
0101101010010110101001011010100101101010010110101001011010100101101010010110101001011010100101
1010100101101010010110101001011010100101101010010110101001011010100101101010001101010010110101
0010110101001011010100101101010010110101001011010100101101010010110101001011010100101101010010
1101010010110101001011010100101101010010110101001011010100101101010010110101001011010100101101
0100101101010010110101001011010100011010100101101010010110101001011010100101101010010110101001
0110101001011010100101101010010110101001011010100101101010010110101001011010100101101010010110
1010010110101001011010100101101010010110101001011010100101101010010110101001011010100011010100
Panels
Brand
Trackers
Sales
& Distribution
But maximum value comes from connecting the dots
4. 4© GfK 2015 | Predictive Analytics World Berlin | November 2015
Social Media Intelligence
Meets Brand Tracker
5. 5© GfK 2015 | Predictive Analytics World Berlin | November 2015
Or more
provocative Can't we just crawl
the web and replace
traditional research?
6. 6© GfK 2015 | Predictive Analytics World Berlin | November 2015
Social Media Analysis 2.0:
More than buzz & sentiment?
An interdisciplinary team of experts from GfK Data & Marketing Sciences, GfK Brand & Customer
Experience and Koen Pauwels from the University of Istanbul was investigating the following research
questions
Can Social Media KPIs …
… possibly replace
or enrich some or
all brand tracking
KPIs?
… serve as
an important
augmentation to
brand tracking KPIs
in the prediction
& mgmt. of brand
performance?
… serve as an
early warning signal
for brand tracking
trends and sub-
sequent shifts in
market perfor-
mance?
… provide new and
relevant diagnostic
depth or breadth to
brand tracking
data?
7. 7© GfK 2015 | Predictive Analytics World Berlin | November 2015
Upper Funnel Key
Measures
Awareness, Familiarity Brand Measures Mentions, Net Sentiment,
Passion Intensity
Activity Read 3rd Party vehicle
reviews etc
Themes Around
Activity
Read 3rd party vehicle
reviews etc
Liked/ followed vehicle on
facebook Twitter etc.
Behavior Plan to shop at dealership Themes Around
Behavior
Plan to shop at dealership
Intenders Over one year intenders Activity Google Trends, fb likes
Make Share
Reasons for First Choice Quality, Safety
Media Share (Three Month) Outdoor/Billboard
Ad Aware Themes Around
Environmental
Attributes
Recall Economy Consumer confidence
Dow Jones Index
Unemployment rate
Fuel price
Ad Empathy Original, Likeable
Make Principal
Messages
Styling, Durability Seasonality Seasonality
Monthly Data (US)
Full available starting from Jul 2009 onwardsSocial Media full available starting from Aug 2012
onwards
Google & facebook: SMI definition, full available
starting from Jan 2012 onwards
Full available starting from Jul 2009 onwards
Percentage of „top-box level“ or „yes“ answers
GfK Owned Data Sources Help to Answer Research Objectives
Brand Tracker Social Media Environmental Attributes
ContentData
8. 8© GfK 2015 | Predictive Analytics World Berlin | November 2015
Causality
Univariate
Description
Prediction
Challenge
The Twofold Process Contains a Filter Process as well
as the Prediction Challenge
• Variances too low/high
Filtering 10% with lowest variance
• Adjust season
Deseasonalization for Sales
• Sample size Social Media
Mentions below 50
Correlation/Granger
Conduct Granger analysis and
Correlations
Evaluation
9. 9© GfK 2015 | Predictive Analytics World Berlin | November 2015
Evaluation
• Prediction performance
• Model complexity
• Interpretation
Models
DLM
VAR
SVM
Hierarchical
Bayes
Lasso
Dynamic Linear Models
Fit and forecast univariate or multivariate time
series that meet the assumption of a Gaussian
distribution.
Hierarchical Bayes
Fits the time series using an
iterative approach maximizing
the posterior likelihood.
Support vector machines
Analyze data and recognize patterns for
classification and regression analysis.
Predicting Sales doing a Challenge With Following …
10. 10© GfK 2015 | Predictive Analytics World Berlin | November 2015
Index VAR SVM** Bayes** Lasso** DLM
RMSE*
(Root mean square error)
7.785 3.259 4.671 2.708 19.344
MAPE*
(Mean absolute percentage error)
23.95% 12.97% 21.40% 10.42% 82.54%
Complexity
Stability
Interpretability
Importance
Total
Model Challenge Shows VAR and LagLasso as Best Suitable
*: E.g., a RMSE (MAPE) of 2.708 (10.42%) is better than one of 4.671 (21.40%) because the smaller the error the better.
**: Including monthly effects
11. 11© GfK 2015 | Predictive Analytics World Berlin | November 2015
Index Brand Tracker Social Media Brand Tracker & Social Media
RMSE* (Root mean square error) 2.206 2.808 1.980
MAPE* (Mean absolute percentage error) 8.9% 13.0 % 8.6 %
# Brand Tracker 8-11 - 5-9
# Social Media - 5-12 0-4
Total
*: E.g., a RMSE (MAPE) of 1.980 (8.6%) is better than one of 2.808 (8.9%) because the smaller the error the better.
#: Number of variables used
Summary of Prediction Shows Brand Tracker in Combination with
Social Media is Suitable to Get a Better Quality of Predictions
12. 12© GfK 2015 | Predictive Analytics World Berlin | November 2015
Social Media Intelligence
meets Retail
13. 13© GfK 2015 | Predictive Analytics World Berlin | November 2015
Research
question Can we enhance
DVD forecast by
adding Social Media?
14. 14© GfK 2015 | Predictive Analytics World Berlin | November 2015
Cinema entrees Movie sales Buzz Seasonality
Multimedia BlueRay, TV, tablet,
gaming console sales
Sentiment Events Awards, christmas
User-generated Genre Action, drama, fantasy, etc.
Themes around Movies Visual effects, actor,
sound, design
Country of origin US, FR, etc.
IMDB ratings
Limited number of titles between 2010 to 2014 with focus on French market
GfK Owned Data Sources Help to Answer Research Objectives
Retail – Box Office Social Media External Attributes
ContentData
Full available starting from 2010 to 2014Three month period per movie starting from 2010 to
2014
Full available starting from 2010 to 2014 on monthly
and weekly level
15. 15© GfK 2015 | Predictive Analytics World Berlin | November 2015
Causality
Univariate
Description
Prediction
Challenge
The Twofold Process Contains a Filter Process as well
as the Prediction Challenge
• Variances too low/high
Filtering 10% with lowest variance
• Missing detection
• Avoid multicollinearity
Correlation/Granger
Conduct Granger analysis for
single KPIs and sales
Comparison
16. 16© GfK 2015 | Predictive Analytics World Berlin | November 2015
Sales
Box office
External
Social Media
Intelligence
(SMI)
Season,
Characteristics
Sales
Box office
External
Social Media
Intelligence
(SMI)
Season,
Characteristics
0
2000
4000
6000
8000
10000
12000
14000
16000
Stepwise
Regression
Lasso & Stepwise
Regression
Subset
Regression
Manual Selection
0
2000
4000
6000
8000
10000
12000
14000
16000
Stepwise
Regression
Lasso & Stepwise
Regression
Subset
Regression
Manual Selection
Including SMI data as predictors improves the prediction…
Predictive Power measured as average prediction error*
*RMSE – Root Mean Square Error
Averagepredictionerror*
Averagepredictionerror*
Model without Social Media Model with Social Media
17. 17© GfK 2015 | Predictive Analytics World Berlin | November 2015
Sales
Box office
External
Social Media
Intelligence
(SMI)
Season,
Characteristics
Sales
Box office
External
Social Media
Intelligence
(SMI)
Season,
Characteristics
0
2000
4000
6000
8000
10000
12000
14000
16000
Stepwise
Regression
Lasso & Stepwise
Regression
Subset
Regression
Manual Selection
0
2000
4000
6000
8000
10000
12000
14000
16000
Stepwise
Regression
Lasso & Stepwise
Regression
Subset
Regression
Manual Selection
What happens, if we adjust for the outlier in the model?
Superiority decreases once we adjust for major outlier. Hence, SMI data seems predicting outliers well.
Averagepredictionerror*
Averagepredictionerror*
weaker superiority
when adjusting for
the outlier
Model without Social Media Model with Social Media
*RMSE – Root Mean Square Error
18. 18© GfK 2015 | Predictive Analytics World Berlin | November 2015
Sales
Box office
External
Social Media
Intelligence
(SMI)
Season,
Characteristics
Sales
Box office
External
Social Media
Intelligence
(SMI)
Season,
Characteristics
0
2000
4000
6000
8000
10000
12000
14000
16000
Stepwise
Regression
Lasso & Stepwise
Regression
Subset
Regression
Manual Selection
0
2000
4000
6000
8000
10000
12000
14000
16000
Stepwise
Regression
Lasso & Stepwise
Regression
Subset
Regression
Manual Selection
… but is using much more information for predicting sales
*RMSE – Root Mean Square Error
6-11
predictors
31-34 predictors
Averagepredictionerror*
Averagepredictionerror*
Model without Social Media Model with Social Media
19. 19© GfK 2015 | Predictive Analytics World Berlin | November 2015
Sales
Box office
External
Social Media
Intelligence
(SMI)
Season,
Characteristics
Sales
Box office
External
Social Media
Intelligence
(SMI)
Season,
Characteristics
0
2000
4000
6000
8000
10000
12000
14000
16000
Stepwise
Regression
Lasso & Stepwise
Regression
Subset
Regression
Manual Selection
0
2000
4000
6000
8000
10000
12000
14000
16000
Stepwise
Regression
Lasso & Stepwise
Regression
Subset
Regression
Manual Selection
However, even a manual variable selection with sparse information
including SMI data as predictors improves the prediction.
*RMSE – Root Mean Square Error
6-11 predictors
31-34 predictors
9
predictors
Averagepredictionerror*
Averagepredictionerror*
Model without Social Media Model with Social Media
20. 20© GfK 2015 | Predictive Analytics World Berlin | November 2015
It is not
"either – or"
• Big Data alone can
(and will very often)
be misleading!
• Smart integration of
Big Data and traditional
approaches guarantees
valid and deep insights
• Our expertise in integrative
modeling is our asset
21. 21© GfK 2015 | Predictive Analytics World Berlin | November 2015
Contact
+33 1 7418 6271
Hwan Kim
hwan.kim@gfk.com
France - Paris
+49 911 395 3961
Dr. Nina Meinel
nina.meinel@gfk.com
Germany - Nuremberg
22. 22© GfK 2015 | Predictive Analytics World Berlin | November 2015
THANK YOU