Defense nikhil khullar

Master’s Thesis Defense
22nd January, 2020
Nikhil Khullar
MBA IBC (184856)

2
Research Details
 Title: Consumer perception and reviews on mobile phones:
An analysis using sentiment models in machine learning
 Primary Supervisor: Prof. Dr. rer. nat. Thomas Wenger
 Secondary Supervisor: Prof. Dr. rer. nat. Tobias Hagen
 MBA Program Director: Prof. Dr. rer. pol. Rainer Fischer
 Date of submission: 15th January, 2020
 Date of presentation: 22nd January, 2020

3
 Prior to the MBA (IBC) I have worked for more than 7 years at various
positions of the web and mobile application development spectrum in Japan,
South Korea, US, UK, India, Iceland and Germany.
 Have been a software engineer by profession, and am a hobbyist musician.
 Recently concluded an internship at Accenture in Burghausen, Germany.
 Find it interesting to comprehend how people perceive products and
services, and keenly follow the development of smartphones.
 Fascinated by craftsmanship behind the process of deriving actionable
insights from huge data-sets, which can help businesses in a huge way.
About Me

4
Agenda
Objectives
Introduction, Domain,
Target data and Goals
1
Methodology
Research phases and
techniques followed
2
Results
Findings from the
analyses
3
Conclusions
Summary, Business
interpretations and Q&A
4

5
Target Domain
 Mobile phone industry
 one of the fastest growing sectors
 defines success of consumer electronics firms today
 global smartphone sales revenue: 522 billion USD
 1.56 billion units being sold each year
 Constant existential threat to players in the market due to innovations as
well as new entrants.
 Amazon – one of the biggest online marketplaces, only matched by Alibaba.
 Fast changing landscape in terms of users’ needs and habits.
Source: Statista

6
Why Online Reviews?
 Significance of online reviews:
 90% of consumers read online reviews before visiting a business.
 Online reviews have been shown to impact 67.7% of purchasing decisions.
 84% of people trust online reviews as much as a personal recommendation.
 Businesses risk losing 22% of business when potential customers find one negative article
on the first page of their search results and this risk grows to 44% and to almost 60%
with two and three negative articles respectively.
 Why sentiment analysis?
 Scalar ratings (typically 1-5) are not very helpful as:
 The “why” for that rating or metric like average rating can’t be determined.
 Numeric ratings are not comparable across segments and devices.
Source: Forbes

7
Introduction
 Text analytics
 Machine learning
 Unstructured raw data
 Extracting human sentiments from written text
 Sentiment analysis as a classification problem
 Qualitative sentiment analysis
 Overall aim: gaining actionable insights from customers’ voices
 Polarity: discrete or continuous
 Subjective and Objective sentiment analysis
 ML-based models and lexicon-based VADER

8
Target Dataset
 Gathered by PromptCloud Web Scraping Service
 Long-term data until 2018 made available under Creative Commons
license with all copyrights waived off.
 Recent reviews data including reviews from mid 2018 to July 2019
purchased from PromptCloud for this research.
 Purchased from Data Stock shop via:
 https://datastock.shop
 After selection and pre-processing phases of the pipeline:
 99708 long-term reviews and 49484 recent reviews were retained.
 after de-duping, brand name harmonisation etc.
 Datasets intentionally not unified to perform separate analyses.

9
Goals
 Exploratory statistical analysis
 Comparing performance of models for sentiment classification:
 Logistic Regression
 Support Vector Machine (with linear kernel)
 k-Nearest Neighbours
 naïve Bayes (Gaussian)
 Random Forest and Ensemble Methods
 VADER
 Compound sentiment analysis using VADER and qualitative analysis on specific
target subsets.
 Business use-cases and interpretation of findings.

10
Agenda
Objectives
1
Methodology
Research phases and
techniques followed
2
Results
Findings from the
analyses
3
Conclusions
Summary, Business
4

11
Research Methodology
 Data selection and sanitisation
 Exploratory Statistical Analysis
 Counts, mean values, distribution of ratings among the reviews,
correlation between review length and perceived helpfulness
 Word clouds
 Sentiment Analysis
 Comparative Analysis
 Compound Sentiment Analysis
 Qualitative Sentiment Analysis

12
Data Encoding & Splitting
 One-hot encoding
 no ordinal relationship exists in unstructured textual data
 binary values are preferred over integer encoding
 Label encoding
 typically used for normalising a set of labels, or for transforming non-
numerical labels to numerical ones, our use case: KNN
 Data splitting
 hyper-parameter optimisation
 training, testing, hold-out validation

13
Agenda
Objectives
1
Methodology
Research phases and
techniques followed
2
Results
Findings from the
analyses
3
Conclusions
Summary, Business
4

16
Confusion Matrix
 Business Implications of metrics
 True negatives have far less business costs compared to false negatives
and false positives.
 Example: missing a very receptive market for social video game in Taiwan,
based on tests using English localisation, leading to false negative!

17
Model Evaluation Metrics
 Accuracy
 percentage of correct predictions among total predictions.
 Precision
 when the model predicts positive, how often is it correct.
 Recall
 when the outcome is positive, how often is our model saying so.
 F1 Score
 harmonic mean of precision and recall.
 better measure to seek a balance, based on business costs.

20
VADER Compound Analysis
Apple
Samsung

21
Qualitative results
 Insights leading to most positives recently on Android phones:
 wireless charging, image stabilisation, curved edge
 quad-core, heart rate, super AMOLED, battery life, snapdragon
 Insights leading to most negatives recently on Android phones:
 unlocking, Bixby, phone heat, Android crash
 Pants pocket, bloatware apps, useless features
 Phone perception linking to general brand image:
 many highly positive iPhone reviews refer to MacBook Pro, Air, iPad Pro
 post-sale customer service also seems to impact product perception
 trends from rolling and expanding means on time series coincide with events

22
Agenda
Objectives
1
Methodology
Research phases and
techniques followed
2
Results
Findings from the
analyses
3
Conclusions
Summary, Business
4

23
Business use-cases
 Descriptive, predictive and prescriptive analytics
 data-driven decision making
 forecasting
 uses abound in industries from video games, stock markets to medicine
 Examples from currently thriving start-ups:
 Gavagai – instant operational insights
 Talkwalker – empowering brands socially
 Aspectiva – acquired by Walmart for recommendation engine
 Smartmunk – improving customer loyalty
 Revuze – text mining on call center feedbacks, online CX, social media etc.

“A breakthrough in machine learning
would be worth ten Microsofts”
25
~ Bill Gates

Defense nikhil khullar

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Defense nikhil khullar

Similaire à Defense nikhil khullar (20)

Dernier

Dernier (20)

Defense nikhil khullar