This document provides an overview of a project for Telefónica I+D to develop analytical user modeling tools. It will analyze data from multiple sources, including call centers, web portals, forums, surveys, and Twitter, to understand customer opinions. The data sources vary in language formality, ability to segment users, and difficulty of acquisition. Current opinion mining tools for Twitter data show limited capability to accurately identify opinions on Telefónica brands from tweets and classify sentiment. The project aims to improve on these tools to better discover and analyze insights from large, multilingual data streams.
2. Index
01 Telefónica Case Study
Overview
Data Sources
Results
Data Key Points
Data Considerations
02 Annex A: Twitter Analysis Examples
Área: LoremI+D
Telefónica ipsum 1
Razón Social: Telefónica Models
User Modeling Analytical
3. 01
Case Study
Telefónica I+D
User Modeling Analytical Models
Área: LoremI+D
Telefónica ipsum 2
Razón Social: Telefónica Models
User Modeling Analytical
4. Overview
RENDER will provide means to enable Telefónica to assess the
incoming requests, complaints and concerns, identify opinions,
viewpoints, trends and tendencies, and take feasible actions based
thereupon.
Área: LoremI+D
Telefónica ipsum 3
Razón Social: Telefónica Models
User Modeling Analytical
5. Data Sources
Web Customer
Portal Messages
Surveys (Shops &
Call Centers Market Research)
Contacts
Twitter Entries
Corporate Forums
Comments
Public Forums
Comments
Área: LoremI+D
Telefónica ipsum 4
Razón Social: Telefónica Models
User Modeling Analytical
6. Data Sources
Amounts of Data
• Data in corporate channels
› Movistar España
› O2 UK and O2 Ireland
• Data in public channels
› Open forums
• Twitter data collection
› 600.000 tweets per day (1% total)
› By geolocation
› 23.000 tweets/day in UK
› 5.000 tweets/day in Spain
› 900 tweets/day in Ireland
› By topic
› 3.300 tweets/day speaking about O2
› 3.200 tweets/day speaking about Movistar
› 800 tweets/day speaking about Telefónica
Área: LoremI+D
Telefónica ipsum 5
Razón Social: Telefónica Models
User Modeling Analytical
7. Results
What do we want to achieve in this project?
• To apply of NLP, data mining, web mining, and machine learning
techniques in order to discover and analyze in‐depth large streams of
data from various sources, across multiple (natural) languages, and a
comprehensive opinion model covering intensity, biases and fact
coverage.
Key aspects
• Management of data source
› Internal Data Vs. External Data
• Processing of the data bias
› Customer Vs. Potential customer
› Non-experimented Vs. Advanced users
• Vision of segmented opinion
› Individual Opinion Vs. Global Opinion
• Identification of the subjectivity in the opinions
› Positive, Negative and Neutral Opinions
• Knowledge of opinion geolocalization (Twitter entries)
Área: LoremI+D
Telefónica ipsum 6
Razón Social: Telefónica Models
User Modeling Analytical
8. Data Key Points
Call Center Web Customer Corporate
Portal Forums
Internal data Internal data Internal data
Customers Customers Customers
Offline users Online users Online users
Objective / Objective / Objective /
Subjective Subjective Subjective
No possible Possible Possible
segmentation segmentation segmentation
Possible localization Possible localization
Possible localization
(with user account) (with user account)
Language not Language not
Language identified identified identified
Área: LoremI+D
Telefónica ipsum 7
Razón Social: Telefónica Models
User Modeling Analytical
9. Data Key Points
Surveys (shops &
Public Forum Twitter Entries
market research)
External data Internal data External data
Customers or Potential Customers or Potential Customers or Potential
Customers Customers Customers
Offline users Advanced online
Online users
users
Objective / Objective / Objective /
Subjective Subjective Subjective
No possible Possible No possible
segmentation segmentation segmentation
Not possible Not always possible
localization Possible localization
localization
Not identified Not identified
language Identified language language
Área: LoremI+D
Telefónica ipsum 8
Razón Social: Telefónica Models
User Modeling Analytical
10. Data Considerations
Call Center
Formal language. Only interaction customer
with the CRM.
The transcriptions have not
mistakes as unknown words Technical Limitations due to
and symbols (only working with recordings:
recognition errors). - Speech recognition
- User/Operator in the same
channel (User diarization)
High difficulty data acquisition.
Customers don’t speak freely,
it’s a formal dialogue.
The topics list is limited, the
issues are defined.
The most of calls don’t express
opinion, are only questions and
complaints.
Área: LoremI+D
Telefónica ipsum 9
Razón Social: Telefónica Models
User Modeling Analytical
11. Data Considerations
Web Customer Portal
Área: LoremI+D
Telefónica ipsum 10
Razón Social: Telefónica Models
User Modeling Analytical
12. Data Considerations
Web Customer Portal
Formal language. Text sentences can have
errors (grammar,
The technical limitations will vocabulary…)
only be the challenge of the
Opinion Mining. Customers don’t write freely,
it’s a formal message.
Only interaction customer
with the CRM.
Medium difficulty data
acquisition.
The list of topics is limited, the
issues are defined.
The most of comments don’t
express opinion, only
questions and complaints.
Área: LoremI+D
Telefónica ipsum 11
Razón Social: Telefónica Models
User Modeling Analytical
13. Data Considerations
Forums Comments
Corporate forum
Área: LoremI+D
Telefónica ipsum 12
Razón Social: Telefónica Models
User Modeling Analytical
14. Data Considerations
Forums Comments
Public forum
Área: LoremI+D
Telefónica ipsum 13
Razón Social: Telefónica Models
User Modeling Analytical
15. Data Considerations
Forums Comments
Customers write in complete Informal language.
freedom.
Transcriptions can have errors
The comments can express (grammar, vocabulary…)
opinion.
Only Interaction between
The list of topics is unlimited, customers (Public Forums)
customers can open any new
issue. Medium difficulty data
acquisition.
Interaction customer-
enterprise and between
customers (Corporate
Forums)
The technical limitations will
only be the challenge of the
Opinion Mining.
Área: LoremI+D
Telefónica ipsum 14
Razón Social: Telefónica Models
User Modeling Analytical
16. Data Considerations
Surveys (shops & market research)
Área: LoremI+D
Telefónica ipsum 15
Razón Social: Telefónica Models
User Modeling Analytical
17. Data Considerations
Surveys (shops & market research)
Formal language. The list of topics is limited.
Customers write in complete Only Interaction customer-
freedom. enterprise
The comments can express Medium difficulty data
opinion. acquisition.
Transcriptions without errors
and natural language.
The technical limitations will
only be the challenge of the
Opinion Mining.
Área: LoremI+D
Telefónica ipsum 16
Razón Social: Telefónica Models
User Modeling Analytical
18. Data Considerations
Twitter Entries
Área: LoremI+D
Telefónica ipsum 17
Razón Social: Telefónica Models
User Modeling Analytical
19. Data Considerations
Twitter Entries
Low difficulty data acquisition. Informal language.
The comments can express Transcriptions can have errors
opinion. (grammar, vocabulary…)
Customers write in complete
freedom.
The list of topics is unlimited,
customers can open any new
issue.
Interaction customer-enterprise
and between customers.
The technical limitations will
only be the challenge of the
Opinion Mining.
Área: LoremI+D
Telefónica ipsum 18
Razón Social: Telefónica Models
User Modeling Analytical
21. Twitter Analysis Examples
Current opinion mining projects in Twitter with no interesting results
• Twitrratr
O2 can’t be
searched because it
has only two
characters.
There’s only 4
results for ‘O2
Ireland’
The only 4 results
are classified as
neutral
This comment is
really negative!
Área: LoremI+D
Telefónica ipsum 20
Razón Social: Telefónica Models
User Modeling Analytical
22. Twitter Analysis Examples
Current opinion mining projects in Twitter with no interesting results
• Tweetfeel
It’s possible
to search
O2, but…
…the
results are
bad!
Sometimes
it’s well
classified
Sometimes
the word
doesn’t exist
And the rest
it’s bad
classified or
identified!
Área: LoremI+D
Telefónica ipsum 21
Razón Social: Telefónica Models
User Modeling Analytical
23. Twitter Analysis Examples
Current projects with no interesting results
• Tweetfeel
In this case it’s
possible to search
O2 Ireland...
…but it’s not
possible as
following words
There are only 4
results, and 3 are
RT (retweeting)
There is still much work to do…
Área: LoremI+D
Telefónica ipsum 22
Razón Social: Telefónica Models
User Modeling Analytical