Collective sensing

Collective Sensing
Opinion Mining
Group members :
Mahdi Kianirad , Maryam Daneshfar , Éva Balázs, Fabian Berndt
1

• Introduction
• History
• Application
• Methods and Approaches
• Case Study
2

Introduction
• Sentiment analysis (also known as opinion mining) refers to the use of
natural language processing, text analysis and computational linguistics to
identify and extract subjective information in source materials.
• Methods to extract, identify, or otherwise characterize the sentiment
content of a text unit, Sometimes referred to as opinion mining, although
the emphasis in this case is on extraction.
• Aims to determine the attitude of a speaker or a writer with respect to
some topic or the overall contextual polarity of a document.
3

History
• Early work in this area includes different methods for detecting the
polarity of product reviews and movie reviews respectively (document
level)
• For example : Rotten Tomatoes movie review dataset
Label the reviews :
0 – negative
1 – somewhat negative
2 – neutral
3 – somewhat positive
4 – positive
4

Application
• Business
• Politics/political science
• Law/policy making
• Sociology
• Psychology
5

Methods and Approaches
• keyword spotting
• lexical affinity
• statistical methods (Machine learning)
– latent semantic analysis
assumes that words that are close in meaning will occur in similar pieces of text
– support vector machines
builds a model that assigns new examples into one category or the other (Positive or Negative)
– bag of words
(frequency of) occurrence of each word is used as a feature for training a classifier. Example usage:
spam filtering
• concept-level techniques
6

• Introduction
• Sentiment Analysis
• Method
• Using Bag of words
– Disadvantages
• Using keyword spotting
– Advantages and Disadvantages
• Validation
• Conclusion
8

Introduction
• Twitter is a social networking and micro blogging service that allows users
to post real time messages, called twits. Twits are restricted to 140
characters in length.
• We introduce two resources for pre-processing twitter data to determine
the polarity of sentiment
– Bag of Words
– Keyword Spotting (Using Sad and happy emoticons)
• We delineate our data to London bounding box
– Most twitter users in Europe
– The language is English
For each of them we will show the results and compare these two methods.
9

Sentiment Analysis
• In order to text mining there are many solutions by many platforms
– “Tm” Package for R
– NLTK package for Python
– LingPipe library for java
– …
• NLTK (Natural Language Toolkit)
– a leading platform for building Python programs to work with human language data
– easy-to-use
– over 50 corpora and lexical resources
– suite of text processing libraries for classification, tokenization, stemming, tagging,
parsing, and semantic reasoning
10

Method
• NLTK
Very strong to slicing sentences :
Detect contractions , punctuation a and emoticons
11

Using Bag of words
• Defining to wordlist
– Positive, consist of 2029 words
– Negative , consist of 4783 words
• Approach
– For each Tokenized part of a twit check whether it is positive or negative
– Rate the whole twit based of ratio of positive and negative words frequency
– Each twit will get a rating between 0 and 1(Float number)
12

Disadvantages
– Tend to generate false positive
Near 70 % of records (from 10 million records) have got positive score
(between 0.75 to 1)
– Very dependent on definition of word bag
Results will be deferent with another word bag
– Can not detect implicit attitudes
sarcasm or wit
14

Complete Positive twits
Low density areas were eliminated in order to have more readable map
15

Complete Negative twits
16

Using keyword spotting
• Defining the keyword
Olympic
17

Using keyword spotting in opinion mining
• Defining the key words
Happy : :-) :) :o) :] :3 :c) …
Sad : :-( :( :-< :-/ :/ …
• Approach
– For each list of Tokenized twits check whether it contains happy or sad emoticon
– Rate the whole twit based of appearance of sad or happy emoticons
18

Advantages
– Less ambiguity of results in comparison with “bag of words”
method
Work only with twits that contain emoticon (explicit emption)
Disadvantage
– Smaller data to analyze
750,000 records out of 10,000,000 records
19

Happy emoticon twits 
20

Sad emoticon twits 
21

Validation
• Validation is performed by user
We examined 4000 twitts to determine whether the algorithm works correctly or not.
It reveals that for bag of words method the algorithm work properly in 60% of cases
No validation performed for emoticon spotting
Conclusion
• In opinion mining when different keywords are matter of concern the distribution
of twitts will be different respectively but in term of mood analysis in an area the
distribution and density of different moods (different moods in twitts) will depend
on distribution of the whole population (in this case the concentration of positive
and negative twits do not differ from each other )
22

Collective sensing

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

Similaire à Collective sensing

Similaire à Collective sensing (20)

Dernier

Dernier (20)

Collective sensing