Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Collective sensing
1. Collective Sensing
Opinion Mining
Group members :
Mahdi Kianirad , Maryam Daneshfar , Éva Balázs, Fabian Berndt
1
2. • Introduction
• History
• Application
• Methods and Approaches
• Case Study
2
3. Introduction
• Sentiment analysis (also known as opinion mining) refers to the use of
natural language processing, text analysis and computational linguistics to
identify and extract subjective information in source materials.
• Methods to extract, identify, or otherwise characterize the sentiment
content of a text unit, Sometimes referred to as opinion mining, although
the emphasis in this case is on extraction.
• Aims to determine the attitude of a speaker or a writer with respect to
some topic or the overall contextual polarity of a document.
3
4. History
• Early work in this area includes different methods for detecting the
polarity of product reviews and movie reviews respectively (document
level)
• For example : Rotten Tomatoes movie review dataset
Label the reviews :
0 – negative
1 – somewhat negative
2 – neutral
3 – somewhat positive
4 – positive
4
5. Application
• Business
• Politics/political science
• Law/policy making
• Sociology
• Psychology
5
6. Methods and Approaches
• keyword spotting
• lexical affinity
• statistical methods (Machine learning)
– latent semantic analysis
assumes that words that are close in meaning will occur in similar pieces of text
– support vector machines
builds a model that assigns new examples into one category or the other (Positive or Negative)
– bag of words
(frequency of) occurrence of each word is used as a feature for training a classifier. Example usage:
spam filtering
• concept-level techniques
6
8. • Introduction
• Sentiment Analysis
• Method
• Using Bag of words
– Disadvantages
• Using keyword spotting
– Advantages and Disadvantages
• Validation
• Conclusion
8
9. Introduction
• Twitter is a social networking and micro blogging service that allows users
to post real time messages, called twits. Twits are restricted to 140
characters in length.
• We introduce two resources for pre-processing twitter data to determine
the polarity of sentiment
– Bag of Words
– Keyword Spotting (Using Sad and happy emoticons)
• We delineate our data to London bounding box
– Most twitter users in Europe
– The language is English
For each of them we will show the results and compare these two methods.
9
10. Sentiment Analysis
• In order to text mining there are many solutions by many platforms
– “Tm” Package for R
– NLTK package for Python
– LingPipe library for java
– …
• NLTK (Natural Language Toolkit)
– a leading platform for building Python programs to work with human language data
– easy-to-use
– over 50 corpora and lexical resources
– suite of text processing libraries for classification, tokenization, stemming, tagging,
parsing, and semantic reasoning
10
11. Method
• NLTK
Very strong to slicing sentences :
Detect contractions , punctuation a and emoticons
11
12. Using Bag of words
• Defining to wordlist
– Positive, consist of 2029 words
– Negative , consist of 4783 words
• Approach
– For each Tokenized part of a twit check whether it is positive or negative
– Rate the whole twit based of ratio of positive and negative words frequency
– Each twit will get a rating between 0 and 1(Float number)
12
14. Disadvantages
– Tend to generate false positive
Near 70 % of records (from 10 million records) have got positive score
(between 0.75 to 1)
– Very dependent on definition of word bag
Results will be deferent with another word bag
– Can not detect implicit attitudes
sarcasm or wit
14
15. Complete Positive twits
Low density areas were eliminated in order to have more readable map
15
16. Complete Negative twits
Low density areas were eliminated in order to have more readable map
16
17. Using keyword spotting
• Defining the keyword
Olympic
Low density areas were eliminated in order to have more readable map
17
18. Using keyword spotting in opinion mining
• Defining the key words
Happy : :-) :) :o) :] :3 :c) …
Sad : :-( :( :-< :-/ :/ …
• Approach
– For each list of Tokenized twits check whether it contains happy or sad emoticon
– Rate the whole twit based of appearance of sad or happy emoticons
18
19. Advantages
– Less ambiguity of results in comparison with “bag of words”
method
Work only with twits that contain emoticon (explicit emption)
Disadvantage
– Smaller data to analyze
750,000 records out of 10,000,000 records
19
20. Happy emoticon twits
Low density areas were eliminated in order to have more readable map
20
21. Sad emoticon twits
Low density areas were eliminated in order to have more readable map
21
22. Validation
• Validation is performed by user
We examined 4000 twitts to determine whether the algorithm works correctly or not.
It reveals that for bag of words method the algorithm work properly in 60% of cases
No validation performed for emoticon spotting
Conclusion
• In opinion mining when different keywords are matter of concern the distribution
of twitts will be different respectively but in term of mood analysis in an area the
distribution and density of different moods (different moods in twitts) will depend
on distribution of the whole population (in this case the concentration of positive
and negative twits do not differ from each other )
22