2. Problem & Dataset
Twitter is a micro-blogging website that allows people to share and express their views
about topics, or post messages.There has been a lot of work in the Sentiment Analysis of
twitter data. This project involves classification of Tourism tweets into two main
sentiments: positive and negative.
Data Collection Source:
We have used tweepy module to extract tweets from twitter using various hastags
combination.We have also used BeautfiulSoup, urllib and requests module to extract
blogs posted by tourists across various blogging sites.
3. Related Papers
Some of the early results on Sentiment Analysis of twitter data are by Go et al. who used
distant learning to acquire sentiment data.They have built the model using Naive Bayes,
MaxEnt and SVM classifiers, where they report SVM is better than all other classifiers. On
the features, they have used Unigram,Bigram.
● A survey report from Pang and Lee on Opinion mininn and sentiment analysis gives
a comprehensive study in the area with respect to sentiment analysis of blogs, reviews
etc. (http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf).
4. Proposed Solutions & Expected ResultResult
After extracting tweets we preprocessed them. In preprocessing step we removed the links
so that it won’t effect the analysis step. Then we generated word cloud of the tweet. After
this we will run sentiment analysis on each tweet and calculate the polarity and subjectivity
score.These tweets provided us with vital information about conditions of Tourism.
Result
So by looking at the average score of polarity (without repetition) we would be able to say
that majority of tweets are positive or negative & by looking at the average subjectivity score
we would be able to say that tweets are more objective in nature or not and it is useful as we
can find more of emotion in them compared to fact which is useful for us. Because we are
more interested in knowing what people feels.From word cloud we can observe that which
places are more preferred by tourists.
5. Progress so far
We have used tweepy module to extract tweets using various
hashtag combination used by the tourists After extracting we
preprocessed them using NLTK library which involves
filtering of data using methods such as First Tokenizing the
data and then Stop-words removal, twitters slang removals.
6. Issues Faced
● Mostly people want to share their positive experience
so it compresses the number of negative tweets
● One of the major challenges in Sentiment Analysis of
Twitter is to collect a labelled dataset.
11. Bar plot of Positive and Negative
Polarity Score
12.
13.
14.
15. Subjective And Objective Analysis
What does it mean ?
An objective perspective is one that is not influenced by emotions, opinions, or personal
feelings - it is a perspective based in fact, in things quantifiable and measurable.
A subjective perspective is one open to greater interpretation based on personal feeling,
emotion, aesthetics, etc.
18. Subjective And Objective Analysis
Objective and Subjective ratio in Tweets is around 1.444 which means
customers express their views more directly (objectively) than being
subjective i.e. people are less affectionate to the Tourism Industry.