1. Intro to Sentiment Analysis
“FAST, NEAT, AVERAGE, FRIENDLY, GOOD, GOOD” was the author’s first sentiment.
2. aka Opinion Mining
Sentiment analysis is opinion mining.
Uses Natural Language Processing.
Dives deep into text analysis.
Leverages computational linguistics.
Develops meta data with business intelligence.
3. Basic Opinion Mining
Construct a range of polarity for opinion markers.
Classify statements by their polarity.
Analyse several levels deep.
Websites are one level.
Authors are another level.
Web page is a third level.
A sentence is a fourth level.
4. Ranges of Polarity
Classify emotional states.
“Angry” can be codified as “upset” or “cross”.
“Sad” may be “disappointed” or “confused”.
“Happy” may be “amazing” or “gorgeous”.
5. Scaling Systems
Some words are negative and deserve to be minus 10.
Some words are neutral and should be equal to five.
Some words are positive and could range from six to 10.
7. Subjectivity and Objectivity
Starts with classifying a given text (no more than a paragraph).
Mark the media text as objective or subjective.
The challenge lies in the subtlety of expression or the compound effect of multiple authors.
Proper analysis normally means removing objective statements from the given text.
8. Aspect-Based Sentiment Analysis
Determine opinions based on features.
Mark the media text as objective or subjective.
The challenge lies in the subtlety of expression or the compound effect of multiple authors.
Proper analysis normally means removing objective statements from the given text.
10. When Something is Ambiguous
Detect entity within text, such as person, place or company.
Get detailed view at entity level, not document-level.
“I love Ireland but I hate traveling on Irish roads.”
11. Disambiguation
Detect entity within text, such as person, place or company.
Get detailed view at entity level, not document-level.
“I love Ireland but I hate traveling on Irish roads.”
12. Entity-Level
Detect entity within text, such as person, place or company.
Get detailed view at entity level, not document-level.
“I love Ireland but I hate traveling on Irish roads.”
13. Keyword-Level Sentiment
Gleans sentiment for every detected keyword.
Much more detailed than view at document-level.
BMW can determine positive comments about cars mention quality of handling.
14. User-Specified Sentiment
You, the analyst, target specific words or phrases.
So you specify a restaurant’s name and return sentiment scores based on that name.
You cull various media texts for sentiment about a specific hotel.
15. Directional Sentiment
Identifies the commentator and emotional range.
First, discover the incident where emotion is expressed.
Second, determine the degree of positive or negative response.
Third, conclude who is mentioning both the product and how negatively.
16. Disambiguation by Location
Identifies the exact point on the earth.
Use contextual cues.
Perhaps where something is posted or where commentator is based.
17. Disambiguation: Meta Data
Meta data provides data about data.
Links can remove ambiguity.
Past geographical movements clarify reach of commentators.
Simple internet searches can provide accurate profile data.
18. Entity Subtypes
Author is a real person.
Author is a man.
Man’s name is Paul O’Connell.
This Paul O’Connell is Munster.
19. Exact Quotations
What was said.
Who said what.
When it was said.
Where it was said.
This exactness provides context.
20. Author Profile
Analyse the text.
Validate the context.
Extract the concept.
Extract the keywords.
Apply to author profile.
Determine what author’s write about.
21. References
Turney and Pang applied methods for detecting polarity at the document level.
Pang and Snyder classified documents on a multi-way scale, such as “five stars”.
Katie Paine wrote “Measure What Matters”
22. Useful Links
For Immediate Release G+ Community
Marketing Over Coffee Podcast
KD Paine’s Blog
The Alchemy Blog
This is the first look at sentiment analysis during a discussion with business students in the Limerick Institute of Technology in October 2013. It is based on professional experience shared by Bernard @topgold Goldbach, Katie @kdpaine Paine, Neville @jangles Hobson, Christopher @cspenn Penn and The Alchemy Group. The author of this deck lives at http://www.insideview.ie.
Sentiment analysis (also known as opinion mining ) refers to the use of natural language processing , text analysis and computational linguistics to identify and extract subjective information in source materials. Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory ), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).
A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as "angry," "sad," and "happy."
Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as "angry," "sad," and "happy."
A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral or positive sentiment with them are given an associated number on a -10 to +10 scale (most negative up to most positive) and when a piece of unstructured text is analyzed using natural language processing , the subsequent concepts are analyzed for an understanding of these words and how they relate to the concept [ citation needed ] . Each concept is then given a score based on the way sentiment words relate to the concept, and their associated score. This allows movement to a more sophisticated understanding of sentiment based on an 11 point scale. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.
Another research direction is subjectivity/objectivity identification . According to Wikipedia, this task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification: the subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Results are largely dependent on the definition of subjectivity used when annotating texts. (Su) As Pang’s research shows, removing objective sentences from a document before classifying its polarity helped improve performance.
Another research direction is subjectivity/objectivity identification . This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification: the subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Results leargely depend on the definition of subjectivity used when annotating texts. (Su) Removing objective sentences from a document before classifying its polarity helped improve performance. (Pang)
The more fine-grained analysis model is called the feature/aspect-based sentiment analysis . It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank. A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, or the picture quality of a camera. This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral. More detailed discussions about this level of sentiment analysis can be found in Liu's NLP Handbook chapter, "Sentiment Analysis and Subjectivity”.
Ambiguous: open to more than one interpretation. Disambiguation: clarification that follows from the removal of ambiguity.
AMBIGUOUS. You need to provide sentiment data for every detected entity within text, such as person, place, organization. You need to give clients a more detailed view than document-level sentiment analysis.
REMOVE AMBIGUITY WITH DISAMBIGUATION TACTICS.
Entity-Level Sentiment Analysis provides sentiment data for every detected entity within text, such as person, place, organization. Alchemy algorithms do this kind of work.
Keyword-Level Sentiment Analysis provides sentiment data for every detected keyword so that instead of generating sentiment by document, it’s possible to generate sentiment for keywords within the document. For example, when analyzing car posts, determine that of the 70% posts that were positive, 80% of them mentioned road handling and 30% complained about the road tax.
User-Specified Sentiment Analysis allows the user to target specific words or phrases. For instance, specifying a movie title returns sentiment scores based on that phrase. This can be done by hand or by Alchemy API.
Directional Sentiment Analysis reveals who is emitting the sentiment. For example, if a person spoke negatively about a product, determine not only that the product was mentioned negatively, but who mentioned the product negatively.
Disambiguation: Dominos in Limerick or Dominos all across Ireland? Since one business can have multiple locations, you need to be able to distinguish by location. This effectively means you are using a disambiguation technique to ferret out the various locations. You can often located contextual cues within the text or by geolocation in a Foursquare tip.
Disambiguation: Additional Information Disambiguation provides additional information for the people, places and things mentioned in a document such as links to their official websites, Wikipedia pages, geographical coordinates and more.
Entity Subtypes: Paul O’Connell, a Person and an Athlete. In addition to the most common entity types, such as person or organization, you should seek to identify subtypes. For example, your basic text analysis services will identify Paul O’Connell as a man but you need to know he is a prominent rugby player for Munster. That way, you know he is an influencer.
Quotations Extraction: What Was Said and Who Said It Entity extraction determines what was said, but quotations extraction tells you who said what by extracting a quote and attributing it back to the person or organization responsible. Knowing that a company was mentioned in a piece of text is important, however, finding out who mentioned the company gives a fuller story. For example, entity extraction can provide you with a list of news articles where a topic and Willie O’Dea were both mentioned, but quotations extraction can provide you with a list of news articles where Willie O’Dea was quoted mentioning that topic.
Author Extraction For data to be meaningful, your text analysis service must be able to contribute to building an author profile. Comments on web pages, tweets, image collections, and site critiques provide excellent data sets. Author extraction combined with concept extraction, keyword extraction, and entity extraction provides information on what topics specific authors write about.
Early work in that area includes Turney and Pang who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder . This expanded the basic task of classifying a movie review as either positive or negative to predicting star ratings on either a 3 or a 4 star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). Peter Turney (2002). "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews". Proceedings of the Association for Computational Linguistics . pp. 417–424. Bo Pang; Lillian Lee and Shivakumar Vaithyanathan (2002). "Thumbs up? Sentiment Classification using Machine Learning Techniques" . Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) . pp. 79–86. Bo Pang; Lillian Lee (2005). "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales" . Proceedings of the Association for Computational Linguistics (ACL) . pp. 115–124. Benjamin Snyder; Regina Barzilay (2007). "Multiple Aspect Ranking using the Good Grief Algorithm" . Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL) . pp. 300–307.
The FIR Community is at https://plus.google.com/communities/112349929544876511942 MOC is http://marketingovercoffee.com KD Paine blogs at http://kdpaine.blogs.com/ Alchemy’s blog is at http://www.alchemyapi.com/blog/
The Moodle Document concerning sentiment analysis is at http://bit.ly/crm-document04 but that might change as the years go on. MOC is http://marketingovercoffee.com KD Paine blogs at http://kdpaine.blogs.com/ Alchemy’s blog is at http://www.alchemyapi.com/blog/ You can contact the author by using the nic “topgold” on all good social networks. This document was written to support the business curriculum in LIT.ie on 11 October 2013.