5. What is Sentiment Analysis?
A linguistic analysis technique that identifies
The movie is great.
The movie stars Mr. X
The movie is horrible.
opinion early in a piece of text.
8. Human can easily understand
emotions.
Can a machine be trained to do it?
What is Sentiment Analysis?
9. SA offers organizations ability to monitor in
real time and act accordingly
Marketing managers, PR Firms, campaign
managers, politicians, equity investors, on
line shoppers are direct beneficiaries
http://www.tweetfeel.com
http://www.nytimes.com/interactive/us/pol
itics/2010-twitter-candidates.html
10.
11. Document-Level
supervised/non supervised learning
Sentence-Level
supervised learning
Feature-Based Sentiment Analysis
All NP in corpus and Polarity
Sentiment Lexicon Acquisition
WordNet
12. Open-source Java based search
engine
Provides document indexing w/
arbitrary fields and fast search
Several relevance and ranking
algorithms
13. 1. Create an index
2. Add ‘document’ representations of
items
3. Construct queries
4. Ask for results (will be scored )
14. IndexWriterConfig config = /* configure */ ;
Directory dir = FSDirectory.open(indexFile);
IndexWriter w = new IndexWriter(dir, config);
for (ItemInfo item: getItems()) {
Document doc = new Document();
doc.add(new Field("title", item.title));
doc.add(new Field("tags", item.tags));
w.add(doc);
}
w.close();
16. PyLucene is Python implementation
Lucy is in C w/ bindings for other langs
Lucene.NET
SOLR provides search server (with REST
API) on top of Lucene
18. Linguistics module
Stems, Lemmas and Synonyms
multi language capability
CJKAnalyzer, UIMA Analyzers
UIMA integration
UpdateProcessorChain
Why Solr ?
19. Why Solr ?
Extract domain specific entities
and concepts
Time and Cost
Solr Set Up – 5 mins
UIMA Annotators - 5 days
Enrich text, write to dedicated field
20. Tagging entities in review text
Applications:
I wasn't really in the market for another tablet, but my girlfriend ended
up getting one for me so she got me on this one. I would like to say that
this tablet reminds me of the first Motorola Droid smartphone that came
out several years back. The phone jam packed a ton of bells & whistles
into its hardware and software to give a lot of bang for your buck. This
is what it feels like amazon has done with the Kindle Fire 8.9. They have
put a lot of advanced hardware and innovative software, so for the
average user, specially someone who absorbs a lot of media, you get a
lot for the price. But just because you get a lot for the price, doesn't
mean it is without its flaws.
22. Digital SLR with Full 1080p HD Video
There are many preprogrammed scene modes
that make this a very easy camera to use.
The picture quality is beyond belief, and
even better for the price.
Price:
Usecase
23. Why UIMA ?
UIMA Framework manages components
and data flow – No coding
Deploy pipeline of analysis engines
AEs wrap NLP algorithms
Person
Place
organization
Language
Detection
Aggregate analysis engine
Sentence
Annotator
POS
Annotator
NER
25. NLP+UIMA
Use POS in query understanding
boosting terms
Synonym expansion
Extract concepts/entities
Faceting using entities
Identify places in query
and use spatial queries
26. Ideas: Sentiment Analysis App
Identify Subjective Sentences from text
Remove noisy sentences
– Regex, conditional probability
Graph min cut – LingPipe
Subjectivity Lexicons
Discard Facts and Objective Sentences
31. Data transformation or post processing
UpdateProcessorFactory
LogUpdateProcessorFactory
UIMAUpdateProcessorFactory
UpdateRequestProcessorChain
◦ Pipe line of UpdateRequestProcessors
Huge explosion today of sentiments. PR firms, etc r Direct beneficiary of SA technology
Classify sentences into 2 principal classes subjective, objective
Positive, negative neutral, naïve bayes
Overall this review is very positive about smart phone, sentiment score
Using hierarchical classification, neutrality is determined first, and sentiment polarity is determined second, but only if the text is not neutral.
No matter how you choose to import data, there is a final config point within solr that allows manipulation of the imported data before it get indexed. Updaterquesthandler put documents on an update request processor chain.