SlideShare une entreprise Scribd logo
1  sur  15
Sentiment Analysis in Machine Learning
Jennifer D. Davis, Ph.D.
American Computing Machinery, Austin Chapter
Sub-group on Knowledge, Discovery and Data Mining
June 2, 2015
Who uses sentiment analysis anyway?
What is sentiment analysis?
 Machine learning technique that classifies
comments and phrases based on what is called a
‘corpus’—a group of annotated texts with weights
given to words in numerical terms
 Defined as:
 “Sentiment analysis (opinion mining) refers to the
use of natural language processing, text analysis and
computational linguistics to identify and extract
subjective information to source materials.”
wikipedia encyclopedia
Sentiment Analysis: Not your Mother’sTwitter Feed!
 Sentiment Analysis can be used to:
 Understand the intent behind language in an
unbiased manner
 Business areas that frequently use Sentiment
Analysis:
 Retail
 Entertainment
 Healthcare
 Any customer-centered organization
 Respond to customer complaints with better
solutions, a sort of virtual call center (e.g. Amelia)
Retail
 Introduce new products more successfully by
understanding culture & social media
 Understand and respond to customer needs using
internal data sources such as customer reviews or
feedback
 Develop new products based on customer wants and
needs as expressed in reviews, on-line and social media
Entertainment
 Create interest or excitement about movies by
understanding the market segment
 Target movie advertising or recommender systems
based on social commentary and collaborative
filtering
 Target advertising to gender or population or by
cultural affinity.
Healthcare and MedicalTreatment
 Healthcare:
 Learn about patient wellness –
 Potentially detect depression from journal entries
 Assist with patient adherence to treatment
 Learn about patient satisfaction and what is working
 Gather outcomes measures associated with patient
satisfaction
 This is a hot area of research and several academic
institutions are investing in research related to
patient outcomes and sentiment analysis.
What are the overall steps for sentiment analysis?
 Gather unstructured data from your own sources, web-sources, databases
(healthcare.gov surprisingly has some) and competitions like Kaggle.
 Parse out unnecessary punctuation and “stop” words or phrases, perform
other pre-processing as needed or appropriate.
 Transform the words or phrases to a numerical representation such as a
vector
 Choose an appropriate classification algorithm. For example Random Forrest
has a high accuracy rate, but isn’t always computationally efficient. We
discussed several other methods previously.
 Apply your algorithm to a training set and if enough data is available, cross-
validate. Tune the algorithm using appropriate parameters matched to
features, but avoid over-fitting.
 Apply the algorithm to test data (the fun part).
What techniques can we use?
 Many are under development by machine-learning
focused corporations and in academic linguistic
laboratories
 Often an ensemble of algorithms works best and is most
accurate
 Text data is often unstructured data. You will spend a
portion of time cleaning and organizing data. Not fun,
but necessary.
 Today we will very briefly give high-level overview of 3
methods (i) Bayesian Probability classification, (ii)
Word2Vec and (iii) Neural Recursive Networks
Bayesian Probability and classification method
 Naïve Bayes classification uses probability formulas
that are based on the assumptions that all features
function independently
 For most cases this is surprisingly accurate, and
typically can yield 70-80% accuracies
 You can read more about this in the textbook for
this course, “Building Machine Learning Systems
with Python”
Word2vec “deep” learning method
 This method relies upon creating a “Bag of Words” from semi-
structured data
 Many tools are available in scikit learn and nltk python
libraries (we will show some in our Jupyter (iPython)
notebook
 Invented by Google engineers who describes it as a “tool [that
provides] an efficient implementation of a continuous bag-of-
words and skip-gram architectures for computing vector
representations of words”
 In other words, (pun intended) words are assigned a vector of
numbers representing their importance, and meaning
Neural recursive network method
 The best (and most convenient to use) library is Stanford
University’s Natural Language Processing library.
 The method uses a recursion algorithm that will distinguish
between phrases based upon the order of words & phrases
 For example “this movie has humor that could not be denied”
would be graded as positive whereas “this movie did not have
any humor whatsoever” would be graded as negative based
on order and choice of words & phrases.
 SNLP Group can be found at: nlp.stanford.edu; their live
demonstration is available at: nlp.stanford.edu/sentiment
So which do I choose?
 It depends upon the complexity of data you are
analyzing
 It depends upon the accuracy you desire versus
scalability (always a balancing act)
 It depends on your time frame and how you will
integrate the knowledge derived from using
sentiment analysis
 Out of the box solutions can work, but sometimes
you will need to build your own
So now we can give it a try!
 A Jupyter Notebook has been created and can be accessed via
my Github account at:
https://github.com/jddavis-100/Statistics-and-Machine-Learning/
 Data is available at:
 Kaggle.com by joining the Kaggle Competition
 The test set was designed by me, and I can provide it to you or
Omar.
 Gather your own data from a number of APIs including or web-
crawlers such as:
 Rotten Tomatoes API
 Twitter API
 Web-scraping tools such as Scrapy (Python tool available at
scrapy.org)
GitHub Repository
 Tutorial:
 https://github.com/jddavis-100/Statistics-and-
Machine-Learning/wiki/Sentiment-Analysis--Class-
for-ACM,-SIGKDD,-Austin-Chapter
 Repo: https://github.com/jddavis-100/Statistics-
and-Machine-Learning

Contenu connexe

Tendances

An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemGan Keng Hoon
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesKarol Chlasta
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisTharindu Kumara
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets🧑‍💻 Manuel Coppotelli
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningAli Habeeb
 
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments AnalysisPratisthaSingh5
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisGangasagar Patil
 
A review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigA review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigNurfadhlina Mohd Sharef
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Mechanical Turk
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysisprathako
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis ReportAbanoub Amgad
 

Tendances (20)

An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments Analysis
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment Analysis
 
A review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigA review of sentiment analysis approaches in big
A review of sentiment analysis approaches in big
 
Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
 
Opinion mining
Opinion miningOpinion mining
Opinion mining
 

En vedette

Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningNihar Suryawanshi
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisMakrand Patil
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with ScalaSusan Eraly
 
Knime Evaluation Smaller
Knime Evaluation SmallerKnime Evaluation Smaller
Knime Evaluation Smallervijaydj
 
KNIME - Create Workflow with KNIME
KNIME - Create Workflow with KNIMEKNIME - Create Workflow with KNIME
KNIME - Create Workflow with KNIMEBilly Wong
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitterprnk08
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!KNIMESlides
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIMEKNIMESlides
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overviewananth
 
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword ResearchSearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword ResearchDistilled
 

En vedette (20)

Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine Learning
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with Scala
 
CURRICULO_LeonardoLopes _20160623
CURRICULO_LeonardoLopes _20160623CURRICULO_LeonardoLopes _20160623
CURRICULO_LeonardoLopes _20160623
 
Knime Evaluation Smaller
Knime Evaluation SmallerKnime Evaluation Smaller
Knime Evaluation Smaller
 
KNIME - Create Workflow with KNIME
KNIME - Create Workflow with KNIMEKNIME - Create Workflow with KNIME
KNIME - Create Workflow with KNIME
 
Introduction to knime
Introduction to knimeIntroduction to knime
Introduction to knime
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 
KNIME tutorial
KNIME tutorialKNIME tutorial
KNIME tutorial
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIME
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Knime
KnimeKnime
Knime
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword ResearchSearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 

Similaire à Sentiment analysis

A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESA SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESJournal For Research
 
introduction to machine learning and nlp
introduction to machine learning and nlpintroduction to machine learning and nlp
introduction to machine learning and nlpMahmoud Farag
 
Text Classification.pptx
Text Classification.pptxText Classification.pptx
Text Classification.pptxhezamgawbah
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsNaveen Ashish
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptxprathammishra28
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf13DikshaDatir
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approchanil maurya
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
 
Deep Machine Reading
Deep Machine ReadingDeep Machine Reading
Deep Machine ReadingNaveen Ashish
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisDinesh V
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine LearningVedaj Padman
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...IRJET Journal
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 

Similaire à Sentiment analysis (20)

NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESA SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
 
introduction to machine learning and nlp
introduction to machine learning and nlpintroduction to machine learning and nlp
introduction to machine learning and nlp
 
Text Classification.pptx
Text Classification.pptxText Classification.pptx
Text Classification.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptx
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A Survey
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
 
Cognitive computing
Cognitive computing Cognitive computing
Cognitive computing
 
Abstract
AbstractAbstract
Abstract
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Deep Machine Reading
Deep Machine ReadingDeep Machine Reading
Deep Machine Reading
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 

Sentiment analysis

  • 1. Sentiment Analysis in Machine Learning Jennifer D. Davis, Ph.D. American Computing Machinery, Austin Chapter Sub-group on Knowledge, Discovery and Data Mining June 2, 2015
  • 2. Who uses sentiment analysis anyway?
  • 3. What is sentiment analysis?  Machine learning technique that classifies comments and phrases based on what is called a ‘corpus’—a group of annotated texts with weights given to words in numerical terms  Defined as:  “Sentiment analysis (opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information to source materials.” wikipedia encyclopedia
  • 4. Sentiment Analysis: Not your Mother’sTwitter Feed!  Sentiment Analysis can be used to:  Understand the intent behind language in an unbiased manner  Business areas that frequently use Sentiment Analysis:  Retail  Entertainment  Healthcare  Any customer-centered organization  Respond to customer complaints with better solutions, a sort of virtual call center (e.g. Amelia)
  • 5. Retail  Introduce new products more successfully by understanding culture & social media  Understand and respond to customer needs using internal data sources such as customer reviews or feedback  Develop new products based on customer wants and needs as expressed in reviews, on-line and social media
  • 6. Entertainment  Create interest or excitement about movies by understanding the market segment  Target movie advertising or recommender systems based on social commentary and collaborative filtering  Target advertising to gender or population or by cultural affinity.
  • 7. Healthcare and MedicalTreatment  Healthcare:  Learn about patient wellness –  Potentially detect depression from journal entries  Assist with patient adherence to treatment  Learn about patient satisfaction and what is working  Gather outcomes measures associated with patient satisfaction  This is a hot area of research and several academic institutions are investing in research related to patient outcomes and sentiment analysis.
  • 8. What are the overall steps for sentiment analysis?  Gather unstructured data from your own sources, web-sources, databases (healthcare.gov surprisingly has some) and competitions like Kaggle.  Parse out unnecessary punctuation and “stop” words or phrases, perform other pre-processing as needed or appropriate.  Transform the words or phrases to a numerical representation such as a vector  Choose an appropriate classification algorithm. For example Random Forrest has a high accuracy rate, but isn’t always computationally efficient. We discussed several other methods previously.  Apply your algorithm to a training set and if enough data is available, cross- validate. Tune the algorithm using appropriate parameters matched to features, but avoid over-fitting.  Apply the algorithm to test data (the fun part).
  • 9. What techniques can we use?  Many are under development by machine-learning focused corporations and in academic linguistic laboratories  Often an ensemble of algorithms works best and is most accurate  Text data is often unstructured data. You will spend a portion of time cleaning and organizing data. Not fun, but necessary.  Today we will very briefly give high-level overview of 3 methods (i) Bayesian Probability classification, (ii) Word2Vec and (iii) Neural Recursive Networks
  • 10. Bayesian Probability and classification method  Naïve Bayes classification uses probability formulas that are based on the assumptions that all features function independently  For most cases this is surprisingly accurate, and typically can yield 70-80% accuracies  You can read more about this in the textbook for this course, “Building Machine Learning Systems with Python”
  • 11. Word2vec “deep” learning method  This method relies upon creating a “Bag of Words” from semi- structured data  Many tools are available in scikit learn and nltk python libraries (we will show some in our Jupyter (iPython) notebook  Invented by Google engineers who describes it as a “tool [that provides] an efficient implementation of a continuous bag-of- words and skip-gram architectures for computing vector representations of words”  In other words, (pun intended) words are assigned a vector of numbers representing their importance, and meaning
  • 12. Neural recursive network method  The best (and most convenient to use) library is Stanford University’s Natural Language Processing library.  The method uses a recursion algorithm that will distinguish between phrases based upon the order of words & phrases  For example “this movie has humor that could not be denied” would be graded as positive whereas “this movie did not have any humor whatsoever” would be graded as negative based on order and choice of words & phrases.  SNLP Group can be found at: nlp.stanford.edu; their live demonstration is available at: nlp.stanford.edu/sentiment
  • 13. So which do I choose?  It depends upon the complexity of data you are analyzing  It depends upon the accuracy you desire versus scalability (always a balancing act)  It depends on your time frame and how you will integrate the knowledge derived from using sentiment analysis  Out of the box solutions can work, but sometimes you will need to build your own
  • 14. So now we can give it a try!  A Jupyter Notebook has been created and can be accessed via my Github account at: https://github.com/jddavis-100/Statistics-and-Machine-Learning/  Data is available at:  Kaggle.com by joining the Kaggle Competition  The test set was designed by me, and I can provide it to you or Omar.  Gather your own data from a number of APIs including or web- crawlers such as:  Rotten Tomatoes API  Twitter API  Web-scraping tools such as Scrapy (Python tool available at scrapy.org)
  • 15. GitHub Repository  Tutorial:  https://github.com/jddavis-100/Statistics-and- Machine-Learning/wiki/Sentiment-Analysis--Class- for-ACM,-SIGKDD,-Austin-Chapter  Repo: https://github.com/jddavis-100/Statistics- and-Machine-Learning

Notes de l'éditeur

  1. http://www.entrepreneur.com/article/245827 Amelia is a AI platform that can sense human emotions and innuendo using sentiment analysis