SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Sentiment Analysis

 1. Discover a niche network of Twitter users
 2. Model their emotions on topics
 3. Use feelings to more accurately predict a
    time series    e.g. The stock market
                  e.g. Box office success
 4. Are some [users/networks] more influential
 than others?
This Talk

   The Design Decision
   The Core Goals
   The 3 parts of the project:
      1. Classifying the SENTIMENT of tweets
      2. Building a NETWORK of twitter users
      3. Finding a TIME SERIES of sentiment for each
      user
Sentiment Analysis Used Already

   Derwent Capital Markets - ”The twitter
    hedgefund”
   £25m fund
   10% of tweets
   predicts Dow Jones movement direction with
    87.6% accuracy
   Returned 1.85% in its first month of trading
   Johan Bollen, Indiana University, used bag-of-
    words approach
Sentiment Analysis Used Already

   Product reviews / ratings
Sentiment Analysis Used Already
   Social Media Analytics
Design Decision
   Many paragraphs of text (Product Reviews)
      + : Better accuracy of prediction
      - : Less data overall
   Huge amount of small quantities of text (Twitter)
      + : Opinions of greater number of people
               & at high enough frequency to model as a signal
      - : Classification of opinion is v. poor



    => TWITTER
2 Current Aims (will change later)

 1. Project aims to be context
 independent (i.e. Movies & products)

 2. When context is given, use it to
 better classify tweets
1: Sentiment Analysis of Tweets
   Three-tier classification process:
             tweet


    spam                not spam


            objective              subjective


                         positive         negative
1: Sentiment Analysis of Tweets
   Double-Back Propagation Algorithm
       ACL Journal, March 2011, MIT Press
       Opinion Word Extraction & Target Extraction
       4 rules
                     ”The phone has a good screen”
                      => add ”good” to list of adjectives
                      => add ”screen” to list of nouns
                     Etc.
       Great for rating features of a product
       Not great for tweets
1: Sentiment Analysis of Tweets
   Twitter Part Of Speech (POS) tagger:
    www.ark.cs.cmu.edu/TweetNLP/
   Written in java          "        ^
                             Drive         ^
   Max Ent                  "        ^
                             ,             ,
                             go            V
                             and           &
                             watch         V
                             it            O
                             !             ,
                             Fantastic     A
                             movie         N
                             .             ,
Bootstrapped Tweet SA improver
                                    Tweet
   IMDB Movie
  Review Corpora                    Tweet

                                    Tweet
                      Sentiment
                       Analysis
                                    Tweet
   Double-Back
    Prop. Algo                      Tweet

                                    Tweet
   Gives useful adjectives, nouns   Tweet
2: Building a Network
   Collected my twitter friends, friends of friends,
    friends of friends of friends.
       => 115,896 users
2: Building a Network
2: Building a Network
   Community detection:
       Paper 1: Near linear time algorithm for
        detecting community structures on large
        scale networks

       Paper 2: An LDA-based Community Structure
        Discovery Approach for Large-Scale Social
        Networks Haizheng Zhang
2: Building a Network

     Like MapReduce
     Instead of ”map” and ”reduce”
     Map = 'Update':
      modify overlapping sets of data
     Reduce = 'Sync': perform reductions in the
      background while sync is running
     Label Propagation & LDA
3: Time series prediction
   Will get time series from python to R
    using the rpy2 module

   R has a great package ”quantmod” for
    importing financial market data.

   Can also import other time series
    very easily & many great libraries.
Built With

   Python        - For majority of code
                Packages: numpy, scipy, matplotlib
                         networkx, graphviz, rpy2
                         django, twython, nltk
   R             - For time series analysis
   Postgreql     - SQL database
   Java          - Twitter POS tagger
   C/C++         - GraphLab
End Product




 IMDB Movie
Review Corpora           Tweet
                         Tweet
             Sentiment
                         Tweet
 Double-Back Analysis
  Prop. Algo             Tweet
                         Tweet
Thank You
        Mike Davies
    Documented at www.m1ked.com
Notes: Vowpal Wabbit LDA

    Vowpal Wabbit is an open source library
    for fast online learning (mostly SGD)
    mainly developed by a guy at Yahoo.
   Optimised for speed
   LDA uses clever tricks like vectorisation,
    floating point representation to avoid using
    pow() and exp() functions.
Notes: Label Propagation

   Label Propagation has been proven to be an
    effective semi-supervised learning approach in
    many applications. The key idea behind label
    propagation is to first construct a graph in which
    each node represents a data point and each
    edge is assigned a weight often computed as
    the similarity between data points, then
    propagate the class labels of labeled data to
    neighbors in the constructed graph in order to
    make predictions.

Contenu connexe

En vedette

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
RexNige
 
Arabic Text mining Classification
Arabic Text mining Classification Arabic Text mining Classification
Arabic Text mining Classification
Zakaria Zubi
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
Arabic_NLP_ImamU2013
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
Davide Feltoni Gurini
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
Marin Dimitrov
 
Sentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews DatasetSentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews Dataset
Maham F'Rajput
 

En vedette (18)

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Arabic Text mining Classification
Arabic Text mining Classification Arabic Text mining Classification
Arabic Text mining Classification
 
Arabic tokenization and stemming
Arabic tokenization and  stemmingArabic tokenization and  stemming
Arabic tokenization and stemming
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache Spark
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaion
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
 
Sentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews DatasetSentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews Dataset
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter data
 
Big Data & Sentiment Analysis
Big Data & Sentiment AnalysisBig Data & Sentiment Analysis
Big Data & Sentiment Analysis
 
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
 

Similaire à Mike davies sentiment_analysis_presentation_backup

Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
SonuCreation
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Markus Harrer
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
Tom LaGatta
 

Similaire à Mike davies sentiment_analysis_presentation_backup (20)

The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
 
Visualising conversation around #c4thepromise
Visualising conversation around #c4thepromiseVisualising conversation around #c4thepromise
Visualising conversation around #c4thepromise
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 
Selecting Trustworthy Content Using Tags
Selecting Trustworthy Content Using TagsSelecting Trustworthy Content Using Tags
Selecting Trustworthy Content Using Tags
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
 
IRJET- Movie Success Prediction using Data Mining and Social Media
IRJET- Movie Success Prediction using Data Mining and Social MediaIRJET- Movie Success Prediction using Data Mining and Social Media
IRJET- Movie Success Prediction using Data Mining and Social Media
 
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdf
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
 
Agile Estimation
Agile EstimationAgile Estimation
Agile Estimation
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
IA and RIA: You know more than you think you do
IA and RIA: You know more than you think you doIA and RIA: You know more than you think you do
IA and RIA: You know more than you think you do
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSIS
UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSISUTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSIS
UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSIS
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data Science
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 

Dernier

Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
baharayali
 
Real Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in kara...
Real Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in kara...Real Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in kara...
Real Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in kara...
baharayali
 
Popular Kala Jadu, Black magic expert in Karachi and Kala jadu expert in Laho...
Popular Kala Jadu, Black magic expert in Karachi and Kala jadu expert in Laho...Popular Kala Jadu, Black magic expert in Karachi and Kala jadu expert in Laho...
Popular Kala Jadu, Black magic expert in Karachi and Kala jadu expert in Laho...
baharayali
 
Famous No -1 amil baba in Hyderabad ! Best No _ Astrologer in Pakistan, UK, A...
Famous No -1 amil baba in Hyderabad ! Best No _ Astrologer in Pakistan, UK, A...Famous No -1 amil baba in Hyderabad ! Best No _ Astrologer in Pakistan, UK, A...
Famous No -1 amil baba in Hyderabad ! Best No _ Astrologer in Pakistan, UK, A...
No -1 Astrologer ,Amil Baba In Australia | Uk | Usa | Canada | Pakistan
 
VADODARA CALL GIRL AVAILABLE 7568201473 call me
VADODARA CALL GIRL AVAILABLE 7568201473 call meVADODARA CALL GIRL AVAILABLE 7568201473 call me
VADODARA CALL GIRL AVAILABLE 7568201473 call me
shivanisharma5244
 
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
baharayali
 

Dernier (20)

Flores de Mayo-history and origin we need to understand
Flores de Mayo-history and origin we need to understandFlores de Mayo-history and origin we need to understand
Flores de Mayo-history and origin we need to understand
 
Codex Singularity: Search for the Prisca Sapientia
Codex Singularity: Search for the Prisca SapientiaCodex Singularity: Search for the Prisca Sapientia
Codex Singularity: Search for the Prisca Sapientia
 
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
 
Genesis 1:8 || Meditate the Scripture daily verse by verse
Genesis 1:8  ||  Meditate the Scripture daily verse by verseGenesis 1:8  ||  Meditate the Scripture daily verse by verse
Genesis 1:8 || Meditate the Scripture daily verse by verse
 
Real Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in kara...
Real Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in kara...Real Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in kara...
Real Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in kara...
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
MEIDUNIDADE COM JESUS PALESTRA ESPIRITA1.pptx
MEIDUNIDADE COM JESUS  PALESTRA ESPIRITA1.pptxMEIDUNIDADE COM JESUS  PALESTRA ESPIRITA1.pptx
MEIDUNIDADE COM JESUS PALESTRA ESPIRITA1.pptx
 
St. Louise de Marillac and Galley Prisoners
St. Louise de Marillac and Galley PrisonersSt. Louise de Marillac and Galley Prisoners
St. Louise de Marillac and Galley Prisoners
 
Genesis 1:10 || Meditate the Scripture daily verse by verse
Genesis 1:10  ||  Meditate the Scripture daily verse by verseGenesis 1:10  ||  Meditate the Scripture daily verse by verse
Genesis 1:10 || Meditate the Scripture daily verse by verse
 
Popular Kala Jadu, Black magic expert in Karachi and Kala jadu expert in Laho...
Popular Kala Jadu, Black magic expert in Karachi and Kala jadu expert in Laho...Popular Kala Jadu, Black magic expert in Karachi and Kala jadu expert in Laho...
Popular Kala Jadu, Black magic expert in Karachi and Kala jadu expert in Laho...
 
The_Chronological_Life_of_Christ_Part_99_Words_and_Works
The_Chronological_Life_of_Christ_Part_99_Words_and_WorksThe_Chronological_Life_of_Christ_Part_99_Words_and_Works
The_Chronological_Life_of_Christ_Part_99_Words_and_Works
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Human Design Gates Cheat Sheet | Kabastro.com
Human Design Gates Cheat Sheet | Kabastro.comHuman Design Gates Cheat Sheet | Kabastro.com
Human Design Gates Cheat Sheet | Kabastro.com
 
Famous No -1 amil baba in Hyderabad ! Best No _ Astrologer in Pakistan, UK, A...
Famous No -1 amil baba in Hyderabad ! Best No _ Astrologer in Pakistan, UK, A...Famous No -1 amil baba in Hyderabad ! Best No _ Astrologer in Pakistan, UK, A...
Famous No -1 amil baba in Hyderabad ! Best No _ Astrologer in Pakistan, UK, A...
 
Hire Best Next Js Developer For Your Project
Hire Best Next Js Developer For Your ProjectHire Best Next Js Developer For Your Project
Hire Best Next Js Developer For Your Project
 
St. John's Church Parish Magazine - May 2024
St. John's Church Parish Magazine - May 2024St. John's Church Parish Magazine - May 2024
St. John's Church Parish Magazine - May 2024
 
VADODARA CALL GIRL AVAILABLE 7568201473 call me
VADODARA CALL GIRL AVAILABLE 7568201473 call meVADODARA CALL GIRL AVAILABLE 7568201473 call me
VADODARA CALL GIRL AVAILABLE 7568201473 call me
 
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
 
St John's Church Parish Diary for May 2024
St John's Church Parish Diary for May 2024St John's Church Parish Diary for May 2024
St John's Church Parish Diary for May 2024
 
St. Louise de Marillac and Care of the Sick Poor
St. Louise de Marillac and Care of the Sick PoorSt. Louise de Marillac and Care of the Sick Poor
St. Louise de Marillac and Care of the Sick Poor
 

Mike davies sentiment_analysis_presentation_backup

  • 1. Sentiment Analysis 1. Discover a niche network of Twitter users 2. Model their emotions on topics 3. Use feelings to more accurately predict a time series e.g. The stock market e.g. Box office success 4. Are some [users/networks] more influential than others?
  • 2. This Talk  The Design Decision  The Core Goals  The 3 parts of the project: 1. Classifying the SENTIMENT of tweets 2. Building a NETWORK of twitter users 3. Finding a TIME SERIES of sentiment for each user
  • 3. Sentiment Analysis Used Already  Derwent Capital Markets - ”The twitter hedgefund”  £25m fund  10% of tweets  predicts Dow Jones movement direction with 87.6% accuracy  Returned 1.85% in its first month of trading  Johan Bollen, Indiana University, used bag-of- words approach
  • 4. Sentiment Analysis Used Already  Product reviews / ratings
  • 5. Sentiment Analysis Used Already  Social Media Analytics
  • 6. Design Decision  Many paragraphs of text (Product Reviews) + : Better accuracy of prediction - : Less data overall  Huge amount of small quantities of text (Twitter) + : Opinions of greater number of people & at high enough frequency to model as a signal - : Classification of opinion is v. poor => TWITTER
  • 7. 2 Current Aims (will change later) 1. Project aims to be context independent (i.e. Movies & products) 2. When context is given, use it to better classify tweets
  • 8. 1: Sentiment Analysis of Tweets  Three-tier classification process: tweet spam not spam objective subjective positive negative
  • 9. 1: Sentiment Analysis of Tweets  Double-Back Propagation Algorithm  ACL Journal, March 2011, MIT Press  Opinion Word Extraction & Target Extraction  4 rules  ”The phone has a good screen” => add ”good” to list of adjectives => add ”screen” to list of nouns  Etc.  Great for rating features of a product  Not great for tweets
  • 10. 1: Sentiment Analysis of Tweets  Twitter Part Of Speech (POS) tagger: www.ark.cs.cmu.edu/TweetNLP/  Written in java " ^ Drive ^  Max Ent " ^ , , go V and & watch V it O ! , Fantastic A movie N . ,
  • 11. Bootstrapped Tweet SA improver Tweet IMDB Movie Review Corpora Tweet Tweet Sentiment Analysis Tweet Double-Back Prop. Algo Tweet Tweet Gives useful adjectives, nouns Tweet
  • 12. 2: Building a Network  Collected my twitter friends, friends of friends, friends of friends of friends.  => 115,896 users
  • 13. 2: Building a Network
  • 14. 2: Building a Network  Community detection:  Paper 1: Near linear time algorithm for detecting community structures on large scale networks  Paper 2: An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks Haizheng Zhang
  • 15. 2: Building a Network  Like MapReduce  Instead of ”map” and ”reduce”  Map = 'Update': modify overlapping sets of data  Reduce = 'Sync': perform reductions in the background while sync is running  Label Propagation & LDA
  • 16. 3: Time series prediction  Will get time series from python to R using the rpy2 module  R has a great package ”quantmod” for importing financial market data.  Can also import other time series very easily & many great libraries.
  • 17. Built With  Python - For majority of code Packages: numpy, scipy, matplotlib networkx, graphviz, rpy2 django, twython, nltk  R - For time series analysis  Postgreql - SQL database  Java - Twitter POS tagger  C/C++ - GraphLab
  • 18. End Product IMDB Movie Review Corpora Tweet Tweet Sentiment Tweet Double-Back Analysis Prop. Algo Tweet Tweet
  • 19. Thank You  Mike Davies  Documented at www.m1ked.com
  • 20. Notes: Vowpal Wabbit LDA Vowpal Wabbit is an open source library for fast online learning (mostly SGD) mainly developed by a guy at Yahoo.  Optimised for speed  LDA uses clever tricks like vectorisation, floating point representation to avoid using pow() and exp() functions.
  • 21. Notes: Label Propagation  Label Propagation has been proven to be an effective semi-supervised learning approach in many applications. The key idea behind label propagation is to first construct a graph in which each node represents a data point and each edge is assigned a weight often computed as the similarity between data points, then propagate the class labels of labeled data to neighbors in the constructed graph in order to make predictions.