SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Sub-Topic Detection Of Tweets
Related To An Entity
International Institute of Information Technology-Hyderabad
Mentor - Sandeep Pannem
By
P Yashaswi (201102111) Aayush Asawa(201305617)
Kumari Ankita(201101161) Diksha J. Yadav(201125130)
Introduction
➢ Tweets are classified according to the “Topic” and then the “Subtopic” they
refer to.
○ “Topic” refers to any major event in the real world.
○ “Subtopics” are fine-grained aspects of such events.
➢ Mining subtopics from entities/topics from tweets helps in trend
analysis, social monitoring, topic tracking and reputation
mining.
➢ Generally all tweets related to a particular entity have similar keywords. So,
while detecting the subtopics will have to deal with more features.
Work Flow
Training Data
Store
features in
Lucene
Classifier
(Phase 1,2,3)
Detected
Subtopic
Extract
Tweet
features
Input Tweet
Approach
Input : Training set of tweets which have subtopic names as class labels.
Test tweets which are to be classified into subtopics
Output : Assign subtopics to each of the test tweets
The entire workflow can be broken into three phases :
1. Pre-processing
2. Feature Extraction and Representation
3. Classification.
Feature Extraction
The following features are extracted from each tweet :
➢ TweetConcepts (using TagMe API)
➢ Named entity and event phrases( using Twical)
➢ URLConcepts(using TagMe API on the content in the external links)
➢ Key Phrases(extracting noun phrases after POS tagging)
➢ Hash tags
➢ Categories(extracting categories for the titles got though TagMe)
Similarity Measures used :
➢ Wikipedia miner(for comparing wikipedia titles)
➢ Wordnet similarity measure(to compare key phrases)
Classification
➢ Subtopic detection is considered as a classification problem where
subtopics are the class labels for the tweets which are the data points.
➢ The classifier derives logic from what features majority of the tweet
(datapoints) of a particular subtopic(class label) have.
➢ Based on the features initial seed clusters are created for each topic and
each cluster is represented as crisp information and index.
➢ The features of test tweets are found and compared with the clusters, and
then a cluster to which it best matches is assigned to the test tweet.
➢ This is done using Machine Learning technique.
Pre-Processing
Pre-processing involves the following steps :
➢ Removal of stopwords from the tweets and stemming from the training
data points.
➢ Extracting URLS from the tweets.
This is done for both training and test tweets.
Algorithm
Offline Process
1. All the tweets in the training data are grouped together according to their
sub topic
2. For every tweet in a subtopic, the features are extracted and are grouped to
form subtopic features.
3. The subtopic features of all the subtopic are stored in the lucene index
under different fields.
4. All those features that are common in two or more subtopics are removed,
also those features are removed that are directly related to the entity name.
Algorithm
Online Procedure
1. Phase 1 : The category features of the test tweet are searched in the lucene
index and the top 10 subtopics are listed.
2. Phase 2 : The tweet concepts and URL concepts of test tweet are compared
with that of the top 10 subtopics from Phase 1 and top 5 subtopics are
listed based on wikipedia miner similarity measure.
3. Phase 3 : NER, Key phrases, event phrases are compared with the top 5
category list from phase 2 using wordnet similarity measures. For hash tags
direct intersection is done .After this the best of 5 subtopics is chosen
All these can also be clubbed together to get the best subtopic
Experiments
➢ RepLab 2013 data set was used. The dataset contains tweets for 61entities.
Each entity has about 700 tweets for training and 1500 tweets for testing.
➢ For evaluation we use Reliability ,Sensitivity and F Measure.
The results that we got for the entity “Volvo” are:
Sensitivity : 0.37 , Reliability : 0.39 F measure : 0.38
Future Work
➢ We can build an SVM classifier which can accurately determine which
feature has to be given preference while classifying the tweets
➢ The input vectors would have dimensions as various features of various
subtopics with the corresponding similarity measures as the coefficients ,
where the labelled subtopic is the class label
➢ In the testing phase we can create similar vectors for test tweets to get their
corresponding subtopics
Reference
1. REINA at RepLab2013 Topic Detection Task: Community Detection
2. Entity Tracking in Real-Time using Sub-Topic Detection on Twitter

Contenu connexe

Tendances

Tweets Classification
Tweets ClassificationTweets Classification
Tweets ClassificationVarun Gupta
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learningVenkat Projects
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
Trend detection and analysis on Twitter
Trend detection and analysis on TwitterTrend detection and analysis on Twitter
Trend detection and analysis on TwitterLukas Masuch
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using pythonCloudTechnologies
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis reportSavio Aberneithie
 
포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804Amir Goudarzi
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysisprathako
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7ashhadiqbal
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets🧑‍💻 Manuel Coppotelli
 

Tendances (18)

Tweets Classification
Tweets ClassificationTweets Classification
Tweets Classification
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learning
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Trend detection and analysis on Twitter
Trend detection and analysis on TwitterTrend detection and analysis on Twitter
Trend detection and analysis on Twitter
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using python
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Combined queries
Combined queriesCombined queries
Combined queries
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Mule filters
Mule filtersMule filters
Mule filters
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 

En vedette

Harnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsHarnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsGabriela Agustini
 
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in TweetsExploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in TweetsM. Atif Qureshi
 
Classifying Microblogs For Disasters
Classifying Microblogs For DisastersClassifying Microblogs For Disasters
Classifying Microblogs For DisastersSarvnaz Karimi
 
Discovering Context
Discovering ContextDiscovering Context
Discovering ContextYegin Genc
 
Semantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports TweetsSemantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports Tweetsmitsmit
 
warblecamp - twical
warblecamp - twical warblecamp - twical
warblecamp - twical Angus Fox
 
London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010Angus Fox
 
CLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETSCLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETSMukul Jha
 
Dan Foote Slide Show
Dan Foote Slide ShowDan Foote Slide Show
Dan Foote Slide ShowDan Foote
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes ClassifiersDongseo University
 

En vedette (12)

Harnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsHarnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of Tweets
 
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in TweetsExploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
 
Classifying Microblogs For Disasters
Classifying Microblogs For DisastersClassifying Microblogs For Disasters
Classifying Microblogs For Disasters
 
Discovering Context
Discovering ContextDiscovering Context
Discovering Context
 
Semantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports TweetsSemantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports Tweets
 
warblecamp - twical
warblecamp - twical warblecamp - twical
warblecamp - twical
 
London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010
 
CLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETSCLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETS
 
Dan Foote Slide Show
Dan Foote Slide ShowDan Foote Slide Show
Dan Foote Slide Show
 
Twitter API Annotations
Twitter API AnnotationsTwitter API Annotations
Twitter API Annotations
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
 

Similaire à SubTopic Detection of Tweets Related to an Entity

Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsS M Raju
 
sentimentanaly 2.pdf
sentimentanaly 2.pdfsentimentanaly 2.pdf
sentimentanaly 2.pdfvisheshs4
 
19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On TwitterShashank S
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Social Sensor for Real Time Event Detection
Social Sensor for Real Time Event DetectionSocial Sensor for Real Time Event Detection
Social Sensor for Real Time Event DetectionIJERA Editor
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...IRJET Journal
 
Twitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitterTwitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitterVenkat Projects
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSubarno Pal
 
IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37manish jindal
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection ModelIRJET Journal
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State Universitydhabalia
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexEric Tham
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET Journal
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionTarekMourad8
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeXin Ye
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET Journal
 

Similaire à SubTopic Detection of Tweets Related to an Entity (20)

Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 
sentimentanaly 2.pdf
sentimentanaly 2.pdfsentimentanaly 2.pdf
sentimentanaly 2.pdf
 
Q01741118123
Q01741118123Q01741118123
Q01741118123
 
19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Ire major project
Ire major projectIre major project
Ire major project
 
Social Sensor for Real Time Event Detection
Social Sensor for Real Time Event DetectionSocial Sensor for Real Time Event Detection
Social Sensor for Real Time Event Detection
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
Twitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitterTwitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitter
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37
 
Internship
InternshipInternship
Internship
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ index
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information Extraction
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data Analysis
 

Dernier

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 

Dernier (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 

SubTopic Detection of Tweets Related to an Entity

  • 1. Sub-Topic Detection Of Tweets Related To An Entity International Institute of Information Technology-Hyderabad Mentor - Sandeep Pannem By P Yashaswi (201102111) Aayush Asawa(201305617) Kumari Ankita(201101161) Diksha J. Yadav(201125130)
  • 2. Introduction ➢ Tweets are classified according to the “Topic” and then the “Subtopic” they refer to. ○ “Topic” refers to any major event in the real world. ○ “Subtopics” are fine-grained aspects of such events. ➢ Mining subtopics from entities/topics from tweets helps in trend analysis, social monitoring, topic tracking and reputation mining. ➢ Generally all tweets related to a particular entity have similar keywords. So, while detecting the subtopics will have to deal with more features.
  • 3. Work Flow Training Data Store features in Lucene Classifier (Phase 1,2,3) Detected Subtopic Extract Tweet features Input Tweet
  • 4. Approach Input : Training set of tweets which have subtopic names as class labels. Test tweets which are to be classified into subtopics Output : Assign subtopics to each of the test tweets The entire workflow can be broken into three phases : 1. Pre-processing 2. Feature Extraction and Representation 3. Classification.
  • 5. Feature Extraction The following features are extracted from each tweet : ➢ TweetConcepts (using TagMe API) ➢ Named entity and event phrases( using Twical) ➢ URLConcepts(using TagMe API on the content in the external links) ➢ Key Phrases(extracting noun phrases after POS tagging) ➢ Hash tags ➢ Categories(extracting categories for the titles got though TagMe) Similarity Measures used : ➢ Wikipedia miner(for comparing wikipedia titles) ➢ Wordnet similarity measure(to compare key phrases)
  • 6. Classification ➢ Subtopic detection is considered as a classification problem where subtopics are the class labels for the tweets which are the data points. ➢ The classifier derives logic from what features majority of the tweet (datapoints) of a particular subtopic(class label) have. ➢ Based on the features initial seed clusters are created for each topic and each cluster is represented as crisp information and index. ➢ The features of test tweets are found and compared with the clusters, and then a cluster to which it best matches is assigned to the test tweet. ➢ This is done using Machine Learning technique.
  • 7. Pre-Processing Pre-processing involves the following steps : ➢ Removal of stopwords from the tweets and stemming from the training data points. ➢ Extracting URLS from the tweets. This is done for both training and test tweets.
  • 8. Algorithm Offline Process 1. All the tweets in the training data are grouped together according to their sub topic 2. For every tweet in a subtopic, the features are extracted and are grouped to form subtopic features. 3. The subtopic features of all the subtopic are stored in the lucene index under different fields. 4. All those features that are common in two or more subtopics are removed, also those features are removed that are directly related to the entity name.
  • 9. Algorithm Online Procedure 1. Phase 1 : The category features of the test tweet are searched in the lucene index and the top 10 subtopics are listed. 2. Phase 2 : The tweet concepts and URL concepts of test tweet are compared with that of the top 10 subtopics from Phase 1 and top 5 subtopics are listed based on wikipedia miner similarity measure. 3. Phase 3 : NER, Key phrases, event phrases are compared with the top 5 category list from phase 2 using wordnet similarity measures. For hash tags direct intersection is done .After this the best of 5 subtopics is chosen All these can also be clubbed together to get the best subtopic
  • 10. Experiments ➢ RepLab 2013 data set was used. The dataset contains tweets for 61entities. Each entity has about 700 tweets for training and 1500 tweets for testing. ➢ For evaluation we use Reliability ,Sensitivity and F Measure. The results that we got for the entity “Volvo” are: Sensitivity : 0.37 , Reliability : 0.39 F measure : 0.38
  • 11. Future Work ➢ We can build an SVM classifier which can accurately determine which feature has to be given preference while classifying the tweets ➢ The input vectors would have dimensions as various features of various subtopics with the corresponding similarity measures as the coefficients , where the labelled subtopic is the class label ➢ In the testing phase we can create similar vectors for test tweets to get their corresponding subtopics
  • 12. Reference 1. REINA at RepLab2013 Topic Detection Task: Community Detection 2. Entity Tracking in Real-Time using Sub-Topic Detection on Twitter