SlideShare a Scribd company logo
1 of 23
Text Analytics In Social
Media
Content
β€’ Introduction to text mining in relation with social media
β€’ Unique features of texts in social media
β€’ Applying Text analytics in social media
β€’ Example of text analytics in social media
Text Mining and
Social Media
The picture here shows the 10 top sites
that generates a lot of traffic. And
majority are under the social media
umbrella.
Social media can then be said to be a
medium whereby information and
communication can be accessed, shared
and discussed
Text Mining and
Social Media
Category Representatives sites
Wiki Wikipedia, Scholarpedia
Blogging Blogger, LiveJournal, Wordpress
Social news Digg, Briefing, Mixx, Slashdot
Micro Blogging Twitter, Google Buzz
Opinion & Reviews ePinions, Yelp
Question Answering Stack Overflow, Yahoo! Answers,
Quora
Media Sharing Flickr, Youtube
Social Bookmarking Delicious, CiteULike
Social Networking Facebook, LinkedIn, MySpace
The table shows the various
categories where we could classify
social media.
It contains various types of
services thereby resulting into
various kinds of data format.
The information in most social
media site are in text format.
Text Mining and Social Media
β€’ With the current trend of Data Mining techniques and Business intelligence
from data, this question arises relating to social media.
β€œHow can I get valuable information from the texts in
social media platform?”
Unique features of texts in social media
β€’ With different kind of social media, there would definitely be some distinct
characteristics of this text and how they occur.
β€’ Text Analytics describes a set of linguistic, statistical, and machine learning
techniques that model and structure the information content of textual
sources for business intelligence, exploratory data analysis, research, or
investigation
β€’ This section gives us a hint on how to answer our previous question.
Unique features of texts in social media
β€’ Text preprocessing is making the input more consistent to facilitate text
representation. text preprocessing methods include stop word removal and
stemming.
β€’ Features Generation/ Text Representation. The most common ways is to
transform them into numeric vectors. Its representation is called BOW or
VSM.
β€’ Knowledge Discovery: Where we apply machine learning or data mining
methods to discover pattern or insight.
Unique features of texts in social media
β€’ Time Sensitivity.
An important and common feature of many social media services is their real-
time nature. Bloggers may update their post every x nos of days but most
networking sites gets updates regularly like in minutes.
The text in social media is not an independent and identically distributed data
anymore due to the sensitivity and timeliness of the textual data.
Unique features of texts in social media
β€’ Short Length
As short messages enhances the participation of users on social media sites, it
poses a great challenge in mining with clustering or classification as a large
number of text provide sufficient context information for effective similarity
measure which is a basis for many text processing methods.
Example. Twitter is limited to 140 characters, Windows Live messenger is
limited to 512 characters but Facebook has 63,026 characters.
Unique features of texts in social media
β€’ Unstructured Phrases
The main challenge posed by content in social media sites is the fact that the
distribution of quality has high variance: from very high-quality items to low-
quality. This can be attributed to the people’s attitudes when posting a
microblogging message or answering a question in a forum.
The difficulty here is how to accurately identify the semantic meaning from
more than 1 word that’s been abbreviated.
Applying Text analytics in social media
β€’ Event detection
β€’ Event Detection aims to monitor a data source and detect the occurrence of an event
that is captured within that source
β€’ Collaborative Question Answering:
β€’ Analyzing the differences between conversational questions and informational
questions
Illustrative Example.
β€’ This example illustrates how to utilize text analytics to solve problems identified in
its application to social media.
β€’ We want to improve the short text representation quality by integrating semantic
knowledge resources found to be useful in dealing with the semantic gap.
β€’ This has 3 steps:
β€’ Seed Phase Extraction
β€’ Semantic features Generation
β€’ Feature Space Construction.
Seed Phase Extraction
β€’ Problem Statement
β€’ Given a sentence level feature T = {t1,t2,…tn}, the phrase levels ti contained in
T. The similarity between the ti and {t1,t2,…,tn} is given by:
InfoScore(ti) = 𝒋=𝟏,π’‹β‰ π’Š
𝒏
π’”π’†π’Ž(π’•π’Š, 𝒕𝒋)
t* = π’‚π’“π’ˆ 𝐦𝐚𝐱
π’•π’Š ∈{t1,t2,…tn}
𝑰𝒏𝒇𝒐𝑺𝒄𝒐𝒓𝒆(π’•π’Š)
Where t* is denoted as the phrasal level feature
Semantic features Generation
β€’ Now the seed phrases has been extracted in the first step.
β€’ What this steps aim to achieve is to generate semantic features on the seed
phrases. What the seed phrase has help us to do is to obtain an informative
and effective basic representation of the input text
β€’ We use Wikipedia as our target social media.
Algorithm
Problem Statement:
Given a set of seed phases from a
text corpus already preprocessed,
generates the semantic features
from the text.
Feature Space Construction
β€’ For the sake of data quality, effectiveness and valuable original information,
we conduct 2 more important basic steps in this process.
β€’ Feature filtering to refine meaningless features
β€’ Feature selection to avoid aggravating the β€œcurse of dimensionality”
Feature Space Construction
β€’ Feature Filtering
For the Wikipedia example, we formulate rules to refine the unstructured
features. Some rules could be
Remove features generated form too general seed phrases.
Transform features e.g List of hotels >>>hotels
Remove features related to chronology.
Feature Space Construction
β€’ Feature Selection
β€’ We need to select semantic features to construct feature space for various
tasks.
β€’ The number of needed features is determined by specific tasks.
Feature Space Construction
β€’ First we calculate the tf-idf weights of all generated features. term
frequency–inverse document frequency, is a numerical statistic that is
intended to reflect how important a word is to a document in a collection
or corpus.
β€’ One seed phrase may generate k semantic features denoted by {fi1,fi2,…,fik}.
β€’ The selection here is one seed phase, one feature
fi
* = arg max
𝑓𝑖𝑗
∈{𝑓𝑖1
,
𝑓𝑖2
,…,
𝑓𝑖𝑗}
𝑑𝑓_𝑖𝑑𝑓(𝑓𝑖𝑗)
Feature Space Construction
β€’ Second the top n features are extracted from the remaining semantic features
based on their frequency.
β€’ These frequently appearing features, together with the features from the first
step, are used to construct the m+n semantic features.
Finally
β€’ With all the processes, and the feature space generated, we can then apply
text clustering or any other text analytics methods.
β€’ In conclusion, though research is still intense on this subject, nevertheless
this short presentation has opened the way for us on how to apply text
analytics in social media resources.
References: [Aggarwal_C.,_Zhai_C._(eds.)]_Mining_Text_Data Ch. 12

More Related Content

What's hot

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningDataminingTools Inc
Β 
Text Data Mining
Text Data MiningText Data Mining
Text Data MiningKU Leuven
Β 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
Β 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
Β 
Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysisAnil Shrestha
Β 
Text summarization
Text summarizationText summarization
Text summarizationkareemhashem
Β 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
Β 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
Β 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
Β 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
Β 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
Β 
4.5 mining the worldwideweb
4.5 mining the worldwideweb4.5 mining the worldwideweb
4.5 mining the worldwidewebKrish_ver2
Β 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
Β 
Information filtering
Information filteringInformation filtering
Information filteringdikshagupta111
Β 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
Β 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
Β 

What's hot (20)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Β 
Text Data Mining
Text Data MiningText Data Mining
Text Data Mining
Β 
Text clustering
Text clusteringText clustering
Text clustering
Β 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Β 
Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysis
Β 
Text summarization
Text summarizationText summarization
Text summarization
Β 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
Β 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
Β 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
Β 
Text Classification
Text ClassificationText Classification
Text Classification
Β 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Β 
Data mining slides
Data mining slidesData mining slides
Data mining slides
Β 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
Β 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDFCS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
Β 
4.5 mining the worldwideweb
4.5 mining the worldwideweb4.5 mining the worldwideweb
4.5 mining the worldwideweb
Β 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
Β 
CS8080 information retrieval techniques unit iii ppt in pdf
CS8080 information retrieval techniques unit iii ppt in pdfCS8080 information retrieval techniques unit iii ppt in pdf
CS8080 information retrieval techniques unit iii ppt in pdf
Β 
Information filtering
Information filteringInformation filtering
Information filtering
Β 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Β 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
Β 

Viewers also liked

Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining Jeremiah Fadugba
Β 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...guest5b1607
Β 
Dialogue-Earth:-Mining-Social-Media
Dialogue-Earth:-Mining-Social-MediaDialogue-Earth:-Mining-Social-Media
Dialogue-Earth:-Mining-Social-MediaTom Masterman
Β 
Detecting insults in social media conversations
Detecting insults in social media conversationsDetecting insults in social media conversations
Detecting insults in social media conversationsraj
Β 
Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)Jie Bao
Β 
Social Media Text Analytics: Mining Value From Predictive Insights
Social Media Text Analytics: Mining Value From Predictive InsightsSocial Media Text Analytics: Mining Value From Predictive Insights
Social Media Text Analytics: Mining Value From Predictive InsightsJohn Blossom
Β 
The Creative Animal Goes Online (Part B)
The Creative Animal Goes Online (Part B)The Creative Animal Goes Online (Part B)
The Creative Animal Goes Online (Part B)Mitch Goodwin
Β 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Mediahome
Β 
Text mining of Social Network Data for Business Intelligence - iLabs camp
Text mining of Social Network Data for Business Intelligence - iLabs campText mining of Social Network Data for Business Intelligence - iLabs camp
Text mining of Social Network Data for Business Intelligence - iLabs campAnkit Sharma
Β 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering BigML, Inc
Β 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social mediaDiana Maynard
Β 
Media and Information Literacy (MIL) - 5. Media and Information Sources
Media and Information Literacy (MIL) - 5. Media and Information SourcesMedia and Information Literacy (MIL) - 5. Media and Information Sources
Media and Information Literacy (MIL) - 5. Media and Information SourcesArniel Ping
Β 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data MiningMahesh Meniya
Β 
Honey Pot
Honey PotHoney Pot
Honey Potiradarji
Β 
Text Mining in Social Media
Text Mining in Social MediaText Mining in Social Media
Text Mining in Social MediaManas Ranjan Kar
Β 

Viewers also liked (17)

Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
Β 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Β 
Dialogue-Earth:-Mining-Social-Media
Dialogue-Earth:-Mining-Social-MediaDialogue-Earth:-Mining-Social-Media
Dialogue-Earth:-Mining-Social-Media
Β 
Detecting insults in social media conversations
Detecting insults in social media conversationsDetecting insults in social media conversations
Detecting insults in social media conversations
Β 
Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)
Β 
Social Media Text Analytics: Mining Value From Predictive Insights
Social Media Text Analytics: Mining Value From Predictive InsightsSocial Media Text Analytics: Mining Value From Predictive Insights
Social Media Text Analytics: Mining Value From Predictive Insights
Β 
The Creative Animal Goes Online (Part B)
The Creative Animal Goes Online (Part B)The Creative Animal Goes Online (Part B)
The Creative Animal Goes Online (Part B)
Β 
Social Media Mining and Analytics
Social Media Mining and AnalyticsSocial Media Mining and Analytics
Social Media Mining and Analytics
Β 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
Β 
Social Media Mining and Retrieval
Social Media Mining and RetrievalSocial Media Mining and Retrieval
Social Media Mining and Retrieval
Β 
Text mining of Social Network Data for Business Intelligence - iLabs camp
Text mining of Social Network Data for Business Intelligence - iLabs campText mining of Social Network Data for Business Intelligence - iLabs camp
Text mining of Social Network Data for Business Intelligence - iLabs camp
Β 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
Β 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
Β 
Media and Information Literacy (MIL) - 5. Media and Information Sources
Media and Information Literacy (MIL) - 5. Media and Information SourcesMedia and Information Literacy (MIL) - 5. Media and Information Sources
Media and Information Literacy (MIL) - 5. Media and Information Sources
Β 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
Β 
Honey Pot
Honey PotHoney Pot
Honey Pot
Β 
Text Mining in Social Media
Text Mining in Social MediaText Mining in Social Media
Text Mining in Social Media
Β 

Similar to Text analytics in social media

Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using Rsantoshi mangalgi
Β 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsIRJET Journal
Β 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Β 
mt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPTmt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPTRamdan43
Β 
Applying Clustering Techniques for Efficient Text Mining in Twitter Data
Applying Clustering Techniques for Efficient Text Mining in Twitter DataApplying Clustering Techniques for Efficient Text Mining in Twitter Data
Applying Clustering Techniques for Efficient Text Mining in Twitter Dataijbuiiir1
Β 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignCommunitySense
Β 
A Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataIOSR Journals
Β 
Machine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptxMachine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptxarunchoubeybxr
Β 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMatthew Rowe
Β 
Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...siramatu-lab
Β 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud ComputingCarmen Sanborn
Β 
BEA 2015 Generating Metadata by Machine
BEA 2015 Generating Metadata by MachineBEA 2015 Generating Metadata by Machine
BEA 2015 Generating Metadata by MachineBowker
Β 
BEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalBEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalS. M. Hassan Zaidi
Β 
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence Marina Santini
Β 
Lecture 1
Lecture 1Lecture 1
Lecture 1neocremia
Β 

Similar to Text analytics in social media (20)

Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
Β 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
Β 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
Β 
mt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPTmt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPT
Β 
Text analytics
Text analyticsText analytics
Text analytics
Β 
Applying Clustering Techniques for Efficient Text Mining in Twitter Data
Applying Clustering Techniques for Efficient Text Mining in Twitter DataApplying Clustering Techniques for Efficient Text Mining in Twitter Data
Applying Clustering Techniques for Efficient Text Mining in Twitter Data
Β 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
Β 
A Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media Data
Β 
O017148084
O017148084O017148084
O017148084
Β 
Text Mining
Text MiningText Mining
Text Mining
Β 
Machine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptxMachine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptx
Β 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
Β 
Tldr
TldrTldr
Tldr
Β 
Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...
Β 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
Β 
BEA 2015 Generating Metadata by Machine
BEA 2015 Generating Metadata by MachineBEA 2015 Generating Metadata by Machine
BEA 2015 Generating Metadata by Machine
Β 
Text mining
Text miningText mining
Text mining
Β 
BEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalBEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine Final
Β 
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
Β 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Β 

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
Β 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
Β 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
Β 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Delhi Call girls
Β 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
Β 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
Β 
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
Β 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
Β 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...amitlee9823
Β 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
Β 
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...amitlee9823
Β 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
Β 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
Β 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
Β 

Recently uploaded (20)

Delhi Call Girls Punjabi Bagh 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Β 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
Β 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
Β 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
Β 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
Β 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
Β 
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Β 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
Β 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Β 
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Β 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
Β 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
Β 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
Β 
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service Ba...
Β 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Β 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
Β 
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Β 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
Β 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
Β 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Β 

Text analytics in social media

  • 1. Text Analytics In Social Media
  • 2. Content β€’ Introduction to text mining in relation with social media β€’ Unique features of texts in social media β€’ Applying Text analytics in social media β€’ Example of text analytics in social media
  • 3. Text Mining and Social Media The picture here shows the 10 top sites that generates a lot of traffic. And majority are under the social media umbrella. Social media can then be said to be a medium whereby information and communication can be accessed, shared and discussed
  • 4. Text Mining and Social Media Category Representatives sites Wiki Wikipedia, Scholarpedia Blogging Blogger, LiveJournal, Wordpress Social news Digg, Briefing, Mixx, Slashdot Micro Blogging Twitter, Google Buzz Opinion & Reviews ePinions, Yelp Question Answering Stack Overflow, Yahoo! Answers, Quora Media Sharing Flickr, Youtube Social Bookmarking Delicious, CiteULike Social Networking Facebook, LinkedIn, MySpace The table shows the various categories where we could classify social media. It contains various types of services thereby resulting into various kinds of data format. The information in most social media site are in text format.
  • 5. Text Mining and Social Media β€’ With the current trend of Data Mining techniques and Business intelligence from data, this question arises relating to social media. β€œHow can I get valuable information from the texts in social media platform?”
  • 6. Unique features of texts in social media β€’ With different kind of social media, there would definitely be some distinct characteristics of this text and how they occur. β€’ Text Analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation β€’ This section gives us a hint on how to answer our previous question.
  • 7.
  • 8. Unique features of texts in social media β€’ Text preprocessing is making the input more consistent to facilitate text representation. text preprocessing methods include stop word removal and stemming. β€’ Features Generation/ Text Representation. The most common ways is to transform them into numeric vectors. Its representation is called BOW or VSM. β€’ Knowledge Discovery: Where we apply machine learning or data mining methods to discover pattern or insight.
  • 9. Unique features of texts in social media β€’ Time Sensitivity. An important and common feature of many social media services is their real- time nature. Bloggers may update their post every x nos of days but most networking sites gets updates regularly like in minutes. The text in social media is not an independent and identically distributed data anymore due to the sensitivity and timeliness of the textual data.
  • 10. Unique features of texts in social media β€’ Short Length As short messages enhances the participation of users on social media sites, it poses a great challenge in mining with clustering or classification as a large number of text provide sufficient context information for effective similarity measure which is a basis for many text processing methods. Example. Twitter is limited to 140 characters, Windows Live messenger is limited to 512 characters but Facebook has 63,026 characters.
  • 11. Unique features of texts in social media β€’ Unstructured Phrases The main challenge posed by content in social media sites is the fact that the distribution of quality has high variance: from very high-quality items to low- quality. This can be attributed to the people’s attitudes when posting a microblogging message or answering a question in a forum. The difficulty here is how to accurately identify the semantic meaning from more than 1 word that’s been abbreviated.
  • 12. Applying Text analytics in social media β€’ Event detection β€’ Event Detection aims to monitor a data source and detect the occurrence of an event that is captured within that source β€’ Collaborative Question Answering: β€’ Analyzing the differences between conversational questions and informational questions
  • 13. Illustrative Example. β€’ This example illustrates how to utilize text analytics to solve problems identified in its application to social media. β€’ We want to improve the short text representation quality by integrating semantic knowledge resources found to be useful in dealing with the semantic gap. β€’ This has 3 steps: β€’ Seed Phase Extraction β€’ Semantic features Generation β€’ Feature Space Construction.
  • 14. Seed Phase Extraction β€’ Problem Statement β€’ Given a sentence level feature T = {t1,t2,…tn}, the phrase levels ti contained in T. The similarity between the ti and {t1,t2,…,tn} is given by: InfoScore(ti) = 𝒋=𝟏,π’‹β‰ π’Š 𝒏 π’”π’†π’Ž(π’•π’Š, 𝒕𝒋) t* = π’‚π’“π’ˆ 𝐦𝐚𝐱 π’•π’Š ∈{t1,t2,…tn} 𝑰𝒏𝒇𝒐𝑺𝒄𝒐𝒓𝒆(π’•π’Š) Where t* is denoted as the phrasal level feature
  • 15. Semantic features Generation β€’ Now the seed phrases has been extracted in the first step. β€’ What this steps aim to achieve is to generate semantic features on the seed phrases. What the seed phrase has help us to do is to obtain an informative and effective basic representation of the input text β€’ We use Wikipedia as our target social media.
  • 16. Algorithm Problem Statement: Given a set of seed phases from a text corpus already preprocessed, generates the semantic features from the text.
  • 17. Feature Space Construction β€’ For the sake of data quality, effectiveness and valuable original information, we conduct 2 more important basic steps in this process. β€’ Feature filtering to refine meaningless features β€’ Feature selection to avoid aggravating the β€œcurse of dimensionality”
  • 18. Feature Space Construction β€’ Feature Filtering For the Wikipedia example, we formulate rules to refine the unstructured features. Some rules could be Remove features generated form too general seed phrases. Transform features e.g List of hotels >>>hotels Remove features related to chronology.
  • 19. Feature Space Construction β€’ Feature Selection β€’ We need to select semantic features to construct feature space for various tasks. β€’ The number of needed features is determined by specific tasks.
  • 20. Feature Space Construction β€’ First we calculate the tf-idf weights of all generated features. term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. β€’ One seed phrase may generate k semantic features denoted by {fi1,fi2,…,fik}. β€’ The selection here is one seed phase, one feature fi * = arg max 𝑓𝑖𝑗 ∈{𝑓𝑖1 , 𝑓𝑖2 ,…, 𝑓𝑖𝑗} 𝑑𝑓_𝑖𝑑𝑓(𝑓𝑖𝑗)
  • 21. Feature Space Construction β€’ Second the top n features are extracted from the remaining semantic features based on their frequency. β€’ These frequently appearing features, together with the features from the first step, are used to construct the m+n semantic features.
  • 22. Finally β€’ With all the processes, and the feature space generated, we can then apply text clustering or any other text analytics methods. β€’ In conclusion, though research is still intense on this subject, nevertheless this short presentation has opened the way for us on how to apply text analytics in social media resources.