1. Understanding
User-generated Content on
Social Media
Meena Nagarajan
Ph.D. Dissertation Defense
Kno.e.sis Center, College of Engineering and Computer Science
Wright State University
1
4. Social Information Needs
• Can we use this information to assess a
population’s preference?
• Can we study how these preferences propagate
in a network of friends?
• Are such crowd-sourced preferences a good
substitute for traditional polling methods?
4
5. Social Information Processing
• "Who says what, to whom, why, to what extent and
with what effect?" [Laswell]
• Network: Social structure emerges !"
!"
from the aggregate of relationships (ties)
• People: poster identities, the active effort
of accomplishing interaction
• Content : studying the content of communication.
ABOUTNESS of textual user-generated content
via the lens of TEXT MINING
5
6. Aboutness Of Text
• One among several terms used to express certain
attributes of a discourse, text or document
• characterizing what a document is about, what its
content, subject or topic matter are
• A central component of knowledge organization
and information retrieval
• For machine and human consumption
6
7. Aboutness & Subgoals in IE
• Named entity recognition
• Co-reference, anaphora resolution
• "International Business Machines" and "IBM"; ‘he’ in a passage
refers to the mention of ‘John Smith’
• Terminology, key-phrase, lexical chain extraction
• Relationship and fact extraction
• ‘person works for organization’
7
8. Text Mining and Aboutness
• Thesis focus: `Aboutness’ understanding via
Text Mining
• Gleaning meaningful information from natural
language text useful for particular purposes
• Indicators of thematic elements for aboutness
• via NER, Key phrase extraction
8
9. Aboutness & The Role Of
Context
• Extracting thematic elements : interpretation of
the individual elements in context.
• (a) I can hear bass sounds. (b) They like grilled bass.
• Typical context cues that are employed
• Word Associations, Linguistic Cues, Syntactic,
Structural Cues, Knowledge Sources..
9
10. User-generated content on Twitter during the 2009 Iran Election
show support for democracy in Iran
add green overlay to your Twitter avatar with 1-click - http://helpiranelection.com/
Twitition: Google Earth to update satellite images of Tehran
#Iranelection http://twitition.com/csfeo @patrickaltoft
Set your location to Tehran and your time zone to GMT +3.30.
Security forces are hunting for bloggers using location/timezone searches
User comments on music artist pages on MySpace
Your music is really bangin!
You’re a genius! Keep droppin bombs!
u doin it up 4 real. i really love the album.
hey just hittin you up showin love to one of
chi-town’s own. MADD LOVE.
Comments on Weblogs about movies and video games
I decided to check out Wanted demo today even though I really did not like the movie
It was THE HANGOVER of the year..lasted forever..
so I went to the movies..bad choice picking GI Jane worse now
Excerpt from a blog around the 2009 Health Care Reform debate
Hawaii’s Lessons - NY times
In Hawaii’s Health System, Lessons for Lawmakers
Since 1974, Hawaii has required all employers to provide relatively generous
health care benefits to any employee who works 20 hours a week or more. If
health care legislation passes in Congress, the rest of the country may barely
catch up.Lawmakers working on a national health care fix have much to learn
from the past 35 years in Hawaii, President Obama’s native state.
Among the most important lessons is that even small steps to change the sys-
10
tem can have lasting effects on health. Another is that, once benefits are en-
11. User-generated content on Twitter during the 2009 Iran Election
show support for democracy in Iran
Unmediated Interpersonal
add green overlay to your Twitter avatar with 1-click - http://helpiranelection.com/ communication
Twitition: Google Earth to update satellite images of Tehran
#Iranelection http://twitition.com/csfeo @patrickaltoft
Set your location to Tehran and your time zone to GMT +3.30. Informal English Domain
Security forces are hunting for bloggers using location/timezone searches
User comments on music artist pages on MySpace
Your music is really bangin!
You’re a genius! Keep droppin bombs!
u doin it up 4 real. i really love the album.
hey just hittin you up showin love to one of
chi-town’s own. MADD LOVE.
Comments on Weblogs about movies and video games
I decided to check out Wanted demo today even though I really did not like the movie
It was THE HANGOVER of the year..lasted forever..
so I went to the movies..bad choice picking GI Jane worse now
Excerpt from a blog around the 2009 Health Care Reform debate
Hawaii’s Lessons - NY times
In Hawaii’s Health System, Lessons for Lawmakers
Since 1974, Hawaii has required all employers to provide relatively generous
health care benefits to any employee who works 20 hours a week or more. If
health care legislation passes in Congress, the rest of the country may barely
catch up.Lawmakers working on a national health care fix have much to learn
from the past 35 years in Hawaii, President Obama’s native state.
Among the most important lessons is that even small steps to change the sys-
10
tem can have lasting effects on health. Another is that, once benefits are en-
12. User-generated content on Twitter during the 2009 Iran Election
show support for democracy in Iran
Unmediated Interpersonal
add green overlay to your Twitter avatar with 1-click - http://helpiranelection.com/ communication
Twitition: Google Earth to update satellite images of Tehran
#Iranelection http://twitition.com/csfeo @patrickaltoft
Set your location to Tehran and your time zone to GMT +3.30. Informal English Domain
Security forces are hunting for bloggers using location/timezone searches
User comments on music artist pages on MySpace
Context is implicit
Your music is really bangin!
You’re a genius! Keep droppin bombs! Interactions between like-minded
u doin it up 4 real. i really love the album.
hey just hittin you up showin love to one of
people
chi-town’s own. MADD LOVE.
Comments on Weblogs about movies and video games
I decided to check out Wanted demo today even though I really did not like the movie
It was THE HANGOVER of the year..lasted forever..
so I went to the movies..bad choice picking GI Jane worse now
Excerpt from a blog around the 2009 Health Care Reform debate
Hawaii’s Lessons - NY times
In Hawaii’s Health System, Lessons for Lawmakers
Since 1974, Hawaii has required all employers to provide relatively generous
health care benefits to any employee who works 20 hours a week or more. If
health care legislation passes in Congress, the rest of the country may barely
catch up.Lawmakers working on a national health care fix have much to learn
from the past 35 years in Hawaii, President Obama’s native state.
Among the most important lessons is that even small steps to change the sys-
10
tem can have lasting effects on health. Another is that, once benefits are en-
13. User-generated content on Twitter during the 2009 Iran Election
show support for democracy in Iran
Unmediated Interpersonal
add green overlay to your Twitter avatar with 1-click - http://helpiranelection.com/ communication
Twitition: Google Earth to update satellite images of Tehran
#Iranelection http://twitition.com/csfeo @patrickaltoft
Set your location to Tehran and your time zone to GMT +3.30. Informal English Domain
Security forces are hunting for bloggers using location/timezone searches
User comments on music artist pages on MySpace
Context is implicit
Your music is really bangin!
You’re a genius! Keep droppin bombs! Interactions between like-minded
u doin it up 4 real. i really love the album.
hey just hittin you up showin love to one of
people
chi-town’s own. MADD LOVE.
Comments on Weblogs about movies and video games Variations and creativity in
I decided to check out Wanted demo today even though I really did not like the movie expression
It was THE HANGOVER of the year..lasted forever..
so I went to the movies..bad choice picking GI Jane worse now
Properties of the medium
Excerpt from a blog around the 2009 Health Care Reform debate
Hawaii’s Lessons - NY times
In Hawaii’s Health System, Lessons for Lawmakers
Since 1974, Hawaii has required all employers to provide relatively generous
health care benefits to any employee who works 20 hours a week or more. If
health care legislation passes in Congress, the rest of the country may barely
catch up.Lawmakers working on a national health care fix have much to learn
from the past 35 years in Hawaii, President Obama’s native state.
Among the most important lessons is that even small steps to change the sys-
10
tem can have lasting effects on health. Another is that, once benefits are en-
14. User-generated content on Twitter during the 2009 Iran Election
show support for democracy in Iran
Unmediated Interpersonal
add green overlay to your Twitter avatar with 1-click - http://helpiranelection.com/ communication
Twitition: Google Earth to update satellite images of Tehran
#Iranelection http://twitition.com/csfeo @patrickaltoft
Set your location to Tehran and your time zone to GMT +3.30. Informal English Domain
Security forces are hunting for bloggers using location/timezone searches
User comments on music artist pages on MySpace
Context is implicit
Your music is really bangin!
You’re a genius! Keep droppin bombs! Interactions between like-minded
u doin it up 4 real. i really love the album.
hey just hittin you up showin love to one of
people
chi-town’s own. MADD LOVE.
Comments on Weblogs about movies and video games Variations and creativity in
I decided to check out Wanted demo today even though I really did not like the movie expression
It was THE HANGOVER of the year..lasted forever..
so I went to the movies..bad choice picking GI Jane worse now
Properties of the medium
Excerpt from a blog around the 2009 Health Care Reform debate
Hawaii’s Lessons - NY times
In Hawaii’s Health System, Lessons for Lawmakers
Since 1974, Hawaii has required all employers to provide relatively generous
One solution rarely fits all
health care benefits to any employee who works 20 hours a week or more. If
health care legislation passes in Congress, the rest of the country may barely
catch up.Lawmakers working on a national health care fix have much to learn
social media content
from the past 35 years in Hawaii, President Obama’s native state.
Among the most important lessons is that even small steps to change the sys-
10
tem can have lasting effects on health. Another is that, once benefits are en-
15. Thesis Contributions
• Compensating for informal highly variable
language, lack of context
• Examining usefulness of multiple context cues
for text mining algorithms
• Context cues: Document corpus, syntactic,
structural cues, social medium and external domain
knowledge
• End goal: NER, Key Phrase Extraction
11
16. Thesis Statements
• We show that for 2 Aboutness Understanding
tasks -- NER, Key Phrase Extraction
• Multiple contextual information can supplement
and improve the reliability and performance of
existing NLP/ML algorithms
• Improvements tend to be robust across domains
and data sources
12
17. Thesis Contributions
Task : Aboutness of text
NER - Movie Names NER - Music Album/Track names
Context Cues
External
Knowledge
Sources I loved your music Yesterday!
“It was THE HANGOVER of the year..lasted
forever.. so I went to the movies..bad choice
Medium picking “GI Jane” worse now”
Metadata,
Structural cues
In Content
Weblogs MySpace Music Forum
13
Text Formality
18. Thesis Contributions
Task : Aboutness of text
NER - Movie Names NER - Music Album/Track names
Context Cues
External
Knowledge
Wikipedia Infoboxes
Sources
Medium Blog URL, Title, Post URL
Metadata,
Structural cues
Word Associations
from large corpora
In Content
Weblogs MySpace Music Forum
13
Text Formality
19. Thesis Contributions
Task : Aboutness of text
NER - Movie Names NER - Music Album/Track names
Context Cues
External Music Brainz,
Wikipedia Infoboxes
Knowledge UrbanDictionary
Sources
Medium Blog URL, Title, Post URL Page URL
Metadata,
Structural cues
Word associations from large
Word Associations
corpora, POS Tags, Syntactic
from large corpora
In Content
Dependencies
Weblogs MySpace Music Forum
13
Text Formality
20. Thesis Contributions
observations (or documents) made by users about an entity, event or topic of interest.
The primary motivation is to obtain an abstraction of a social phenomenon that makes volumes
Task : Aboutness of text
of unstructured user-generated content easily consumable by humans and agents alike. As an
example of the goals of our work, Table 4.1 shows key phrases extracted from online discussions
Key Phrase Extraction Key Phrase Elimination
Context Cues
around the 2009 Health Care Reform debate and the 2008 Mumbai terror attack, summarizing
hundreds of user comments to give a sense of what the population cared about on a particular day.
External
Knowledge 2009 Health Care Reform 2008 Mumbai Terror Attack
Sources
Health care debate Foreign relations perspective
Healthcare staffing problem Indian prime minister speech
Obamacare facts UK indicating support
Healthcare protestors Country of India
Party ratings plummet Rejected evidence provided
Medium Public option Photographers capture images of Mumbai
Metadata,
Structural cues
Table 4.1: Showing summary key phrases extracted from more than 500 online posts on Twitter
around two news-worthy events on a single day.
In Content Solutions to key phrase extraction have ranged from both unsupervised techniques that are
based on heuristics to identify phrases and supervised learning approaches that learn from human
Twitter Facebook, MySpace Forums
14 105 Text Formality
21. Thesis Contributions
Task : Aboutness of text
Key Phrase Extraction Key Phrase Elimination
Context Cues
External
Knowledge
Sources
Medium spatial, temporal metadata
Metadata,
Structural cues
n-grams for thematic cues
In Content
Twitter Facebook, MySpace Forums
14
Text Formality
22. Thesis Contributions
Task : Aboutness of text
Key Phrase Extraction Key Phrase Elimination
Context Cues
External Seeds from a Domain
Knowledge
Sources
Knowledge base
Medium spatial, temporal metadata Page Title
Metadata,
Structural cues
Word associations from large
n-grams for thematic cues
corpora
In Content
Twitter Facebook, MySpace Forums
14
Text Formality
24. Thesis Contributions
Building Social Intelligence Applications
WHY
WHAT WHO
WHERE
HOW
WHEN
Social Intelligence Applications
1. Application of NER results : BBC Sound Index with IBM Almaden
2. Application of Key Phrase Extraction : Twitris @ Knoesis
Building on results of NER, Key Phrase Extraction
25. Thesis Significance, Impact
• Focuses on relatively less explored content aspects
of expression on social media platforms
• Why text on social media is different from what
most text mining applications have focused on
• Combination of top-down, bottom-up analysis
for informal text
• Statistical NLP, ML algorithms over large corpora
• Models and rich knowledge bases in a domain
16
26. TALK OUTLINE - In Detail
ABOUTNESS UNDERSTANDING
• Named Entity Identification in Informal Text
TALK OUTLINE - Overviews
• Topical Key Phrase Extraction from Informal Text
• Applications and Consequences of Understanding
content : Social Intelligence Application
• BBC SoundIndex, Twitris
17
27. Named Entity Recognition
I loved your music Yesterday!
“It was THE HANGOVER of the year..lasted
forever..
so I went to the movies..bad choice picking “GI
Jane” worse now”
18
28. Thesis Contributions
Predominant Focus of Prior
Thesis Focus
Work
Entity Type Focus : PER, LOC, Entity Type Focus: Cultural
ORGN, DATE, TIME.. [TREC] Entities
Method: Spot and Disambiguate
Method: Sequential Labeling
(pre-supposed knowledge)
Document Types: Scientific Document Types: Social Media
Literature, News, Blogs (formal) Content, Blogs, MySpace Forums
Features: Word-Level Features, List- Features: Word-Level Features, List-
lookup Features, Documents and lookup Features, Documents and
corpus features corpus features
19
29. Cultural Named Entities
• NER focus in my work: Cultural Named Entities
• Name of a books, music albums, films, video
games, etc.
• The Lord of the Rings, Lips, Crash, Up, Wanted,
Today, Twilight, Dark Knight...
• Common words in a language
20
30. Characteristics of Cultural Entities
• Varied senses, several poorly documented
• Merry Christmas covered by 60+ artists
Star Trek: movies, tv series, media franchise.. and cuisines !!
• Changing contexts with recent events
• The Dark Knight reference to Obama, health care reform
• Unrealistic expectations: Comprehensive sense definitions,
enumeration of contexts, labeled corpora for all senses ..
21
31. Characteristics of Cultural Entities
• Varied senses, several poorly documented
• Merry Christmas covered by 60+ artists
Star Trek: movies, tv series, media franchise.. and cuisines !!
• Changing contexts with recent events
• The Dark Knight reference to Obama, health care reform
• Unrealistic expectations: Comprehensive sense definitions,
enumeration of contexts, labeled corpora for all senses ..
NER Relaxing the closed-world sense assumptions
21
32. Thesis Contributions
Predominant Focus of Prior
Thesis Focus
Work
Entity Types : PER, LOC, ORGN,
Entity Type Focus: Cultural Entities
DATE, TIME
Method: Spot and Disambiguate
Method: Sequential Labeling
(pre-supposed knowledge)
Document Types: Scientific Document Types: Social Media
Literature, News, Blogs (formal) Content, Blogs, MySpace Forums
Features: Word-Level Features, List- Features: Word-Level Features, List-
lookup Features, Documents and lookup Features, Documents and
corpus features corpus features
22
33. A Spot and Disambiguate Paradigm
• NER generally a sequential prediction problem
• NER system that achieves 90.8 F1 score on the CoNLL-2003 NER
shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth]
• My approach: Spot and Disambiguate Paradigm
• Dictionary or list of entities we want to spot
• Disambiguate in context (natural language, domain
knowledge cues)
• Binary Classification
23
34. Thesis Contributions
Predominant Focus of Prior
Thesis Focus
Work
Entity Types : PER, LOC, ORGN,
Entity Type Focus: Cultural Entities
DATE, TIME
Method: Spot and Disambiguate
Method: Sequential Labeling
(pre-supposed knowledge)
Document Types: Informal Social
Document Types: Scientific
Media Content, Blogs, MySpace
Literature, News, Blogs (formal)
Forums, Twitter, Facebook
Features: SENSE BIASED Word-
Features: Word-Level Features,
Level Features, List-lookup
List-lookup Features, Documents
Features, Documents and corpus
and corpus features 24 features
35. NER Algorithmic Contributions
Supervised, Two Flavors
3.2. THESIS FOCUS - CULTURAL NER IN INFORMAL TEXT August 10, 20
(a) Multiple Senses in the same Music Domain
Bands with a song “Merry Christmas” 60
Songs with “Yesterday” in the title 3,600
Releases of “American Pie” 195
Artists covering “American Pie” 31
(b) Multiple senses in different domains for the same movie entities
Twilight Novel, Film, Short story, Albums, Places, Comics, Poem, Time of day
Transformers Electronic device, Film, Comic book series, Album, Song, Toy Line
The Dark Knight Nickname for comic superhero Batman, Film, Soundtrack, Video game,
Themed roller coaster ride
Table 3.3: Challenging Aspects of Cultural Named Entities
“I am watching Pattinson scenes in <movie id=2341>Twilight</movie> for the nth time.”
“I spent a romantic evening watching the Twilight by the bay..”
3.2.3 love <artist id=357688>Lilyʼs</artist> song <track id=8513722>smile</track>”.
“I
Two Approaches to Cultural NER
25
37. Approach 1: Multiple
Senses, Multiple Domains
• When a Cultural entity appears in multiple
senses across domains in the same corpus
3.3. CULTURAL NER – MULTIPLE SENSES ACROSS MULTIPLE DOMAINS
August 10, 2010
Title: Peter Cullen Talks Transformers: War for Cybertron
Recently, we heard legendary Transformers voice actor Peter
Cullen talk not only about becoming an hero to millions for his
portrayal of the heroic Autobot leader, Optimus Prime, but also
about being the first person to play the role of video game icon
Mario. But today, he focuses more on the recent Transformers
video game release, War for Cybertron.
27
Following are some excerpts from an interview Cullen recently
38. Algorithm Preliminaries
• Problem Space
• Corpus: Weblogs, Distribution: unknown
• All senses of a cultural entity: unknown
• Problem Definition
• Input: A target Sense (e.g., movie); List of Entities to
be extracted
• Goal: Disambiguating every entity’s mention as
related to target sense or not
28
39. Contribution: Improving NER -
feature-based approach
• Improving classifiers using a novel feature
• the “complexity of extraction” in a target sense
• Hypothesis: knowing how hard or easy it is to extract this
entity in a particular sense will improve extraction accuracy
of learners
29
40. Contribution: Improving NER -
feature-based approach
• Improving classifiers using a novel feature
• the “complexity of extraction” in a target sense
• Hypothesis: knowing how hard or easy it is to extract this
entity in a particular sense will improve extraction accuracy
of learners
• Making classifiers ‘complexity aware’
• ‘The Curious Case of Benjamin Button’ vs. ‘Wanted’
29
41. Overview
List of movies to extract
The Curious Case of
Benjamin Button
Twilight
Date Night
Death at a Funeral
The Last Song
Up
Angels and Demons
Sample Population
Uncharacterized population (blog corpus), target sense (movies)
42. Overview
List of movies to extract
The Curious Case of
Benjamin Button
Twilight
Date Night
Death at a Funeral The Curious Case of Benjamin Button
The Last Song
Up
Angels and Demons
Sample Population
Uncharacterized population (blog corpus), target sense (movies)
43. Overview
List of movies to extract Entity Complexity of Extraction
The Curious Case of Benjamin Button 0.2
The Curious Case of
Benjamin Button
Twilight
Date Night
Death at a Funeral
The Last Song
Up
Angels and Demons
Sample Population
Uncharacterized population (blog corpus), target sense (movies)
44. Overview
List of movies to extract Entity Complexity of Extraction
The Curious Case of Benjamin Button 0.2
The Curious Case of
Benjamin Button
Twilight
Date Night
Death at a Funeral Date Night
The Last Song
Up
Angels and Demons
Sample Population
Uncharacterized population (blog corpus), target sense (movies)
45. Overview
List of movies to extract Entity Complexity of Extraction
The Curious Case of Benjamin Button 0.2
The Curious Case of
Date Night 0.5
Benjamin Button
Twilight
Date Night
Death at a Funeral
The Last Song
Up
Angels and Demons
Sample Population
Uncharacterized population (blog corpus), target sense (movies)
46. Overview
List of movies to extract Entity Complexity of Extraction
The Curious Case of Benjamin Button 0.2
The Curious Case of
Date Night 0.5
Benjamin Button
Twilight
Use Complexity of Extraction as a feature in
Date Night
Death at a Funeral named entity classifiers
The Last Song
Up
Angels and Demons
Sample Population
Uncharacterized population (blog corpus), target sense (movies)
47. Overview
List of movies to extract Entity Complexity of Extraction
The Curious Case of Benjamin Button 0.2
The Curious Case of
Date Night 0.5
Benjamin Button
Twilight
Use Complexity of Extraction as a feature in
Date Night
Death at a Funeral named entity classifiers
The Last Song
Up
Angels and Demons
Sample Population
Uncharacterized population (blog corpus), target sense (movies)
NOTE: An entity occurring in fewer varied senses (The Curious Case of Benjamin Button)
could still have a high complexity of extraction if the distribution is skewed away from the
sense of interest!
48. Extraction in a Target Sense
• Complexity of extraction in a sense of interest =
how much support in corpus toward that sense
31
49. Extraction in a Target Sense
• Complexity of extraction in a sense of interest =
how much support in corpus toward that sense
• How do we find this?
31
50. Extraction in a Target Sense
• Complexity of extraction in a sense of interest =
how much support in corpus toward that sense
• How do we find this?
• Documents that mention the entity in word contexts that
are biased to our sense of interest (language models)
31
51. Extraction in a Target Sense
• Complexity of extraction in a sense of interest =
how much support in corpus toward that sense
• How do we find this?
• Documents that mention the entity in word contexts that
are biased to our sense of interest (language models)
• More document, implies a lot of support, implies easy to
extract, low complexity of extraction
31
52. Support via Word
Associations
• Co-occurring words alone wont cut it!
• Prolific discussion and comparison of different
senses
• Co-occurrence based language models will give
us everything unless we bias it to our sense
(movies)
32
53. Complexity of Extraction
• Goal: Complexity of Extraction in a target sense
• Subgoal: Support in terms of sense-biased
contexts in documents that mention entity
• Step1: Extract a sense-biased LM
• Step 2: Identify documents that mention entity
in the context of the sense-biased LM
33
54. Knowledge Features to seed Sense-
biased Word Association Gathering
• Sense Definition (hints) from Wikipedia
Infoboxes
• Working definition: Sense is domain of interest
34
55. Knowledge Features to seed Sense-
biased Word Association Gathering
• Sense Definition (hints) from Wikipedia
Infoboxes
• Working definition: Sense is domain of interest
• Use sense hints to derive contextual
support
Lot of support, easy to extract, implies a low ‘complexity of
extraction’ score!
34
56. Measuring ‘complexity of extraction’
Two step framework (unsupervised)
• Step 1: Propagate sense evidence in contexts of e,
extract a sense-biased language model (LM)
• random walks, distributional similarity approaches
• SPREADING ACTIVATION NETWORKS
D e e
35
57. Measuring ‘complexity of extraction’
Two step framework (unsupervised)
• Step 1: Propagate sense evidence in contexts of e,
extract a sense-biased language model (LM)
• random walks, distributional similarity approaches
• SPREADING ACTIVATION NETWORKS
D e e
Sense hint nodes
Sense-biased Language Model
35
58. Overview
• Step 2: Clustering documents represented by sense-
relatedness vectors
• CHINESE WHISPERS CLUSTERING
SenseRel doc 1 doc 2 doc n
sense LM term 1 SenseRel (t1) SenseRel (t1) SenseRel (t1)
sense LM term 2 SenseRel (t2)
sense LM term m SenseRel (tm) SenseRel (tm)
• Result: Clustered Documents in similar senses
• Not just similar words!
60. Constructing the SAN
J. J. Abrams
Damon Lindelof
Star Trek
Roberto Orci
Alex Kurtzman
Startrek
Paramount Pictures
Chris Pine
Zachary Quinto
Eric Bana
Zoe Saldana
Karl Urban
John Cho
Anton Yelchin
Simon Pegg
Bruce Greenwood
Leonard Nimoy
Kirk
Spock
Nero
Pavel Chekov
Nyota Uhura
..
Greenwood
Leonard
Nimoy
Pavel
Chekov
Nyota
Uhura
61. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek
Paramount Pictures
Chris Pine
Zachary Quinto
Eric Bana
Zoe Saldana
Karl Urban
John Cho
Anton Yelchin
Simon Pegg
Bruce Greenwood
Leonard Nimoy
Kirk
Spock
Nero
Pavel Chekov
Nyota Uhura
..
Greenwood
Leonard
Nimoy
Pavel
Chekov
Nyota
Uhura
62. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek
Paramount Pictures
Chris Pine
Zachary Quinto
Eric Bana
10 minutes.
Zoe Saldana
That is all it took for JJ Abrams to make a
Karl Urban believer out of me.
John Cho 10 minutes.
Anton Yelchin Let us set the stage for my viewing of Star
Simon Pegg Trek. IMAX? Check. Perfect seats?
Bruce Greenwood Check..not sit well with me was the
Leonard Nimoy libidinous Spock. It changed one of the
Kirk fundamental aspects of the character for no
Spock good reason. Other than that, however,
Nero none of the changes to Trek canon
Pavel Chekov particularly bothered me in a "get a life" kind
of way.………….the special effects were
Nyota Uhura
stunning, and the performances were...wow.
..
Chris Pine IS James T. Kirk. Karl Urban IS
Greenwood Leonard McCoy…Spock
Leonard
Nimoy
Pavel
Chekov
Nyota
Uhura
63. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek Top X keywords (IDF)
Paramount Pictures
Chris Pine - Among the context surrounding (but
Zachary Quinto
Eric Bana
excluding) entity of interest
Zoe Saldana 10 minutes.
That is all it took for JJ Abrams to make a - Force include sense related words
Karl Urban believer out of me.
John Cho 10 minutes.
Spock
Anton Yelchin IMAX
Let us set the stage for my viewing of Star ..
Simon Pegg Trek. IMAX? Check. Perfect seats? Kirk
Bruce Greenwood Check..not sit well with me was the Karl Urban
Leonard Nimoy libidinous Spock. It changed one of the James
fundamental aspects of the character for no ..
Kirk
canon
Spock good reason. Other than that, however,
Chris Pine
Nero none of the changes to Trek canon libidinous
Pavel Chekov particularly bothered me in a "get a life" kind
of way.………….the special effects were
Nyota Uhura
stunning, and the performances were...wow.
..
Chris Pine IS James T. Kirk. Karl Urban IS
Greenwood Leonard McCoy…Spock
Leonard
Nimoy
Pavel
Chekov
Nyota
Uhura
64. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek Top X keywords (IDF)
Paramount Pictures
Chris Pine - Among the context surrounding (but
Zachary Quinto
Eric Bana
excluding) entity of interest
Zoe Saldana 10 minutes.
That is all it took for JJ Abrams to make a - Force include sense related words
Karl Urban believer out of me.
John Cho 10 minutes.
Spock
Anton Yelchin IMAX
Let us set the stage for my viewing of Star ..
Simon Pegg Trek. IMAX? Check. Perfect seats? Kirk
Bruce Greenwood Check..not sit well with me was the Karl Urban
Leonard Nimoy libidinous Spock. It changed one of the James
fundamental aspects of the character for no ..
Kirk
canon
Spock good reason. Other than that, however,
Chris Pine
Nero none of the changes to Trek canon libidinous
Pavel Chekov particularly bothered me in a "get a life" kind
of way.………….the special effects were
Nyota Uhura
stunning, and the performances were...wow.
..
Greenwood
Leonard
Chris Pine IS James T. Kirk. Karl Urban IS
Leonard McCoy…Spock Activation Network
Nimoy
Pavel
Chekov
Nyota
Uhura
65. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek Top X keywords (IDF)
Paramount Pictures
Chris Pine - Among the context surrounding (but
Zachary Quinto
Eric Bana
excluding) entity of interest
Zoe Saldana 10 minutes.
That is all it took for JJ Abrams to make a - Force include sense related words
Karl Urban believer out of me.
John Cho 10 minutes.
Spock
Anton Yelchin IMAX
Let us set the stage for my viewing of Star ..
Simon Pegg Trek. IMAX? Check. Perfect seats? Kirk
Bruce Greenwood Check..not sit well with me was the Karl Urban
Leonard Nimoy libidinous Spock. It changed one of the James
fundamental aspects of the character for no ..
Kirk
canon
Spock good reason. Other than that, however,
Chris Pine
Nero none of the changes to Trek canon libidinous
Pavel Chekov particularly bothered me in a "get a life" kind
of way.………….the special effects were
Nyota Uhura
stunning, and the performances were...wow.
..
Greenwood
Leonard
Chris Pine IS James T. Kirk. Karl Urban IS
Leonard McCoy…Spock Activation Network
Nimoy
Pavel
Chekov
Nyota
Uhura
Let us set the stage for
my viewing of Star Trek.
IMAX? Check. Perfect
seats? Check..not sit
well with me was the
libidinous Spock. It
changed one of
66. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek Top X keywords (IDF)
Paramount Pictures
Chris Pine - Among the context surrounding (but
Zachary Quinto
Eric Bana
excluding) entity of interest
Zoe Saldana 10 minutes.
That is all it took for JJ Abrams to make a - Force include sense related words
Karl Urban believer out of me.
John Cho 10 minutes.
Spock
Anton Yelchin IMAX
Let us set the stage for my viewing of Star ..
Simon Pegg Trek. IMAX? Check. Perfect seats? Kirk
Bruce Greenwood Check..not sit well with me was the Karl Urban
Leonard Nimoy libidinous Spock. It changed one of the James
fundamental aspects of the character for no ..
Kirk
canon
Spock good reason. Other than that, however,
Chris Pine
Nero none of the changes to Trek canon libidinous
Pavel Chekov particularly bothered me in a "get a life" kind
of way.………….the special effects were
Nyota Uhura
stunning, and the performances were...wow.
..
Greenwood
Leonard
Chris Pine IS James T. Kirk. Karl Urban IS
Leonard McCoy…Spock Activation Network
Nimoy
Pavel
Chekov
Nyota
Uhura
Let us set the stage for imax 1
Spock
my viewing of Star Trek.
IMAX? Check. Perfect 1
seats? Check..not sit 1
well with me was the libidinous
libidinous Spock. It
changed one of
67. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek Top X keywords (IDF)
Paramount Pictures
Chris Pine - Among the context surrounding (but
Zachary Quinto
Eric Bana
excluding) entity of interest
Zoe Saldana 10 minutes.
That is all it took for JJ Abrams to make a - Force include sense related words
Karl Urban believer out of me.
John Cho 10 minutes.
Spock
Anton Yelchin IMAX
Let us set the stage for my viewing of Star ..
Simon Pegg Trek. IMAX? Check. Perfect seats? Kirk
Bruce Greenwood Check..not sit well with me was the Karl Urban
Leonard Nimoy libidinous Spock. It changed one of the James
fundamental aspects of the character for no ..
Kirk
canon
Spock good reason. Other than that, however,
Chris Pine
Nero none of the changes to Trek canon libidinous
Pavel Chekov particularly bothered me in a "get a life" kind
of way.………….the special effects were
Nyota Uhura
stunning, and the performances were...wow.
..
Greenwood
Leonard
Chris Pine IS James T. Kirk. Karl Urban IS
Leonard McCoy…Spock Activation Network
Nimoy
Pavel
Chekov
Nyota
Uhura
effects were stunning, imax 1
Spock
and the performances
were...wow. Chris Pine 1
IS James T. Kirk. Karl 1
Urban IS Leonard libidinous
McCoy…Spock
68. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek Top X keywords (IDF)
Paramount Pictures
Chris Pine - Among the context surrounding (but
Zachary Quinto
Eric Bana
excluding) entity of interest
Zoe Saldana 10 minutes.
That is all it took for JJ Abrams to make a - Force include sense related words
Karl Urban believer out of me.
John Cho 10 minutes.
Spock
Anton Yelchin IMAX
Let us set the stage for my viewing of Star ..
Simon Pegg Trek. IMAX? Check. Perfect seats? Kirk
Bruce Greenwood Check..not sit well with me was the Karl Urban
Leonard Nimoy libidinous Spock. It changed one of the James
fundamental aspects of the character for no ..
Kirk
canon
Spock good reason. Other than that, however,
Chris Pine
Nero none of the changes to Trek canon libidinous
Pavel Chekov particularly bothered me in a "get a life" kind
of way.………….the special effects were
Nyota Uhura
stunning, and the performances were...wow.
..
Greenwood
Leonard
Chris Pine IS James T. Kirk. Karl Urban IS
Leonard McCoy…Spock Activation Network
Nimoy
Pavel
Chekov
Nyota
Uhura
effects were stunning, imax 1
Spock
and the performances
were...wow. Chris Pine 1 1
IS James T. Kirk. Karl 1 1 Kirk
Urban IS Leonard libidinous 1
McCoy…Spock
Chris Pine
69. Constructing the SAN
J. J. Abrams Star Trek
Damon Lindelof indicative of being a Named Entity
Roberto Orci
Alex Kurtzman
Startrek Top X keywords (IDF)
Paramount Pictures
Chris Pine - Among the context surrounding (but
Zachary Quinto
Eric Bana
excluding) entity of interest
Zoe Saldana 10 minutes.
That is all it took for JJ Abrams to make a - Force include sense related words
Karl Urban believer out of me.
John Cho 10 minutes.
Spock
Anton Yelchin IMAX
Let us set the stage for my viewing of Star ..
Simon Pegg Trek. IMAX? Check. Perfect seats? Kirk
Bruce Greenwood Check..not sit well with me was the Karl Urban
Leonard Nimoy libidinous Spock. It changed one of the James
fundamental aspects of the character for no ..
Kirk
canon
Spock good reason. Other than that, however,
Chris Pine
Nero none of the changes to Trek canon libidinous
Pavel Chekov particularly bothered me in a "get a life" kind
of way.………….the special effects were
Nyota Uhura
stunning, and the performances were...wow.
..
Greenwood
Leonard
Chris Pine IS James T. Kirk. Karl Urban IS
Leonard McCoy…Spock Activation Network
Nimoy
Pavel
Chekov
Nyota
Uhura
effects were stunning, imax 1
Spock
and the performances
were...wow. Chris Pine 1 1
Repeat this procedure for all blogs
IS James T. Kirk. Karl 1 1 Kirk
End up with a connected SAN Urban IS Leonard 1
libidinous
With some sense Nodes and other McCoy…Spock
words in context of entity Chris Pine
70. Node and Edge Semantics
• Pre-adjustment phase
• Node weights: Sense nodes: 1;
Other nodes: 0.1
• ambiguous sense nodes
• alternate seeding methods:
distributional similarity with
unambiguous domain terms (movie,
theatre, imax, cinemas)
• Edge weights: co-occurrence
counts
38
71. Node and Edge Semantics
• Pre-adjustment phase Constructing the Spreading
Activation Network G from
• Node weights: Sense nodes: 1; words co-occurring with e in D
movie
Other nodes: 0.1 1
Chris pine
Eric Sulu
• ambiguous sense nodes Bana 1
1
franchise
• alternate seeding methods: 1
distributional similarity with Romulan
seats
unambiguous domain terms (movie, 1
theatre, imax, cinemas) starship J. J. Abrams
sense hints Y
• Edge weights: co-occurrence
other vertices X
counts
38
72. Propagating Sense Evidences
Constructing the Spreading Pulsesense nodes and spread effect
Activation Network G from
words co-occurring with e in D As many pulses (iterations) as number of
movie
1
sense nodes
Chris pine
Eric Sulu
Bana 1
1
franchise
1
Romulan
seats
1
starship J. J. Abrams
sense hints Y
other vertices X
39
73. Propagating Sense Evidences
Constructing the Spreading Pulsesense nodes and spread effect
Activation Network G from
words co-occurring with e in D As many pulses (iterations) as number of
movie
1
sense nodes
Chris pine
Eric Sulu
Bana 1
1
franchise
At every iteration
1 A BFS walk starting at a sense node (weight 1)
Romulan
seats Revisiting nodes not edges
1 Amplifying weights of visited nodes:
starship J. J. Abrams
W [ j ] = W [ j ] + (W [ i ] * co-occ[ i, j ] * α)
sense hints Y
other vertices X
39
74. Propagating Sense Evidences
Constructing the Spreading Pulsesense nodes and spread effect
Activation Network G from
words co-occurring with e in D As many pulses (iterations) as number of
movie
1
sense nodes
Chris pine
Eric Sulu
Bana 1
1
franchise
At every iteration
1 A BFS walk starting at a sense node (weight 1)
Romulan
seats Revisiting nodes not edges
1 Amplifying weights of visited nodes:
starship J. J. Abrams
W [ j ] = W [ j ] + (W [ i ] * co-occ[ i, j ] * α)
sense hints Y
other vertices X
Collective Spreading controlled by dampening
factor α, co-occurrence thresholds
39
75. Propagating Sense Evidences
Post Propagation of Sense
Pulsesense nodes and spread effect
Evidences:
Spreading Activation Theory As many pulses (iterations) as number of
movie sense nodes
Chris pine
Eric Sulu
Bana
franchise
At every iteration
A BFS walk starting at a sense node (weight 1)
Romulan
seats Revisiting nodes not edges
Amplifying weights of visited nodes:
starship J. J. Abrams
W [ j ] = W [ j ] + (W [ i ] * co-occ[ i, j ] * α)
non-activated vertices
Final activated portions of the Collective Spreading controlled by dampening
network indicate word’s factor α, co-occurrence thresholds
relatedness to sense = sense-
biased LM 39
76. Sense-biased
LM
Entity: Star Trek(movie)
20 iterations (pulsed sense nodes)
900+ blogs, 35K+ words in co-occ graph
167 words in the LM
40
77. Sense-biased
LM
Entity: Star Trek(movie)
20 iterations (pulsed sense nodes)
900+ blogs, 35K+ words in co-occ graph
167 words in the LM
Sense-biased Spreading Activation already
lends one type of clustering (separation of
words strongly related to our sense)
40
78. Sense-biased
LM
Entity: Star Trek(movie)
20 iterations (pulsed sense nodes)
900+ blogs, 35K+ words in co-occ graph
167 words in the LM
Sense-biased Spreading Activation already
lends one type of clustering (separation of
words strongly related to our sense)
40
79. Step2:Clustering using Extracted LM
Algorithmic Implementations
Vector Space Model
Typically: word, tfidf score
Here: word, sense relatedness score
Documents D Represented
in terms of LMe
10 minutes.
That is minutes. for JJ Abrams to make a
10 all it took
believer10 minutes. for JJ Abrams to make a
That out ofitme.
is all took
10 minutes. out ofitme. for JJ Abrams to make a
believer is all took
That
Let 10 set the stage forme. viewing of Star
us minutes. out of my
believer
Trek. IMAX?the stage Perfect viewing of Star
Let 10 set Check. for my seats?
us minutes.
Check..not IMAX?the stage Perfect viewing of Star
Trek. us set with me for my seats?
Let sit well Check. was the
libidinous Spock. It changed mePerfect seats?
Check..not IMAX? with one of the
Trek. sit well Check. was the
fundamental aspects of the with me was the
Check..not sit well character for no
libidinous Spock. It changed one of the
good reason. Other thanIt the however, of the
fundamental aspects ofthat, character for no
libidinous Spock. changed one
none offundamental aspects ofthat, however, for no
goodthe changes to Trek canon character
reason. Other than the
particularly bothered Other athan that, however,
none of the changes to Trek canon kind
good reason. me in "get a life"
of way.………….the special to Trek were kind
particularly bothered meeffects canon
none of the changes in a "get a life"
stunning, and the performances were...wow.life" kind
of way.………….the special effects were
particularly bothered me in a "get a
Chris Pineway.………….theKarl Urban IS were
of IS James T. Kirk. special effects
stunning, and the performances were...wow.
Leonard McCoy…Spock performances were...wow.
Chris Pine IS and the Kirk. Karl Urban IS
stunning, James T.
Leonard McCoy…Spock Kirk. Karl Urban IS
Chris Pine IS James T.
Leonard McCoy…Spock
di(LMe) = {w1, LMe(w1) ; .. wx,
LMe(wx) }
41
80. Step2:Clustering using Extracted LM
Algorithmic Implementations
Vector Space Model
Typically: word, tfidf score
Here: word, sense relatedness score
http://realart.blogspot.com/2009/05/ http://susanisaacs.blogspot.com/2009/04/
star-trek-balance-of-terror-from.html quantum-leap-convention.html
http://realart.blogspot.com/2009/05/
star-trek-balance-of-terror-from.html
No Representation
http://semioblog.blogspot.com/2009/01/retrofuturo-web.html
http://wilwheaton.net/2006/05/learn_to_swim.php
41