SlideShare une entreprise Scribd logo
1  sur  149
Language Independent Methods of Clustering Similar Contexts (with applications) Ted Pedersen University of Minnesota, Duluth  http://www.d.umn.edu/~tpederse [email_address]
Language Independent Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Clustering Similar Contexts ,[object Object],[object Object],[object Object],[object Object],[object Object]
Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tutorial Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Info ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SenseClusters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Many thanks… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background and Motivations
Headed and Headless Contexts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Headed Contexts (input) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Headed Contexts (output) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Headless Contexts (input) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Headless Contexts (output) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Search as Application ,[object Object],[object Object],[object Object],[object Object]
Name Discrimination
George Millers!
 
 
 
 
 
Email Foldering as Application ,[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
Clustering News as Application ,[object Object],[object Object],[object Object],[object Object]
 
 
 
What is it to be “similar”? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Methodology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identifying Lexical Features Measures of Association and  Tests of Significance
What are features? ,[object Object],[object Object],[object Object]
Where do features come from?  ,[object Object],[object Object],[object Object],[object Object]
Feature Selection ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lexical Features ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigrams ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Co-occurrences ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigrams and Co-occurrences ,[object Object],[object Object],[object Object],[object Object],[object Object]
“ occur together more often than expected by chance…” ,[object Object],[object Object],[object Object],[object Object],[object Object]
2x2 Contingency Table 100,000 300 !Artificial 400 100 Artificial !Intelligence Intelligence
2x2 Contingency Table 100,000 99,700 300 99,600 99,400 200 !Artificial 400 300 100 Artificial !Intelligence Intelligence
2x2 Contingency Table 100,000 99,700 300 99,600 99,400.0 99,301.2 200.0 298.8 !Artificial 400 300.0 398.8 100.0 000.12 Artificial !Intelligence Intelligence
Measures of Association
Measures of Association
Interpreting the Scores… ,[object Object],[object Object]
 
Interpreting the Scores… ,[object Object],[object Object],[object Object]
Measures of Association ,[object Object],[object Object],[object Object]
Measures Supported in NSP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NSP ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Context Representations First and Second Order Methods
Once features selected… ,[object Object],[object Object],[object Object]
First Order Representation ,[object Object],[object Object],[object Object]
Contexts ,[object Object],[object Object],[object Object],[object Object]
Unigram Feature Set  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First Order Vectors of Unigrams 1 0 1 0 1 Cxt4 0 0 0 0 0 Cxt3 1 1 0 1 0 Cxt2 1 1 1 1 1 Cxt1 child magic curse black island
Bigram Feature Set ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First Order Vectors of Bigrams 1 0 1 1 0 Cxt4 0 1 1 0 0 Cxt3 1 0 0 0 1 Cxt2 1 0 0 1 1 Cxt1 voodoo child serious error military might  island curse  black magic
First Order Vectors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Second Order Features ,[object Object],[object Object],[object Object],[object Object],[object Object]
Second Order Representation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Word by Word Matrix 120.0 0 69.4 0 0 voodoo 0 89.2 0 21.2 0 serious 0 54.9 100.3 0 0 military 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
Word by Word Matrix ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
There was an  island  curse of  black  magic cast by that  voodoo  child.  120.0 0 69.4 0 0 voodoo 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
Second Order Co-Occurrences ,[object Object],[object Object]
Second Order Representation ,[object Object],[object Object]
There was an  island  curse of  black  magic cast by that  voodoo  child.  78.8 0 24.4 63.1 41.2 Cxt1 child error might curse magic
Second Order Representation ,[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimensionality Reduction Singular Value Decomposition
Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Many Methods  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Effect of SVD ,[object Object],[object Object]
Effect of SVD ,[object Object],[object Object],[object Object]
How can SVD be used? ,[object Object],[object Object],[object Object],[object Object]
Word by Word Matrix 4 2 0 0 0 3 0 1 box 0 1 2 2 1 2 0 0 memory 0 0 0 1 0 0 2 0 organ 0 2 0 3 2 0 0 0 debt 0 1 0 3 1 0 0 2 linux 0 1 0 3 2 0 0 0 sales 3 0 2 2 0 3 0 0 lab 1 0 2 0 0 1 2 0 petri 0 1 0 0 2 0 0 1 disk 1 0 2 0 0 0 3 0 body 0 0 0 3 1 0 0 2 pc plasma graphics tissue data ibm cells blood apple
Singular Value Decomposition A=UDV’
U -.52 .39 -.48 .02 .09 .41 -.09 .40 -.30 .08 .31 .43 -.26 -.39 -.6 .20 .00 -.00 -.00 -.02 -.01 .00 -.02 -.00 -.07 -.3 .14 -.49 -.07 .30 .25 .56 -.01 .08 .05 -.01 .24 -.08 .11 .46 .08 .03 -.04 .72 .09 -.31 -.01 .37 -.07 .01 -.21 -.31 -.34 -.45 -.68 .29 .00 .05 .83 .17 -.02 .25 -.45 .08 .03 .20 -.22 .31 -.60 .39 .13 .35 -.01 -.04 -.44 .08 .44 .59 -.49 .05 -.02 .63 .02 -.09 .52 -.2 .09 .35
D 0.00 0.00 0.00 0.66 1.26 2.30 2.52 3.25 3.99 6.36 9.19
V -.20 .22 -.07 -.10 -.87 -.07 -.06 .17 .19 -.26 .04 .03 .17 -.32 .02 .13 -.26 -.17 .06 -.04 .86 .50 -.58 .12 .09 -.18 -.27 -.18 -.12 -.47 .11 -.03 .12 .31 -.32 -.04 .64 -.45 -.14 -.23 .28 .07 -.23 -.62 -.59 .05 .02 -.12 .15 .11 .25 -.71 -.31 -.04 .08 .29 -.05 .05 .20 -.51 .09 -.03 .12 .31 -.01 .02 -.45 -.32 .50 .27 .49 -.02 .08 .21 -.06 .08 -.09 .52 -.45 -.01 .63 .03 -.12 -.31 .71 -.13 .39 -.12 .12 .15 .37 .07 .58 -.41 .15 .17 -.30 -.32 -.27 -.39 .11 .44 .25 .03 -.02 .26 .23 .39 .57 -.37 .04 .03 -.12 -.31 -.05 -.05 .04 .28 -.04 .08 .21
Word by Word Matrix After SVD 1.1 1.0 .98 1.7 .86 .72 .85 .77 memory .00 .00 .17 1.2 .77 .00 .84 .00 organ .00 1.5 .00 3.2 2.1 .00 .00 1.2 debt .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .41 .85 .35 2.2 1.3 .39 .15 .73 sales 2.3 .18 2.5 1.7 .35 2.0 1.7 .21 lab 1.4 .00 1.5 .49 .00 1.2 1.1 .00 germ .00 .91 .00 2.1 1.3 .01 .00 .76 disk 1.5 .00 1.6 .33 .00 1.3 1.2 .00 body .09 .86 .01 2.0 1.3 .11 .00 .73 pc plasma graphics tissue data ibm cells blood apple
Second Order Representation ,[object Object],[object Object],[object Object],[object Object],1.0 .72 memory .00 .00 organ .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .00 .91 .00 2.1 1.3 .01 .00 .76 disk Plasma graphics tissue data ibm cells blood apple
Relationship to LSA ,[object Object],[object Object],[object Object],[object Object],[object Object]
Feature by Context Representation 0 1 0 0 serious error 1 0 1 1 voodoo child 0 1 0 0 military might 1 0 0 1 island curse 1 0 1 1 black magic Cxt4 Cxt3 Cxt2 Cxt1
References ,[object Object],[object Object],[object Object],[object Object]
Clustering Partitional Methods Cluster Stopping Cluster Labeling
Many many methods… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Methodology ,[object Object],[object Object],[object Object],[object Object],[object Object]
Agglomerative Clustering ,[object Object],[object Object],[object Object]
Measuring Similarity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agglomerative Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
  Average Link Clustering 1 2 4 S3 1 2 4 S3 0 2 S4 0 3 S2 2 3 S1 S4 S2 S1 0 S4 0 S2 S1S3 S4 S2 S1S3 S4 S1S3S2 S4 S1S3S2
Partitional Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Partitional Methods ,[object Object],[object Object],[object Object],[object Object],[object Object]
Vectors to be clustered
Random Initial Centroids (k=2)
Assignment of Clusters
Recalculation of Centroids
Reassignment of Clusters
Recalculation of Centroid
Reassignment of Clusters
Partitional Criterion Functions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Intra Cluster Similarity ,[object Object],[object Object],[object Object],[object Object]
Contexts to be Clustered
Ball of String  (I1 Internal Criterion Function)
Flower (I2 Internal Criterion Function)
Inter Cluster Similarity ,[object Object],[object Object],[object Object]
The Fan (E1 External Criterion Function)
Hybrid Criterion Functions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cluster Stopping
Cluster Stopping ,[object Object],[object Object]
Criterion Functions Can Help ,[object Object],[object Object],[object Object],[object Object]
SenseCluster’s Approach  to Cluster Stopping ,[object Object],[object Object],[object Object]
H2 versus k T. Blair – V. Putin – S. Hussein
PK2 ,[object Object],[object Object],[object Object]
PK2 predicts 3 senses T. Blair – V. Putin – S. Hussein
PK3 ,[object Object],[object Object],[object Object],[object Object]
PK3 predicts 3 senses T. Blair – V. Putin – S. Hussein
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cluster Labeling
Cluster Labeling ,[object Object],[object Object]
Results of Clustering ,[object Object],[object Object],[object Object],[object Object]
Label Types ,[object Object],[object Object]
Evaluation Techniques Comparison to gold standard data
Evaluation ,[object Object],[object Object],[object Object],[object Object]
Evaluation ,[object Object],[object Object],[object Object],[object Object]
Evaluation ,[object Object],[object Object],[object Object]
Baseline Algorithm ,[object Object],[object Object]
Baseline Performance ,[object Object],170 55 35 80 Totals 170 55 35 80 C3 0 0 0 0 C2 0 0 0 0 C1 Totals S3 S2 S1 170 80 35 55 Totals 170 80 35 55 C3 0 0 0 0 C2 0 0 0 0 C1 Totals S1 S2 S3
Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],170 55 35 80 Totals 65 10 5 50 C3 60 40 0 20 C2 45 5 30 10 C1 Totals S3 S2 S1
Evaluation ,[object Object],[object Object],[object Object],170 80 55 35 Totals 65 50 10 5 C3 60 20 40 0 C2 45 10 5 30 C1 Totals S1 S3 S2
Analysis ,[object Object],[object Object],[object Object],[object Object]
Practical Session Experiments with SenseClusters
Things to Try ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Experimental Data ,[object Object],[object Object],[object Object],[object Object]
Creating Experimental Data ,[object Object],[object Object],[object Object],[object Object],[object Object]
Headed Clustering ,[object Object],[object Object],[object Object]
 
 
 
 
Headless Contexts ,[object Object],[object Object]
 
 
Thank you! ,[object Object],[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Tendances (12)

Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Contextual ontology alignment may 2011
Contextual ontology alignment may 2011Contextual ontology alignment may 2011
Contextual ontology alignment may 2011
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Data modelingpresentation
Data modelingpresentationData modelingpresentation
Data modelingpresentation
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
A^2_Poster
A^2_PosterA^2_Poster
A^2_Poster
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 

En vedette

En vedette (20)

Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014Pedersen masters-thesis-oct-10-2014
Pedersen masters-thesis-oct-10-2014
 
Screening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSDScreening Twitter Users for Depression and PTSD
Screening Twitter Users for Depression and PTSD
 
I2b2 2008
I2b2 2008I2b2 2008
I2b2 2008
 
Acm ihi-2010-pedersen-final
Acm ihi-2010-pedersen-finalAcm ihi-2010-pedersen-final
Acm ihi-2010-pedersen-final
 
Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012
 
What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)What it's like to do a Master's thesis with me (Ted Pedersen)
What it's like to do a Master's thesis with me (Ted Pedersen)
 
Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24Pedersen semeval-2013-poster-may24
Pedersen semeval-2013-poster-may24
 
Google Glasses for the Masses
Google Glasses for the MassesGoogle Glasses for the Masses
Google Glasses for the Masses
 
Amia06
Amia06Amia06
Amia06
 
Articulating Our Impact: Strategies for Holistic Library Assessment
Articulating Our Impact: Strategies for Holistic Library AssessmentArticulating Our Impact: Strategies for Holistic Library Assessment
Articulating Our Impact: Strategies for Holistic Library Assessment
 
Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25Pedersen naacl-2013-demo-poster-may25
Pedersen naacl-2013-demo-poster-may25
 
Heart on Sleeve: Librarianship As an Avocational Vocation
Heart on Sleeve: Librarianship As an Avocational VocationHeart on Sleeve: Librarianship As an Avocational Vocation
Heart on Sleeve: Librarianship As an Avocational Vocation
 
Heuristics for Reflective Practice
Heuristics for Reflective PracticeHeuristics for Reflective Practice
Heuristics for Reflective Practice
 
LSLS 2015 Keynote: Reframing Our Narratives
LSLS 2015 Keynote: Reframing Our NarrativesLSLS 2015 Keynote: Reframing Our Narratives
LSLS 2015 Keynote: Reframing Our Narratives
 
Information Privilege: Narratives of Challenge and Change
Information Privilege: Narratives of Challenge and ChangeInformation Privilege: Narratives of Challenge and Change
Information Privilege: Narratives of Challenge and Change
 
Cultivating Campus Collaborations
Cultivating Campus CollaborationsCultivating Campus Collaborations
Cultivating Campus Collaborations
 
Cicling2005
Cicling2005Cicling2005
Cicling2005
 
Pedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshopPedersen ACL Disco-2011 workshop
Pedersen ACL Disco-2011 workshop
 
Duluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of LexicographyDuluth : Word Sense Discrimination in the Service of Lexicography
Duluth : Word Sense Discrimination in the Service of Lexicography
 
Teaching with Technology
Teaching with TechnologyTeaching with Technology
Teaching with Technology
 

Similaire à Eacl 2006 Pedersen

CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
butest
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Chunyang Chen
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
Andre Freitas
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
anil maurya
 

Similaire à Eacl 2006 Pedersen (20)

Icon 2007 Pedersen
Icon 2007 PedersenIcon 2007 Pedersen
Icon 2007 Pedersen
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
The Semantic Quilt
The Semantic QuiltThe Semantic Quilt
The Semantic Quilt
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Information retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.pptInformation retrieval chapter 2-Text Operations.ppt
Information retrieval chapter 2-Text Operations.ppt
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
 
Using construction grammar in conversational systems
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systems
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 

Plus de University of Minnesota, Duluth

Plus de University of Minnesota, Duluth (18)

Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
Muslims in Machine Learning workshop (NeurlPS 2021) - Automatically Identifyi...
 
Automatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social MediaAutomatically Identifying Islamophobia in Social Media
Automatically Identifying Islamophobia in Social Media
 
What Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshopWhat Makes Hate Speech : an interactive workshop
What Makes Hate Speech : an interactive workshop
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
Duluth at Semeval 2017 Task 6 - Language Models in Humor Detection
 
Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...Who's to say what's funny? A computer using Language Models and Deep Learning...
Who's to say what's funny? A computer using Language Models and Deep Learning...
 
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
Duluth at Semeval 2017 Task 7 - Puns upon a Midnight Dreary, Lexical Semantic...
 
Puns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and wearyPuns upon a midnight dreary, lexical semantics for the weak and weary
Puns upon a midnight dreary, lexical semantics for the weak and weary
 
The horizon isn't found in a dictionary : Identifying emerging word senses a...
The horizon isn't found in a  dictionary : Identifying emerging word senses a...The horizon isn't found in a  dictionary : Identifying emerging word senses a...
The horizon isn't found in a dictionary : Identifying emerging word senses a...
 
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
MICAI 2013 Tutorial Slides - Measuring the Similarity and Relatedness of Conc...
 
Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013Talk at UAB, April 12, 2013
Talk at UAB, April 12, 2013
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
 
Pedersen acl2011-business-meeting
Pedersen acl2011-business-meetingPedersen acl2011-business-meeting
Pedersen acl2011-business-meeting
 
Pedersen naacl-2010-poster
Pedersen naacl-2010-posterPedersen naacl-2010-poster
Pedersen naacl-2010-poster
 
Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005
 
Advances In Wsd Acl 2005
Advances In Wsd Acl 2005Advances In Wsd Acl 2005
Advances In Wsd Acl 2005
 
Amia2009
Amia2009Amia2009
Amia2009
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Eacl 2006 Pedersen

  • 1. Language Independent Methods of Clustering Similar Contexts (with applications) Ted Pedersen University of Minnesota, Duluth http://www.d.umn.edu/~tpederse [email_address]
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 18.  
  • 19.  
  • 20.  
  • 21.  
  • 22.  
  • 23.
  • 24.  
  • 25.  
  • 26.
  • 27.  
  • 28.  
  • 29.  
  • 30.
  • 31.
  • 32. Identifying Lexical Features Measures of Association and Tests of Significance
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. 2x2 Contingency Table 100,000 300 !Artificial 400 100 Artificial !Intelligence Intelligence
  • 42. 2x2 Contingency Table 100,000 99,700 300 99,600 99,400 200 !Artificial 400 300 100 Artificial !Intelligence Intelligence
  • 43. 2x2 Contingency Table 100,000 99,700 300 99,600 99,400.0 99,301.2 200.0 298.8 !Artificial 400 300.0 398.8 100.0 000.12 Artificial !Intelligence Intelligence
  • 46.
  • 47.  
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54. Context Representations First and Second Order Methods
  • 55.
  • 56.
  • 57.
  • 58.
  • 59. First Order Vectors of Unigrams 1 0 1 0 1 Cxt4 0 0 0 0 0 Cxt3 1 1 0 1 0 Cxt2 1 1 1 1 1 Cxt1 child magic curse black island
  • 60.
  • 61. First Order Vectors of Bigrams 1 0 1 1 0 Cxt4 0 1 1 0 0 Cxt3 1 0 0 0 1 Cxt2 1 0 0 1 1 Cxt1 voodoo child serious error military might island curse black magic
  • 62.
  • 63.
  • 64.
  • 65. Word by Word Matrix 120.0 0 69.4 0 0 voodoo 0 89.2 0 21.2 0 serious 0 54.9 100.3 0 0 military 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
  • 66.
  • 67. There was an island curse of black magic cast by that voodoo child. 120.0 0 69.4 0 0 voodoo 73.2 0 0 189.2 0 island 43.2 0 0 0 123.5 black child error might curse magic
  • 68.
  • 69.
  • 70. There was an island curse of black magic cast by that voodoo child. 78.8 0 24.4 63.1 41.2 Cxt1 child error might curse magic
  • 71.
  • 72.
  • 73.
  • 74. Dimensionality Reduction Singular Value Decomposition
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80. Word by Word Matrix 4 2 0 0 0 3 0 1 box 0 1 2 2 1 2 0 0 memory 0 0 0 1 0 0 2 0 organ 0 2 0 3 2 0 0 0 debt 0 1 0 3 1 0 0 2 linux 0 1 0 3 2 0 0 0 sales 3 0 2 2 0 3 0 0 lab 1 0 2 0 0 1 2 0 petri 0 1 0 0 2 0 0 1 disk 1 0 2 0 0 0 3 0 body 0 0 0 3 1 0 0 2 pc plasma graphics tissue data ibm cells blood apple
  • 82. U -.52 .39 -.48 .02 .09 .41 -.09 .40 -.30 .08 .31 .43 -.26 -.39 -.6 .20 .00 -.00 -.00 -.02 -.01 .00 -.02 -.00 -.07 -.3 .14 -.49 -.07 .30 .25 .56 -.01 .08 .05 -.01 .24 -.08 .11 .46 .08 .03 -.04 .72 .09 -.31 -.01 .37 -.07 .01 -.21 -.31 -.34 -.45 -.68 .29 .00 .05 .83 .17 -.02 .25 -.45 .08 .03 .20 -.22 .31 -.60 .39 .13 .35 -.01 -.04 -.44 .08 .44 .59 -.49 .05 -.02 .63 .02 -.09 .52 -.2 .09 .35
  • 83. D 0.00 0.00 0.00 0.66 1.26 2.30 2.52 3.25 3.99 6.36 9.19
  • 84. V -.20 .22 -.07 -.10 -.87 -.07 -.06 .17 .19 -.26 .04 .03 .17 -.32 .02 .13 -.26 -.17 .06 -.04 .86 .50 -.58 .12 .09 -.18 -.27 -.18 -.12 -.47 .11 -.03 .12 .31 -.32 -.04 .64 -.45 -.14 -.23 .28 .07 -.23 -.62 -.59 .05 .02 -.12 .15 .11 .25 -.71 -.31 -.04 .08 .29 -.05 .05 .20 -.51 .09 -.03 .12 .31 -.01 .02 -.45 -.32 .50 .27 .49 -.02 .08 .21 -.06 .08 -.09 .52 -.45 -.01 .63 .03 -.12 -.31 .71 -.13 .39 -.12 .12 .15 .37 .07 .58 -.41 .15 .17 -.30 -.32 -.27 -.39 .11 .44 .25 .03 -.02 .26 .23 .39 .57 -.37 .04 .03 -.12 -.31 -.05 -.05 .04 .28 -.04 .08 .21
  • 85. Word by Word Matrix After SVD 1.1 1.0 .98 1.7 .86 .72 .85 .77 memory .00 .00 .17 1.2 .77 .00 .84 .00 organ .00 1.5 .00 3.2 2.1 .00 .00 1.2 debt .13 1.1 .03 2.7 1.7 .16 .00 .96 linux .41 .85 .35 2.2 1.3 .39 .15 .73 sales 2.3 .18 2.5 1.7 .35 2.0 1.7 .21 lab 1.4 .00 1.5 .49 .00 1.2 1.1 .00 germ .00 .91 .00 2.1 1.3 .01 .00 .76 disk 1.5 .00 1.6 .33 .00 1.3 1.2 .00 body .09 .86 .01 2.0 1.3 .11 .00 .73 pc plasma graphics tissue data ibm cells blood apple
  • 86.
  • 87.
  • 88. Feature by Context Representation 0 1 0 0 serious error 1 0 1 1 voodoo child 0 1 0 0 military might 1 0 0 1 island curse 1 0 1 1 black magic Cxt4 Cxt3 Cxt2 Cxt1
  • 89.
  • 90. Clustering Partitional Methods Cluster Stopping Cluster Labeling
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96. Average Link Clustering 1 2 4 S3 1 2 4 S3 0 2 S4 0 3 S2 2 3 S1 S4 S2 S1 0 S4 0 S2 S1S3 S4 S2 S1S3 S4 S1S3S2 S4 S1S3S2
  • 97.
  • 98.
  • 99. Vectors to be clustered
  • 106.
  • 107.
  • 108. Contexts to be Clustered
  • 109. Ball of String (I1 Internal Criterion Function)
  • 110. Flower (I2 Internal Criterion Function)
  • 111.
  • 112. The Fan (E1 External Criterion Function)
  • 113.
  • 115.
  • 116.
  • 117.
  • 118. H2 versus k T. Blair – V. Putin – S. Hussein
  • 119.
  • 120. PK2 predicts 3 senses T. Blair – V. Putin – S. Hussein
  • 121.
  • 122. PK3 predicts 3 senses T. Blair – V. Putin – S. Hussein
  • 123.
  • 125.
  • 126.
  • 127.
  • 128. Evaluation Techniques Comparison to gold standard data
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137. Practical Session Experiments with SenseClusters
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.  
  • 143.  
  • 144.  
  • 145.  
  • 146.
  • 147.  
  • 148.  
  • 149.