SlideShare a Scribd company logo
1 of 28
Download to read offline
Networks and NLP



 Networks and Natural
 Language Processing

      Presented by: Ahmed Magdy Ezzeldin
Graphs in NLP

● Graphs are used in many NLP applications like :
   - Text Summarization
   - Syntactic parsing
   - Word sense disambiguation
   - Ontology construction
   - Sentiment and subjectivity analysis
   - Text clustering
● Associative or semantic networks are used to

represent the language units and their relations
where language units are the vertices (nodes) and
the relations are the edges (links).
Networks are Graphs
              Nodes are Vertices
               Links are Edges

- Node can represent text units can be : (words,
collocations, word senses, sentences,
documents)
- Graph nodes do not have to be of the same
category
- Edges can represent relations: (co-occurrence,
collocation, syntactic dependency, lexical
similarity)
Outline
●   Syntax
     1- Dependency Parsing
     2- Prepositional Phrase Attachment
     3- Co-reference Resolution

●   Lexical Semantics
     1- Lexical Networks
     2- Semantic Similarity and Relatedness
     3- Word Sense Disambiguation
     4- Sentiment and Subjectivity Analysis

●   Other Applications
     1- Summarization
     2- Semi-supervised Passage Retrieval
     3- Keyword Extraction
Syntax
1- Dependency Parsing
   An approach to sentence parsing
   Dependency tree of a sentence is
    a directed subgraph of the full
    graph connecting all words in the
    sentence.
   So this subgraph is a tree
   The root of the tree is the main
    predicate that takes arguments
    which are the child nodes
● (McDonald et al, 2005) made a parser that
finds the tree with the highest score using CLE
(Chu Liu Edmonds) Algorithm of Maximum
spanning tree (MST) in a directed graph.

● Each node picks the neighbor with the highest
score which will lead to a spanning tree or a cycle
● CLE collapses each cycles into a single node

● CLE runs in O(n^2)
●No tree covers all nodes so the closest 2 nodes
are collapsed
● We repeat this step until
all nodes are collapsed
then an MST is constructed
by reversing the procedure
and expanding all nodes.

● McDonald achieved
excellent results on a
standard English data set
and even better results on
Czech (free word order
language)
2- Prepositional Phrase Attachment

● (Toutanova et al., 2004) A preposition like "with" is either
attached to the main predicate (high verbal attachment) or the
noun phrase before it (low nominal attachment).
- “I ate pizza with olives.”
- “I ate pizza with a knife.”

● He proposed a semi-supervised learning process where a
graph of nouns and verbs is constructed and if 2 words
appear in the same context they are connected with an edge.
● Random walk until convergence



●Reached performance of 87.54% classification accuracy
which is near the human performance which is 88.20%
3- Co-reference Resolution
● Identifying relations between entity
references in a text
● Can be nouns or pronouns

● Approximate the correct assignment of

references to entities in a text by using a graph-
cut algorithm.
 Method:
A graph is constructed for each entity
● Every entity is linked to all the possible co-

reference with weighted edges where weights
are the confidence of each co-reference.
● Min-cut partitioning separate each entity and its

co-references.
Lexical Semantics
  Semantic Analysis, Machine Translation, Information
  retrieval, question answering, knowledge acquisition,
   word sense disambiguation, semantic role labeling,
textual entailment, lexical acquisition, semantic relations
1- Lexical Networks

a- Unsupervised lexical acquisition (Widdows and
Dorow, 2002)
Goal: build semantic classes automatically from raw
corpora
Method:
● Build a co-occurrence graph from British National

Corpus where nodes are words linked by conjunction
(and/or)
● Over 100,000 nodes and over half a million edges.

● Representative nouns are manually selected and put in

a seed set.
● Largest number of links with the seed set is added to

the seed
Result:
Accuracy 82% which is
far better than before
The drawback of this
method is low coverage
as it is limited to words in
conjunction relation only.
1- Lexical Networks [continued]

b- Lexical Network Properties (Ferrer-i-Cancho and
Sole, 2001)

Goal:
● Observe Lexical Networks properties

Method:
● Build a co-occurrence network where words are

nodes that are linked with edges if they appear in the
same sentences with distance of 2 words at most.
● Half million nodes with over 10 million edges

Result:
● Small-world effect: 2-3 jumps can connect any 2 words

● Distribution of node degree is scale-free
2- Semantic Similarity and Relatedness



●Methods include metrics calculated on existing
semantic networks like WordNet by applying shortest
path algorithms to identify the closest semantic relation
between 2 concepts (Leacock et al. 1998)

● Random Walk algorithm (Hughes and Ramage, 2007)
● PageRank gets the stationary distribution of nodes in

WordNet biased on each word of an input word pair.
● Divergence between these distributions is calculated to

show the words relatedness.
3- Word Sense Disambiguation

a- Label Propagation Algorithm (Niu et al. 2005)
Method:
● Construct a graph of labeled and unlabeled examples for a

given ambiguous word
● Word sense examples are the nodes and weighted edges are

drawn by pairwise metric of similarity.
● Known labeled examples are the seed set are assigned with

their correct labels (manually)
● Labels are propagated through the graph through the weighted

edges
● Labels are assigned with certain probability

● The propagation is repeated until the correct labels are

assigned.
Result: Performs better than SVM when there is a small number
of examples provided.
b- Knowledge-
based word sense
disambiguation
(Mihalcea et al.
2004, Sinha and
Mihalcea 2007)
Method:
● Build a graph for a given text and all the senses of its

words as nodes
● Senses are connected on the basis of their semantic

relations (synonymy, antonymy ...)
● A random walk results in a set of scores that reflects the

importance of each word sense.
Result:
● Superior to other Knowledge-based word sense

disambiguation that did not use graph based representations.
Follow up work:
● Mihalcea did not use semantic relations but she used

weighted edges using a measure of lexical similarity
● Brought generality as it can use any electronic dictionary

not just a semantic network like WordNet
c- Comparative Evaluation of Graph Connectivity
Algorithms (Navigli and Lapata, 2007)

●   Applied on word sense graphs derived from WordNET

●Found out that the best measure to use is a closeness
measure
4- Sentiment and Subjectivity Analysis

a- Using min-cut graph algorithm (Pang and Lea 2004)
Method:
● Drawing a graph where sentences are the nodes and the

edges are drawn according to the sentences proximity
● Each node is assigned a score showing the probability that its

sentence is subjective using a supervised subjectivity classifier
● Use min-cut algorithm to separate subjective from objective

sentences.
Results:
● Better than the supervised subjectivity classifier

b- By Assignment subjectivity and polarity labels (Esuli and
Sebastiani 2007)
Method:
● Random walk on a graph seeded with nodes labeled for

subjectivity and polarity.
Other Applications
1- Summarization
a- (Salton et al. 1994, 1997)
● Draw a graph of the corpus where every node is a paragraph

● Lexically similar paragraphs are linked with edges

● A summary is retrieved by following paths defined by different

algorithms to cover as much of the content of the graph as
possible.

b- Lexical Centrality (Erkan and Radev 2004) (Mihalcea and
Tarau 2004)
Method:
● Sentences are nodes of the graph

● Random walk to define the most visited nodes as central to

the documents
● Remove duplicates or near duplicates

● Select sentences with maximal marginal relevance
2- Semi-supervised Passage Retrieval

●Question Biased Passage Retrieval
(OtterBacher et al., 2005)
Answer a question from a group of documents

Method:
● Use biased random walk on a graph seeded

with positive and negative examples
● Each node is labeled according to the

percentage a random walk ends at this node
● The nodes with the highest score are central to

the document set and similar to the seed nodes.
3- Keyword Extraction


●A set of terms that
best describes the
document

●Used in terminology
Extraction and
construction of
domain specific
dictionaries
●   Mihalcea and Tarau, 2004

Method:
● Build a co-occurrence graph of for the input text where

words are the the text words
● Words are linked by co-occurrence relation limited by

the distance between words.
● Random walk on graph

● Words ranked as important important and found next

to each other are collapsed into one key phrase

Result:
● A lot better than tf.idf
References



 Networks and Natural Language Processing
            (Mihalcea and Radev 2008)

                  Dragomir Radev
                University of Michigan
                 radev@umich.edu

                   Rada Mihalcea
              University of North Texas
                  rada@cs.unt.edu

More Related Content

What's hot

Natural Language Processing glossary for Coders
Natural Language Processing glossary for CodersNatural Language Processing glossary for Coders
Natural Language Processing glossary for CodersAravind Mohanoor
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNetSeid Hassen
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processingHareem Naz
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing Adarsh Saxena
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationDivya Sugumar
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Word sense dissambiguation
Word sense dissambiguationWord sense dissambiguation
Word sense dissambiguationAshwin Perti
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games ResearchJose Zagal
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationSurabhi Verma
 

What's hot (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing glossary for Coders
Natural Language Processing glossary for CodersNatural Language Processing glossary for Coders
Natural Language Processing glossary for Coders
 
Nlp
NlpNlp
Nlp
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processing
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Word sense dissambiguation
Word sense dissambiguationWord sense dissambiguation
Word sense dissambiguation
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
 

Viewers also liked

Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 documentUma Kant
 
город герой волгоград»
город герой волгоград»город герой волгоград»
город герой волгоград»girina2014
 
Learning Dictionaries From Unannotated Data
Learning Dictionaries From Unannotated DataLearning Dictionaries From Unannotated Data
Learning Dictionaries From Unannotated Datahtanev
 
Multilingual Event Extraction and Semi-automatic acquisition of related resou...
Multilingual Event Extraction and Semi-automatic acquisition of related resou...Multilingual Event Extraction and Semi-automatic acquisition of related resou...
Multilingual Event Extraction and Semi-automatic acquisition of related resou...htanev
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Mohammed Bennamoun
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyoutsider2
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectValerii Klymchuk
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Pythonshanbady
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Language Acquisition: Lecture 3 Lexical and Semantic Development
Language Acquisition: Lecture 3 Lexical and Semantic DevelopmentLanguage Acquisition: Lecture 3 Lexical and Semantic Development
Language Acquisition: Lecture 3 Lexical and Semantic Developmentsuascolleges
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 

Viewers also liked (17)

Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
 
город герой волгоград»
город герой волгоград»город герой волгоград»
город герой волгоград»
 
Learning Dictionaries From Unannotated Data
Learning Dictionaries From Unannotated DataLearning Dictionaries From Unannotated Data
Learning Dictionaries From Unannotated Data
 
Multilingual Event Extraction and Semi-automatic acquisition of related resou...
Multilingual Event Extraction and Semi-automatic acquisition of related resou...Multilingual Event Extraction and Semi-automatic acquisition of related resou...
Multilingual Event Extraction and Semi-automatic acquisition of related resou...
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary Definitions
 
Presentation1
Presentation1Presentation1
Presentation1
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
Mycin
MycinMycin
Mycin
 
HOPFIELD NETWORK
HOPFIELD NETWORKHOPFIELD NETWORK
HOPFIELD NETWORK
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Language Acquisition: Lecture 3 Lexical and Semantic Development
Language Acquisition: Lecture 3 Lexical and Semantic DevelopmentLanguage Acquisition: Lecture 3 Lexical and Semantic Development
Language Acquisition: Lecture 3 Lexical and Semantic Development
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 

Similar to Networks and Natural Language Processing

Document Summarization
Document SummarizationDocument Summarization
Document SummarizationPratik Kumar
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsSharath TS
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING mlaij
 
About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...
About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...
About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...Giovanni Murru
 
Word Space Models and Random Indexing
Word Space Models and Random IndexingWord Space Models and Random Indexing
Word Space Models and Random IndexingDileepa Jayakody
 
Word Space Models & Random indexing
Word Space Models & Random indexingWord Space Models & Random indexing
Word Space Models & Random indexingDileepa Jayakody
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learningcsandit
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)Abhilash Majumder
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationRichard Littauer
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluationijnlc
 
IRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency ParserIRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency ParserIRJET Journal
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 

Similar to Networks and Natural Language Processing (20)

Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
mlss
mlssmlss
mlss
 
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING
 
About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...
About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...
About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disa...
 
Word Space Models and Random Indexing
Word Space Models and Random IndexingWord Space Models and Random Indexing
Word Space Models and Random Indexing
 
Word Space Models & Random indexing
Word Space Models & Random indexingWord Space Models & Random indexing
Word Space Models & Random indexing
 
semeval2016
semeval2016semeval2016
semeval2016
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
 
IRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency ParserIRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency Parser
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 

More from Ahmed Magdy Ezzeldin, MSc.

More from Ahmed Magdy Ezzeldin, MSc. (12)

Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsDistributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
 
Win any Interview like a Boss
Win any Interview like a BossWin any Interview like a Boss
Win any Interview like a Boss
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithms
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
 
Distributed Coordination-Based Systems
Distributed Coordination-Based SystemsDistributed Coordination-Based Systems
Distributed Coordination-Based Systems
 
Distributed Systems Naming
Distributed Systems NamingDistributed Systems Naming
Distributed Systems Naming
 
Cyclcone a safe dialect of C
Cyclcone a safe dialect of CCyclcone a safe dialect of C
Cyclcone a safe dialect of C
 
Objective C Memory Management
Objective C Memory ManagementObjective C Memory Management
Objective C Memory Management
 
Bash Scripting Workshop
Bash Scripting WorkshopBash Scripting Workshop
Bash Scripting Workshop
 
Object Role Modeling
Object Role ModelingObject Role Modeling
Object Role Modeling
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Networks and Natural Language Processing

  • 1. Networks and NLP Networks and Natural Language Processing Presented by: Ahmed Magdy Ezzeldin
  • 2. Graphs in NLP ● Graphs are used in many NLP applications like : - Text Summarization - Syntactic parsing - Word sense disambiguation - Ontology construction - Sentiment and subjectivity analysis - Text clustering ● Associative or semantic networks are used to represent the language units and their relations where language units are the vertices (nodes) and the relations are the edges (links).
  • 3. Networks are Graphs Nodes are Vertices Links are Edges - Node can represent text units can be : (words, collocations, word senses, sentences, documents) - Graph nodes do not have to be of the same category - Edges can represent relations: (co-occurrence, collocation, syntactic dependency, lexical similarity)
  • 4. Outline ● Syntax 1- Dependency Parsing 2- Prepositional Phrase Attachment 3- Co-reference Resolution ● Lexical Semantics 1- Lexical Networks 2- Semantic Similarity and Relatedness 3- Word Sense Disambiguation 4- Sentiment and Subjectivity Analysis ● Other Applications 1- Summarization 2- Semi-supervised Passage Retrieval 3- Keyword Extraction
  • 6. 1- Dependency Parsing  An approach to sentence parsing  Dependency tree of a sentence is a directed subgraph of the full graph connecting all words in the sentence.  So this subgraph is a tree  The root of the tree is the main predicate that takes arguments which are the child nodes
  • 7. ● (McDonald et al, 2005) made a parser that finds the tree with the highest score using CLE (Chu Liu Edmonds) Algorithm of Maximum spanning tree (MST) in a directed graph. ● Each node picks the neighbor with the highest score which will lead to a spanning tree or a cycle ● CLE collapses each cycles into a single node ● CLE runs in O(n^2)
  • 8. ●No tree covers all nodes so the closest 2 nodes are collapsed
  • 9. ● We repeat this step until all nodes are collapsed then an MST is constructed by reversing the procedure and expanding all nodes. ● McDonald achieved excellent results on a standard English data set and even better results on Czech (free word order language)
  • 10. 2- Prepositional Phrase Attachment ● (Toutanova et al., 2004) A preposition like "with" is either attached to the main predicate (high verbal attachment) or the noun phrase before it (low nominal attachment). - “I ate pizza with olives.” - “I ate pizza with a knife.” ● He proposed a semi-supervised learning process where a graph of nouns and verbs is constructed and if 2 words appear in the same context they are connected with an edge. ● Random walk until convergence ●Reached performance of 87.54% classification accuracy which is near the human performance which is 88.20%
  • 11. 3- Co-reference Resolution ● Identifying relations between entity references in a text ● Can be nouns or pronouns ● Approximate the correct assignment of references to entities in a text by using a graph- cut algorithm. Method: A graph is constructed for each entity ● Every entity is linked to all the possible co- reference with weighted edges where weights are the confidence of each co-reference. ● Min-cut partitioning separate each entity and its co-references.
  • 12. Lexical Semantics Semantic Analysis, Machine Translation, Information retrieval, question answering, knowledge acquisition, word sense disambiguation, semantic role labeling, textual entailment, lexical acquisition, semantic relations
  • 13. 1- Lexical Networks a- Unsupervised lexical acquisition (Widdows and Dorow, 2002) Goal: build semantic classes automatically from raw corpora Method: ● Build a co-occurrence graph from British National Corpus where nodes are words linked by conjunction (and/or) ● Over 100,000 nodes and over half a million edges. ● Representative nouns are manually selected and put in a seed set. ● Largest number of links with the seed set is added to the seed
  • 14. Result: Accuracy 82% which is far better than before The drawback of this method is low coverage as it is limited to words in conjunction relation only.
  • 15. 1- Lexical Networks [continued] b- Lexical Network Properties (Ferrer-i-Cancho and Sole, 2001) Goal: ● Observe Lexical Networks properties Method: ● Build a co-occurrence network where words are nodes that are linked with edges if they appear in the same sentences with distance of 2 words at most. ● Half million nodes with over 10 million edges Result: ● Small-world effect: 2-3 jumps can connect any 2 words ● Distribution of node degree is scale-free
  • 16. 2- Semantic Similarity and Relatedness ●Methods include metrics calculated on existing semantic networks like WordNet by applying shortest path algorithms to identify the closest semantic relation between 2 concepts (Leacock et al. 1998) ● Random Walk algorithm (Hughes and Ramage, 2007) ● PageRank gets the stationary distribution of nodes in WordNet biased on each word of an input word pair. ● Divergence between these distributions is calculated to show the words relatedness.
  • 17. 3- Word Sense Disambiguation a- Label Propagation Algorithm (Niu et al. 2005) Method: ● Construct a graph of labeled and unlabeled examples for a given ambiguous word ● Word sense examples are the nodes and weighted edges are drawn by pairwise metric of similarity. ● Known labeled examples are the seed set are assigned with their correct labels (manually) ● Labels are propagated through the graph through the weighted edges ● Labels are assigned with certain probability ● The propagation is repeated until the correct labels are assigned. Result: Performs better than SVM when there is a small number of examples provided.
  • 18. b- Knowledge- based word sense disambiguation (Mihalcea et al. 2004, Sinha and Mihalcea 2007)
  • 19. Method: ● Build a graph for a given text and all the senses of its words as nodes ● Senses are connected on the basis of their semantic relations (synonymy, antonymy ...) ● A random walk results in a set of scores that reflects the importance of each word sense. Result: ● Superior to other Knowledge-based word sense disambiguation that did not use graph based representations. Follow up work: ● Mihalcea did not use semantic relations but she used weighted edges using a measure of lexical similarity ● Brought generality as it can use any electronic dictionary not just a semantic network like WordNet
  • 20. c- Comparative Evaluation of Graph Connectivity Algorithms (Navigli and Lapata, 2007) ● Applied on word sense graphs derived from WordNET ●Found out that the best measure to use is a closeness measure
  • 21. 4- Sentiment and Subjectivity Analysis a- Using min-cut graph algorithm (Pang and Lea 2004) Method: ● Drawing a graph where sentences are the nodes and the edges are drawn according to the sentences proximity ● Each node is assigned a score showing the probability that its sentence is subjective using a supervised subjectivity classifier ● Use min-cut algorithm to separate subjective from objective sentences. Results: ● Better than the supervised subjectivity classifier b- By Assignment subjectivity and polarity labels (Esuli and Sebastiani 2007) Method: ● Random walk on a graph seeded with nodes labeled for subjectivity and polarity.
  • 22.
  • 24. 1- Summarization a- (Salton et al. 1994, 1997) ● Draw a graph of the corpus where every node is a paragraph ● Lexically similar paragraphs are linked with edges ● A summary is retrieved by following paths defined by different algorithms to cover as much of the content of the graph as possible. b- Lexical Centrality (Erkan and Radev 2004) (Mihalcea and Tarau 2004) Method: ● Sentences are nodes of the graph ● Random walk to define the most visited nodes as central to the documents ● Remove duplicates or near duplicates ● Select sentences with maximal marginal relevance
  • 25. 2- Semi-supervised Passage Retrieval ●Question Biased Passage Retrieval (OtterBacher et al., 2005) Answer a question from a group of documents Method: ● Use biased random walk on a graph seeded with positive and negative examples ● Each node is labeled according to the percentage a random walk ends at this node ● The nodes with the highest score are central to the document set and similar to the seed nodes.
  • 26. 3- Keyword Extraction ●A set of terms that best describes the document ●Used in terminology Extraction and construction of domain specific dictionaries
  • 27. Mihalcea and Tarau, 2004 Method: ● Build a co-occurrence graph of for the input text where words are the the text words ● Words are linked by co-occurrence relation limited by the distance between words. ● Random walk on graph ● Words ranked as important important and found next to each other are collapsed into one key phrase Result: ● A lot better than tf.idf
  • 28. References Networks and Natural Language Processing (Mihalcea and Radev 2008) Dragomir Radev University of Michigan radev@umich.edu Rada Mihalcea University of North Texas rada@cs.unt.edu