Using and learning phrases

Образец заголовка
Tutorial on using and
learning phrases from text
by Cassandra Jacobs
Prepared as an assignment for CS410: Text Information Systems in Spring 2016

Образец заголовкаRoadmap
•  What are phrases?
•  Why use phrases?
•  What NLP tasks do phrases help?
•  How do we mine phrases?

Образец заголовкаWhat are phrases?
•  Word combinations
•  Literal and idiomatic meanings
– “kick the bucket” – to die
– “strong coffee” – highly caffeinated,
concentrated
– “data mining” – a particular concept in
computer science

Образец заголовкаWhy phrases?
•  Phrases can express ideas not obvious
from the individual words
– White House (an important building)
– red herring (an anomaly)
– syntactic parsing (a paper topic)
•  Can disambiguate words “for free”
– (river) bank versus (financial) bank

Образец заголовкаPhrases versus words
•  Difficult to extract from text
•  n words, but n2 possible bigrams, n3
trigrams, etc.
– Always rarer than individual words
– Simple measures like frequency can lead to
bad phrases (e.g. “in the”, “is a”, “not our”)

•  Some probabilistic measurements are
good proxies for “phraseness”
•  Mutual information identifies phrases that
occur more often than chance:
p(a,b)
p(a)p(b)

•  Unsupervised methods like topic models
of bigrams often provide strange results
– “I mean”
– “Well I”
•  Distributional similarity/vector methods
require supervision or feedback about
phrase quality

•  Low numbers of observations
– Huge domain differences in whether phrases
are used
•  E.g. ACL submissions encouraged to not use
idiomatic expressions
– Formal versus informal contexts
– Difference between writers’ language
backgrounds

Образец заголовкаTasks where phrases are useful
•  Good phrases should improve or reflect
–  Document classification tasks
–  External knowledge (Wikipedia titles, dictionary)
–  Analogy solving
–  Paraphrase identification
–  Similarity ratings on Amazon Mechanical Turk
–  Machine translation

Task 1: Named entity
recognition
•  Some studies use wiki phrases (headlines)
by taking all the titles and using them in
other tasks
•  Can parse a sentence for entities by
automatically labeling some of the entities
that are in Wikipedia

Identifying wiki phrases for
named entity recognition
•  Polls show DemocratORG
Hillary_ClintonPER and RepublicanORG
Donald_TrumpPER ahead by double-digit
margins
•  Wiki phrases like Hillary_Clinton and
Donald_Trump contain lots of clues that
they are people

Identifying wiki phrases for
named entity recognition
•  Passos, Kumar, & McCallum (2014)
– Bigrams where p(a,b)/(p(a)p(b)) > 1000
– Then top 1M phrases
– Create embeddings from these phrases
– Embeddings used as features in named entity
recognition (NER)
– Using phrase embeddings led to state of the
art NER

Task 2: Using idioms in
sentiment analysis
•  Bag of individual words models would
probably misclassify these two
– “not that bad” à ok
– “not that good” à probably bad
•  Sometimes adding in phrase information
increases noise, runtime

Using idioms in sentiment
analysis
•  Williams et al. (2015) annotated idioms in
context as either positive or negative
– 580 idioms from a language learner textbook
– Regular expressions to identify variants
– “Not that bad” -> neutral
– “A drop in the bucket” -> good
•  Sentiment classification increased from 45
to 60% with addition of idioms

Task 3: Using idioms in phrase
analogies
Toronto: Toronto Mapleleafs ::
Montreal: Montreal Canadiens
– Want to produce complex, non-word output in
an analogy task

Using idioms in phrase
analogies
•  Mikolov et al. (2013)
•  In an analogy task, need to first identify
phrases
– High mutual information score cutoff for
phrase learning
– Train a neural network model to learn
distributed phrase vector representations

Using idioms in phrase
analogies
•  Neural network representations are pairs
of words that are concatenated
– “Toronto Mapleleafs” is treated like a single
word for the model
– Model predicts the contexts given words and
phrases as input
– “Toronto Mapleleafs” and “Montreal
Canadiens” both predict a “hockey” context
when the individual words do not

Образец заголовкаHow to learn phrases?
•  Unsupervised methods
•  Supervised methods

Unsupervised learning of
phrases
•  Some papers focus on how to get good
phrases beyond mutual information
measures
– Shallow parsing with structural constraints (no
“of the United”)
– If a phrase includes another phrase, the whole
phrase must be included (“President of the
United States”)

phrases
•  Cho et al. (2014) propose a model for
machine translation that predicts words
and phrases in a target language
(recursive neural network)
– Input: Word and next word in source language
– Output: Word and next word in target
language

phrases
•  Predicting the next word of a word in a
foreign language helps the model
associate the past with potential future
output
– Phrases learned in the Cho et al. (2014)
model cluster “one to three months” near “for
two months”

Образец заголовкаSupervised learning of phrases
•  Liu et al. (2015) define quality as a
threshold with two properties
– Informativeness within a document (effectively
term frequency/inverse document frequency)
– Concordance (conventionality, judged by
difference between some combinations – e.g.
powerful coffee, strong coffee)
– Like TF-IDF for phrases

Образец заголовкаEvaluation of learned phrases
•  Perplexity of the data given the model
– Higher perplexity means less data explained
– When a model captures more dependencies
in the data, phrases included are good (El-Kishky
et al., 2015)
– This metric works better for some domains
than others (e.g. Yelp)

Образец заголовкаEvaluations of phrases
•  El-Kishky et al. (2015) also compared
retrieved phrases against Wikipedia titles
– If in Wikipedia, then this is a very good phrase
– If not, harder to evaluate
– Works for some domains but maybe not
others (e.g. abstracts and papers)

Образец заголовкаCurrent state of research
•  No gold standard for evaluating whether a
phrase is good or not
– Many available datasets and applications
– Less clear how to learn phrases in an
unsupervised framework
– Many models implicitly or explicitly use mutual
information and background language models
as filters

Образец заголовкаReferences
Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R. Voss, Jiawei Han, "Scalable Topical
Phrase Mining from Text Corpora", PVLDB Vol. 8 (Also, Proc. 2015 Int. Conf. on Very Large
Data Bases (VLDB'15), Kohala Coast, Hawaii, Sept. 2015).
Liu, J., Shang, J., Wang, C., Ren, X., & Han, J. (2015, May). Mining quality phrases from
massive text corpora. In Proceedings of the 2015 ACM SIGMOD International Conference on
Management of Data (pp. 1729-1744). ACM.
Passos, A., Kumar, V., & McCallum, A. (2014). Lexicon infused phrase embeddings for named
entity resolution. arXiv preprint arXiv:1404.5367.
Williams, L., Bannister, C., Arribas-Ayllon, M., Preece, A., & Spasić, I. (2015). The role of idioms
in sentiment analysis. Expert Systems with Applications, 42, 7375-7385.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed
representations of words and phrases and their compositionality. In Advances in neural
information processing systems (pp. 3111-3119).
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., &
Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical
machine translation. arXiv preprint arXiv:1406.1078.

Using and learning phrases

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Using and learning phrases

Similaire à Using and learning phrases (20)

Dernier

Dernier (20)

Using and learning phrases