Machine Learning for Language Learning and Processing

Machine Learning and NLP
Research Interests

Machine Learning for Language Learning and
Processing

Sharon Goldwater, Frank Keller, Mirella Lapata,
Victor Lavrenko, Mark Steedman

School of Informatics
University of Edinburgh
keller@inf.ed.ac.uk

October 15, 2008

Goldwater et al. Machine Learning for Language 1

Machine Learning and NLP
Research Interests

1 Machine Learning and NLP
Latent Variables
Multi-class and Structured Variables
Discrete, Sparse Data
Other Problems

2 Research Interests
Parsing
Language Acquisition
Language Generation
Information Retrieval


Latent Variables
Machine Learning and NLP Multi-class and Structured Variables
Research Interests Discrete, Sparse Data
Other Problems

Latent Variables
Other Problems

Parsing
Language Generation


Latent Variables
Other Problems

Latent Variables

Natural language processing (NLP) problems typically involve
inferring latent (non-observed) variables.
given a bilingual text, infer an alignment;
given a string of words, infer a parse tree.


Latent Variables
Other Problems


The learning targets in NLP often are multi-class, e.g., in part of
speech tagging:
standard POS tag sets for English have around 60 classes;
more elaborate ones around 150 (CLAWS6);
morphological annotation often increases the size of the tag
set (e.g., Bulgarian, around 680 tags).

Everything in the sale will have been used in ﬁlms .
PNI PRP AT0 NN1 VM0 VHI VBN VVN PRP NN2 PUN


Latent Variables
Other Problems

NLP tasks are often sequencing tasks, rather than simple
classiﬁcation:

PNI → PRP → AT0 → NN1 → VM0 → VHI → VBN → VVN . . .

Many NLP tasks are not classiﬁcation but often involve
hierarchical structure, e.g., in parsing:

Most structures in the Penn Treebank are unique.

Latent Variables
Other Problems

Linguistic data is diﬀerent from standard ML data (speech, vision):
typically discrete (characters, words, texts);
follows a Zipﬁan distribution.


Latent Variables
Other Problems


The Zipﬁan distribution leads to ubiquitous data sparseness:
standard maximum likelihood estimation doesn’t work well for
linguistic data;
a large number of smoothing techniques have been developed
to deal with this problem;
most of them are ad hoc; Bayesian methods are a principled
alternative.

G ∼ PY(d, θ, G0 )


Latent Variables
Other Problems

Other Problems

NLP typically uses pipeline models; errors propagate;
models often highly domain-dependent (models for broadcast
news will not work well for biomedical text, etc.);
there is no single error function to optimize; evaluation
metrics diﬀer from task to task (BLEU for MT, ROUGE for
summarization, PARSEVAL for parsing).


Parsing
Machine Learning and NLP Language Acquisition
Research Interests Language Generation

Latent Variables
Other Problems

Parsing
Language Generation


Parsing

Parsing (Keller, Steedman)
Current focus of research in probabilistic parsing:
models for more expressive syntactic representations (CCG,
TAG, dependency grammar);
semi-supervised induction of grammars and parsing models;
cognitive modeling:
incrementality;
limited parallelism, limited memory;
evaluation against behavioral data.
The pilot embarrassed John and put himself in a very awkward situation.
11 111111 1
00 000000 0
11
00
111111 1
000000 0
1
0
11111 1111
00000 0000
1
0 1
0
1
0 11
00
11
00 1
0
1
0
1 2 3 4 5 6

First fixation time = 5 1 111111111 1111111111 1
0 000000000 0000000000 0
11
00
1
0 11
00 1
0
gaze duration = 5+6 7 8 9
Total time = 5+6+8+10
Second pass time = 8+10 11
00
11
00 1
0
1
0
Skipping rate: e.g. put 11
00 1
0
10 11


Parsing

Language Acquisition (Goldwater, Steedman)

Research focuses on Bayesian models for improving unsupervised
NLP and understanding of human language acquisition:
What constraints/biases are needed for effective
generalization?
How can different sources of information be successfully
combined?
ML methods and problems:
infinite models, esp. those for sequences/hierarchies;
incremental, memory-limited inference methods;
joint inference of different kinds of linguistic information
(e.g., morphology and syntax).


Parsing

Language Generation (Lapata)

Research focuses on data-driven models for language generation:
fluent and coherent text that resembles human writing;
general modeling framework for different input types (time
series data, pictures, logical forms).
ML methods and problems:
mathematical programming for sentence compression and
summarization;
latent variable models for image caption generation;
models have to integrate conflicting constraints and varying
linguistic representations.


Parsing

Language Generation (Lapata)

Example: image captioning beyond keywords.

troop, Israel, force, ceaseﬁre, soldiers
Thousands of Israeli troops are in Lebanon as the ceaseﬁre begins.


Parsing

Information Retrieval (Lavrenko)
Universal search:
learn to relate relevant text/images/products/DB records;
data: high-dimensional, extremely sparse, but dimensionality
reduction is a bad idea;
targets: focused information needs, not broad categories;
semi-supervised: lots of unlabeled data, few judgments.
Learning to rank:
partial preferences → ranking function;
objective: non-smooth, can be very expensive to evaluate.
Novelty detection:
example: identify ﬁrst reports of events in the news;
supervised task, but hard to learn anything from labels;
best approaches unsupervised, performance very low.

Machine Learning for Language Learning and Processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning for Language Learning and Processing

Similar to Machine Learning for Language Learning and Processing (20)

More from butest

More from butest (20)

Machine Learning for Language Learning and Processing