Cl.week5-6

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
N-grams model Web-scaled N-grams Related Topics The Entropy of Natural Languages Lab
Topics in Computational Linguistics
Week 5: ngrams and language model
Shu-Kai Hsieh
Lab of Ontologies, Language Processing and e-Humanities
GIL, National Taiwan University
March 28, 2014
Topics in Computational Linguistics Shu-Kai Hsieh

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
..1 N-grams model
Evaluation
Smoothing Techniques
..2 Web-scaled N-grams
..3 Related Topics
..4 The Entropy of Natural Languages
..5 Lab

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Language models
• Statistical/probabilistic language models aim to compute
• either the prob. of a sentence or sequence of words,
P(S) = P(w1, w2, w3, ...wn), or
• the prob. of the upcoming word
P(wn|w1, w2, w3, ...wn−1)
(which will turn out to be closely related to computing the
probability of a sequence of words.)
• N-gram model is one of the most important tools in speech
and language processing.
• Varied applications: spelling checker, MT, Speech Recognition,
QA, etc.

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Simple n-gram model
• Let’s start with calculating the P(S), say,
P(S) = P(學, 語言, 很, 有趣)

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Review of Joint and Conditional Probability
• Recall that the conditional prob. of X given Y, P(X|Y), is
deﬁned in terms of the prob. of Y, P(Y), and the joint prob.
of X and Y, P(X, Y):
P(X|Y) =
P(X, Y)
P(Y)

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Review of Chain Rule of Probability
Conversely, the joint prob. P(X, Y) can be expressed in terms of
the conditional prob. P(X|Y).
P(X, Y) =
P(X|Y)
P(Y)
which leads to the chain rule
P(X1, X2, X3, · · · , Xn)
= P(X1)P(X2|X1)P(X3|X1, X2) · · · P(Xn|X1, · · · , Xn−1)
= P(X1)
∏n
i=2 P(Xi|X1, · · · , Xi−1)

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
The Chain Rule applied to calculate joint probability of
words in sentence
chain rule of probability
P(S) = P(wn
1) = P(w1)P(w2|w1)P(w3|w2
1)...P(wn|wn−1
1 )
=
∏n
k=1 P(wk|wk−1
1 )
= P(學) * P(語言|學) * P(很|學語言) * P(有趣|學語言很)

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
How to Estimate these Probabilities?
• Maximum Likelihood Estimation (MLE): by dividing simply
counting in a corpus and normalize them so that they lie
between 0 and 1. (There are of course more sophisticated
algorithms) 1
count and divide
P(嗎 | 學語言很有趣) = Count(學語言很有趣嗎) /
Count(學語言很有趣)
1
MLE sometimes called relative frequency

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Markov Assumption: Don’t look too far into the past
Simpliﬁed idea: instead of computing the prob. of a word given its
entire history, we can approximate the history by just the last few
words.
P(嗎 | 學語言很有趣) ≈ P( 嗎 | 有趣) OR,
P(嗎 | 學語言很有趣) ≈ P( 嗎 | 很有趣 )

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
In other words
• Bi-gram model: approximates the prob. of a word give all the
previous P(wn|wn−1
1 ) by using only the conditional prob. of
the preceding words P(wn|wn−1). Thus generalized as
P(wn|wn−1
1 ) ≈ P(wn|wn−1
n−N+1)
• Tri-gram: (your turn)
• We can extend to trigrams, 4-grams, 5-grams, knowing that
in general this is an insuﬃcient model of language (because
language has long-distance dependencies). 我在一個非
常奇特的機緣巧合之下學梵文

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
In other words
• So given the bi-gram assumption for the prob. of an individual
word, we can compute the prob. of the entire sentence as
P(S) = P(wn
1) ≈
n∏
k=1
P(wk|wk−1)
• recall MLE on JM book equation (4.13)-(4.14)

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Example: Language Modeling of Alice.txt

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Exercise
• Walk through the example of Berkeley Restaurant Project
sentences (PP90-91)
BTW, we used to do everything in log space to avoid underﬂow
(also adding is faster than multiplying)
log(p1 ∗ p2 ∗ p3) = logp1 + logp2 + logp3

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Google n-gram and Google Suggestion

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Generating the Wall Street Journal vs Generating
Shakespeare

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
• Quadrigrams looks like Shakespeare because it is Shakespeare.
• N-gram model is very sensitive to the training corpus!
Overﬁtting issue
• N-grams only work well for word prediction if the test corpus
looks like the training corpus, but in real life, it often doesn’t.
• We need to train a more robust model that generalize, e.g.
Zeros issue, i.e., Things that don’t ever occur in the training
set but occur in the test set.

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Evaluation
..1 N-grams model
Evaluation
..2 Web-scaled N-grams
..3 Related Topics
..4 The Entropy of Natural Languages
..5 Lab

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Evaluation
Evaluating n-gram models
How good is our model? How to make it better(robust)?
• N-gram language models are evaluated by separating the
corpus into a training set and a test set, training the model on
the training set, and evaluating on the test set. An evaluation
metric tells us how well our model does on the test set.
• Extrinsic (in vivo) evaluation
• intrinsic evaluation: perplexity (2H
of of the language model
on a test set is used to compare language models.)

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Evaluation
Evaluation the N-gram Model
But the model relies heavily on the corpus the models were trained
on, and thus often results in overﬁtting!
Example
• Given a vocabulary of 20,000 types, the potential number of
bigrams is 20, 0002 = 400, 000, 000, and with tri-grams, it
amounts to the astronomic ﬁgure of 20, 0003. No corpus yet
has the size to cover the corresponding word combinations.
• MLE gives no hint on how to estimate their prob.
• Here we use smoothing (or discounting) techniques to
estimate prob. of unseen ngrams, presumably because a
distribution without zeros is smoother than one with zeros.Topics in Computational Linguistics Shu-Kai Hsieh

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Evaluation
Perplexity
• The best language model is one that best predicts an unseen
test set (i.e., Gives the highest P(sentence)).
• Perplexity is deﬁned as the inverse probability of the test set,
normalized by the number of words.

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
The intuition of smoothing (from Dan Klein)

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Smoothing n-gram probabilities
• sparse data: the corpus is not big enough to have all the
bigrams covered with a realistic estimate.
• Smoothing algorithms provide a better way of estimating the
probability of n-grams than Maximum Likelihood Estimation.

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
• Laplace Smoothing (a.k.a. add-one method)
• Interpolation
• Backoﬀ
• Good-Turing Estimation(/Discounting)
• Kneser-Ney Smoothing

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Laplace Smoothing
• Pretend we saw each word one more time than we did.
• Re-estimate the counts by just add one to all the counts!
• Read the BeRP examples (JM pp99-100)

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Laplace Smoothing: Comparing with Raw Bigram Counts

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Laplace Smoothing: It’s a blunt estimation
• Too much probability mass is moved to all the zeros.
喧賓奪主: 為了處理大量的 zero，Chinese food 可以少 10 倍!

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
(Katz) Backoff and Interpolation
Intuition
Sometimes it helps to use less context. Condition on less context
for contexts you haven’t learned much about.
• Backoff and Interpolation are another two strategies that
utilize n-grams of variable length.
• Backoff: use trigram if you have good evidence, otherwise
bigram, otherwise unigram.
• Interpolation: mix unigram, bigram, trigram.

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Katz Back-off
• The idea is to use the frequency of longest available n-grams,
and if no n-gram is available to back-off to the (n-1)-gram,
and then to (n-2)-gram, and so on.
• If n = 3, we first try trigrams, then bigrams, and finally
unigrams.

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
P∗
and α?
• P∗: the discounted probability rather than MLE probabilities,
such as Good-Turing.
• α: the normalizing factor

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Linear Interpolation 線性插值
將高階模型和低階模型作線性組合
• Simple interpolation
• Lambdas conditional on context

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Advanced Discounting Techniques
Intuition
To use the count of things you’ve seen once to help estimate the
count of things you’ve never seen.
• Good-Turing
• Witten-Bell
• Kneser-Ney

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Good-Turing Smoothing: Notations
• A word or N-gram (or any event) that occurs once is called
singleton or a hapax legomenon.
• Nc: the number of things we’ve seen c times, i.e., the
frequency of frequency c.
Example (In terms of bigrams)
N0 is the number of bigrams with count 0, N1 the number of
bigrams with count 1 (singleton), etc

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Good-Turing Smoothing:Intuition
[2]:pp101-102

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Good-Turing Smoothing: Answer

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Other advanced Smoothing Techniques

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
How to deal with huge web-scaled ngrams
How might one build a language model (ngrams model) that
allows scaling to very large amounts of training data?
• Naive Pruning: Only store N-grams with count geq
threshold, and remove singletons of higher-order n-grams.
• Entropy-based pruning

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Smoothing for Web-scaled N-grams
“Standard backoff” uses variations of context-dependent backoff,
where p are pre-computed and stored probabilities, and λ are
back-off weights.

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Smoothing for Web-scaled N-grams
“Stupid backoﬀ” [1] don’t apply any discounting and instead
directly use the relative frequencies (S is used instead of P to
emphasize that these are not probabilities but scores):

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
LM Tools and n-gram Resources
• CMU Statistical Language Modeling Toolkit
http://www.speech.cs.cmu.edu/SLM/toolkit.html
• SRILM http://www.speech.sri.com/projects/srilm/
• Google Web1T5-gram http://googleresearch.blogspot.
com/2006/08/all-our-n-gram-are-belong-to-you.html
• Google Book N-grams
• Chinese Web 5-gram http://www.ldc.upenn.edu/
Catalog/catalogEntry.jsp?catalogId=LDC2010T06

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Quick demo of CMU-LM

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Google book ngrams

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
From Corpus-based to Google-based Linguistics
Enhancing Linguistic Search with the Google Books Ngram
Viewer

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
From Corpus-based to Google-based Linguistics
Syntactic N-grams are coming out too!
http://commondatastorage.googleapis.com/books/
syntactic-ngrams/index.html

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Exercise
The Google Web 1T 5-Gram Database — SQLite Index & Web
Interface

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Applications
What Next Words Predication (based on Probabilistic Language
Models) can do today?
source: fandywang,2012
ExampleTopics in Computational Linguistics Shu-Kai Hsieh

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
You’d deﬁnitely like to try this
An Automatic CS Paper Generator
http://pdos.csail.mit.edu/scigen/

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Collocations
• Collocations are recurrent combinations of words.
Example
• Simple collocations are ﬁxed ngrams, such as The Wall Street,
• Collocations with predicative relations involves
morpho-syntactic variations, such as the one linking make and
decision: to make a decision, decisions to be made, made an
important decision, etc.

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Collocations
• Statistically, collocates are events co-occur more often than
by chance.
• Measures used to calculate the strength of word preference are
Mutual Information, t-score and the likelihood ratio.
MI

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Lab
• ngramR for Google book ngram
• python nltk [see extra ipython notebook]
Example
For newbie in python
https://www.coursera.org/course/interactivepython
For quick starter (Develop and host Python from your
browser):https://www.pythonanywhere.com/

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Homework.week5
80% (4.3, JM book p122)
20% 預習 chapter 5 [2]

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Homework.week6
20% 閱讀中研院平衡語料庫說明手冊(
http://app.sinica.edu.tw/kiwi/mkiwi/98-04.pdf),預
習 chapter 6.
80% 實作服貿論述的 language model (data will be provided),由
此建立自動 PRO/CON 文本產生器。

.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
. . .
. . . . . . . . . . . . . . .
Thorsten Brants, Ashok C Popat, Peng Xu, Franz J Och, and
Jeﬀrey Dean.
Large language models in machine translation.
In In Proceedings of the Joint Conference on Empirical
Methods in Natural Language Processing and Computational
Natural Language Learning. Citeseer, 2007.
Dan Jurafsky and James H Martin.
Speech & Language Processing.
Pearson Education India, 2000.

Cl.week5-6

Recommandé

Recommandé

Contenu connexe

Similaire à Cl.week5-6

Similaire à Cl.week5-6 (20)

Cl.week5-6