SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
[Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count
[Pickhardt+ ACL2014] A Generalized Language Model as the
Comination of Skipped n-grams and Modified Kneser-Ney Smoothing
2014/7/12 ACL Reading @ PFI
Nakatani Shuyo, Cybozu Labs Inc.
Kneser-Ney Smoothing
[Kneser+ 1995]
• Discounting & Interpolation
𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
max 𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷, 0
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+
𝐷
𝑐 𝑤𝑖−𝑛+1
𝑖−1
𝑁1+ 𝑤𝑖−𝑛+1
𝑖−1
∙ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2
𝑖−1
• where
𝑤 𝑚
𝑛 = 𝑤 𝑚 ⋯ 𝑤 𝑛, 𝑁1+ 𝑤 𝑚
𝑛 ⋅ = 𝑤𝑖|𝑐 𝑤 𝑚
𝑛 𝑤𝑖 > 0
Number of
Discounting
Modified KN-Smoothing
[Chen+ 1999]
𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷 𝑤𝑖−𝑛+1
𝑖
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+ 𝛾 𝑤𝑖−𝑛+1
𝑖−1
𝑃 𝑤𝑖 𝑤𝑖−𝑛+2
𝑖−1
• where 𝐷 𝑐 = 0 if 𝑐 = 0,
𝐷1 if 𝑐 = 1, 𝐷2 if 𝑐 = 2, _ 𝐷3+ if 𝑐 ≥ 3
𝛾 𝑤𝑖−𝑛+1
𝑖−1
=
[amount of discounting]
𝑐 𝑤𝑖−𝑛+1
𝑖−1
Weighted Discounting
(D_n are estimated by leave-1-out CV)
[Zhang+ ACL2014] Kneser-Ney
Smoothing on Expected Count
• When each sentence has fractional
weight
– Domain adaptation
– EM-algorithm on word alignment
• Propose KN-smoothing using expected
fractional counts
I’m interested in it!
Model
• 𝒖 means 𝑤𝑖−𝑛+1
𝑖−1
, and 𝒖′ means 𝑤𝑖−𝑛+2
𝑖−1
• A sequence 𝒖𝑤 occurs 𝑘 times and each
occurring has probability 𝑝𝑖
(𝑖 = 1, ⋯ , 𝑘) as weight,
• then count 𝑐(𝒖𝑤) is distributed according to
Poisson Binomial Distribution.
• 𝑝 𝑐 𝑢𝑤 = 𝑟 = 𝑠 𝑘, 𝑟 , where
𝑠 𝑘, 𝑟 =
𝑠 𝑘 − 1, 𝑟 1 − 𝑝 𝑘
+ 𝑠 𝑘 − 1, 𝑟 − 1 𝑝 𝑘
if 0 ≤ 𝑟 ≤ 𝑘
1 if 𝑘 = 𝑟 = 0
0 otherwise
MLE on this model
• Expectations
– 𝔼 𝑐 𝒖𝑤 = 𝑟 ⋅ 𝑝 𝑐 𝒖𝑤 = 𝑟𝑟
– 𝔼 𝑁𝑟 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 = 𝑟𝑤
– 𝔼 𝑁𝑟+ 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 ≥ 𝑟𝑤
• Maximize (expected) likelihood
– 𝔼 𝐿 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤
= 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤
– obtain 𝑝MLE 𝑤 𝒖 =
𝔼 𝑐 𝒖𝑤
𝔼 𝑐 𝒖⋅
Expected Kneser-Ney
• 𝑐 𝒖𝑤 =
max 0, 𝑐 𝒖𝑤 − 𝐷 + 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′
)
• So, 𝔼 𝑐 𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 − 𝑝 𝑐 𝒖𝑤 > 0 𝐷 +
𝔼 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′
)
– where 𝑝′ 𝑤 𝒖′
=
𝔼 𝑁1+ ⋅𝒖′ 𝑤
𝔼 𝑁1+ ⋅𝒖′⋅
• then 𝑝 𝑤 𝒖 =
𝔼 𝑐 𝒖𝑤
𝔼 𝑐 𝒖⋅
Language model adaptation
• Our corpus consists on
– large general-domain data and
– small specific domain data
• Sentence 𝒘 ‘s weight:
– 𝑝 𝒘 is in − domain =
1
1+exp −𝐻 𝒘
– where 𝐻 𝒘 =
log 𝑝in 𝒘 −log 𝑝out 𝒘
𝒘
,
– 𝑝in:lang. model of in-domain, 𝑝out: out’s one
• Figure 1: On the language model adaptation task, expected KN outperforms all
other methods across all sizes of selected subsets. Integral KN is applied to
unweighted instances, while fractional WB, fractional KN and expected KN are
applied to weighted instances. (via [Zhang+ ACL2014])
from general-domain data
in-domain data
- training: 54k
- testing: 3k
192
162
156
148
Why isn't there
Modified KN as a
baseline?
[Pickhardt+ ACL2014] A Generalized Language Model
as the Comination of Skipped n-grams
and Modified Kneser-Ney Smoothing
• Higher-order n-grams are very sparse
– Especially remarkable on small data(e.g.
domain specific data!)
• Improve performance for small data
by skipped n-grams and Modified KN-
smoothing
– Perplexity reduces 25.7% for very small
training data of only 736KB text
“Generalized Language Models”
• 𝜕3 𝑤1 𝑤2 𝑤3 𝑤4 = 𝑤1 𝑤2_𝑤4
– “_” means a word placeholder
𝑃GLM 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷 𝑐 𝑤𝑖−𝑛+1
𝑖
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+𝛾high 𝑤𝑖−𝑛+1
𝑖−1 1
𝑛 − 1
𝑃GLM
𝑛−1
𝑗=1
𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
𝑃GLM 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
=
𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛
𝑖
− 𝐷 𝑐 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖
𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
∗
+𝛾mid 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1 1
𝑛 − 2
𝑃GLM 𝑤𝑖 𝜕𝑗 𝜕 𝑘 𝑤𝑖−𝑛+1
𝑖−1
𝑛−1
𝑘=1,𝑘≠𝑗
• The bold arrows correspond to interpolation of models in traditional
modified Kneser-Ney smoothing. The lighter arrows illustrate the
additional interpolations introduced by our generalized language
models. (via [Pickhardt+ ACL2014])
• shrunk training data
sets for the English
Wikipedia
small domain
specific data
Space Complexity
model size = 9.5GB
# of entries = 427M
model size = 15GB
# of entries = 742M
References
• [Zhang+ ACL2014] Kneser-Ney Smoothing
on Expected Count
• [Pickhardt+ ACL2014] A Generalized
Language Model as the Comination of
Skipped n-grams and Modified Kneser-Ney
Smoothing
• [Kneser+ 1995] Improved backing-off for m-
gram language modeling
• [Chen+ 1999] An Empirical Study of
Smoothing Techniques for Language Modeling

Contenu connexe

En vedette

ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測Shuyo Nakatani
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRShuyo Nakatani
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?Shuyo Nakatani
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5Shuyo Nakatani
 
KB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKoji Matsuda
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani
 
Acl読み会2014
Acl読み会2014Acl読み会2014
Acl読み会2014tempra28
 
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...Hiroyuki TOKUNAGA
 
階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割tn1031
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"nozyh
 
Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsNaoaki Okazaki
 
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...Preferred Networks
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門Shuyo Nakatani
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPShuyo Nakatani
 
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...Yuya Unno
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 

En vedette (20)

ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
 
KB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみた
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
Acl読み会2014
Acl読み会2014Acl読み会2014
Acl読み会2014
 
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
 
階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problems
 
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
 
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
LDA入門
LDA入門LDA入門
LDA入門
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 

Similaire à ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"

Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksChenYiHuang5
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in FinanceAltoros
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Manchor Ko
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptxZhiwuGuo1
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2weekcsl9496
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesJinho Lee
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional SemanticsEncoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional SemanticsYubing Dong
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptxKarasuLee
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
ICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationAkisato Kimura
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationSantiagoGarridoBulln
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Asafak Husain
 

Similaire à ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing" (20)

QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in Finance
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2week
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
暗認本読書会6
暗認本読書会6暗認本読書会6
暗認本読書会6
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional SemanticsEncoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
ICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged information
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
 

Plus de Shuyo Nakatani

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15Shuyo Nakatani
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksShuyo Nakatani
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)Shuyo Nakatani
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Shuyo Nakatani
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRShuyo Nakatani
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章Shuyo Nakatani
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章Shuyo Nakatani
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyoShuyo Nakatani
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...Shuyo Nakatani
 
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShuyo Nakatani
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing SystemsShuyo Nakatani
 
極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定Shuyo Nakatani
 
人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編Shuyo Nakatani
 
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekShuyo Nakatani
 
言語判定へのいざない
言語判定へのいざない言語判定へのいざない
言語判定へのいざないShuyo Nakatani
 
∞-gram を使った短文言語判定
∞-gram を使った短文言語判定∞-gram を使った短文言語判定
∞-gram を使った短文言語判定Shuyo Nakatani
 
CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011Shuyo Nakatani
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツShuyo Nakatani
 
CRF を使った Web 本文抽出
CRF を使った Web 本文抽出CRF を使った Web 本文抽出
CRF を使った Web 本文抽出Shuyo Nakatani
 

Plus de Shuyo Nakatani (19)

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
 
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-Gram
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
 
極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定
 
人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編
 
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a Week
 
言語判定へのいざない
言語判定へのいざない言語判定へのいざない
言語判定へのいざない
 
∞-gram を使った短文言語判定
∞-gram を使った短文言語判定∞-gram を使った短文言語判定
∞-gram を使った短文言語判定
 
CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ
 
CRF を使った Web 本文抽出
CRF を使った Web 本文抽出CRF を使った Web 本文抽出
CRF を使った Web 本文抽出
 

Dernier

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Dernier (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"

  • 1. [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing 2014/7/12 ACL Reading @ PFI Nakatani Shuyo, Cybozu Labs Inc.
  • 2. Kneser-Ney Smoothing [Kneser+ 1995] • Discounting & Interpolation 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = max 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷, 0 𝑐 𝑤𝑖−𝑛+1 𝑖−1 + 𝐷 𝑐 𝑤𝑖−𝑛+1 𝑖−1 𝑁1+ 𝑤𝑖−𝑛+1 𝑖−1 ∙ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2 𝑖−1 • where 𝑤 𝑚 𝑛 = 𝑤 𝑚 ⋯ 𝑤 𝑛, 𝑁1+ 𝑤 𝑚 𝑛 ⋅ = 𝑤𝑖|𝑐 𝑤 𝑚 𝑛 𝑤𝑖 > 0 Number of Discounting
  • 3. Modified KN-Smoothing [Chen+ 1999] 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷 𝑤𝑖−𝑛+1 𝑖 𝑐 𝑤𝑖−𝑛+1 𝑖−1 + 𝛾 𝑤𝑖−𝑛+1 𝑖−1 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2 𝑖−1 • where 𝐷 𝑐 = 0 if 𝑐 = 0, 𝐷1 if 𝑐 = 1, 𝐷2 if 𝑐 = 2, _ 𝐷3+ if 𝑐 ≥ 3 𝛾 𝑤𝑖−𝑛+1 𝑖−1 = [amount of discounting] 𝑐 𝑤𝑖−𝑛+1 𝑖−1 Weighted Discounting (D_n are estimated by leave-1-out CV)
  • 4. [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count • When each sentence has fractional weight – Domain adaptation – EM-algorithm on word alignment • Propose KN-smoothing using expected fractional counts I’m interested in it!
  • 5. Model • 𝒖 means 𝑤𝑖−𝑛+1 𝑖−1 , and 𝒖′ means 𝑤𝑖−𝑛+2 𝑖−1 • A sequence 𝒖𝑤 occurs 𝑘 times and each occurring has probability 𝑝𝑖 (𝑖 = 1, ⋯ , 𝑘) as weight, • then count 𝑐(𝒖𝑤) is distributed according to Poisson Binomial Distribution. • 𝑝 𝑐 𝑢𝑤 = 𝑟 = 𝑠 𝑘, 𝑟 , where 𝑠 𝑘, 𝑟 = 𝑠 𝑘 − 1, 𝑟 1 − 𝑝 𝑘 + 𝑠 𝑘 − 1, 𝑟 − 1 𝑝 𝑘 if 0 ≤ 𝑟 ≤ 𝑘 1 if 𝑘 = 𝑟 = 0 0 otherwise
  • 6. MLE on this model • Expectations – 𝔼 𝑐 𝒖𝑤 = 𝑟 ⋅ 𝑝 𝑐 𝒖𝑤 = 𝑟𝑟 – 𝔼 𝑁𝑟 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 = 𝑟𝑤 – 𝔼 𝑁𝑟+ 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 ≥ 𝑟𝑤 • Maximize (expected) likelihood – 𝔼 𝐿 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤 – obtain 𝑝MLE 𝑤 𝒖 = 𝔼 𝑐 𝒖𝑤 𝔼 𝑐 𝒖⋅
  • 7. Expected Kneser-Ney • 𝑐 𝒖𝑤 = max 0, 𝑐 𝒖𝑤 − 𝐷 + 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′ ) • So, 𝔼 𝑐 𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 − 𝑝 𝑐 𝒖𝑤 > 0 𝐷 + 𝔼 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′ ) – where 𝑝′ 𝑤 𝒖′ = 𝔼 𝑁1+ ⋅𝒖′ 𝑤 𝔼 𝑁1+ ⋅𝒖′⋅ • then 𝑝 𝑤 𝒖 = 𝔼 𝑐 𝒖𝑤 𝔼 𝑐 𝒖⋅
  • 8. Language model adaptation • Our corpus consists on – large general-domain data and – small specific domain data • Sentence 𝒘 ‘s weight: – 𝑝 𝒘 is in − domain = 1 1+exp −𝐻 𝒘 – where 𝐻 𝒘 = log 𝑝in 𝒘 −log 𝑝out 𝒘 𝒘 , – 𝑝in:lang. model of in-domain, 𝑝out: out’s one
  • 9. • Figure 1: On the language model adaptation task, expected KN outperforms all other methods across all sizes of selected subsets. Integral KN is applied to unweighted instances, while fractional WB, fractional KN and expected KN are applied to weighted instances. (via [Zhang+ ACL2014]) from general-domain data in-domain data - training: 54k - testing: 3k 192 162 156 148 Why isn't there Modified KN as a baseline?
  • 10. [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing • Higher-order n-grams are very sparse – Especially remarkable on small data(e.g. domain specific data!) • Improve performance for small data by skipped n-grams and Modified KN- smoothing – Perplexity reduces 25.7% for very small training data of only 736KB text
  • 11. “Generalized Language Models” • 𝜕3 𝑤1 𝑤2 𝑤3 𝑤4 = 𝑤1 𝑤2_𝑤4 – “_” means a word placeholder 𝑃GLM 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷 𝑐 𝑤𝑖−𝑛+1 𝑖 𝑐 𝑤𝑖−𝑛+1 𝑖−1 +𝛾high 𝑤𝑖−𝑛+1 𝑖−1 1 𝑛 − 1 𝑃GLM 𝑛−1 𝑗=1 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 𝑃GLM 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 = 𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛 𝑖 − 𝐷 𝑐 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖 𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 ∗ +𝛾mid 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 1 𝑛 − 2 𝑃GLM 𝑤𝑖 𝜕𝑗 𝜕 𝑘 𝑤𝑖−𝑛+1 𝑖−1 𝑛−1 𝑘=1,𝑘≠𝑗
  • 12. • The bold arrows correspond to interpolation of models in traditional modified Kneser-Ney smoothing. The lighter arrows illustrate the additional interpolations introduced by our generalized language models. (via [Pickhardt+ ACL2014])
  • 13. • shrunk training data sets for the English Wikipedia small domain specific data
  • 14. Space Complexity model size = 9.5GB # of entries = 427M model size = 15GB # of entries = 742M
  • 15. References • [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count • [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing • [Kneser+ 1995] Improved backing-off for m- gram language modeling • [Chen+ 1999] An Empirical Study of Smoothing Techniques for Language Modeling