15. [[x = y]] =
(
1 if x = y
0 if x 6= y
MEMM
MEMM
X Y
※
( )
HMM
※
Yt 1 Yt Yt+1
Xt+1XtXt 1
Z(Xt, Yt 1)
Ps(Yt|Xt) =
1
Z(Xt, s)
exp
X
a
afa(Xt, Yt)
!
P(Y |X) =
Y
t
Ps(Yt|Xt)[[Yt 1 = s]]
s ME
16. f<begins-with-number,question>
= 1
features 2
Usenet FAQ
begins-with-number
begins-with-ordinal
begins-with-punctuation
begins-with-question-word
begins-with-subject
blank
contains-alphanum
contains-bracketed-number
contains-http
contains-non-space
contains-number
contains-pipe
contains-question-mark
contains-question-word
ends-with-question-mark
first-alpha-is-capitalized
indented
indented-1-to-4
indented-5-to-10
more-than-one-third-space
only-punctuation
prev-is-blank
prev-begins-with-ordinal
shorter-than-30
Xt 1
head, question,
answer, tail
t
question
f<b,s>(Xt, Yt) =
(
1 if b(Xt) is true and Yt = s
0 otherwise
※ 1
t question
17. 1. sS
2. t = 1, ..., T – 1
3.
4. t = T – 1, ..., 1
Viterbi
←
ˆy = arg max
y
P(y|x)
tx = x1 · · · xT x1 · · · xt si
1(si|x) = P(si|sS, x1)
t+1(si|x) = max
sj
[ t(sj|x)P(si|sj, xt+1)]
⇥t(si|x) = arg max
sj
[ t(sj|x)P(si|sj, xt+1)]
max
y
P(y|x) = max
sj
T (sj|x)
ˆyT = arg max
sj
T (sj|x)
ˆyt = t(ˆyt+1|x)
t(si|x) = max
y1···yt 1
P(y1, · · · , yt 1, Yt = si|x1, · · · , xt)
18. (x(1)
, y(1)
), · · · , (x(n)
, y(n)
)
MEMM
Generalized Iterative Scaling
1. o, s C
※
2.
3. x
4.
5. 3, 4 s ME
fc(o, s) 0 8
o, sfc(o, s) = C
X
a
fa(o, s)
C =
X
a
fa(o, s)
˜E[fa] =
1
n
nX
i=1
1
m
(i)
s
X
t:yt 1=s
fa(x
(i)
t , y
(i)
t )
E[fa] =
1
n
nX
i=1
1
m
(i)
s
X
t:yt 1=s
X
y2S
Ps(y|xt, )fa(x
(i)
t , y)
new
a = a +
1
C
log
˜E[fa]
E[fa]
!
22. log
Y
t
P(Yt|Yt 1)P(Xt|Yt)
!
=
X
t
{log P(Yt|Yt 1) + log P(Xt|Yt)}
=
X
t
8
<
:
X
<s,s0>
[[Yt 1 = s0
]][[Yt = s]] log P(s|s0
) +
X
<o,s>
[[Xt = o]][[Yt = s]] log P(o|s)
9
=
;
=
X
<s,s0>
log P(s|s0
)[[Yt 1 = s0
]][[Yt = s]] +
X
t,<o,s>
log P(o|s)[[Xt = o]][[Yt = s]]
P(Y |X) =
1
P(X)
exp
0
@
X
t,<o,s>
<o,s>f<o,s>(Yt 1, Yt, X, t) +
X
t,<s,s0>
µ<s,s0>g<s,s0>(Yt, X, t)
1
A
X
o
exp( <o,s>) = 1
X
s
exp(µ<s,s0>) = 1
HMM CRF
CRF
P(X, Y ) = P(Y )P(X|Y ) =
Y
t
P(Yt|Yt 1)P(Xt|Xt)
<o,s> f<o,s>(Yt 1, Yt, X, t) g<s,s0>(Yt, X, t)µ<s,s0>
23. ˆyT = arg max
sm
T (x, sm)
x skt k
1. sS
2. k = 1, ..., T – 1
3.
4. t = T – 1, ..., 1
Viterbi
←
ˆyt = k(x, ˆyt+1)
1(x, sl) = h1(sS, sl, x)
k+1(x, sl) = max
sm
[ k(x, sm) + hk+1(sm, sl, x)]
⇥k(x, sl) = arg max
sm
[ k(x, sm) + hk+1(sm, sl, x)]
k(x, sl) = max
y1···yk 1
"k 1X
t=1
ht(yt 1, yt, x) + hk(yk 1, sl, x)
#
ht(Yt 1, Yt, X) =
X
i
ifi(Yt 1, Yt, X, t) +
X
j
µjgj(Yt, X, t)
ˆy = arg max
y
P(Y |X) = arg max
y
2
4
X
t,i
ifi(Yt 1, Yt, X, t) +
X
t,j
µjgj(Yt, X, t)
3
5
24. (|S| + 2) × (|S| + 2)
|S| + 2
|S| + 2
sS
sm
sE
sl
Mt(X)
Mt(sl, sm|X) = exp ht(sl, sm, X)
↵t(X)
0(Y |X) =
(
1 if Y = sS
0 otherwise
T +1(Y |X) =
(
1 if Y = sE
0 otherwise
t(X)T
= t 1(X)T
Mt(X)
t(X)
t(X) = Mt+1(X) t+1(X)
sS sEMt(X)
↵t(X)
t(X)
25. C MEMM
(x(1)
, y(1)
), · · · , (x(n)
, y(n)
)
Generalized Iterative Scaling
1. C
※
2.
3. x
4.
5. 3, 4
C =
X
t,i
fi(yt 1, yt, x, t) +
X
t,j
gj(yt, x, t)
c(x, y) = C
X
t,i
fi(yt 1, yt, x, t)
X
t,j
gj(yt, x, t)
new
i = i +
1
C
log
˜E[fi]
E[fi]
!
E[fi] =
1
n
nX
k=1
X
t
X
sl,sm
t 1(sl|x(k)
, , µ)Mt(sl, sm|x(k)
, , µ)⇥t(sm|x(k)
, , µ)
Z(x(k)| , µ)
fi(sl, sm, x(k)
, t)
Z(x) =
Y
t
Mt(x)
!
sS ,sE
P(Yt 1 = sl, Yt = sm|x, , µ)
˜E[fi] =
1
n
nX
k=1
X
t
fi(y
(k)
t 1, y
(k)
t , x(k)
, t)
c(x(k)
, y(k)
) 0 k = 1, · · · , n
26. • , ( ). . , 1999.
• A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for
information extraction and segmentation. Proc. ICML, pp. 591-598, 2000.
• J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic
models for segmenting and labeling sequence data. Proc. ICML, pp. 282-289
, 2001.
• Charles Elkan. Log-Linear Models and Conditional Random Fields. Notes for a
tutorial at CIKM, 2008.
• Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report
MS-CIS-04-21. Department of Computer and Information Science, University of
Pennsylvania, 2004.
• , , . Conditional Random Fields
. , pp. 89-96, 2004.
• http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706b.pdf
• http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706a.pdf
31. E(yC|x) =
X
j
jtj(yi 1, yi, x, i)
X
k
µksk(yi, x, i)
sk(yi 1, yi, x, i)
E(yC|x) =
X
j
jfj(yi 1, yi, x, i)
E
y_i-1 y_i
( )
y_i
2
p(y|x) =
1
Z
Y
C
C(yC|x) =
1
Z
exp
X
C
E(yC|x)
!
32. i yi C
CRF
E(yC|x) =
X
j
jfj(yi 1, yi, x, i)
X
C
E(yC|x) =
X
C
X
j
jfj(yi 1, yi, x, i)
=
X
j
jFj(y, x)
Fj(y|x) =
X
C
jfj(yi 1, yi, x, i)
p(y|x) =
1
Z
exp
0
@
X
j
jFj(y, x)
1
A