grammer

Parsing Techniques
for
Lexicalized
Context-Free Grammars*
Giorgio Satta

University of Padua
* Joint work with : Jason Eisner, Mark-Jan Nederhof

Summary
• Part I: Lexicalized Context-Free Grammars
– motivations and definition
– relation with other formalisms

• Part II: standard parsing
– TD techniques
– BU techniques

• Part III: novel algorithms
– BU enhanced
– TD enhanced

Lexicalized grammars
• each rule specialized for one or more
lexical items
• advantages over non-lexicalized
formalisms:
– express syntactic preferences that are
sensitive to lexical words
– control word selection

Syntactic preferences
• adjuncts
Workers [ dumped sacks ] into a bin
*Workers dumped [ sacks into a bin ]

• N-N compound
[ hydrogen ion ] exchange
*hydrogen [ ion exchange ]

Word selection
• lexical
Nora convened the meeting
?Nora convened the party

• semantics
Peggy solved two puzzles
?Peggy solved two goats

• world knowledge
Mary shelved some books
?Mary shelved some cooks

Lexicalized CFG
Motivations :
• study computational properties common
to generative formalisms used in
state-of-the-art real-world parsers
• develop parsing algorithm that can be
directly applied to these formalisms

Lexicalized CFG
VP[dump][sack]
VP[dump][sack]
V[dump]

NP[sack]

PP[into][bin]
P[into]

N[sack]

dumped

sacks

NP[bin]
Det[a]

into

a

N[bin]

bin

Lexicalized CFG
Context-free grammars with :
• alphabet VT:
– dumped, sacks, into, ...

• delexicalized nonterminals VD:
– NP, VP, ...

• nonterminals VN:
– NP[sack], VP[dump][sack], ...

Lexicalized CFG
Delexicalized nonterminals encode :
• word sense
N, V, ...

• grammatical features
number, tense, ...

• structural information
bar level, subcategorization state, ...

• other constraints
distribution, contextual features, ...

Lexicalized CFG
• productions have two forms :
– V[dump] → dumped
– VP[dump][sack]
→ VP[dump][sack] PP[into][bin]

• lexical elements in lhs inherited from rhs

Lexicalized CFG
• production is k-lexical :
k occurrences of lexical elements in rhs
– NP[bin] → Det[a] N[bin]
is 2-lexical
– VP[dump][sack]
→ VP[dump][sack] PP[into][bin]
is 4-lexical

LCFG at work
• 2-lexical CFG
– Alshawi 1996

: Head Automata

– Eisner 1996

: Dependency Grammars

– Charniak 1997

: CFG

– Collins 1997

: generative model

LCFG at work
Probabilistic LCFG G is strongly equivalent
to probabilistic grammar G’ iff
• 1-2-1 mapping between derivations
• each direction is a homomorphism
• derivation probabilities are preserved

LCFG at work
From Charniak 1997 to 2-lex CFG :
NPS[profits]
NNP[profits]

ADJNP[corporate]

Pr1 (corporate | ADJ, NP, profits) ×
Pr1 (profits | N, NP, profits) ×

Pr2 ( NP → ADJ N | NP, S, profits)

LCFG at work
From Collins 1997 (Model #2) to 2-lex CFG :

〈VPS, {}, ∆left ⊕ ∆, ∆S ⊕ ∆〉 [bought]
〈N, ∆〉 [IBM]

〈VPS, {NP-C}, ∆left , ∆S 〉 [bought]

Prleft (NP, IBM | VP, S, bought, ∆left , {NP-C})

LCFG at work
Major Limitation :
Cannot capture relations involving
lexical items outside actual constituent
(cfr. history based models)
NP[d1][d0]
V[d0]

d0

NP[d1]

d1

PP[d2][d3]

d2

cannot look at d0
when computing
PP attachment

LCFG at work
• lexicalized context-free parsers that
are not LCFG :
– Magerman 1995

: Shift-Reduce+

– Ratnaparkhi 1997

: Shift-Reduce+

– Chelba & Jelinek 1998

: Shift-Reduce+

– Hermjakob & Mooney 1997

: LR

Related work
Other frameworks for the study of
lexicalized grammars :
• Carroll & Weir 1997 :
Stochastic Lexicalized Grammars;
emphasis on expressiveness
• Goodman 1997 :
Probabilistic Feature Grammars;
emphasis on parameter estimation

Standard Parsing
• standard parsing algorithms
(CKY, Earley, LC, ...) run on LCFG in time
O ( | G | × |w |3 )
• for 2-lex CFG (simplest case) |G | grows
with |VD|3 × |VT|2 !!
Goal :
Get rid of |VT| factors

Standard Parsing: TD
Result (to be refined) :
Algorithms satisfying the correct-prefix
property are “unlikely” to run on LCFG in
time independent of VT

Correct-prefix property
Earley, Left-Corner, GLR, ... :
S

w
left-to-right reading position

On-line parsing
No grammar precompilation (Earley) :

G
w

Parser

Output

Result :
On-line parsers with correct-prefix property
cannot run in time O ( f(|VD|, |w |) ),
for any function f

Off-line parsing
Grammar is precompiled (Left-Corner, LR) :

w

Parser

C(G )

G

PreComp

Output

Fact :
We can simulate a nondeterministic
FA M on w in time O ( |M | × |w | )
Conjecture :
Fix a polynomial p.
We cannot simulate M on w in time
p( |w | ) unless we spend exponential
time in precompiling M

Assume our conjecture holds true
Result :
Off-line parsers with correct-prefix property
cannot run in time O ( p(|VD|, |w |) ),
for any polynomial p, unless we spend
exponential time in precompiling G

Standard Parsing: BU
Common practice in lexicalized grammar
parsing :
• select productions that are lexically
grounded in w
• parse BU with selected subset of G
Problem :
Algorithm removes |VT| factors but
introduces new |w | factors !!

Standard Parsing: BU
Time charged :
• i, k, j
⇒ |w |3

A[d2]
B[d1]

i

d1

C[d2]

k

d2

• A, B, C
j

• d 1, d 2

Running time is O ( |VD|3 × |w |5 ) !!

⇒ |VD|3
⇒ |w |2

Standard BU : Exhaustive
100000

10000

y = c x 5,2019
1000

time

BU naive
100

10

1
10

100

length

Standard BU : Pruning
10000

1000

time

y = c x 3,8282
BU naive

100

10

1
10

100

length

BU enhanced
Result :
Parsing with 2-lex CFG in time
O ( |VD|3 × |w |4 )
Remark :
Result transfers to models in Alshawi 1996,
Eisner 1996, Charniak 1997, Collins 1997
Remark :
Technique extends to improve parsing of
Lexicalized-Tree Adjoining Grammars

Algorithm #1
Basic step in naive BU :
A[d2]
B[d1]

i

d1

C[d2]

k

d2

j

Idea :
Indices d1 and j can be processed
independently

Algorithm #1
• Step 1

A[d2]

• Step 2

d1

⇒

C[d2]

B[d1]
i

A[d2]

k

A[d2]

k

d2

k

i

d2

C[d2]
i

C[d2]

j

d2

A[d2]

⇒
i

d2

j

BU enhanced
Upper bound provided by Algorithm #1 :
O (|w |4 )
Goal :
Can we go down to O (|w |3 ) ?

Spine
The spine of a parse tree is the path from
the root to the root’s head
S[buy]
S[buy]
NP[IBM]
IBM

AdvP[week]

VP[buy]
V[buy]
bought

last week

NP[Lotus]
Lotus

Spine projection
The spine projection is the yield of the subtree composed by the spine and all its
sibling nodes
S[buy]
S[buy]
NP[IBM]
IBM

AdvP[week]

VP[buy]
V[buy]
bought

last week

NP[Lotus]
Lotus

NP[IBM] bought NP[Lotus] AdvP[week]

Split Grammars
Split spine projections at head :

??
Problem :
how much information do we need to store
in order to construct new grammatical
spine projections from splits ?

Split Grammars
Fact :
Set of spine projections is a linear contextfree language
Definition :
2-lex CFG is split if set of spine projections
is a regular language
Remark :
For split grammars, we can recombine
splits using finite information

Split Grammars
Non-split grammar :

S[d]
AdvP[a]

S1[d]
S[d]

AdvP[a]

AdvP[b]

S1[d]
S[d]

AdvP[b]

• unbounded # of
dependencies
between left and
right dependents
of head

…

• linguistically
unattested and
unlikely

Split Grammars
S[d]
AdvP[a] S[d]
S[d]

AdvP[b]

…

Split grammar :
finite # of
dependencies
between left and
right dependents
of lexical head

Split Grammars
Precompile grammar such that splits are
derived separately
S[buy]

S[buy]

S[buy] AdvP[week]
NP[IBM] VP[buy]
V[buy]

NP[IBM] r3[buy]

⇒
NP[Lotus]

bought

r3[buy] is a split symbol

r2[buy]

AdvP[week]

r1[buy]

NP[Lotus]

bought

Split Grammars
• t : max # of states per spine automaton
• g : max # of split symbols per spine
automaton (g < t )
• m : # of delexicalized nonterminals
thare are maximal projections

BU enhanced
Result :
Parsing with split 2-lexical CFG in time
O ( t 2 g 2 m 2 | w |3 )
Remark:
Models in Alshawi 1996, Charniak 1997
and Collins 1997 are not split

Algorithm #2
Idea :

B[d]

• recognize left and
right splits separately

d
B[d]

B[d]
s

d

s

d

• collect head
dependents one
split at a time

Algorithm #2

NP[IBM]

bought

NP[Lotus]

AdvP[week]

Algorithm #2
• Step 1

k

s1

s2
d2

d1

d2

• Step 2

d1

B[d1]

s2

d1

i

⇒

r1

B[d1]
s1

B[d1]

r2

r2

r2

r2

⇒
s1

s2

s2
d2

i

d2

Algorithm #2 : Exhaustive
100000

y = c x 5,2019
10000

y = c x 3,328
1000

time

BU split
BU naive

100

10

1
10

100

length

Algorithm #2 : Pruning
10000

y = c x 3,8282
1000

time

y = c x 2,8179

BU naive
BU split

100

10

1
10

100

length

Related work
Cubic time algorithms for lexicalized
grammars :
• Sleator & Temperley 1991 :
Link Grammars
• Eisner 1997 :
Bilexical Grammars (improved by
transfer of Algorithm #2)

TD enhanced
Goal :
Introduce TD prediction for 2-lexical
CFG parsing, without |VT| factors
Remark :
Must relax left-to-right parsing
(because of previous results)

TD enhanced
Result :
TD parsing with 2-lex CFG in time
O ( |VD|3 × |w |4 )
Open :
O ( |w |3 ) extension to split grammars

TD enhanced
Strongest version of correct-prefix property :
S

w
reading position

Data Structures
Prods with lhs A[d] :

Trie for A[d] :

• A[d] → X1[d1] X2[d2]

d1

• A[d] → Y1[d3] Y2[d2]
• A[d] → Z1[d2] Z2[d1]

d2

d2
d1

d3

Data Structures
Rightmost subsequence recognition
by precompiling input w into a
deterministic FA :
a

b
a

b

c
a

c
a

c
a

b

b

Algorithm #3
S

A[d]

i

j

Item representation :
• i, j indicate extension
of A[d] partial analysis

k

• k indicates rightmost
possible position for
completion of A[d]
analysis

Algorithm #3 : Prediction
A[d2]
C[d2]

B[d1]
d1
i

d2
k’

k

• Step 1 :
find rightmost
subsequence
before k for some
A[d2] production
• Step 2 :
make Earley
prediction

Conclusions
• standard parsing techniques are not
suitable for processing lexicalized
grammars
• novel algorithms have been introduced
using enhanced dynamic programming
• work to be done :
extension to history-based models

The End

Many thanks for helpful discussion to :
Jason Eisner, Mark-Jan Nederhof

grammer

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (15)

En vedette

En vedette (12)

Similaire à grammer

Similaire à grammer (20)

Dernier

Dernier (20)

grammer

Notes de l'éditeur