2. Today
• Finite-state methods
• English Morphology
• Finite-State Transducers
03/27/12 Speech and Language Processing - Jurafsky and Martin 2
3. Regular Expressions and Text
Searching
• Everybody does it
Emacs, vi, grep, Microsoft Words, etc..
• Regular expressions are a compact textual
representation of a set of strings
representing a language.
• The regular expression is used for specifying
text strings in all sorts of text processing and
information extraction applications.
Speech and Language Processing - Jurafsky and Martin
03/27/12 3
4. Regular Expressions and Text
Searching
• A string is a sequence of symbols;
• For most text-based search techniques, a
string is any sequence of alphanumeric
characters (letters, numbers, spaces, tabs,
and punctuation).
• A regular expression is an algebraic notation
for characterizing a set of strings.
03/27/12 Speech and Language Processing - Jurafsky and Martin 4
6. Example
• Find all the instances of the word “the” in
a text.
/the/
/[tT]he/
/b[tT]heb/
03/27/12 Speech and Language Processing - Jurafsky and Martin 6
7. Errors
• The process we just went through was
based on two fixing kinds of errors
Matching strings that we should not have
matched (there, then, other)
False positives (Type I)
Not matching things that we should have
matched (The)
False negatives (Type II)
03/27/12 Speech and Language Processing - Jurafsky and Martin 7
8. Errors
• Reducing the error rate for an application
often involves two antagonistic efforts:
Increasing accuracy, or precision, (minimizing
false positives)
Increasing coverage, or recall, (minimizing
false negatives).
03/27/12 Speech and Language Processing - Jurafsky and Martin 8
9. Finite State Automata
• Regular expressions can be viewed as a
textual way of specifying the structure of
finite-state automata (FSA).
• FSAs and their probabilistic relatives
capture significant aspects of what
linguists say we need for morphology and
parts of syntax.
03/27/12 Speech and Language Processing - Jurafsky and Martin 9
10. FSAs as Graphs
• Let’s start with the sheep language from
Chapter 2
/baa+!/
• A directed graph, a finite set of vertices
(nodes), together with a set of directed
links between pairs of vertices called arcs.
03/27/12 Speech and Language Processing - Jurafsky and Martin 10
11. Sheep FSA
• We can say the following things about this
machine
It has 5 states
b, a, and ! are in its alphabet
q0 is the start state
q4 is an accept state
It has 5 transitions
03/27/12 Speech and Language Processing - Jurafsky and Martin 11
12. More Formally
• You can specify an FSA by enumerating
the following things.
The set of states: Q
A finite alphabet: Σ
A start state
A set of accept/final states
A transition function that maps QxΣ to Q
03/27/12 Speech and Language Processing - Jurafsky and Martin 12
13. About Alphabets
• Don’t take term alphabet word too
narrowly; it just means we need a finite
set of symbols in the input.
• These symbols can and will stand for
bigger objects that can have internal
structure.
03/27/12 Speech and Language Processing - Jurafsky and Martin 13
15. Yet Another View
• The guts of FSAs b a ! e
can ultimately be 0 1
represented as
tables 1 2
2 2,3
If you’re in state 1
and you’re looking at
an a, go to state 2
3 4
4
03/27/12 Speech and Language Processing - Jurafsky and Martin 15
16. Recognition
• Recognition is the process of determining if
a string should be accepted by a machine
• Or… it’s the process of determining if a
string is in the language we’re defining with
the machine
• Or… it’s the process of determining if a
regular expression matches a string
• Those all amount the same thing in the end
03/27/12 Speech and Language Processing - Jurafsky and Martin 16
17. Recognition
• Traditionally, (Turing’s notion) this process is
depicted with a tape.
03/27/12 Speech and Language Processing - Jurafsky and Martin 17
18. Recognition
• Simply a process of starting in the start
state
• Examining the current input
• Consulting the table
• Going to a new state and updating the
tape pointer.
• Until you run out of tape.
03/27/12 Speech and Language Processing - Jurafsky and Martin 18
19. D-Recognize
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
03/27/12 Speech and Language Processing - Jurafsky and Martin 19
20. Key Points
• Deterministic means that at each point in
processing there is always one unique thing
to do (no choices).
• D-recognize is a simple table-driven
interpreter.
• The algorithm is universal for all
unambiguous regular languages.
To change the machine, you simply change the
table.
03/27/12 Speech and Language Processing - Jurafsky and Martin 20
21. Recognition as Search
• You can view this algorithm as a trivial kind
of state-space search.
• States are pairings of tape positions and
state numbers.
• Operators are compiled into the table
• Goal state is a pairing with the end of tape
position and a final accept state
03/27/12 Speech and Language Processing - Jurafsky and Martin 21
22. Generative Formalisms
• Formal Languages are sets of strings
composed of symbols from a finite set of
symbols.
• Finite-state automata define formal
languages (without having to enumerate all
the strings in the language)
• The term Generative is based on the view
that you can run the machine as a generator
to get strings from the language.
03/27/12 Speech and Language Processing - Jurafsky and Martin 22
23. Generative Formalisms
• FSAs can be viewed from two
perspectives:
Acceptors that can tell you if a string is in the
language
Generators to produce all and only the strings
in the language
03/27/12 Speech and Language Processing - Jurafsky and Martin 23
24. Non-Determinism
DFSA
NDFSA
03/27/12 Speech and Language Processing - Jurafsky and Martin 24
25. Non-Determinism cont.
• Yet another technique
Epsilon transitions
Key point: these transitions do not examine or
advance the tape during recognition
03/27/12 Speech and Language Processing - Jurafsky and Martin 25
26. Equivalence
• Non-deterministic machines can be
converted to deterministic ones with a
fairly simple construction
• That means that they have the same
power; non-deterministic machines are
not more powerful than deterministic
ones in terms of the languages they can
accept
03/27/12 Speech and Language Processing - Jurafsky and Martin 26
27. ND Recognition
• Two basic approaches (used in all major
implementations of regular expressions,
see Friedl 2006)
1. Either take a ND machine and convert it to a
D machine and then do recognition with
that.
2. Or explicitly manage the process of
recognition as a state-space search (leaving
the machine as is).
03/27/12 Speech and Language Processing - Jurafsky and Martin 27
28. Non-Deterministic
Recognition: Search
• In a ND FSA there exists at least one path
through the machine for a string that is in the
language defined by the machine.
• But not all paths directed through the machine
for an accept string lead to an accept state.
• No paths through the machine lead to an accept
state for a string not in the language.
03/27/12 Speech and Language Processing - Jurafsky and Martin 28
29. Non-Deterministic
Recognition
• So success in non-deterministic
recognition occurs when a path is found
through the machine that ends in an
accept.
• Failure occurs when all of the possible
paths for a given string lead to failure.
03/27/12 Speech and Language Processing - Jurafsky and Martin 29
30. Example
b a a a !
q0 q1 q2 q2 q3 q4
03/27/12 Speech and Language Processing - Jurafsky and Martin 30
31. Example
03/27/12 Speech and Language Processing - Jurafsky and Martin 31
32. Example
03/27/12 Speech and Language Processing - Jurafsky and Martin 32
33. Example
03/27/12 Speech and Language Processing - Jurafsky and Martin 33
34. Example
03/27/12 Speech and Language Processing - Jurafsky and Martin 34
35. Example
03/27/12 Speech and Language Processing - Jurafsky and Martin 35
36. Example
03/27/12 Speech and Language Processing - Jurafsky and Martin 36
37. Example
03/27/12 Speech and Language Processing - Jurafsky and Martin 37
38. Example
03/27/12 Speech and Language Processing - Jurafsky and Martin 38
39. Key Points
• States in the search space are pairings of
tape positions and states in the machine.
• By keeping track of as yet unexplored
states, a recognizer can systematically
explore all the paths through the machine
given an input.
03/27/12 Speech and Language Processing - Jurafsky and Martin 39
40. Why Bother?
• Non-determinism doesn’t get us more
formal power and it causes headaches so
why bother?
More natural (understandable) solutions
03/27/12 Speech and Language Processing - Jurafsky and Martin 40
41. Today
• Finite-state methods
• English Morphology
• Finite-State Transducers
03/27/12 Speech and Language Processing - Jurafsky and Martin 41
42. Words
• Finite-state methods are particularly useful in dealing
with a lexicon
• Many devices, most with limited memory, need access to
large lists of words
• And they need to perform fairly sophisticated tasks with
those lists
• So we’ll first talk about some facts about words and then
come back to computational methods
03/27/12 Speech and Language Processing - Jurafsky and Martin 42
43. English Morphology
• Morphology is the study of the ways that
words are built up from smaller
meaningful units called morphemes
• We can usefully divide morphemes into
two classes
Stems: The core meaning-bearing units
Affixes: Bits and pieces that adhere to stems
to change their meanings and grammatical
functions
03/27/12 Speech and Language Processing - Jurafsky and Martin 43
44. English Morphology
• We can further divide morphology up into
two broad classes
Inflectional
Derivational
03/27/12 Speech and Language Processing - Jurafsky and Martin 44
45. Inflectional Morphology
• Inflectional morphology concerns the
combination of stems and affixes where the
resulting word:
Has the same word class as the original
Serves a grammatical/semantic purpose that is
Different from the original
But is nevertheless transparently related to the
original
03/27/12 Speech and Language Processing - Jurafsky and Martin 45
46. Nouns and Verbs in English
• Nouns are simple
Markers for plural and possessive
• Verbs are only slightly more complex
Markers appropriate to the tense of the verb
03/27/12 Speech and Language Processing - Jurafsky and Martin 46
47. Regulars and Irregulars
• It is a little complicated by the fact that
some words misbehave (refuse to follow
the rules)
Mouse/mice, goose/geese, ox/oxen
Go/went, fly/flew
• The terms regular and irregular are used
to refer to words that follow the rules and
those that don’t
03/27/12 Speech and Language Processing - Jurafsky and Martin 47
48. Regular and Irregular Verbs
• Regulars…
Walk, walks, walking, walked, walked
• Irregulars
Eat, eats, eating, ate, eaten
Catch, catches, catching, caught, caught
Cut, cuts, cutting, cut, cut
03/27/12 Speech and Language Processing - Jurafsky and Martin 48
49. Inflectional Morphology
• So inflectional morphology in English is
fairly straightforward
• But is complicated by the fact that are
irregularities
03/27/12 Speech and Language Processing - Jurafsky and Martin 49
50. Derivational Morphology
• Derivational morphology is the messy stuff
that no one ever taught you.
Quasi-systematicity
Irregular meaning change
Changes of word class
03/27/12 Speech and Language Processing - Jurafsky and Martin 50
51. Derivational Examples
• Verbs and Adjectives to Nouns
-ation computerize computerization
-ee appoint appointee
-er kill killer
-ness fuzzy fuzziness
03/27/12 Speech and Language Processing - Jurafsky and Martin 51
52. Derivational Examples
• Nouns and Verbs to Adjectives
-al computation computational
-able embrace embraceable
-less clue clueless
03/27/12 Speech and Language Processing - Jurafsky and Martin 52
53. Example: Compute
• Many paths are possible…
• Start with compute
Computer -> computerize -> computerization
Computer -> computerize -> computerizable
• But not all paths/operations are equally good
(allowable?)
Clue
Clue -> *clueable
03/27/12 Speech and Language Processing - Jurafsky and Martin 53
54. Morpholgy and FSAs
• We’d like to use the machinery provided
by FSAs to capture these facts about
morphology
Accept strings that are in the language
Reject strings that are not
And do so in a way that doesn’t require us to
in effect list all the words in the language
03/27/12 Speech and Language Processing - Jurafsky and Martin 54
55. Start Simple
• Regular singular nouns are ok
• Regular plural nouns have an -s on the
end
• Irregulars are ok as is
03/27/12 Speech and Language Processing - Jurafsky and Martin 55
57. Now Plug in the Words
03/27/12 Speech and Language Processing - Jurafsky and Martin 57
58. Derivational Rules
If everything is an accept state
how do things ever get rejected?
03/27/12 Speech and Language Processing - Jurafsky and Martin 58
59. Parsing/Generation
vs. Recognition
• We can now run strings through these machines
to recognize strings in the language
• But recognition is usually not quite what we need
Often if we find some string in the language we might
like to assign a structure to it (parsing)
Or we might have some structure and we want to
produce a surface form for it (production/generation)
• Example
From “cats” to “cat +N +PL”
03/27/12 Speech and Language Processing - Jurafsky and Martin 59
60. Today
• Finite-state methods
• English Morphology
• Finite-State Transducers
03/27/12 Speech and Language Processing - Jurafsky and Martin 60
61. Finite State Transducers
• The simple story
Add another tape
Add extra symbols to the transitions
On one tape we read “cats”, on the other we
write “cat +N +PL”
03/27/12 Speech and Language Processing - Jurafsky and Martin 61
62. FSTs
03/27/12 Speech and Language Processing - Jurafsky and Martin 62
63. Applications
• The kind of parsing we’re talking about is
normally called morphological analysis
• It can either be
• An important stand-alone component of many
applications (spelling correction, information
retrieval)
• Or simply a link in a chain of further linguistic
analysis
03/27/12 Speech and Language Processing - Jurafsky and Martin 63
64. Transitions
c:c a:a t:t +N: ε +PL:s
• c:c means read a c on one tape and write a c on the other
• +N:ε means read a +N symbol on one tape and write nothing on
the other
• +PL:s means read +PL and write an s
03/27/12 Speech and Language Processing - Jurafsky and Martin 64
65. Typical Uses
• Typically, we’ll read from one tape using
the first symbol on the machine transitions
(just as in a simple FSA).
• And we’ll write to the second tape using
the other symbols on the transitions.
03/27/12 Speech and Language Processing - Jurafsky and Martin 65
66. Ambiguity
• Recall that in non-deterministic recognition
multiple paths through a machine may
lead to an accept state.
• Didn’t matter which path was actually
traversed
• In FSTs the path to an accept state does
matter since different paths represent
different parses and different outputs will
result
03/27/12 Speech and Language Processing - Jurafsky and Martin 66
67. Ambiguity
• What’s the right parse (segmentation) for
• Unionizable
• Union-ize-able
• Un-ion-ize-able
• Each represents a valid path through the
derivational morphology machine.
03/27/12 Speech and Language Processing - Jurafsky and Martin 67
68. Ambiguity
• There are a number of ways to deal with
this problem
• Simply take the first output found
• Find all the possible outputs (all paths) and
return them all (without choosing)
• Bias the search so that only one or a few
likely paths are explored
03/27/12 Speech and Language Processing - Jurafsky and Martin 68
69. The Gory Details
• Of course, its not as easy as
• “cat +N +PL” <-> “cats”
• As we saw earlier there are geese, mice and
oxen
• But there are also a whole host of
spelling/pronunciation changes that go along
with inflectional changes
• Cats vs Dogs
• Fox and Foxes
03/27/12 Speech and Language Processing - Jurafsky and Martin 69
70. Multi-Tape Machines
• To deal with these complications, we will
add more tapes and use the output of one
tape machine as the input to the next
• So to handle irregular spelling changes
we’ll add intermediate tapes with
intermediate symbols
03/27/12 Speech and Language Processing - Jurafsky and Martin 70
71. Multi-Level Tape Machines
• We use one machine to transduce between the
lexical and the intermediate level, and another
to handle the spelling changes to the surface
tape
03/27/12 Speech and Language Processing - Jurafsky and Martin 71
72. Lexical to Intermediate
Level
03/27/12 Speech and Language Processing - Jurafsky and Martin 72
73. Intermediate to Surface
• The add an “e” rule as in fox^s# <-> foxes#
03/27/12 Speech and Language Processing - Jurafsky and Martin 73
74. Foxes
03/27/12 Speech and Language Processing - Jurafsky and Martin 74
75. Note
• A key feature of this machine is that it
doesn’t do anything to inputs to which it
doesn’t apply.
• Meaning that they are written out
unchanged to the output tape.
03/27/12 Speech and Language Processing - Jurafsky and Martin 75
76. Overall Scheme
• We now have one FST that has explicit
information about the lexicon (actual
words, their spelling, facts about word
classes and regularity).
• Lexical level to intermediate forms
• We have a larger set of machines that
capture orthographic/spelling rules.
• Intermediate forms to surface forms
03/27/12 Speech and Language Processing - Jurafsky and Martin 76
78. Cascades
• This is an architecture that we’ll see again
and again
• Overall processing is divided up into distinct
rewrite steps
• The output of one layer serves as the input to
the next
• The intermediate tapes may or may not wind
up being useful in their own right
03/27/12 Speech and Language Processing - Jurafsky and Martin 78
81. Composition
• Create a set of new states that
correspond to each pair of states from
the original machines (New states are
called (x,y), where x is a state from M1,
and y is a state from M2)
• Create a new FST transition table for the
new machine according to the following
intuition…
03/27/12 Speech and Language Processing - Jurafsky and Martin 81
82. Composition
• There should be a transition between two
states in the new machine if it’s the case
that the output for a transition from a
state from M1, is the same as the input to
a transition from M2 or…
03/27/12 Speech and Language Processing - Jurafsky and Martin 82
83. Composition
• δ3((xa,ya), i:o) = (xb,yb) iff
There exists c such that
δ1(xa, i:c) = xb AND
δ2(ya, c:o) = yb
03/27/12 Speech and Language Processing - Jurafsky and Martin 83