SlideShare une entreprise Scribd logo
1  sur  83
Speech and Language
     Processing


        Lecture 03
   Chapter 2 and 3 of SLP
Today
    • Finite-state methods
    • English Morphology
    • Finite-State Transducers




03/27/12        Speech and Language Processing - Jurafsky and Martin   2
Regular Expressions and Text
                Searching
 • Everybody does it
    Emacs, vi, grep, Microsoft Words, etc..

 • Regular expressions are a compact textual
   representation of a set of strings
   representing a language.

 • The regular expression is used for specifying
   text strings in all sorts of text processing and
   information extraction applications.
               Speech and Language Processing - Jurafsky and Martin
03/27/12                                                              3
Regular Expressions and Text
                Searching
 • A string is a sequence of symbols;

 • For most text-based search techniques, a
   string is any sequence of alphanumeric
   characters (letters, numbers, spaces, tabs,
   and punctuation).

 • A regular expression is an algebraic notation
   for characterizing a set of strings.
03/27/12       Speech and Language Processing - Jurafsky and Martin   4
Regular Expressions




03/27/12     Speech and Language Processing - Jurafsky and Martin   5
Example
    • Find all the instances of the word “the” in
      a text.
            /the/
            /[tT]he/
            /b[tT]heb/




03/27/12           Speech and Language Processing - Jurafsky and Martin   6
Errors

     • The process we just went through was
       based on two fixing kinds of errors
            Matching strings that we should not have
             matched (there, then, other)
              False positives (Type I)
            Not matching things that we should have
             matched (The)
              False negatives (Type II)




03/27/12               Speech and Language Processing - Jurafsky and Martin   7
Errors

 • Reducing the error rate for an application
   often involves two antagonistic efforts:
        Increasing accuracy, or precision, (minimizing
         false positives)
        Increasing coverage, or recall, (minimizing
         false negatives).




03/27/12            Speech and Language Processing - Jurafsky and Martin   8
Finite State Automata
    • Regular expressions can be viewed as a
      textual way of specifying the structure of
      finite-state automata (FSA).

    • FSAs and their probabilistic relatives
      capture significant aspects of what
      linguists say we need for morphology and
      parts of syntax.


03/27/12         Speech and Language Processing - Jurafsky and Martin   9
FSAs as Graphs
    • Let’s start with the sheep language from
      Chapter 2
            /baa+!/




    • A directed graph, a finite set of vertices
      (nodes), together with a set of directed
      links between pairs of vertices called arcs.
03/27/12               Speech and Language Processing - Jurafsky and Martin   10
Sheep FSA

  • We can say the following things about this
    machine
        It has 5 states
        b, a, and ! are in its alphabet
        q0 is the start state
        q4 is an accept state
        It has 5 transitions




03/27/12              Speech and Language Processing - Jurafsky and Martin   11
More Formally
    • You can specify an FSA by enumerating
      the following things.
            The set of states: Q
            A finite alphabet: Σ
            A start state
            A set of accept/final states
            A transition function that maps QxΣ to Q




03/27/12              Speech and Language Processing - Jurafsky and Martin   12
About Alphabets
    • Don’t take term alphabet word too
      narrowly; it just means we need a finite
      set of symbols in the input.

    • These symbols can and will stand for
      bigger objects that can have internal
      structure.



03/27/12        Speech and Language Processing - Jurafsky and Martin   13
Dollars and Cents




03/27/12     Speech and Language Processing - Jurafsky and Martin   14
Yet Another View

  • The guts of FSAs                                                       b a   !   e
    can ultimately be                                   0                  1
    represented as
    tables                                              1                    2
                                                        2                    2,3
            If you’re in state 1
            and you’re looking at
            an a, go to state 2
                                                        3                        4
                                                        4




03/27/12            Speech and Language Processing - Jurafsky and Martin                 15
Recognition

  • Recognition is the process of determining if
    a string should be accepted by a machine
  • Or… it’s the process of determining if a
    string is in the language we’re defining with
    the machine
  • Or… it’s the process of determining if a
    regular expression matches a string
  • Those all amount the same thing in the end

03/27/12       Speech and Language Processing - Jurafsky and Martin   16
Recognition
    • Traditionally, (Turing’s notion) this process is
      depicted with a tape.




03/27/12           Speech and Language Processing - Jurafsky and Martin   17
Recognition
    • Simply a process of starting in the start
      state
    • Examining the current input
    • Consulting the table
    • Going to a new state and updating the
      tape pointer.
    • Until you run out of tape.



03/27/12         Speech and Language Processing - Jurafsky and Martin   18
D-Recognize




                                    QuickTime™ and a
                          TIFF (Uncompressed) decompressor
                             are needed to see this picture.




03/27/12   Speech and Language Processing - Jurafsky and Martin   19
Key Points
    • Deterministic means that at each point in
      processing there is always one unique thing
      to do (no choices).
    • D-recognize is a simple table-driven
      interpreter.
    • The algorithm is universal for all
      unambiguous regular languages.
            To change the machine, you simply change the
             table.

03/27/12             Speech and Language Processing - Jurafsky and Martin   20
Recognition as Search

    • You can view this algorithm as a trivial kind
      of state-space search.
    • States are pairings of tape positions and
      state numbers.
    • Operators are compiled into the table
    • Goal state is a pairing with the end of tape
      position and a final accept state



03/27/12         Speech and Language Processing - Jurafsky and Martin   21
Generative Formalisms
  • Formal Languages are sets of strings
    composed of symbols from a finite set of
    symbols.
  • Finite-state automata define formal
    languages (without having to enumerate all
    the strings in the language)
  • The term Generative is based on the view
    that you can run the machine as a generator
    to get strings from the language.


03/27/12       Speech and Language Processing - Jurafsky and Martin   22
Generative Formalisms
    • FSAs can be viewed from two
      perspectives:
            Acceptors that can tell you if a string is in the
             language
            Generators to produce all and only the strings
             in the language




03/27/12               Speech and Language Processing - Jurafsky and Martin   23
Non-Determinism



   DFSA




   NDFSA




03/27/12     Speech and Language Processing - Jurafsky and Martin   24
Non-Determinism cont.
    • Yet another technique
            Epsilon transitions
            Key point: these transitions do not examine or
             advance the tape during recognition




03/27/12              Speech and Language Processing - Jurafsky and Martin   25
Equivalence
     • Non-deterministic machines can be
       converted to deterministic ones with a
       fairly simple construction
     • That means that they have the same
       power; non-deterministic machines are
       not more powerful than deterministic
       ones in terms of the languages they can
       accept



03/27/12        Speech and Language Processing - Jurafsky and Martin   26
ND Recognition
    •      Two basic approaches (used in all major
           implementations of regular expressions,
           see Friedl 2006)
           1. Either take a ND machine and convert it to a
              D machine and then do recognition with
              that.
           2. Or explicitly manage the process of
              recognition as a state-space search (leaving
              the machine as is).


03/27/12              Speech and Language Processing - Jurafsky and Martin   27
Non-Deterministic
           Recognition: Search

    • In a ND FSA there exists at least one path
      through the machine for a string that is in the
      language defined by the machine.
    • But not all paths directed through the machine
      for an accept string lead to an accept state.
    • No paths through the machine lead to an accept
      state for a string not in the language.



03/27/12          Speech and Language Processing - Jurafsky and Martin   28
Non-Deterministic
              Recognition
    • So success in non-deterministic
      recognition occurs when a path is found
      through the machine that ends in an
      accept.
    • Failure occurs when all of the possible
      paths for a given string lead to failure.




03/27/12        Speech and Language Processing - Jurafsky and Martin   29
Example




                b          a                a             a                !        

           q0        q1             q2               q2             q3         q4

03/27/12            Speech and Language Processing - Jurafsky and Martin                30
Example




03/27/12   Speech and Language Processing - Jurafsky and Martin   31
Example




03/27/12   Speech and Language Processing - Jurafsky and Martin   32
Example




03/27/12   Speech and Language Processing - Jurafsky and Martin   33
Example




03/27/12   Speech and Language Processing - Jurafsky and Martin   34
Example




03/27/12   Speech and Language Processing - Jurafsky and Martin   35
Example




03/27/12   Speech and Language Processing - Jurafsky and Martin   36
Example




03/27/12   Speech and Language Processing - Jurafsky and Martin   37
Example




03/27/12   Speech and Language Processing - Jurafsky and Martin   38
Key Points
    • States in the search space are pairings of
      tape positions and states in the machine.
    • By keeping track of as yet unexplored
      states, a recognizer can systematically
      explore all the paths through the machine
      given an input.




03/27/12        Speech and Language Processing - Jurafsky and Martin   39
Why Bother?
    • Non-determinism doesn’t get us more
      formal power and it causes headaches so
      why bother?
            More natural (understandable) solutions




03/27/12              Speech and Language Processing - Jurafsky and Martin   40
Today
    • Finite-state methods
    • English Morphology
    • Finite-State Transducers




03/27/12        Speech and Language Processing - Jurafsky and Martin   41
Words
    • Finite-state methods are particularly useful in dealing
      with a lexicon
    • Many devices, most with limited memory, need access to
      large lists of words
    • And they need to perform fairly sophisticated tasks with
      those lists
    • So we’ll first talk about some facts about words and then
      come back to computational methods




03/27/12            Speech and Language Processing - Jurafsky and Martin   42
English Morphology
    • Morphology is the study of the ways that
      words are built up from smaller
      meaningful units called morphemes
    • We can usefully divide morphemes into
      two classes
            Stems: The core meaning-bearing units
            Affixes: Bits and pieces that adhere to stems
             to change their meanings and grammatical
             functions


03/27/12              Speech and Language Processing - Jurafsky and Martin   43
English Morphology
    • We can further divide morphology up into
      two broad classes
            Inflectional
            Derivational




03/27/12              Speech and Language Processing - Jurafsky and Martin   44
Inflectional Morphology
    • Inflectional morphology concerns the
      combination of stems and affixes where the
      resulting word:
            Has the same word class as the original
            Serves a grammatical/semantic purpose that is
              Different from the original
              But is nevertheless transparently related to the
               original




03/27/12               Speech and Language Processing - Jurafsky and Martin   45
Nouns and Verbs in English
    • Nouns are simple
            Markers for plural and possessive
    • Verbs are only slightly more complex
            Markers appropriate to the tense of the verb




03/27/12              Speech and Language Processing - Jurafsky and Martin   46
Regulars and Irregulars
    • It is a little complicated by the fact that
      some words misbehave (refuse to follow
      the rules)
            Mouse/mice, goose/geese, ox/oxen
            Go/went, fly/flew
    • The terms regular and irregular are used
      to refer to words that follow the rules and
      those that don’t


03/27/12             Speech and Language Processing - Jurafsky and Martin   47
Regular and Irregular Verbs
    • Regulars…
            Walk, walks, walking, walked, walked
    • Irregulars
            Eat, eats, eating, ate, eaten
            Catch, catches, catching, caught, caught
            Cut, cuts, cutting, cut, cut




03/27/12              Speech and Language Processing - Jurafsky and Martin   48
Inflectional Morphology
    • So inflectional morphology in English is
      fairly straightforward
    • But is complicated by the fact that are
      irregularities




03/27/12         Speech and Language Processing - Jurafsky and Martin   49
Derivational Morphology
    • Derivational morphology is the messy stuff
      that no one ever taught you.
            Quasi-systematicity
            Irregular meaning change
            Changes of word class




03/27/12             Speech and Language Processing - Jurafsky and Martin   50
Derivational Examples
    • Verbs and Adjectives to Nouns


    -ation              computerize                               computerization

    -ee                 appoint                                   appointee
    -er                 kill                                      killer
    -ness               fuzzy                                     fuzziness




03/27/12         Speech and Language Processing - Jurafsky and Martin               51
Derivational Examples
    • Nouns and Verbs to Adjectives


           -al                  computation                                 computational

           -able                embrace                                     embraceable

           -less                clue                                        clueless




03/27/12             Speech and Language Processing - Jurafsky and Martin                   52
Example: Compute
    • Many paths are possible…
    • Start with compute
            Computer -> computerize -> computerization
            Computer -> computerize -> computerizable
    • But not all paths/operations are equally good
      (allowable?)
            Clue
               Clue -> *clueable




03/27/12                   Speech and Language Processing - Jurafsky and Martin   53
Morpholgy and FSAs
    • We’d like to use the machinery provided
      by FSAs to capture these facts about
      morphology
            Accept strings that are in the language
            Reject strings that are not
            And do so in a way that doesn’t require us to
             in effect list all the words in the language




03/27/12              Speech and Language Processing - Jurafsky and Martin   54
Start Simple
    • Regular singular nouns are ok
    • Regular plural nouns have an -s on the
      end
    • Irregulars are ok as is




03/27/12        Speech and Language Processing - Jurafsky and Martin   55
Simple Rules




03/27/12   Speech and Language Processing - Jurafsky and Martin   56
Now Plug in the Words




03/27/12       Speech and Language Processing - Jurafsky and Martin   57
Derivational Rules




           If everything is an accept state
           how do things ever get rejected?




03/27/12                         Speech and Language Processing - Jurafsky and Martin   58
Parsing/Generation
                    vs. Recognition

  • We can now run strings through these machines
    to recognize strings in the language
  • But recognition is usually not quite what we need
            Often if we find some string in the language we might
             like to assign a structure to it (parsing)
            Or we might have some structure and we want to
             produce a surface form for it (production/generation)
  • Example
            From “cats” to “cat +N +PL”



03/27/12                 Speech and Language Processing - Jurafsky and Martin   59
Today
    • Finite-state methods
    • English Morphology
    • Finite-State Transducers




03/27/12        Speech and Language Processing - Jurafsky and Martin   60
Finite State Transducers
    • The simple story
            Add another tape
            Add extra symbols to the transitions

            On one tape we read “cats”, on the other we
             write “cat +N +PL”




03/27/12              Speech and Language Processing - Jurafsky and Martin   61
FSTs




03/27/12   Speech and Language Processing - Jurafsky and Martin   62
Applications
    • The kind of parsing we’re talking about is
      normally called morphological analysis
    • It can either be
           • An important stand-alone component of many
             applications (spelling correction, information
             retrieval)
           • Or simply a link in a chain of further linguistic
             analysis



03/27/12               Speech and Language Processing - Jurafsky and Martin   63
Transitions

                  c:c           a:a                   t:t                +N: ε   +PL:s




    •      c:c means read a c on one tape and write a c on the other
    •      +N:ε means read a +N symbol on one tape and write nothing on
           the other
    •      +PL:s means read +PL and write an s




03/27/12                  Speech and Language Processing - Jurafsky and Martin           64
Typical Uses
    • Typically, we’ll read from one tape using
      the first symbol on the machine transitions
      (just as in a simple FSA).
    • And we’ll write to the second tape using
      the other symbols on the transitions.




03/27/12        Speech and Language Processing - Jurafsky and Martin   65
Ambiguity


  • Recall that in non-deterministic recognition
    multiple paths through a machine may
    lead to an accept state.
       • Didn’t matter which path was actually
         traversed
  • In FSTs the path to an accept state does
    matter since different paths represent
    different parses and different outputs will
    result

03/27/12           Speech and Language Processing - Jurafsky and Martin   66
Ambiguity
    • What’s the right parse (segmentation) for
           • Unionizable
           • Union-ize-able
           • Un-ion-ize-able
    • Each represents a valid path through the
      derivational morphology machine.




03/27/12              Speech and Language Processing - Jurafsky and Martin   67
Ambiguity
    • There are a number of ways to deal with
      this problem
           • Simply take the first output found
           • Find all the possible outputs (all paths) and
             return them all (without choosing)
           • Bias the search so that only one or a few
             likely paths are explored




03/27/12               Speech and Language Processing - Jurafsky and Martin   68
The Gory Details
    • Of course, its not as easy as
           • “cat +N +PL” <-> “cats”
    • As we saw earlier there are geese, mice and
      oxen
    • But there are also a whole host of
      spelling/pronunciation changes that go along
      with inflectional changes
           • Cats vs Dogs
           • Fox and Foxes




03/27/12               Speech and Language Processing - Jurafsky and Martin   69
Multi-Tape Machines
    • To deal with these complications, we will
      add more tapes and use the output of one
      tape machine as the input to the next
    • So to handle irregular spelling changes
      we’ll add intermediate tapes with
      intermediate symbols




03/27/12        Speech and Language Processing - Jurafsky and Martin   70
Multi-Level Tape Machines




    • We use one machine to transduce between the
      lexical and the intermediate level, and another
      to handle the spelling changes to the surface
      tape


03/27/12          Speech and Language Processing - Jurafsky and Martin   71
Lexical to Intermediate
                     Level




03/27/12        Speech and Language Processing - Jurafsky and Martin   72
Intermediate to Surface
    • The add an “e” rule as in fox^s# <-> foxes#




03/27/12         Speech and Language Processing - Jurafsky and Martin   73
Foxes




03/27/12   Speech and Language Processing - Jurafsky and Martin   74
Note
    • A key feature of this machine is that it
      doesn’t do anything to inputs to which it
      doesn’t apply.
    • Meaning that they are written out
      unchanged to the output tape.




03/27/12        Speech and Language Processing - Jurafsky and Martin   75
Overall Scheme
    • We now have one FST that has explicit
      information about the lexicon (actual
      words, their spelling, facts about word
      classes and regularity).
           • Lexical level to intermediate forms
    • We have a larger set of machines that
      capture orthographic/spelling rules.
           • Intermediate forms to surface forms



03/27/12              Speech and Language Processing - Jurafsky and Martin   76
Overall Scheme




03/27/12    Speech and Language Processing - Jurafsky and Martin   77
Cascades
    • This is an architecture that we’ll see again
      and again
           • Overall processing is divided up into distinct
             rewrite steps
           • The output of one layer serves as the input to
             the next
           • The intermediate tapes may or may not wind
             up being useful in their own right




03/27/12              Speech and Language Processing - Jurafsky and Martin   78
Overall Plan




03/27/12   Speech and Language Processing - Jurafsky and Martin   79
Final Scheme




03/27/12   Speech and Language Processing - Jurafsky and Martin   80
Composition
    • Create a set of new states that
      correspond to each pair of states from
      the original machines (New states are
      called (x,y), where x is a state from M1,
      and y is a state from M2)
    • Create a new FST transition table for the
      new machine according to the following
      intuition…


03/27/12        Speech and Language Processing - Jurafsky and Martin   81
Composition
    • There should be a transition between two
      states in the new machine if it’s the case
      that the output for a transition from a
      state from M1, is the same as the input to
      a transition from M2 or…




03/27/12        Speech and Language Processing - Jurafsky and Martin   82
Composition

    • δ3((xa,ya), i:o) = (xb,yb) iff
           There exists c such that
           δ1(xa, i:c) = xb AND
           δ2(ya, c:o) = yb




03/27/12           Speech and Language Processing - Jurafsky and Martin   83

Contenu connexe

Tendances

THE STRUCTURED COMPACT TAG-SET FOR LUGANDA
THE STRUCTURED COMPACT TAG-SET FOR LUGANDATHE STRUCTURED COMPACT TAG-SET FOR LUGANDA
THE STRUCTURED COMPACT TAG-SET FOR LUGANDAijnlc
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Learning phoneme mappings for transliteration without parallel data
Learning phoneme mappings for transliteration without parallel dataLearning phoneme mappings for transliteration without parallel data
Learning phoneme mappings for transliteration without parallel dataAttaporn Ninsuwan
 
Contrastive analysis (ca)
Contrastive analysis (ca) Contrastive analysis (ca)
Contrastive analysis (ca) Devonne Orio
 
Ella Rabinovich - 2017 - Personalized Machine Translation: Preserving Origin...
Ella Rabinovich - 2017 -  Personalized Machine Translation: Preserving Origin...Ella Rabinovich - 2017 -  Personalized Machine Translation: Preserving Origin...
Ella Rabinovich - 2017 - Personalized Machine Translation: Preserving Origin...Association for Computational Linguistics
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
 

Tendances (10)

THE STRUCTURED COMPACT TAG-SET FOR LUGANDA
THE STRUCTURED COMPACT TAG-SET FOR LUGANDATHE STRUCTURED COMPACT TAG-SET FOR LUGANDA
THE STRUCTURED COMPACT TAG-SET FOR LUGANDA
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Learning phoneme mappings for transliteration without parallel data
Learning phoneme mappings for transliteration without parallel dataLearning phoneme mappings for transliteration without parallel data
Learning phoneme mappings for transliteration without parallel data
 
Contrastive analysis (ca)
Contrastive analysis (ca) Contrastive analysis (ca)
Contrastive analysis (ca)
 
Lesson 41
Lesson 41Lesson 41
Lesson 41
 
Ella Rabinovich - 2017 - Personalized Machine Translation: Preserving Origin...
Ella Rabinovich - 2017 -  Personalized Machine Translation: Preserving Origin...Ella Rabinovich - 2017 -  Personalized Machine Translation: Preserving Origin...
Ella Rabinovich - 2017 - Personalized Machine Translation: Preserving Origin...
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
Lesson 40
Lesson 40Lesson 40
Lesson 40
 
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
 

Dernier

EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 

Dernier (20)

EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 

Lec03

  • 1. Speech and Language Processing Lecture 03 Chapter 2 and 3 of SLP
  • 2. Today • Finite-state methods • English Morphology • Finite-State Transducers 03/27/12 Speech and Language Processing - Jurafsky and Martin 2
  • 3. Regular Expressions and Text Searching • Everybody does it  Emacs, vi, grep, Microsoft Words, etc.. • Regular expressions are a compact textual representation of a set of strings representing a language. • The regular expression is used for specifying text strings in all sorts of text processing and information extraction applications. Speech and Language Processing - Jurafsky and Martin 03/27/12 3
  • 4. Regular Expressions and Text Searching • A string is a sequence of symbols; • For most text-based search techniques, a string is any sequence of alphanumeric characters (letters, numbers, spaces, tabs, and punctuation). • A regular expression is an algebraic notation for characterizing a set of strings. 03/27/12 Speech and Language Processing - Jurafsky and Martin 4
  • 5. Regular Expressions 03/27/12 Speech and Language Processing - Jurafsky and Martin 5
  • 6. Example • Find all the instances of the word “the” in a text.  /the/  /[tT]he/  /b[tT]heb/ 03/27/12 Speech and Language Processing - Jurafsky and Martin 6
  • 7. Errors • The process we just went through was based on two fixing kinds of errors  Matching strings that we should not have matched (there, then, other)  False positives (Type I)  Not matching things that we should have matched (The)  False negatives (Type II) 03/27/12 Speech and Language Processing - Jurafsky and Martin 7
  • 8. Errors • Reducing the error rate for an application often involves two antagonistic efforts:  Increasing accuracy, or precision, (minimizing false positives)  Increasing coverage, or recall, (minimizing false negatives). 03/27/12 Speech and Language Processing - Jurafsky and Martin 8
  • 9. Finite State Automata • Regular expressions can be viewed as a textual way of specifying the structure of finite-state automata (FSA). • FSAs and their probabilistic relatives capture significant aspects of what linguists say we need for morphology and parts of syntax. 03/27/12 Speech and Language Processing - Jurafsky and Martin 9
  • 10. FSAs as Graphs • Let’s start with the sheep language from Chapter 2  /baa+!/ • A directed graph, a finite set of vertices (nodes), together with a set of directed links between pairs of vertices called arcs. 03/27/12 Speech and Language Processing - Jurafsky and Martin 10
  • 11. Sheep FSA • We can say the following things about this machine  It has 5 states  b, a, and ! are in its alphabet  q0 is the start state  q4 is an accept state  It has 5 transitions 03/27/12 Speech and Language Processing - Jurafsky and Martin 11
  • 12. More Formally • You can specify an FSA by enumerating the following things.  The set of states: Q  A finite alphabet: Σ  A start state  A set of accept/final states  A transition function that maps QxΣ to Q 03/27/12 Speech and Language Processing - Jurafsky and Martin 12
  • 13. About Alphabets • Don’t take term alphabet word too narrowly; it just means we need a finite set of symbols in the input. • These symbols can and will stand for bigger objects that can have internal structure. 03/27/12 Speech and Language Processing - Jurafsky and Martin 13
  • 14. Dollars and Cents 03/27/12 Speech and Language Processing - Jurafsky and Martin 14
  • 15. Yet Another View • The guts of FSAs b a ! e can ultimately be 0 1 represented as tables 1 2 2 2,3 If you’re in state 1 and you’re looking at an a, go to state 2 3 4 4 03/27/12 Speech and Language Processing - Jurafsky and Martin 15
  • 16. Recognition • Recognition is the process of determining if a string should be accepted by a machine • Or… it’s the process of determining if a string is in the language we’re defining with the machine • Or… it’s the process of determining if a regular expression matches a string • Those all amount the same thing in the end 03/27/12 Speech and Language Processing - Jurafsky and Martin 16
  • 17. Recognition • Traditionally, (Turing’s notion) this process is depicted with a tape. 03/27/12 Speech and Language Processing - Jurafsky and Martin 17
  • 18. Recognition • Simply a process of starting in the start state • Examining the current input • Consulting the table • Going to a new state and updating the tape pointer. • Until you run out of tape. 03/27/12 Speech and Language Processing - Jurafsky and Martin 18
  • 19. D-Recognize QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 03/27/12 Speech and Language Processing - Jurafsky and Martin 19
  • 20. Key Points • Deterministic means that at each point in processing there is always one unique thing to do (no choices). • D-recognize is a simple table-driven interpreter. • The algorithm is universal for all unambiguous regular languages.  To change the machine, you simply change the table. 03/27/12 Speech and Language Processing - Jurafsky and Martin 20
  • 21. Recognition as Search • You can view this algorithm as a trivial kind of state-space search. • States are pairings of tape positions and state numbers. • Operators are compiled into the table • Goal state is a pairing with the end of tape position and a final accept state 03/27/12 Speech and Language Processing - Jurafsky and Martin 21
  • 22. Generative Formalisms • Formal Languages are sets of strings composed of symbols from a finite set of symbols. • Finite-state automata define formal languages (without having to enumerate all the strings in the language) • The term Generative is based on the view that you can run the machine as a generator to get strings from the language. 03/27/12 Speech and Language Processing - Jurafsky and Martin 22
  • 23. Generative Formalisms • FSAs can be viewed from two perspectives:  Acceptors that can tell you if a string is in the language  Generators to produce all and only the strings in the language 03/27/12 Speech and Language Processing - Jurafsky and Martin 23
  • 24. Non-Determinism DFSA NDFSA 03/27/12 Speech and Language Processing - Jurafsky and Martin 24
  • 25. Non-Determinism cont. • Yet another technique  Epsilon transitions  Key point: these transitions do not examine or advance the tape during recognition 03/27/12 Speech and Language Processing - Jurafsky and Martin 25
  • 26. Equivalence • Non-deterministic machines can be converted to deterministic ones with a fairly simple construction • That means that they have the same power; non-deterministic machines are not more powerful than deterministic ones in terms of the languages they can accept 03/27/12 Speech and Language Processing - Jurafsky and Martin 26
  • 27. ND Recognition • Two basic approaches (used in all major implementations of regular expressions, see Friedl 2006) 1. Either take a ND machine and convert it to a D machine and then do recognition with that. 2. Or explicitly manage the process of recognition as a state-space search (leaving the machine as is). 03/27/12 Speech and Language Processing - Jurafsky and Martin 27
  • 28. Non-Deterministic Recognition: Search • In a ND FSA there exists at least one path through the machine for a string that is in the language defined by the machine. • But not all paths directed through the machine for an accept string lead to an accept state. • No paths through the machine lead to an accept state for a string not in the language. 03/27/12 Speech and Language Processing - Jurafsky and Martin 28
  • 29. Non-Deterministic Recognition • So success in non-deterministic recognition occurs when a path is found through the machine that ends in an accept. • Failure occurs when all of the possible paths for a given string lead to failure. 03/27/12 Speech and Language Processing - Jurafsky and Martin 29
  • 30. Example b a a a ! q0 q1 q2 q2 q3 q4 03/27/12 Speech and Language Processing - Jurafsky and Martin 30
  • 31. Example 03/27/12 Speech and Language Processing - Jurafsky and Martin 31
  • 32. Example 03/27/12 Speech and Language Processing - Jurafsky and Martin 32
  • 33. Example 03/27/12 Speech and Language Processing - Jurafsky and Martin 33
  • 34. Example 03/27/12 Speech and Language Processing - Jurafsky and Martin 34
  • 35. Example 03/27/12 Speech and Language Processing - Jurafsky and Martin 35
  • 36. Example 03/27/12 Speech and Language Processing - Jurafsky and Martin 36
  • 37. Example 03/27/12 Speech and Language Processing - Jurafsky and Martin 37
  • 38. Example 03/27/12 Speech and Language Processing - Jurafsky and Martin 38
  • 39. Key Points • States in the search space are pairings of tape positions and states in the machine. • By keeping track of as yet unexplored states, a recognizer can systematically explore all the paths through the machine given an input. 03/27/12 Speech and Language Processing - Jurafsky and Martin 39
  • 40. Why Bother? • Non-determinism doesn’t get us more formal power and it causes headaches so why bother?  More natural (understandable) solutions 03/27/12 Speech and Language Processing - Jurafsky and Martin 40
  • 41. Today • Finite-state methods • English Morphology • Finite-State Transducers 03/27/12 Speech and Language Processing - Jurafsky and Martin 41
  • 42. Words • Finite-state methods are particularly useful in dealing with a lexicon • Many devices, most with limited memory, need access to large lists of words • And they need to perform fairly sophisticated tasks with those lists • So we’ll first talk about some facts about words and then come back to computational methods 03/27/12 Speech and Language Processing - Jurafsky and Martin 42
  • 43. English Morphology • Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes • We can usefully divide morphemes into two classes  Stems: The core meaning-bearing units  Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions 03/27/12 Speech and Language Processing - Jurafsky and Martin 43
  • 44. English Morphology • We can further divide morphology up into two broad classes  Inflectional  Derivational 03/27/12 Speech and Language Processing - Jurafsky and Martin 44
  • 45. Inflectional Morphology • Inflectional morphology concerns the combination of stems and affixes where the resulting word:  Has the same word class as the original  Serves a grammatical/semantic purpose that is  Different from the original  But is nevertheless transparently related to the original 03/27/12 Speech and Language Processing - Jurafsky and Martin 45
  • 46. Nouns and Verbs in English • Nouns are simple  Markers for plural and possessive • Verbs are only slightly more complex  Markers appropriate to the tense of the verb 03/27/12 Speech and Language Processing - Jurafsky and Martin 46
  • 47. Regulars and Irregulars • It is a little complicated by the fact that some words misbehave (refuse to follow the rules)  Mouse/mice, goose/geese, ox/oxen  Go/went, fly/flew • The terms regular and irregular are used to refer to words that follow the rules and those that don’t 03/27/12 Speech and Language Processing - Jurafsky and Martin 47
  • 48. Regular and Irregular Verbs • Regulars…  Walk, walks, walking, walked, walked • Irregulars  Eat, eats, eating, ate, eaten  Catch, catches, catching, caught, caught  Cut, cuts, cutting, cut, cut 03/27/12 Speech and Language Processing - Jurafsky and Martin 48
  • 49. Inflectional Morphology • So inflectional morphology in English is fairly straightforward • But is complicated by the fact that are irregularities 03/27/12 Speech and Language Processing - Jurafsky and Martin 49
  • 50. Derivational Morphology • Derivational morphology is the messy stuff that no one ever taught you.  Quasi-systematicity  Irregular meaning change  Changes of word class 03/27/12 Speech and Language Processing - Jurafsky and Martin 50
  • 51. Derivational Examples • Verbs and Adjectives to Nouns -ation computerize computerization -ee appoint appointee -er kill killer -ness fuzzy fuzziness 03/27/12 Speech and Language Processing - Jurafsky and Martin 51
  • 52. Derivational Examples • Nouns and Verbs to Adjectives -al computation computational -able embrace embraceable -less clue clueless 03/27/12 Speech and Language Processing - Jurafsky and Martin 52
  • 53. Example: Compute • Many paths are possible… • Start with compute  Computer -> computerize -> computerization  Computer -> computerize -> computerizable • But not all paths/operations are equally good (allowable?)  Clue  Clue -> *clueable 03/27/12 Speech and Language Processing - Jurafsky and Martin 53
  • 54. Morpholgy and FSAs • We’d like to use the machinery provided by FSAs to capture these facts about morphology  Accept strings that are in the language  Reject strings that are not  And do so in a way that doesn’t require us to in effect list all the words in the language 03/27/12 Speech and Language Processing - Jurafsky and Martin 54
  • 55. Start Simple • Regular singular nouns are ok • Regular plural nouns have an -s on the end • Irregulars are ok as is 03/27/12 Speech and Language Processing - Jurafsky and Martin 55
  • 56. Simple Rules 03/27/12 Speech and Language Processing - Jurafsky and Martin 56
  • 57. Now Plug in the Words 03/27/12 Speech and Language Processing - Jurafsky and Martin 57
  • 58. Derivational Rules If everything is an accept state how do things ever get rejected? 03/27/12 Speech and Language Processing - Jurafsky and Martin 58
  • 59. Parsing/Generation vs. Recognition • We can now run strings through these machines to recognize strings in the language • But recognition is usually not quite what we need  Often if we find some string in the language we might like to assign a structure to it (parsing)  Or we might have some structure and we want to produce a surface form for it (production/generation) • Example  From “cats” to “cat +N +PL” 03/27/12 Speech and Language Processing - Jurafsky and Martin 59
  • 60. Today • Finite-state methods • English Morphology • Finite-State Transducers 03/27/12 Speech and Language Processing - Jurafsky and Martin 60
  • 61. Finite State Transducers • The simple story  Add another tape  Add extra symbols to the transitions  On one tape we read “cats”, on the other we write “cat +N +PL” 03/27/12 Speech and Language Processing - Jurafsky and Martin 61
  • 62. FSTs 03/27/12 Speech and Language Processing - Jurafsky and Martin 62
  • 63. Applications • The kind of parsing we’re talking about is normally called morphological analysis • It can either be • An important stand-alone component of many applications (spelling correction, information retrieval) • Or simply a link in a chain of further linguistic analysis 03/27/12 Speech and Language Processing - Jurafsky and Martin 63
  • 64. Transitions c:c a:a t:t +N: ε +PL:s • c:c means read a c on one tape and write a c on the other • +N:ε means read a +N symbol on one tape and write nothing on the other • +PL:s means read +PL and write an s 03/27/12 Speech and Language Processing - Jurafsky and Martin 64
  • 65. Typical Uses • Typically, we’ll read from one tape using the first symbol on the machine transitions (just as in a simple FSA). • And we’ll write to the second tape using the other symbols on the transitions. 03/27/12 Speech and Language Processing - Jurafsky and Martin 65
  • 66. Ambiguity • Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state. • Didn’t matter which path was actually traversed • In FSTs the path to an accept state does matter since different paths represent different parses and different outputs will result 03/27/12 Speech and Language Processing - Jurafsky and Martin 66
  • 67. Ambiguity • What’s the right parse (segmentation) for • Unionizable • Union-ize-able • Un-ion-ize-able • Each represents a valid path through the derivational morphology machine. 03/27/12 Speech and Language Processing - Jurafsky and Martin 67
  • 68. Ambiguity • There are a number of ways to deal with this problem • Simply take the first output found • Find all the possible outputs (all paths) and return them all (without choosing) • Bias the search so that only one or a few likely paths are explored 03/27/12 Speech and Language Processing - Jurafsky and Martin 68
  • 69. The Gory Details • Of course, its not as easy as • “cat +N +PL” <-> “cats” • As we saw earlier there are geese, mice and oxen • But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes • Cats vs Dogs • Fox and Foxes 03/27/12 Speech and Language Processing - Jurafsky and Martin 69
  • 70. Multi-Tape Machines • To deal with these complications, we will add more tapes and use the output of one tape machine as the input to the next • So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols 03/27/12 Speech and Language Processing - Jurafsky and Martin 70
  • 71. Multi-Level Tape Machines • We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape 03/27/12 Speech and Language Processing - Jurafsky and Martin 71
  • 72. Lexical to Intermediate Level 03/27/12 Speech and Language Processing - Jurafsky and Martin 72
  • 73. Intermediate to Surface • The add an “e” rule as in fox^s# <-> foxes# 03/27/12 Speech and Language Processing - Jurafsky and Martin 73
  • 74. Foxes 03/27/12 Speech and Language Processing - Jurafsky and Martin 74
  • 75. Note • A key feature of this machine is that it doesn’t do anything to inputs to which it doesn’t apply. • Meaning that they are written out unchanged to the output tape. 03/27/12 Speech and Language Processing - Jurafsky and Martin 75
  • 76. Overall Scheme • We now have one FST that has explicit information about the lexicon (actual words, their spelling, facts about word classes and regularity). • Lexical level to intermediate forms • We have a larger set of machines that capture orthographic/spelling rules. • Intermediate forms to surface forms 03/27/12 Speech and Language Processing - Jurafsky and Martin 76
  • 77. Overall Scheme 03/27/12 Speech and Language Processing - Jurafsky and Martin 77
  • 78. Cascades • This is an architecture that we’ll see again and again • Overall processing is divided up into distinct rewrite steps • The output of one layer serves as the input to the next • The intermediate tapes may or may not wind up being useful in their own right 03/27/12 Speech and Language Processing - Jurafsky and Martin 78
  • 79. Overall Plan 03/27/12 Speech and Language Processing - Jurafsky and Martin 79
  • 80. Final Scheme 03/27/12 Speech and Language Processing - Jurafsky and Martin 80
  • 81. Composition • Create a set of new states that correspond to each pair of states from the original machines (New states are called (x,y), where x is a state from M1, and y is a state from M2) • Create a new FST transition table for the new machine according to the following intuition… 03/27/12 Speech and Language Processing - Jurafsky and Martin 81
  • 82. Composition • There should be a transition between two states in the new machine if it’s the case that the output for a transition from a state from M1, is the same as the input to a transition from M2 or… 03/27/12 Speech and Language Processing - Jurafsky and Martin 82
  • 83. Composition • δ3((xa,ya), i:o) = (xb,yb) iff There exists c such that δ1(xa, i:c) = xb AND δ2(ya, c:o) = yb 03/27/12 Speech and Language Processing - Jurafsky and Martin 83