SlideShare une entreprise Scribd logo
1  sur  73
Parsing Needs
             New Abstractions




11/23/2011                      1
Problem
• Parsing of context-free languages
     – active research topic from 60’s to 80’s
     – rich variety of parsing techniques are known
             • general CFL parsing:
                 – Earley’s algorithm, Cocke-Younger-Kasami (CYK)
             • deterministic parsing:
                 – SLL(k), LL(k), SLR(k), LR(k), LALR(k), LA(l)LR(k)..
• Problem: most of these techniques were invented by
  automata theory people
     – terminology is fairly obscure: leftmost derivations, rightmost
       derivations, handles, viable prefixes, ….
     – string rewriting is very clean but not intuitive for most PL people
     – descriptions in compiler textbooks are obscure/erroneous
     – connections between different parsing techniques are lost
• Question: is there an easier way of thinking about parsing
  than in terms of strings and string rewriting?

11/23/2011                                                                   2
New abstraction
• For any context-free grammar, construct a Grammar Flow
  Graph (GFG)
    – syntax: representation of grammar as a control-flow graph
    – semantics: executable representation
             • special kind of non-deterministic pushdown automaton
• Parsing problems
    – become path problems in GFG
• Alphabet soup of grammar classes like LL(k), SLL(k), LR(k),
  LALR(k), SLR(k) etc. can be viewed as choices along three
  dimensions
    – non-determinism: how many paths can we explore at a time?
             • all (Earley), only one (LL), some (LR)
    – look-ahead: how much do we know about future?
             • solve fixpoint equations over sets
    – context: how much do we remember about the past?
             • procedure cloning

11/23/2011                                                            3
GFG example
SAa | bAc | Bc | bBa
Ad                                        START-S
Bd                                                                ²
                         ²                 ²
                                                       ²
                                     S.bAc                            ²          ²       S.bBa
            ²                    ²
                                       b                                                    b
  S.Aa                                                    S.Bc
                START-A              Sb.Ac                            START-B
                                                                                          Sb.Ba
                  ²                                                      ²
  SA.a    ²     A.d                                                      B.d
                                                           SB.c
                   d                 SbA.c                                  d            SbB.a
     a                                                       c
                                       c                                                    a
  SAa.          Ad.                                                      Bd.
                                                           SBc.
                  ²                  SbAc.                                 ²             SbBa.
                 END-A                                                     END-B
                             ²                                                        ²
                                               ²       ²
                         ²
                                                                       ²
                                                   END-S
GFG construction
 For each non-terminal A, create nodes labeled START-A and END-A.

 For each production in grammar, create a “procedure” and connect to
 START and END nodes of LHS non-terminal as shown below.

A ²                      START-A
                                       ²   A.
                                                 ²   END-A




A bXY             ²            b                                          ²
         START-A       A.bXY        A b.XY         AbX.Y       AbXY.       END-A

                                      ²              ²        ²        ²
                                    START-X ……. END-X        START-Y   …….
                                                                         END-Y



   Edges labeled ²: only at entry/exit of START-A and END-A nodes.
   Fan-out: only at exit of START-A nodes and END-A nodes
 11/23/2011 transition node: node whose outgoing edge is labeled with a terminal
   Terminal                                                                    5
Terminology

A ²                            START-A
                                               ²   A.
                                                           ²    END-A




          Start node       Entry node          Call node       Return node   Exit node       End node



A bXY
                       ²                b                                                ²
           START-A         A.bXY            A b.XY            AbX.Y       AbXY.          END-A

                                              ²                 ²        ²        ²
                                            START-X ……. END-X           START-Y   …….
                                                                                    END-Y




  11/23/2011                                                                                         6
Non-deterministic GFG automaton
• Interpretation of GFG: NGA
   – similar to NFA                               START-S
• Rules:
   – begin at START-S
   – at START nodes, make non-
     deterministic choice                         b                           b
   – at END nodes, must follow
     CFL path
        • “return to the same procedure                             d
          from which you made the call”       d
• CFL path from START to                  a                 c
                                                  c                           a
  END  leftmost derivation
• Label(path):
   – sequence of terminal
     symbols labeling edges in
     path
                                                       END-S
   – Label of CFL path from
     START to END is a word in                        SAa | bAc | Bc | bBa
     language generated by CFG
                                                      Ad
                                                      Bd
   11/23/2011                                                            7
Parsing problem
• Paths(l):
   – set of paths with label l
                                             START-S
   – inverse relation of Label
• Parsing problem: given a
  grammar G and a string S,
   – find all paths in GFG(G) that
     generate S, or                          b                           b
   – demonstrate that there is no
     such path
• Parallel paths:                        d
                                                               d
   –   P1 = START-S + A             a                 c
                                             c                           a
   –   P2 = START-S + B
   –   Label(P1) = Label(P2)
   –   Equivalence relation on
       paths originating at START-
       S                                          END-S
• Ambiguous grammar
   – two or more parallel paths                  SAa | bAc | Bc | bBa
     START-S+ END-S                             Ad
                                                 Bd
   11/23/2011                                                        8
Compressed paths




11/23/2011                      9
Addition to GFG
• We need to be able                            START
  to talk about                               SbAc
  sentential forms, not                                                 SbBa

  just sentences               SAa                       SBc

• Small modification to                         b                            b
  GFG:                          A       Ad
                                                    A
                                                          B       Bd
   – add transitions labeled                                                     B

     with non-terminals at          a    d
     procedure calls                                          c    d
                                               c                             a
• Some paths will have
  edges labeled with
  non-terminals
   – non-terminals that                             END
     have not been
     “expanded out”                                SAa | bAc | Bc | bBa
                                                   Ad
                                                   Bd
  11/23/2011                                                            10
Compressed GFG paths
                                                    START
• More compact representation of
  GFG path
• Idea:                                   START-P

   – collapse portion of path between               P
     start and end of a given procedure   END-P
     and replace with non-terminal
• Point: completed calls cannot               P1
  affect further evolution of path
  so we need not store full path
• Edges going out of END nodes
  of procedures will never appear
  in compressed representation


 11/23/2011                                                 11
NFA for compressed paths
• Start from extended                    START


  GFG                                  SbAc                     SbBa

• Remove edges out
                        SAa                       SBc
                                         b                            b
  of END nodes since     A       Ad               B       Bd
  these will never be                        A                            B

  in compressed path         a    d                    c    d
                                        c                             a
• Each path in NFA
  corresponds to a
  compressed GFG
                                             END
  path
                                            SAa | bAc | Bc | bBa
                                            Ad
                                            Bd
  11/23/2011                                                     12
Following all paths:
             Earley’s algorithm




11/23/2011                          13
Recall: NFA simulation
• Input string is processed left to
  right, one symbol at a time
• Deterministic simulator keeps
  track of all states NFA could be
  in as the input is processed
• Simulation
    – simulated state = subset of NFA
      states
    – if current simulated state is C
      and next input symbol is t ,
      compute next simulated N as                  {s0,s1,s4} !a {s2} !a {s2,s3,s7} ….
      follows:
         • scanning: if state si 2 C and NFA
           has transition si t sj, add sj to
           N
         • prediction: if state sj 2 N and NFA
           has transition sj       sk, add sk to
           N
    – initial simulated state = set of
      initial states of NFA closed with
      prediction rule

 11/23/2011                                                                      14
Analog in GFG
• First cut: use exactly the                      SAa | bAc | Bc | bBa
  same idea                                       Ad
                                                  Bd
   – current state C, next state N,
     next input symbol is t                                S0
   – scanning: if state si 2 C and
     NFA has transition
     si t sj, add sj to N                                S4                      S13
   – prediction: if state sj 2 N and     S1               b         S8               b
     NFA has transition                                  S5                      S14
     sj     sk, add sk to N
                                         S2     S17                 S9   S11
• Problem: not clear how to               a       d
                                                         S6                      S15
  make ²-transitions at return                            c          c     d
                                         S3     S18              S10     S12         a
                                                         S7
  states like s18 and s12                                                        S16

• Solution: keep “return
  addresses” as in Earley
                                                              S19

                                       {S0,S1,S4,S8,S13,S17,S11} !d {S12,S18, ?????}
 11/23/2011                                                                     15
START-E,0
E.(E+E),0 E.(E-E) ,0 E.int,0    0
     (               (

E(.E+E),0 E(.E- E),0                               START-E
      START-E,1                    1                    (
                                            (
E.(E+E) ,1 E.(E-E) ,1 E.int,1

                         9             E                          E
                                             +      int ---
       E.int,1
       END-E ,1              2         E                          E
E(E. +E),0 E(E. -E),0
                                                )        )
          +

 E(E+.E),0
 START-E,3                         3
 E.(E+E),3 E.(E-E),3 E.int,3            E int | (E+E) | (E-E)

                     6
                                            Input string: (9+6)
 Eint.,3
 END-E,3         4
 E(E+E.),0
      )

  E (E+E)., 0
  END-E,0        5                                                    16
Earley parser and GFG states
• A given § set can contain multiple instances of the
  same GFG state.
• Example: SaS|a
• Earley set §i
     –   <Sa.S, i-1>
     –   <Sa., i-1>
     –   <S.aS, i>
     –   <S.a, i>
     –   <SaS. , i-2>
     –   <SaS., i-3>
     –   ……
     –   <SaS., 0>

11/23/2011                                              17
Earley’s parser and
             ambiguous grammars
• If an Earley configuration   §t
  can be added to a given §     <X ® . , p1>
  set by two or more            <Y ¯. , p2>
  configurations, grammar is
  ambiguous                     <Z ° A. ±, p>

• Example: substring between
  positions p and t can be
  derived from A in two
  different ways

11/23/2011                                       18
Look-ahead computation




11/23/2011                            19
Look-ahead computation
• Look-ahead at point p in GFG:
    – first k symbols you might encounter on path starting at p
    – k is a small integer that is given for entire grammar
• Subtle point:
    – look-ahead may depend on path from START that you took to get to p
    – (eg) 2-look-ahead at entry of N is different for red and blue calls
• Two approaches:
    – context-independent look-ahead: first k symbols on paths starting at p
    – context-dependent look-ahead: given a path C from START to p, what
      are the first k symbols on any path starting at p that extends C
                                            S                         N
                                    {xa}        {ya,yb}
                                                            {aa,ab}       {ab,bc}
                                    x               y
      S xNab | yNbc            N                                a
                                                        N
      N a |
                                    a               b
11/23/2011                              b       c                               20
FIRSTk sets
• FIRSTk(A): set of strings of length k or less
   – If A * s where s is a terminal string of length k or less, s ²
     FIRSTk(A)
   – If A * s where s is a string longer than k symbols, then k-prefix of
     s ² FIRSTk(A)
• Intuition:
   – non-terminal A represents a set, which is the set of strings we can
     derive from it
   – FIRSTk(A) is the set of k-prefixes of these strings
• Easy to extend FIRSTk to sequences of grammar symbols


                                            S                   N
       S xNab | yNbc
       N a |
      FIRST2(N)= {a, }
                                    x               y
      FIRST2(Nab)                                           a
              = {aa,ab}         N                       N
                                    a               b
                                        b       c                       21
 11/23/2011
Useful string functions
• Concatenation: s + t
     – (eg) xy + abc = xyabc
• k-prefix of string s: sk
     – (eg) (xyz)2 = xy, (x)2 = x, ( )2 =
• Composition of concatenation and k-prefix: s +k t
     – defined as (s+t)k
     – (eg) x +2 yz = xy
     – operation is associative
• Easy result: (s+t)k = (sk+tk)k = sk +k tk
• Operations can be extended to sets in the obvious
  way
     – (eg) {a,bcd} +2 { ,x,yz} = {a,ax,ay,bc}

11/23/2011                                            22
FIRSTk
     FIRSTk(²) = {²}
     FIRSTk(t) = {t}
     FIRSTk(A) = FIRSTk(X1X2…Xn) U
                  FIRSTk(Y1Y2…Ym) U …
                 //rhs of productions
     FIRSTk(X1X2..Xn) = FIRSTk(X1) +k FIRSTk(X2)
                             +k…+k FIRSTk(Xn)



11/23/2011                                         23
FIRSTk example

                         S  aAab | bAb
                         A  cAB | | a
                         B

FIRST2(S) = FIRST2(aAab) U FIRST2(bAb)
          = ({a}+2 FIRST2(A) +2 {ab}) U ({b}+2 FIRST2(A) +2 {b})
FIRST2(A) = FIRST2(cAB) U {²} U{a} = ({c} +2 FIRST2(A) +2 FIRST2(B)) U {²} U {a}
FIRST2(B) = { }

FIRST2(A) ={²,a,c,ca,cc}
FIRST2(B) = {²}
FIRST2(S)={aa,ac,bb,ba,bc}




11/23/2011                                                                  24
Context-independent look-aheads
                          S                          A            B



                  a           b                  c
                                  A          A                a
            A
                  a           b                      ?

                      b                      B
           {ab}
                                      {b$}
                      Se={$$}                                         Be
                                                         Ae

Compute FOLLOWk(A) sets: strings of length k that can be encountered
after you return from non-terminal A

Se = {$$}
Ae = (FIRST2({ab}) +2 Se) U (FIRST2({b})+2 Se) U (FIRST2(B) +2 Ae)
Be = Ae
Solution: Se = {$$} Ae = {ab,b$} Be = {ab,b$}

From these FOLLOW sets, we can now compute look-ahead at any GFG point.
Computing context-independent
          look-ahead sets
• Algorithm:
     – For each non-terminal A, compute FIRSTk(A)
             • First k terminals you encounter on path A-START + A-END
     – For each non-terminal A, compute FOLLOWk(A)
             • First k terminals you encounter on path that extends a GFG
               path START + A-END
     – Use the FIRSTk and FOLLOWk sets to compute the
       look-ahead at any point of interest in GFG
• You can even compute FIRSTk and FOLLOWk
  sets in one big iteration if you want.
• This computation is independent of the particular
  parsing method used

11/23/2011                                                                  26
Production cloning:
             a way of implementing
              context-dependence



11/23/2011                           27
Context-dependent look-ahead
• In running example,
   – look-aheads for N for red                  S
     call to N are disjoint                                                 N
   – look-aheads for N for blue         {xa}        {ya,yb}         {aa,ab}     {ab,bc}
     call to N are disjoint
   – context-independent look-          x                y
     ahead computation                                                  a
     combines the look-aheads       N                        N
     from all the call sites of N                        b
     at the bottom of N and             a
                                                                 {bc}
     propagates them to the top             b        c
• Idea:
   – compute look-aheads                                     {ab}
     separately for each context
   – keep track of context while
     parsing                                         S xNab | yNbc
    we can get a more capable                       N a |
     parser
                                                    Input string: xab$$
 11/23/2011                                                                        28
Tracking context by cloning
 • Grammar:
      S xNab | yNbc                            S  xN1ab | yN2bc
      N a |                                    N1 a |    N2 a |

                   S                      [N,{ab}]
                                               N1           [N,{bc}]
                                                                  N2

                                           {aa}      {ab}     {ab}     {bc}

           x               y
                                            a                  a
[N,{ab}]
      N1                       [N,{bc}]
                                N2
           a               b
               b       c




 11/23/2011                                                                   29
General idea of cloning
• Cloning creates copies of productions
• Intuitively we would like to create a clone of a production for each of
  its contexts and write-down look-ahead
    – but set of contexts for a production is usually infinite
• Solution:
    – create finite number of equivalence classes of contexts for a given
      production
    – create a clone for each equivalence class
    – compute context-independent look-ahead
• Two cloning rules are important in practice
    – k-look-ahead cloning: two contexts are in same equivalence class if
      their k-look-aheads are identical (used in LL(k))
    – reachability cloning: two contexts C1 and C2 are in same equivalence
      class if the set of GFG nodes reachable by paths with label(C1) is equal
      to set of GFG nodes reachable by paths with label(C2) (used in LR(0))
    – LR(k) uses a combination of them

11/23/2011                                                                   30
k-look-ahead cloning (intuitive idea)
            S                          A       B                           S


    a           b                  c                               a           b
A                   A          A           a       d
                                                            [A,{ab}                  [A,{b$}]
    a           b
                                                                   a           b
        b                      B
                        {b$}                                           b
                    {ab}
                                                              [A,{ab}]                        [A,{b$}]



                                                            c                             c
                                                       [A,{da}]        a           [A,{db}]          a


                                                       [B,{ab}]                    [B,{b$}]

                                                                                                    {b$}
                                                            {ab}
                                                                   Other clones not shown
                                                                           k
If there are |T| terminal symbols, you may end up with 2|T| clones of a given production
k-look-ahead cloning
• G=(V,T,P,S):grammar, k:positive integer.
• Tk(G) is following grammar
     –   nonterminals: {[A,R]| A in V -T, and R µ Tk}
     –   terminals: T
     –   start symbol: [S,{$k}]
     –   rules: all rules of the form [A,R]  X1'X2'X3'...Xm' where for
         some rule A  X1X2X3...Xm in P
             • Xi' = Xi if Xi is a terminal
             • Xi' = [Xi, FIRSTk(Xi+1,..Xm) +k R] when Xi is a non-terminal.
• Intuition:
     – after this kind of cloning, k-look-aheads at the end of a procedure
       are identical for all return edges
     – so doing a context-independent look-ahead computation on the
       transformed grammar does not tell you anything you did not
       already know about k-look-aheads


11/23/2011                                                                     32
LL(k) and SLL(k)




11/23/2011                      33
Intuition
• This class of grammars has the following
  property:
     – if s is a string in the language, then for any prefix
       p of s, there is a unique path P from START such
       that label(P) = p (modulo look-ahead)
• So we need to follow only one path through
  GFG for a given input string, using look-
  ahead to eliminate alternatives
• Roughly analogous to DFAs in the CFL world
11/23/2011                                                 34
LL(k) parsing
• Only one path can be
  followed by the parser
    – so at procedure call for                  S
      non-terminal N, we must                                              N
      know exactly which                {xa}        {ya,yb}       {aa,ab}         {ab,bc}
      procedure (rule) to call
• Simple LL(k) parsing:                 x               y
                                                                       a
    – make decision based on        N                       N
      context-independent look-         a               b
      ahead of k symbols at entry
      point for N                           b       c
• LL(k) parsing:
    – use context-dependent
      look-ahead of k symbols
    – procedure cloning                              S xNab | yNbc
      technique converts LL(k)                       N a |
      grammar into SLL(k)
      grammar
                                                Grammar is LL(2) but not SLL(2)
 11/23/2011                                                                       35
Parser
• Modify Earley parser to
     – track compressed paths instead of full paths
             • transitions labeled by non-terminals and terminals
     – eliminate return addresses
             • at the end of a production
                – A  X1X2..Xn: pop n states off and make an A transition
                  from the exposed state
                – A  ² : make an A transition from current state
     – use look-ahead to eliminate alternatives

11/23/2011                                                                  36
START-E
       E.(+ E E) E.(- E E)                E.int
                                                                                            START-E
                          (
                                                                                    (           (
     E(. - E E)
                              -                                           E                             E
                      E(- . E E)                                                   +       int ---
                      START-E
           E  .(+ E E) E .(- E E) E .int
                                                                          E                             E
                          (
                                                                                        )       )
               E  (.+ E E)                             E
                                  +
                E  (+ . E E)                                                     E int | (+ E E) | (- E E)
                  START-E
       E.(+ E E) E .(- E E) E .int                                          Input string: ( - ( + 8 9 ) 7 )
           8                           E

E  int.                  E  (+ E . E)                       E  (- E .E)
END-E                       START-E                             START-E
                  E.(+ E E) E  (- E E) E .int     E. (+ E E) E .(- E E) E .int
                      9                     E                 7           E
               E  int.               E (+ E E.)
               END-E                                     E int.       E  (- E E .)
                                             )           END-E
                                                                              )

                                      E(+ E E).                        E (- E E ).
                                       END-E                             END-E                         37
Many grammars are not LL(k)
• Grammar
    – Eint | (E+E) | (E-E)
                                               E
• Not clear which rule to              (           (
  apply until you see “+”
                                   E                     E
  or “-”
                                       +       int ---
    – this needs unbounded
                                   E                     E
      look-ahead, so grammar
      is not LL(k) for any k               )       )

• One solution:
    – follow multiple paths till
      only one survives


11/23/2011                                                   38
LR(k),SLR(k),LALR(k)




11/23/2011                          39
LR grammars (informal)
• LR parsers permit limited non-                           START
  determinism
   – can follow more than one path but not
     all paths like Early
• LR(0) condition: for any prefix of                      b                      b
  input, the corresponding fully             A                    B
  extended compressed paths must                              A
  have the same label
• Condition not true in general                  a   d              c   d
                                                          c                      a
  grammars: see example
   – Consider string “da”
   – For prefix “d”, there are two paths:
        • red path
        • blue path
                                                              END
   – Labels of compressed paths:
        • red path: “A”                              SAa | bAc | Bc | bBa
        • blue path: “B”
                                                     Ad
• We can use modified Earley parser                  Bd
  for these grammars
  11/23/2011                                                                40
START-E,0                             START-E
                                   0                                                  START-E
E.(E+E),0 E.(E-E) ,0 E.int,0        E.(E+E) E.(E-E)       E.int
                                                                             (           (
     (               (                     (           (

E(.E+E),0 E(.E- E),0                  E(.E+E) E(.E- E)              E                           E
      START-E,1                    1          START-E                         +      int ---
E.(E+E) ,1 E.(E-E) ,1 E.int,1       E.(E+E) E.(E-E) E.int
                                                                        E                           E
                         3         E                              E
                                                                                 )        )
       E.int,1                        E(E. +E)   E(E. -E)
       END-E ,1              2
E(E. +E),0 E(E. -E),0
                                           +
          +                                                                 E int | (E+E) | (E-E)

 E(E+.E),0                            E(E+.E)
 START-E,3                         3   START-E                               Input string: (3+4)
 E.(E+E),3 E.(E-E),3 E.int,3        E.(E+E) E.(E-E) E.int

                     4                                4

 Eint.,3                              Eint.
 END-E,3         4                     END-E
 E(E+E.),0
                                       …………..
      )

  E (E+E)., 0
  END-E,0        5                                                                             41
Parser for LR languages
• Use the modified Earley parser we used for LL grammars
     – each -state will have multiple items as in the original Earley parser
       since LR parsers follow multiple paths too
•   -states must follow a stack discipline for modified Earley parser to
  work
• Since we are following multiple paths, this might break down
     – shift-reduce conflict: parallel compressed paths
             • P1 to a scan node and P2 to an EXIT node (push/pop conflict)
     – reduce-reduce conflict: parallel compressed paths
             • P1 and P2 to different EXIT nodes (pop/pop conflict)
• If grammar does not have shift-reduce or reduce-reduce conflicts,
  we can use modified Earley parser and follow compressed paths
  while maintaining a stack discipline for   -states
• How do we determine whether grammar has shift-reduce or reduce-
  reduce conflicts?


11/23/2011                                                                     42
Finding LR(0) conflicts
  • Compute the DFA corresponding to the
    compressed path NFA
  • If conflicting states are in same DFA state,
    grammar has an LR(0) conflict
                            d            Reduce-reduce conflict
                   S.Aa        Ad.
                   S.bAc       Bd.
                   S.Bc           d                          c
                                         A      SbA.c                SbAc.
                   S.bBa
                            b   Sb.Ac
                   A.d
                                Sb.Ba
                   B.d
                                A.d     B      SbB.a                SbBa.
                            A                                     a
                                B.d

                        B                a
SAa | bAc | Bc | bBa           SA.a            SAa.
Ad
 11/23/2011
Bd                                      c                              43
                                SB.c            SBc.
LR(0) automaton for expression grammar


                                                                                                  )   E(E+E).


                                                                                     E(E+E.)
                                                                      E(E+.E)
             int       Eint.                                                    E
                                                                      E.(E+E)
E.(E+E)                                      (                       E.(E-E)
E.(E – E)                      int                               +   E.int
                                                                                        int
E .int
                   (      E(.E+E)
                          E.(E+E)        E           E(E.+E)
                                                                                            Eint.
                          E.(E-E)                    E(E.- E)
                          E.int
                          E(.E-E)                                -   E(E-.E)        int
                                                                      E.(E+E)
                                                  (                   E.(E-E)
                                                                      E.int

                                      (                                     E        E(E-E.)

                                                                                              )
                                                                                                      E(E-E).

   11/23/2011                                                                                            44
Parser for LR(0) languages
• Use the modified Earley parser we used for
  LL grammars
     – each -state will have multiple items as in the
       original Earley parser since LR parsers follow
       multiple paths too
• No need to keep track of GFG nodes within
  each -state
     – states in compressed path DFA correspond to
       possible -states
     – So modified Earley parser just pushes and pops
       DFA states

11/23/2011                                              45
GFG path interpretation

• Let P1 and P2 be two GFG
                                                START
  paths with identical labels
• Sufficient condition for labels     START-P
                                                    START-P

  of compressed paths to be
                                                        END-P
  equal:                              END-P

   – sequence of completed calls in
                                          P1              P2
     P1 and P2 are identical
• Most of the action in LR
  parsers happens at EXIT
  nodes of productions
 11/23/2011                                              46
LR(0) conflicts: GFG
                  START                                    START


           t*                                        t*
                  t*                                      t*

                                                            B
                       Bexit                               u
        Aexit                                    Aexit

     reduce-reduce conflict                   shift-reduce conflict

• LR(0) conflicts (GFG definition):
    – Shift-reduce conflict: there are parallel paths P1: START + Aexit and
      P2: START + scan-node
    – Reduce-reduce conflict: there are parallel paths P1: START + Aexit
      and P2: START + Bexit
• Claim: Let G be an LR(0) grammar according to GFG definition.
     – P1 and P2 are two GFG paths that end at SCAN or END nodes, and
         C(P1) and C(P2) are their compressed equivalents
     – P1
11/23/2011 and P2 have the same label iff C(P1) and C(P2) have the same label
                                                                           47
LR(0) conflicts: GFG
                         START                                    START


                t*                                         t*
                         t*                                      t*

                                                                    B
                              Bexit                                u
             Aexit                                     Aexit

      reduce-reduce conflict                        shift-reduce conflict

•   Claim: Let G be an LR(0) grammar according to GFG definition.
     – P1 and P2 are two GFG paths that end at SCAN or END nodes, and C(P1) and C(P2)
       are their compressed equivalents
     – P1 and P2 have the same label iff C(P1) and C(P2) have the same label
•   This claim is not true if the paths do not end at SCAN or END nodes
     – counterexample: in this LR(0) grammar, consider paths from START to nodes
       S  A.a and S .Uc

                S  Aa | Uc
                U  Ab
11/23/2011      A.                                                                48
Example
                                                    START

                                                  SbAc                    SbBa
                                 SAa                         SBc
                                                    b                           b
                                         Ad                         Bd
• States with LR(0) conflicts
   – (Ad. , Bd.)
                                     a        d               c        d
• Conflicting context pairs                        c                            a
  (i) path label: d
    – C1: START, S.Aa, A.d, Ad.
    – C2: START, S.Bc, B.d, Bd.
                                                        END
   (ii) path label: bd
    – C3: START, S.bAc, Sb.Ac, A.d, Ad.            SAa | bAc | Bc | bBa
    – C4: START, S.bBa, Sb.Ba, B.d, Bd.            Ad
                                                       Bd
• So grammar is not LR(0)
    11/23/2011                                                             49
LR(0) H&U
• A grammar G is LR(0) if
     – its start symbol does not appear on the right side of any
       production, and
     – for every viable prefix °, whenever A ! ®. is a complete valid
       item for °, then no other complete item nor any item with a
       terminal to the right of the dot is valid for °.
• Comment:
     – by this definition, the only other valid items that can occur
       together with A ! ®. are incomplete items with a non-terminal to
       the right of the dot of the form B! ¯.C±
     – if First(C) contains a terminal t, it can be shown that an item of
       the form Y ! .t ¸ is valid for °, violating the LR(0) condition.
       Therefore, First(C) = {²}. It can be shown that this means ® = ²
     – Example: this grammar is LR(0) (A  . and B .Cd are valid
       items for viable prefix ² )
             •   SB
             •   BCd
             •   CA
             •   A ²
11/23/2011                                                              50
Look-ahead in LR grammars
                   START                                  START


              t*                                    t*
                   t*                                     t*


                        Bexit                                  B
         Aexit                                  Aexit

          reduce-reduce conflict                shift-reduce conflict

• LR(k)
    – for each pair of parallel paths to LR(0) conflicting states, k-look-ahead
      sets are disjoint
• SLR(k):
    – if there is LR(0) conflict at nodes A and B, context-insensitive look-
      ahead sets of A and B are disjoint
• LALR(k): grammar is SLR(k) after reachability cloning
 11/23/2011                                                                    51
Example
                                                               START

                                                             SbAc                    SbBa
                                         SAa                            SBc
                                                               b                           b
• States with LR(0) conflicts                          Ad                      Bd
   – (Ad. , Bd.)
• Conflicting context pairs                   a         d                c        d
                                                              c                            a
  (i) path label: d
    – C1: START, S.Aa, A.d, Ad.
    – C2: START, S.Bc, B.d, Bd.
   (ii) path label: bd                                             END
    – C3: START, S.bAc, Sb.Ac, A.d, Ad.
                                                                  SAa | bAc | Bc | bBa
    – C4: START, S.bBa, Sb.Ba, B.d, Bd.
                                                                  Ad
• Grammar is LR(1)                                                Bd
    – Look-ahead for C1: {a}, look-ahead for C2: {c}
    – Look-ahead for C3: {c}, look-ahead for C4: {a}
    11/23/2011                                                                        52
LR(1) automaton
   SAa | bAc | Bc | bBa
   Ad
   Bd


   S.Aa,$           d   Ad., a
   S.bAc,$              Bd., c
   S.Bc,$                              Ad.,c
   S.bBa,$                             Bd.,a
                         Sb.Ac,$   d
   A.d, a           b
                         Sb.Ba,$                   c
   B.d, c                          A   SbA.c,$        SbAc.,$
                         A.d, c
                 A       B.d, a                    a
                                    B   SbB.a, $       SbBa.,$

             B                      a
                         SA.a,$        SAa.,$

                                    c
11/23/2011               SB.c,$        SBc.,$                    53
LALR look-ahead computation
• Key observation:
     – each path START s in deterministic LR(0) automaton
       represents a set of contexts in the non-deterministic LR(0)
       automaton
             • each context in this set ends at one of the items in s
     – in general, there will be multiple paths to state s in
       deterministic LR(0) automaton
     – so each state in LR(0) automaton represents a set of sets
       of contexts
     – in LALR, we merge the look-aheads for those contexts
• LALR = reachability cloning + SLR (Bermudez and
  Logothetis) + unions at some nodes (see RL.) state
  in diagram on next page


11/23/2011                                                              54
LALR(1) but not SLR(1)
                                                              S’        S$
                            shift-reduce conflict             S        L=R|R
                                                              L        *R | id
             S
                                                              R        L
                             $
 S’  .S$        S’  S.$            S’  S$.
 S .L=R
                                                          R        S  L=R.
 S  .R                                S  L=.R
             L   S  L.=R        =
 L  .*R                               R  .L
                 R  L.
 L  .id                               L  .*R            L
             R                                                         R  L.
 R  .L                                L  .id
                 S  R.                              id
                                            *                          L  id.
                   id                               id

                                       L  *.R                         L  *R.
                        *                                 R
 FOLLOW(S) = { $ }                     R  .L
 FOLLOW(R) = { =, $ }                  L  .*R
                                 *     L  .id                     L
 FOLLOW(L) = { =, $ }
11/23/2011                                                                       55
LALR  SLR grammar
S’    S$                                                S’   S$
S    L=R|R                                              S    L1 = R2 | R1
L    *R | id                                            L1,L2,L3    *R3 | id
R    L                                                  R1    L1
                                                        R2    L2
 S’  .S$
              S
                   S’  S.$
                              $
                                      S’  S$.          R3    L3
 S .L=R
                                                        R2   S  L=R.
 S  .R                                S  L=.R
              L1   S  L.=R       =
 L  .*R                               R  .L
                   R  L.
 L  .id                               L  .*R          L2
              R1                                              R  L.
 R  .L                                L  .id
                   S  R.                          id
                                           *                  L  id.
                     id                           id
                                       L  *.R                    L  *R.
                          *                             R3
                                       R  .L
                                  *    L  .*R
 11/23/2011                                                                 56
                                       L  .id               L3
LR(0): Reachability cloning
• Motivation: NFADFA
  conversion for LR grammars                                         START
• Driven by compressed paths
                                                  C1
   •   Need to verify that this cloning
       satisfies sanity condition even                     C2        1
       if grammar is not LR(0)                1                 C3
                                                           2                 2
• Compressed contexts C1 and
                                                                3
  C2 of node A are in same
                                                                         B
  equivalence class if                                 A
   set of GFG nodes reachable by
      paths with label(C1)
                                          C1 and C2 will be in the same
    =                                     equivalence class. C3 is in a different class.
   set of GFG nodes reachable by
      paths with label(C2)

 11/23/2011                                                                         57
Algorithm (need to write)
• G=(V,T,P,S):grammar
• R(G) is following grammar
     – nonterminals: {[Ai]| A in V -T, 1 <= i <= n and
       there are n edges labeled A in compressed path
       DFA}
     – terminals: T
     – start symbol: [S]
     – rules: all rules of the form [Ai]  X1'X2'X3'...Xm'
       where for some rule A  X1X2X3...Xm in P
             • Xi' = Xi if Xi is a terminal
             • [Xi] when Xi is a non-terminal.

11/23/2011                                                   58
Cloning for LALR(1)

• Same condition as LR(0): reachability
  cloning
• Extension to LA(k)LR(l):
     – cloning is governed by LR(l)
     – compute SLR(k) look-aheads
     – LALR(k) is LA(k)LR(0)
     – LR(k) is LA(k)LR(k clone as in LR(l)
11/23/2011                                    59
Summary

• New abstraction for CFL parsing
     – Grammar Flow Graph (GFG)
• Parsing problems become path problems in GFG
• Earley parser emerges as simple extension of NFA simulation
• Mechanisms
     – control number of paths followed during parsing
     – look-ahead:
             • algorithm: solving set constraints
     – context-dependent look-ahead
             • algorithm: cloning
• SLL(k), LL(k), SLR(k), LR(k), LALR(k) grammars arise from
  different choices of these mechanisms
• LL and LR parsers emerge as specializations of Earley parser


11/23/2011                                                  60
LR(0) ²DFA
     M1                                                                                             M6
                                                                 E(E+.E)                      E(E+E.)
   Eint.
                                                                              M3       E
         int
                                    ²                                                                     )
                                                                                                                    M8
                                                                +
                                    M2                                                                            E(E+E).
  E.(E+E)              ²
                                 E(.E+E)          E      E(E.+E)
  E.(E – E)                     E(.E- E)                E(E.- E)       M4
                    (
  E .int
                                                                                                                     M9
    M0                                                                -
                                         ²                                    M5                                  E(E-E).
                                                                    E(E-.E)                              )
                                                                                               M7
                                                                                   E           E(E-E.)


 ((2+3)-4)


          (                 (                2            +               3                )                  -                4
<M0,0>         <M2,0>           <M2,1>           <M1,2>       <M3,1>           <M1,4>           <M8,1>             <M5,0>
               <M0,1>           <M0,2>           <M4,1>       <M0,4>           <M6,1>           <M4,0>             <M0,7>

        )                   )
<M1,7>
  11/23/2011
           <M9,0>                                                                                                         61
<M7,0>
LALR(1) example from G&J
                                                     S’ -> S #
S’S.#
                                                     S -> A B c
    S                                                A -> a
S’ .S#                                 c            B -> b
S.ABc            SA.Bc       SAB.c       SAbc.   B -> e
A.a              B.      B
              A
                  B.b
    a
                      b
 Aa.
                   Bb.




 11/23/2011                                                       62
S     L=R|R
                                                               L     *R | id
        S    L        =            R        Send
                  C
                                                               R     L

                          id

            *
        L         T            R             Lend




       R                  L                 Rend
Shift-reduce conflict occurs at states C and Rend
             (conflicting paths are S->L->Lend->C and S->R->L->Lend->Rend)
1-look-ahead at C is =
Context-independent 1-look-ahead at Rend is {=,$} so grammar is not SLR(1).
LALR(1) figures out that for conflicting state, the calling context must SR.
Look-ahead at Rend is = for context S LTRLLendRend but there is
   11/23/2011 context S* C parallel to this one.
           no                                                                   63
LR(1)
                  S                               L                    R

                                                  *                           L
          L                 R               R            id
              =
          R                                                            FIRST(L)=FIRST(R)={*,id}
                                                                       Shift-reduce conflict: id $



              S                   [L,{=}]               [L,{$}]
                                                                           [R,{=}]      [R,{$}]
                                        *                  *
[L,{=}]               [R,{$}] [R,{=}]             id [R,{$}]      id                 [L,{=}]       [L,{$}]
     =
  [R,{$}]


      11/23/2011                                After procedure cloning                           64
LALR(1) look-aheads

                T0                     T1                   T2                        T4

              S’ .S$                S(.S) [$,)]       S(S.) [$,)]              S(S). [$,)]
S(S)         S .(S) [$]            S.(S) [)]                            )
                                                    S
S            S.                    S. [)]
                                 (
              [$]    S
                            T5              (                • After reduction S(S), parsing can
                     S’S.$                                    resume either in state T0 or T1.
                                                             • LR parser stack tells you which one to
                                                                resume from
                                                             • LALR(1) look-aheads in state T1 are
                                                                interesting. Item S(.S) gets look-ahead
                                                                from item S  .(S) in state T0 as well as
                                                                 item S(.S) from state T1.




 11/23/2011                                                                                         65
Parsing techniques
• Our focus: techniques that perform breadth-first
  traversal of GFG
     – similar to online simulation of NFA
     – input is read left to right one symbol at a time
     – extend current GFG paths if possible, using symbol
• Three dimensions:
     – non-determinism: how many paths can I follow at a given
       time?
     – look-ahead: how many symbols of look-ahead are known
       at each point?
     – context: how much context do we keep?
             • this is implemented by procedure cloning, independent of look-
               ahead

11/23/2011                                                                  66
What we would like to show
• Obvious algorithm:
     –   follow all CFL-paths in GFG
     –   essentially a fancy transitive closure in GFG
     –   leads to Earley’s algorithm
     –   O(n3) complexity
• O(n) algorithms: LL/LR/LALR,…
     – preprocessing to compute look-ahead sets
     – maintain compressed paths
     – ensure that Earley sets can be manipulated as a
       stack
11/23/2011                                               67
What we would like to show
              (contd.)
• SLL(k) = no cloning + decision at procedure start
• LL(k) = k-look-ahead-cloning+ decision at
  procedure start
• LA(l)LL(k) = l-look-ahead-cloning + context-
  independent k-look-ahead + decision at procedure
  start
• SLR(k) = no cloning + decision at procedure end
• LR(k) = k-lookahead-cloning + decision at
  procedure end
• LALR(k) = reachability-cloning + decision at
  procedure end
11/23/2011                                       68
Computing context-independent look-ahead
• Intuition:                                            S xNab | yNbc
    – simple inter-procedural                           N a |
      backward dataflow analysis in
      GFG
    – assume look-ahead at exit of                  S                          N
      GFG is {$k}
    – propagate look-ahead back             {xa}        {ya,yb}
      through GFG to determine
      look-aheads at other points           x                y
• How do we propagate look-                                                a
  aheads through non-terminal           N                        N
  calls?                                    a                b
    – would like to avoid repeatedly
      analyzing procedure for each              b        c
      look-ahead set we want to
      propagate through it
    – need to handle recursive calls
    – ideally, we would have a                          2-symbol look-aheads
      function that tells us how a
      look-ahead set at the exit of a
      procedure gets propagated to
      its input

  11/23/2011                                                                       69
Every LL(1) grammar is an SLL(1) grammar

           START         Let string generated by paths P and Q be SP and SQ

                         Cases:
C1                 C2    -SP = a and SQ = a : grammar is neither LL(1) nor SLL(1)
                         -SP = a and SQ = b : grammar is LL1() and SLL(1)
  x            y
                         -SP = and SQ = : grammar is neither LL(1) nor SLL(1)
           N             -SP = a and SQ = :
                            - We show that there cannot be a context Ci for which the
                                   generated string for the complementary context Ci’ is a
 P                 Q        - Otherwise, for context Ci, 1-lookahead for choice P is a
                                                          1-lookahead for choice Q is a
                              so the grammar is not LL(1).
                            - Therefore, there is no context Ci for which the 1-lookahead for
                                   choice Q is a.
C1’                         - But this means that the context-independent 1-lookahead
                   C2’             for choice Q cannot contain a.
                            - Therefore the grammar is SLL(1).

            END
      11/23/2011                                                                       70
LL(2) grammar that is not SLL(2)
           START
                         -Consider the context-sensitive look-aheads at N.
                         -For context C1,
                             2-lookahead for choice P is {aa}
C1                 C2
                             2-lookahead for choice Q is {ab}
      x        y         -For context C2,
           N                 2-lookahead for choice P is {ab}
                             2-lookahead for choice Q is {bc}.
      a                  -Therefore, grammar is LL(2).
  P                Q     -Context-independent lookaheads:
                             2-lookahead for choice P is {aa,ab}
                             2-lookahead for choice Q is {ab,bc}.
                         -Since these two sets are not disjoint, the grammar is
                         not SLL(2).
      a        b         -Grammar:
                   C2’     S  xNab
C1’
                           S  yNbc
      b     c              Na
                           N
           END
  11/23/2011                                                             71
Cloning for LR(k)
• From Sippu & Soissalon
     – replace each non-terminal A in the original grammar
       G with the set of all pairs of the form ([ ]k,A) where is
       a viable prefix of the $-augmented grammar G
• [page 16] String 1 is LR(k) equivalent to string 2
  if VALIDk( 1) = VALIDk( 2); i.e. exactly those items
  valid for 2 are valid 1 and vice versa.
• An item [A . ,y] is LR(k)-valid for if
     S rm*    Az rm        z =     z and k:z = y

• Question:
     – is this a finer equivalence class than LL(k)?
11/23/2011                                                    72
Sanity condition on
             equivalence classes
• If C1 and C2 are two                  START
  contexts for some node N
  and
     – C1 = B1 + P
     – C2 = B2 + P                 B1       B2
     – B1 and B2 are in the same
       equivalence class
     C1 and C2 must be in the
       same equivalence class
• Can we come up with a                 P
  general construction
  procedure for cloning,                    N
  given a specification of
  the equivalence classes?
11/23/2011                                       73

Contenu connexe

Dernier

Culture Clash_Bioethical Concerns_Slideshare Version.pptx
Culture Clash_Bioethical Concerns_Slideshare Version.pptxCulture Clash_Bioethical Concerns_Slideshare Version.pptx
Culture Clash_Bioethical Concerns_Slideshare Version.pptxStephen Palm
 
Unity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfUnity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfRebeccaSealfon
 
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Darul Amal Chishtia
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Mangal Maseeh
 
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxThe Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxNetwork Bible Fellowship
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiAmil Baba Mangal Maseeh
 
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...baharayali
 
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfUnity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfRebeccaSealfon
 
Understanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxUnderstanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxjainismworldseo
 
Asli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in LahoreAsli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in Lahoreamil baba kala jadu
 
No 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaNo 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaAmil Baba Mangal Maseeh
 
Do You Think it is a Small Matter- David’s Men.pptx
Do You Think it is a Small Matter- David’s Men.pptxDo You Think it is a Small Matter- David’s Men.pptx
Do You Think it is a Small Matter- David’s Men.pptxRick Peterson
 
Study of the Psalms Chapter 1 verse 1 by wanderean
Study of the Psalms Chapter 1 verse 1 by wandereanStudy of the Psalms Chapter 1 verse 1 by wanderean
Study of the Psalms Chapter 1 verse 1 by wandereanmaricelcanoynuay
 
Amil baba kala jadu expert asli ilm ka malik
Amil baba kala jadu expert asli ilm ka malikAmil baba kala jadu expert asli ilm ka malik
Amil baba kala jadu expert asli ilm ka malikamil baba kala jadu
 
Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Bassem Matta
 
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedA Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedVintage Church
 
Asli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAsli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAmil Baba Mangal Maseeh
 

Dernier (20)

Culture Clash_Bioethical Concerns_Slideshare Version.pptx
Culture Clash_Bioethical Concerns_Slideshare Version.pptxCulture Clash_Bioethical Concerns_Slideshare Version.pptx
Culture Clash_Bioethical Concerns_Slideshare Version.pptx
 
Unity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdfUnity is Strength 2024 Peace Haggadah + Song List.pdf
Unity is Strength 2024 Peace Haggadah + Song List.pdf
 
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
Monthly Khazina-e-Ruhaniyaat April’2024 (Vol.14, Issue 12)
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
Top 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdfTop 8 Krishna Bhajan Lyrics in English.pdf
Top 8 Krishna Bhajan Lyrics in English.pdf
 
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptxThe Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
The Chronological Life of Christ part 097 (Reality Check Luke 13 1-9).pptx
 
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in KarachiNo.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
No.1 Amil baba in Pakistan amil baba in Lahore amil baba in Karachi
 
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...Topmost Black magic specialist in Saudi Arabia  Or Bangali Amil baba in UK Or...
Topmost Black magic specialist in Saudi Arabia Or Bangali Amil baba in UK Or...
 
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdfUnity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
Unity is Strength 2024 Peace Haggadah_For Digital Viewing.pdf
 
Understanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptxUnderstanding Jainism Beliefs and Information.pptx
Understanding Jainism Beliefs and Information.pptx
 
Asli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in LahoreAsli amil baba in Karachi asli amil baba in Lahore
Asli amil baba in Karachi asli amil baba in Lahore
 
St. Louise de Marillac: Animator of the Confraternities of Charity
St. Louise de Marillac: Animator of the Confraternities of CharitySt. Louise de Marillac: Animator of the Confraternities of Charity
St. Louise de Marillac: Animator of the Confraternities of Charity
 
No 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in CanadaNo 1 astrologer amil baba in Canada Usa astrologer in Canada
No 1 astrologer amil baba in Canada Usa astrologer in Canada
 
Do You Think it is a Small Matter- David’s Men.pptx
Do You Think it is a Small Matter- David’s Men.pptxDo You Think it is a Small Matter- David’s Men.pptx
Do You Think it is a Small Matter- David’s Men.pptx
 
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort serviceyoung Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
young Whatsapp Call Girls in Adarsh Nagar🔝 9953056974 🔝 escort service
 
Study of the Psalms Chapter 1 verse 1 by wanderean
Study of the Psalms Chapter 1 verse 1 by wandereanStudy of the Psalms Chapter 1 verse 1 by wanderean
Study of the Psalms Chapter 1 verse 1 by wanderean
 
Amil baba kala jadu expert asli ilm ka malik
Amil baba kala jadu expert asli ilm ka malikAmil baba kala jadu expert asli ilm ka malik
Amil baba kala jadu expert asli ilm ka malik
 
Sawwaf Calendar, 2024
Sawwaf Calendar, 2024Sawwaf Calendar, 2024
Sawwaf Calendar, 2024
 
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - BlessedA Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
A Costly Interruption: The Sermon On the Mount, pt. 2 - Blessed
 
Asli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahirAsli amil baba near you 100%kala ilm ka mahir
Asli amil baba near you 100%kala ilm ka mahir
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

En vedette (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Parsing using graphs

  • 1. Parsing Needs New Abstractions 11/23/2011 1
  • 2. Problem • Parsing of context-free languages – active research topic from 60’s to 80’s – rich variety of parsing techniques are known • general CFL parsing: – Earley’s algorithm, Cocke-Younger-Kasami (CYK) • deterministic parsing: – SLL(k), LL(k), SLR(k), LR(k), LALR(k), LA(l)LR(k).. • Problem: most of these techniques were invented by automata theory people – terminology is fairly obscure: leftmost derivations, rightmost derivations, handles, viable prefixes, …. – string rewriting is very clean but not intuitive for most PL people – descriptions in compiler textbooks are obscure/erroneous – connections between different parsing techniques are lost • Question: is there an easier way of thinking about parsing than in terms of strings and string rewriting? 11/23/2011 2
  • 3. New abstraction • For any context-free grammar, construct a Grammar Flow Graph (GFG) – syntax: representation of grammar as a control-flow graph – semantics: executable representation • special kind of non-deterministic pushdown automaton • Parsing problems – become path problems in GFG • Alphabet soup of grammar classes like LL(k), SLL(k), LR(k), LALR(k), SLR(k) etc. can be viewed as choices along three dimensions – non-determinism: how many paths can we explore at a time? • all (Earley), only one (LL), some (LR) – look-ahead: how much do we know about future? • solve fixpoint equations over sets – context: how much do we remember about the past? • procedure cloning 11/23/2011 3
  • 4. GFG example SAa | bAc | Bc | bBa Ad START-S Bd ² ² ² ² S.bAc ² ² S.bBa ² ² b b S.Aa S.Bc START-A Sb.Ac START-B Sb.Ba ² ² SA.a ² A.d B.d SB.c d SbA.c d SbB.a a c c a SAa. Ad. Bd. SBc. ² SbAc. ² SbBa. END-A END-B ² ² ² ² ² ² END-S
  • 5. GFG construction For each non-terminal A, create nodes labeled START-A and END-A. For each production in grammar, create a “procedure” and connect to START and END nodes of LHS non-terminal as shown below. A ² START-A ² A. ² END-A A bXY ² b ² START-A A.bXY A b.XY AbX.Y AbXY. END-A ² ² ² ² START-X ……. END-X START-Y ……. END-Y Edges labeled ²: only at entry/exit of START-A and END-A nodes. Fan-out: only at exit of START-A nodes and END-A nodes 11/23/2011 transition node: node whose outgoing edge is labeled with a terminal Terminal 5
  • 6. Terminology A ² START-A ² A. ² END-A Start node Entry node Call node Return node Exit node End node A bXY ² b ² START-A A.bXY A b.XY AbX.Y AbXY. END-A ² ² ² ² START-X ……. END-X START-Y ……. END-Y 11/23/2011 6
  • 7. Non-deterministic GFG automaton • Interpretation of GFG: NGA – similar to NFA START-S • Rules: – begin at START-S – at START nodes, make non- deterministic choice b b – at END nodes, must follow CFL path • “return to the same procedure d from which you made the call” d • CFL path from START to a c c a END  leftmost derivation • Label(path): – sequence of terminal symbols labeling edges in path END-S – Label of CFL path from START to END is a word in SAa | bAc | Bc | bBa language generated by CFG Ad Bd 11/23/2011 7
  • 8. Parsing problem • Paths(l): – set of paths with label l START-S – inverse relation of Label • Parsing problem: given a grammar G and a string S, – find all paths in GFG(G) that generate S, or b b – demonstrate that there is no such path • Parallel paths: d d – P1 = START-S + A a c c a – P2 = START-S + B – Label(P1) = Label(P2) – Equivalence relation on paths originating at START- S END-S • Ambiguous grammar – two or more parallel paths SAa | bAc | Bc | bBa START-S+ END-S Ad Bd 11/23/2011 8
  • 10. Addition to GFG • We need to be able START to talk about SbAc sentential forms, not SbBa just sentences SAa SBc • Small modification to b b GFG: A Ad A B Bd – add transitions labeled B with non-terminals at a d procedure calls c d c a • Some paths will have edges labeled with non-terminals – non-terminals that END have not been “expanded out” SAa | bAc | Bc | bBa Ad Bd 11/23/2011 10
  • 11. Compressed GFG paths START • More compact representation of GFG path • Idea: START-P – collapse portion of path between P start and end of a given procedure END-P and replace with non-terminal • Point: completed calls cannot P1 affect further evolution of path so we need not store full path • Edges going out of END nodes of procedures will never appear in compressed representation 11/23/2011 11
  • 12. NFA for compressed paths • Start from extended START GFG SbAc SbBa • Remove edges out SAa SBc b b of END nodes since A Ad B Bd these will never be A B in compressed path a d c d c a • Each path in NFA corresponds to a compressed GFG END path SAa | bAc | Bc | bBa Ad Bd 11/23/2011 12
  • 13. Following all paths: Earley’s algorithm 11/23/2011 13
  • 14. Recall: NFA simulation • Input string is processed left to right, one symbol at a time • Deterministic simulator keeps track of all states NFA could be in as the input is processed • Simulation – simulated state = subset of NFA states – if current simulated state is C and next input symbol is t , compute next simulated N as {s0,s1,s4} !a {s2} !a {s2,s3,s7} …. follows: • scanning: if state si 2 C and NFA has transition si t sj, add sj to N • prediction: if state sj 2 N and NFA has transition sj sk, add sk to N – initial simulated state = set of initial states of NFA closed with prediction rule 11/23/2011 14
  • 15. Analog in GFG • First cut: use exactly the SAa | bAc | Bc | bBa same idea Ad Bd – current state C, next state N, next input symbol is t S0 – scanning: if state si 2 C and NFA has transition si t sj, add sj to N S4 S13 – prediction: if state sj 2 N and S1 b S8 b NFA has transition S5 S14 sj sk, add sk to N S2 S17 S9 S11 • Problem: not clear how to a d S6 S15 make ²-transitions at return c c d S3 S18 S10 S12 a S7 states like s18 and s12 S16 • Solution: keep “return addresses” as in Earley S19 {S0,S1,S4,S8,S13,S17,S11} !d {S12,S18, ?????} 11/23/2011 15
  • 16. START-E,0 E.(E+E),0 E.(E-E) ,0 E.int,0 0 ( ( E(.E+E),0 E(.E- E),0 START-E START-E,1 1 ( ( E.(E+E) ,1 E.(E-E) ,1 E.int,1 9 E E + int --- E.int,1 END-E ,1 2 E E E(E. +E),0 E(E. -E),0 ) ) + E(E+.E),0 START-E,3 3 E.(E+E),3 E.(E-E),3 E.int,3 E int | (E+E) | (E-E) 6 Input string: (9+6) Eint.,3 END-E,3 4 E(E+E.),0 ) E (E+E)., 0 END-E,0 5 16
  • 17. Earley parser and GFG states • A given § set can contain multiple instances of the same GFG state. • Example: SaS|a • Earley set §i – <Sa.S, i-1> – <Sa., i-1> – <S.aS, i> – <S.a, i> – <SaS. , i-2> – <SaS., i-3> – …… – <SaS., 0> 11/23/2011 17
  • 18. Earley’s parser and ambiguous grammars • If an Earley configuration §t can be added to a given § <X ® . , p1> set by two or more <Y ¯. , p2> configurations, grammar is ambiguous <Z ° A. ±, p> • Example: substring between positions p and t can be derived from A in two different ways 11/23/2011 18
  • 20. Look-ahead computation • Look-ahead at point p in GFG: – first k symbols you might encounter on path starting at p – k is a small integer that is given for entire grammar • Subtle point: – look-ahead may depend on path from START that you took to get to p – (eg) 2-look-ahead at entry of N is different for red and blue calls • Two approaches: – context-independent look-ahead: first k symbols on paths starting at p – context-dependent look-ahead: given a path C from START to p, what are the first k symbols on any path starting at p that extends C S N {xa} {ya,yb} {aa,ab} {ab,bc} x y S xNab | yNbc N a N N a | a b 11/23/2011 b c 20
  • 21. FIRSTk sets • FIRSTk(A): set of strings of length k or less – If A * s where s is a terminal string of length k or less, s ² FIRSTk(A) – If A * s where s is a string longer than k symbols, then k-prefix of s ² FIRSTk(A) • Intuition: – non-terminal A represents a set, which is the set of strings we can derive from it – FIRSTk(A) is the set of k-prefixes of these strings • Easy to extend FIRSTk to sequences of grammar symbols S N S xNab | yNbc N a | FIRST2(N)= {a, } x y FIRST2(Nab) a = {aa,ab} N N a b b c 21 11/23/2011
  • 22. Useful string functions • Concatenation: s + t – (eg) xy + abc = xyabc • k-prefix of string s: sk – (eg) (xyz)2 = xy, (x)2 = x, ( )2 = • Composition of concatenation and k-prefix: s +k t – defined as (s+t)k – (eg) x +2 yz = xy – operation is associative • Easy result: (s+t)k = (sk+tk)k = sk +k tk • Operations can be extended to sets in the obvious way – (eg) {a,bcd} +2 { ,x,yz} = {a,ax,ay,bc} 11/23/2011 22
  • 23. FIRSTk FIRSTk(²) = {²} FIRSTk(t) = {t} FIRSTk(A) = FIRSTk(X1X2…Xn) U FIRSTk(Y1Y2…Ym) U … //rhs of productions FIRSTk(X1X2..Xn) = FIRSTk(X1) +k FIRSTk(X2) +k…+k FIRSTk(Xn) 11/23/2011 23
  • 24. FIRSTk example S  aAab | bAb A  cAB | | a B FIRST2(S) = FIRST2(aAab) U FIRST2(bAb) = ({a}+2 FIRST2(A) +2 {ab}) U ({b}+2 FIRST2(A) +2 {b}) FIRST2(A) = FIRST2(cAB) U {²} U{a} = ({c} +2 FIRST2(A) +2 FIRST2(B)) U {²} U {a} FIRST2(B) = { } FIRST2(A) ={²,a,c,ca,cc} FIRST2(B) = {²} FIRST2(S)={aa,ac,bb,ba,bc} 11/23/2011 24
  • 25. Context-independent look-aheads S A B a b c A A a A a b ? b B {ab} {b$} Se={$$} Be Ae Compute FOLLOWk(A) sets: strings of length k that can be encountered after you return from non-terminal A Se = {$$} Ae = (FIRST2({ab}) +2 Se) U (FIRST2({b})+2 Se) U (FIRST2(B) +2 Ae) Be = Ae Solution: Se = {$$} Ae = {ab,b$} Be = {ab,b$} From these FOLLOW sets, we can now compute look-ahead at any GFG point.
  • 26. Computing context-independent look-ahead sets • Algorithm: – For each non-terminal A, compute FIRSTk(A) • First k terminals you encounter on path A-START + A-END – For each non-terminal A, compute FOLLOWk(A) • First k terminals you encounter on path that extends a GFG path START + A-END – Use the FIRSTk and FOLLOWk sets to compute the look-ahead at any point of interest in GFG • You can even compute FIRSTk and FOLLOWk sets in one big iteration if you want. • This computation is independent of the particular parsing method used 11/23/2011 26
  • 27. Production cloning: a way of implementing context-dependence 11/23/2011 27
  • 28. Context-dependent look-ahead • In running example, – look-aheads for N for red S call to N are disjoint N – look-aheads for N for blue {xa} {ya,yb} {aa,ab} {ab,bc} call to N are disjoint – context-independent look- x y ahead computation a combines the look-aheads N N from all the call sites of N b at the bottom of N and a {bc} propagates them to the top b c • Idea: – compute look-aheads {ab} separately for each context – keep track of context while parsing S xNab | yNbc  we can get a more capable N a | parser Input string: xab$$ 11/23/2011 28
  • 29. Tracking context by cloning • Grammar: S xNab | yNbc S  xN1ab | yN2bc N a | N1 a | N2 a | S [N,{ab}] N1 [N,{bc}] N2 {aa} {ab} {ab} {bc} x y a a [N,{ab}] N1 [N,{bc}] N2 a b b c 11/23/2011 29
  • 30. General idea of cloning • Cloning creates copies of productions • Intuitively we would like to create a clone of a production for each of its contexts and write-down look-ahead – but set of contexts for a production is usually infinite • Solution: – create finite number of equivalence classes of contexts for a given production – create a clone for each equivalence class – compute context-independent look-ahead • Two cloning rules are important in practice – k-look-ahead cloning: two contexts are in same equivalence class if their k-look-aheads are identical (used in LL(k)) – reachability cloning: two contexts C1 and C2 are in same equivalence class if the set of GFG nodes reachable by paths with label(C1) is equal to set of GFG nodes reachable by paths with label(C2) (used in LR(0)) – LR(k) uses a combination of them 11/23/2011 30
  • 31. k-look-ahead cloning (intuitive idea) S A B S a b c a b A A A a d [A,{ab} [A,{b$}] a b a b b B {b$} b {ab} [A,{ab}] [A,{b$}] c c [A,{da}] a [A,{db}] a [B,{ab}] [B,{b$}] {b$} {ab} Other clones not shown k If there are |T| terminal symbols, you may end up with 2|T| clones of a given production
  • 32. k-look-ahead cloning • G=(V,T,P,S):grammar, k:positive integer. • Tk(G) is following grammar – nonterminals: {[A,R]| A in V -T, and R µ Tk} – terminals: T – start symbol: [S,{$k}] – rules: all rules of the form [A,R]  X1'X2'X3'...Xm' where for some rule A  X1X2X3...Xm in P • Xi' = Xi if Xi is a terminal • Xi' = [Xi, FIRSTk(Xi+1,..Xm) +k R] when Xi is a non-terminal. • Intuition: – after this kind of cloning, k-look-aheads at the end of a procedure are identical for all return edges – so doing a context-independent look-ahead computation on the transformed grammar does not tell you anything you did not already know about k-look-aheads 11/23/2011 32
  • 34. Intuition • This class of grammars has the following property: – if s is a string in the language, then for any prefix p of s, there is a unique path P from START such that label(P) = p (modulo look-ahead) • So we need to follow only one path through GFG for a given input string, using look- ahead to eliminate alternatives • Roughly analogous to DFAs in the CFL world 11/23/2011 34
  • 35. LL(k) parsing • Only one path can be followed by the parser – so at procedure call for S non-terminal N, we must N know exactly which {xa} {ya,yb} {aa,ab} {ab,bc} procedure (rule) to call • Simple LL(k) parsing: x y a – make decision based on N N context-independent look- a b ahead of k symbols at entry point for N b c • LL(k) parsing: – use context-dependent look-ahead of k symbols – procedure cloning S xNab | yNbc technique converts LL(k) N a | grammar into SLL(k) grammar Grammar is LL(2) but not SLL(2) 11/23/2011 35
  • 36. Parser • Modify Earley parser to – track compressed paths instead of full paths • transitions labeled by non-terminals and terminals – eliminate return addresses • at the end of a production – A  X1X2..Xn: pop n states off and make an A transition from the exposed state – A  ² : make an A transition from current state – use look-ahead to eliminate alternatives 11/23/2011 36
  • 37. START-E E.(+ E E) E.(- E E) E.int START-E ( ( ( E(. - E E) - E E E(- . E E) + int --- START-E E  .(+ E E) E .(- E E) E .int E E ( ) ) E  (.+ E E) E + E  (+ . E E) E int | (+ E E) | (- E E) START-E E.(+ E E) E .(- E E) E .int Input string: ( - ( + 8 9 ) 7 ) 8 E E  int. E  (+ E . E) E  (- E .E) END-E START-E START-E E.(+ E E) E  (- E E) E .int E. (+ E E) E .(- E E) E .int 9 E 7 E E  int. E (+ E E.) END-E E int. E  (- E E .) ) END-E ) E(+ E E). E (- E E ). END-E END-E 37
  • 38. Many grammars are not LL(k) • Grammar – Eint | (E+E) | (E-E) E • Not clear which rule to ( ( apply until you see “+” E E or “-” + int --- – this needs unbounded E E look-ahead, so grammar is not LL(k) for any k ) ) • One solution: – follow multiple paths till only one survives 11/23/2011 38
  • 40. LR grammars (informal) • LR parsers permit limited non- START determinism – can follow more than one path but not all paths like Early • LR(0) condition: for any prefix of b b input, the corresponding fully A B extended compressed paths must A have the same label • Condition not true in general a d c d c a grammars: see example – Consider string “da” – For prefix “d”, there are two paths: • red path • blue path END – Labels of compressed paths: • red path: “A” SAa | bAc | Bc | bBa • blue path: “B” Ad • We can use modified Earley parser Bd for these grammars 11/23/2011 40
  • 41. START-E,0 START-E 0 START-E E.(E+E),0 E.(E-E) ,0 E.int,0 E.(E+E) E.(E-E) E.int ( ( ( ( ( ( E(.E+E),0 E(.E- E),0 E(.E+E) E(.E- E) E E START-E,1 1 START-E + int --- E.(E+E) ,1 E.(E-E) ,1 E.int,1 E.(E+E) E.(E-E) E.int E E 3 E E ) ) E.int,1 E(E. +E) E(E. -E) END-E ,1 2 E(E. +E),0 E(E. -E),0 + + E int | (E+E) | (E-E) E(E+.E),0 E(E+.E) START-E,3 3 START-E Input string: (3+4) E.(E+E),3 E.(E-E),3 E.int,3 E.(E+E) E.(E-E) E.int 4 4 Eint.,3 Eint. END-E,3 4 END-E E(E+E.),0 ………….. ) E (E+E)., 0 END-E,0 5 41
  • 42. Parser for LR languages • Use the modified Earley parser we used for LL grammars – each -state will have multiple items as in the original Earley parser since LR parsers follow multiple paths too • -states must follow a stack discipline for modified Earley parser to work • Since we are following multiple paths, this might break down – shift-reduce conflict: parallel compressed paths • P1 to a scan node and P2 to an EXIT node (push/pop conflict) – reduce-reduce conflict: parallel compressed paths • P1 and P2 to different EXIT nodes (pop/pop conflict) • If grammar does not have shift-reduce or reduce-reduce conflicts, we can use modified Earley parser and follow compressed paths while maintaining a stack discipline for -states • How do we determine whether grammar has shift-reduce or reduce- reduce conflicts? 11/23/2011 42
  • 43. Finding LR(0) conflicts • Compute the DFA corresponding to the compressed path NFA • If conflicting states are in same DFA state, grammar has an LR(0) conflict d Reduce-reduce conflict S.Aa Ad. S.bAc Bd. S.Bc d c A SbA.c SbAc. S.bBa b Sb.Ac A.d Sb.Ba B.d A.d B SbB.a SbBa. A a B.d B a SAa | bAc | Bc | bBa SA.a SAa. Ad 11/23/2011 Bd c 43 SB.c SBc.
  • 44. LR(0) automaton for expression grammar ) E(E+E). E(E+E.) E(E+.E) int Eint. E E.(E+E) E.(E+E) ( E.(E-E) E.(E – E) int + E.int int E .int ( E(.E+E) E.(E+E) E E(E.+E) Eint. E.(E-E) E(E.- E) E.int E(.E-E) - E(E-.E) int E.(E+E) ( E.(E-E) E.int ( E E(E-E.) ) E(E-E). 11/23/2011 44
  • 45. Parser for LR(0) languages • Use the modified Earley parser we used for LL grammars – each -state will have multiple items as in the original Earley parser since LR parsers follow multiple paths too • No need to keep track of GFG nodes within each -state – states in compressed path DFA correspond to possible -states – So modified Earley parser just pushes and pops DFA states 11/23/2011 45
  • 46. GFG path interpretation • Let P1 and P2 be two GFG START paths with identical labels • Sufficient condition for labels START-P START-P of compressed paths to be END-P equal: END-P – sequence of completed calls in P1 P2 P1 and P2 are identical • Most of the action in LR parsers happens at EXIT nodes of productions 11/23/2011 46
  • 47. LR(0) conflicts: GFG START START t* t* t* t* B Bexit u Aexit Aexit reduce-reduce conflict shift-reduce conflict • LR(0) conflicts (GFG definition): – Shift-reduce conflict: there are parallel paths P1: START + Aexit and P2: START + scan-node – Reduce-reduce conflict: there are parallel paths P1: START + Aexit and P2: START + Bexit • Claim: Let G be an LR(0) grammar according to GFG definition. – P1 and P2 are two GFG paths that end at SCAN or END nodes, and C(P1) and C(P2) are their compressed equivalents – P1 11/23/2011 and P2 have the same label iff C(P1) and C(P2) have the same label 47
  • 48. LR(0) conflicts: GFG START START t* t* t* t* B Bexit u Aexit Aexit reduce-reduce conflict shift-reduce conflict • Claim: Let G be an LR(0) grammar according to GFG definition. – P1 and P2 are two GFG paths that end at SCAN or END nodes, and C(P1) and C(P2) are their compressed equivalents – P1 and P2 have the same label iff C(P1) and C(P2) have the same label • This claim is not true if the paths do not end at SCAN or END nodes – counterexample: in this LR(0) grammar, consider paths from START to nodes S  A.a and S .Uc S  Aa | Uc U  Ab 11/23/2011 A. 48
  • 49. Example START SbAc SbBa SAa SBc b b Ad Bd • States with LR(0) conflicts – (Ad. , Bd.) a d c d • Conflicting context pairs c a (i) path label: d – C1: START, S.Aa, A.d, Ad. – C2: START, S.Bc, B.d, Bd. END (ii) path label: bd – C3: START, S.bAc, Sb.Ac, A.d, Ad. SAa | bAc | Bc | bBa – C4: START, S.bBa, Sb.Ba, B.d, Bd. Ad Bd • So grammar is not LR(0) 11/23/2011 49
  • 50. LR(0) H&U • A grammar G is LR(0) if – its start symbol does not appear on the right side of any production, and – for every viable prefix °, whenever A ! ®. is a complete valid item for °, then no other complete item nor any item with a terminal to the right of the dot is valid for °. • Comment: – by this definition, the only other valid items that can occur together with A ! ®. are incomplete items with a non-terminal to the right of the dot of the form B! ¯.C± – if First(C) contains a terminal t, it can be shown that an item of the form Y ! .t ¸ is valid for °, violating the LR(0) condition. Therefore, First(C) = {²}. It can be shown that this means ® = ² – Example: this grammar is LR(0) (A  . and B .Cd are valid items for viable prefix ² ) • SB • BCd • CA • A ² 11/23/2011 50
  • 51. Look-ahead in LR grammars START START t* t* t* t* Bexit B Aexit Aexit reduce-reduce conflict shift-reduce conflict • LR(k) – for each pair of parallel paths to LR(0) conflicting states, k-look-ahead sets are disjoint • SLR(k): – if there is LR(0) conflict at nodes A and B, context-insensitive look- ahead sets of A and B are disjoint • LALR(k): grammar is SLR(k) after reachability cloning 11/23/2011 51
  • 52. Example START SbAc SbBa SAa SBc b b • States with LR(0) conflicts Ad Bd – (Ad. , Bd.) • Conflicting context pairs a d c d c a (i) path label: d – C1: START, S.Aa, A.d, Ad. – C2: START, S.Bc, B.d, Bd. (ii) path label: bd END – C3: START, S.bAc, Sb.Ac, A.d, Ad. SAa | bAc | Bc | bBa – C4: START, S.bBa, Sb.Ba, B.d, Bd. Ad • Grammar is LR(1) Bd – Look-ahead for C1: {a}, look-ahead for C2: {c} – Look-ahead for C3: {c}, look-ahead for C4: {a} 11/23/2011 52
  • 53. LR(1) automaton SAa | bAc | Bc | bBa Ad Bd S.Aa,$ d Ad., a S.bAc,$ Bd., c S.Bc,$ Ad.,c S.bBa,$ Bd.,a Sb.Ac,$ d A.d, a b Sb.Ba,$ c B.d, c A SbA.c,$ SbAc.,$ A.d, c A B.d, a a B SbB.a, $ SbBa.,$ B a SA.a,$ SAa.,$ c 11/23/2011 SB.c,$ SBc.,$ 53
  • 54. LALR look-ahead computation • Key observation: – each path START s in deterministic LR(0) automaton represents a set of contexts in the non-deterministic LR(0) automaton • each context in this set ends at one of the items in s – in general, there will be multiple paths to state s in deterministic LR(0) automaton – so each state in LR(0) automaton represents a set of sets of contexts – in LALR, we merge the look-aheads for those contexts • LALR = reachability cloning + SLR (Bermudez and Logothetis) + unions at some nodes (see RL.) state in diagram on next page 11/23/2011 54
  • 55. LALR(1) but not SLR(1) S’ S$ shift-reduce conflict S L=R|R L *R | id S R L $ S’  .S$ S’  S.$ S’  S$. S .L=R R S  L=R. S  .R S  L=.R L S  L.=R = L  .*R R  .L R  L. L  .id L  .*R L R R  L. R  .L L  .id S  R. id * L  id. id id L  *.R L  *R. * R FOLLOW(S) = { $ } R  .L FOLLOW(R) = { =, $ } L  .*R * L  .id L FOLLOW(L) = { =, $ } 11/23/2011 55
  • 56. LALR  SLR grammar S’ S$ S’ S$ S L=R|R S L1 = R2 | R1 L *R | id L1,L2,L3 *R3 | id R L R1 L1 R2 L2 S’  .S$ S S’  S.$ $ S’  S$. R3 L3 S .L=R R2 S  L=R. S  .R S  L=.R L1 S  L.=R = L  .*R R  .L R  L. L  .id L  .*R L2 R1 R  L. R  .L L  .id S  R. id * L  id. id id L  *.R L  *R. * R3 R  .L * L  .*R 11/23/2011 56 L  .id L3
  • 57. LR(0): Reachability cloning • Motivation: NFADFA conversion for LR grammars START • Driven by compressed paths C1 • Need to verify that this cloning satisfies sanity condition even C2 1 if grammar is not LR(0) 1 C3 2 2 • Compressed contexts C1 and 3 C2 of node A are in same B equivalence class if A set of GFG nodes reachable by paths with label(C1) C1 and C2 will be in the same = equivalence class. C3 is in a different class. set of GFG nodes reachable by paths with label(C2) 11/23/2011 57
  • 58. Algorithm (need to write) • G=(V,T,P,S):grammar • R(G) is following grammar – nonterminals: {[Ai]| A in V -T, 1 <= i <= n and there are n edges labeled A in compressed path DFA} – terminals: T – start symbol: [S] – rules: all rules of the form [Ai]  X1'X2'X3'...Xm' where for some rule A  X1X2X3...Xm in P • Xi' = Xi if Xi is a terminal • [Xi] when Xi is a non-terminal. 11/23/2011 58
  • 59. Cloning for LALR(1) • Same condition as LR(0): reachability cloning • Extension to LA(k)LR(l): – cloning is governed by LR(l) – compute SLR(k) look-aheads – LALR(k) is LA(k)LR(0) – LR(k) is LA(k)LR(k clone as in LR(l) 11/23/2011 59
  • 60. Summary • New abstraction for CFL parsing – Grammar Flow Graph (GFG) • Parsing problems become path problems in GFG • Earley parser emerges as simple extension of NFA simulation • Mechanisms – control number of paths followed during parsing – look-ahead: • algorithm: solving set constraints – context-dependent look-ahead • algorithm: cloning • SLL(k), LL(k), SLR(k), LR(k), LALR(k) grammars arise from different choices of these mechanisms • LL and LR parsers emerge as specializations of Earley parser 11/23/2011 60
  • 61. LR(0) ²DFA M1 M6 E(E+.E) E(E+E.) Eint. M3 E int ² ) M8 + M2 E(E+E). E.(E+E) ² E(.E+E) E E(E.+E) E.(E – E) E(.E- E) E(E.- E) M4 ( E .int M9 M0 - ² M5 E(E-E). E(E-.E) ) M7 E E(E-E.) ((2+3)-4) ( ( 2 + 3 ) - 4 <M0,0> <M2,0> <M2,1> <M1,2> <M3,1> <M1,4> <M8,1> <M5,0> <M0,1> <M0,2> <M4,1> <M0,4> <M6,1> <M4,0> <M0,7> ) ) <M1,7> 11/23/2011 <M9,0> 61 <M7,0>
  • 62. LALR(1) example from G&J S’ -> S # S’S.# S -> A B c S A -> a S’ .S# c B -> b S.ABc SA.Bc SAB.c SAbc. B -> e A.a B. B A B.b a b Aa. Bb. 11/23/2011 62
  • 63. S L=R|R L *R | id S L = R Send C R L id * L T R Lend R L Rend Shift-reduce conflict occurs at states C and Rend (conflicting paths are S->L->Lend->C and S->R->L->Lend->Rend) 1-look-ahead at C is = Context-independent 1-look-ahead at Rend is {=,$} so grammar is not SLR(1). LALR(1) figures out that for conflicting state, the calling context must SR. Look-ahead at Rend is = for context S LTRLLendRend but there is 11/23/2011 context S* C parallel to this one. no 63
  • 64. LR(1) S L R * L L R R id = R FIRST(L)=FIRST(R)={*,id} Shift-reduce conflict: id $ S [L,{=}] [L,{$}] [R,{=}] [R,{$}] * * [L,{=}] [R,{$}] [R,{=}] id [R,{$}] id [L,{=}] [L,{$}] = [R,{$}] 11/23/2011 After procedure cloning 64
  • 65. LALR(1) look-aheads T0 T1 T2 T4 S’ .S$ S(.S) [$,)] S(S.) [$,)] S(S). [$,)] S(S) S .(S) [$] S.(S) [)] ) S S S. S. [)] ( [$] S T5 ( • After reduction S(S), parsing can S’S.$ resume either in state T0 or T1. • LR parser stack tells you which one to resume from • LALR(1) look-aheads in state T1 are interesting. Item S(.S) gets look-ahead from item S  .(S) in state T0 as well as item S(.S) from state T1. 11/23/2011 65
  • 66. Parsing techniques • Our focus: techniques that perform breadth-first traversal of GFG – similar to online simulation of NFA – input is read left to right one symbol at a time – extend current GFG paths if possible, using symbol • Three dimensions: – non-determinism: how many paths can I follow at a given time? – look-ahead: how many symbols of look-ahead are known at each point? – context: how much context do we keep? • this is implemented by procedure cloning, independent of look- ahead 11/23/2011 66
  • 67. What we would like to show • Obvious algorithm: – follow all CFL-paths in GFG – essentially a fancy transitive closure in GFG – leads to Earley’s algorithm – O(n3) complexity • O(n) algorithms: LL/LR/LALR,… – preprocessing to compute look-ahead sets – maintain compressed paths – ensure that Earley sets can be manipulated as a stack 11/23/2011 67
  • 68. What we would like to show (contd.) • SLL(k) = no cloning + decision at procedure start • LL(k) = k-look-ahead-cloning+ decision at procedure start • LA(l)LL(k) = l-look-ahead-cloning + context- independent k-look-ahead + decision at procedure start • SLR(k) = no cloning + decision at procedure end • LR(k) = k-lookahead-cloning + decision at procedure end • LALR(k) = reachability-cloning + decision at procedure end 11/23/2011 68
  • 69. Computing context-independent look-ahead • Intuition: S xNab | yNbc – simple inter-procedural N a | backward dataflow analysis in GFG – assume look-ahead at exit of S N GFG is {$k} – propagate look-ahead back {xa} {ya,yb} through GFG to determine look-aheads at other points x y • How do we propagate look- a aheads through non-terminal N N calls? a b – would like to avoid repeatedly analyzing procedure for each b c look-ahead set we want to propagate through it – need to handle recursive calls – ideally, we would have a 2-symbol look-aheads function that tells us how a look-ahead set at the exit of a procedure gets propagated to its input 11/23/2011 69
  • 70. Every LL(1) grammar is an SLL(1) grammar START Let string generated by paths P and Q be SP and SQ Cases: C1 C2 -SP = a and SQ = a : grammar is neither LL(1) nor SLL(1) -SP = a and SQ = b : grammar is LL1() and SLL(1) x y -SP = and SQ = : grammar is neither LL(1) nor SLL(1) N -SP = a and SQ = : - We show that there cannot be a context Ci for which the generated string for the complementary context Ci’ is a P Q - Otherwise, for context Ci, 1-lookahead for choice P is a 1-lookahead for choice Q is a so the grammar is not LL(1). - Therefore, there is no context Ci for which the 1-lookahead for choice Q is a. C1’ - But this means that the context-independent 1-lookahead C2’ for choice Q cannot contain a. - Therefore the grammar is SLL(1). END 11/23/2011 70
  • 71. LL(2) grammar that is not SLL(2) START -Consider the context-sensitive look-aheads at N. -For context C1, 2-lookahead for choice P is {aa} C1 C2 2-lookahead for choice Q is {ab} x y -For context C2, N 2-lookahead for choice P is {ab} 2-lookahead for choice Q is {bc}. a -Therefore, grammar is LL(2). P Q -Context-independent lookaheads: 2-lookahead for choice P is {aa,ab} 2-lookahead for choice Q is {ab,bc}. -Since these two sets are not disjoint, the grammar is not SLL(2). a b -Grammar: C2’ S  xNab C1’ S  yNbc b c Na N END 11/23/2011 71
  • 72. Cloning for LR(k) • From Sippu & Soissalon – replace each non-terminal A in the original grammar G with the set of all pairs of the form ([ ]k,A) where is a viable prefix of the $-augmented grammar G • [page 16] String 1 is LR(k) equivalent to string 2 if VALIDk( 1) = VALIDk( 2); i.e. exactly those items valid for 2 are valid 1 and vice versa. • An item [A . ,y] is LR(k)-valid for if S rm* Az rm z = z and k:z = y • Question: – is this a finer equivalence class than LL(k)? 11/23/2011 72
  • 73. Sanity condition on equivalence classes • If C1 and C2 are two START contexts for some node N and – C1 = B1 + P – C2 = B2 + P B1 B2 – B1 and B2 are in the same equivalence class C1 and C2 must be in the same equivalence class • Can we come up with a P general construction procedure for cloning, N given a specification of the equivalence classes? 11/23/2011 73