SlideShare une entreprise Scribd logo
1  sur  102
 Syntax error handling :
Programs can contain errors at many different levels. For example :
1. Lexical, such as misspelling a keyword.
2.Syntactic, such as an arithmetic expression with unbalanced
parentheses.
3.Semantic, such as an operator applied to an incompatible
operand.
4. Logical, such as an infinitely recursive call.
Functions of error handler :
1. It should report the presence of errors clearly and accurately.
2.It should recover from each error quickly enough to be able to
detect subsequent errors.
3. It should not significantly slow down the processing of correct
programs.
Some Issues related to parser:
1
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 The different strategies that a parse uses to recover from a
syntactic error are:
 Panic mode recovery: On discovering an error enywhere in the
statement, the parser discards rest of the statement by not
processing input from erroneous input to delimiters, such as
semicolon or end. It has the advantage of simplicity and does not
go into an infinite loop. When multiple errors in the same
statement are rare, this method is quite useful.
 Phrase level recovery: On discovering an error, the parser performs
local correction on the remaining input that allows it to continue.
Example: Insert a missing semicolon, replacing comma with
semicolon or delete an extraneous semicolon etc.
Error recovery strategies:
2
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Error productions: The parser is constructed using augmented
grammar with error productions for some common errors that may
occur. The augmented grammar contains productions that
generates erroneous constructs when this error encountered.
 Global correction: Given an incorrect input statement x and
grammar G, certain algorithms can be used to find a parse tree for
some closest error free statement y.This may allow the parser to
make minimal changes in the source code . However, these
methods are in general too costly in terms of time and space.
Error recovery strategies:
3
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Universal parsers
Universal parsing methods such as the Cocke-Younger-Kasami
algorithm and Earley's algorithm can parse any grammar.
 These general methods are too inefficient to use in production.
 This method is not commonly used in compilers.
 Top-down parsers Top-down methods build parse trees from the
top (root) to the bottom (leaves).
 Bottom-up parsers Bottom-up methods start from the leaves and
work their way up to the root.
Types of parsers for grammars:
4
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 CFG is used to specify the structure of legal
programs.
• The design of the grammar is an initial phase of
the design of a programming language.
 Formally a CFG G = (V, Ʃ,S,P), where:
• V = non-terminals, are variables that denote sets of (sub)strings
occurring in the language. These impose a structure on the
grammar.
• Ʃ is the set of terminal symbols in the grammar
(i.e., the set of tokens returned by the scanner)
• S is the start/goal symbol, a distinguished non-terminal in V
denoting the entire set of strings in L(G).
• P is a finite set of productions specifying how terminals and non-
terminals can be combined to form strings in the language.
Each production must have a single non-terminal on its left hand
side.
 The set V = V  Ʃ is called the vocabulary of G
Context Free Grammars(CFG)
5
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Example (G1):
E – E | E * E | E / E | - E
E  E + E |
E  ( E )
E  id
• Where
• Vt = {+, -, *, / (,), id}, Vn = {E}
• S = {E}
• Production are shown above
• Sometimes  can be replaced by ::=
 CFG is more expressive than RE - Every language that can be
described by regular expressions can also be described by a CFG
• L = {𝑎𝑛𝑏𝑛 | 𝑛 >= 1}, is an example language that can be expressed by CFG
but not by RE
 Context-free grammar is sufficient to describe most programming
languages.
Context Free Grammars(CFG)…
6
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Conventions
1. These symbols are terminals:
A. Lowercase letters early in the alphabet, such as a, b, c.
B. Operator symbols such as +, *, and so on .
C. Punctuation symbols such as parentheses , comma, and so on.
D. The digits 0, 1, ... ,9 .
E. Boldface strings such as id or if, each of which represents a single terminal
symbol.
2. These symbols are non-terminals:
i. Uppercase letters early in the alphabet, such as A, B, C.
ii. The letter S, which, when it appears, is usually the start symbol.
iii. Lowercase, italic names such as expr or stmt.
iv. Uppercase letters may be used to represent non-terminals for the constructs.
For example:- non terminals for expressions, terms, and factors are often
represented by E, T, and F, respectively.
3. Uppercase letters late in the alphabet , such as X, Y, Z, represent grammar symbols;
7
Contd.
4. Lowercase letters late in the alphabet , chiefly u, v, ... ,z , represent (possibly
empty) strings of terminals.
5. Lowercase Greek letters ,,, for example, represent (possibly empty) strings of
grammar symbols.
 Thus, a generic production can be written as A , where A is the head and  the
body.
6. A set of productions A 1, A 2, A 3,..., A k with a common head A (call
them A-productions), may be written A 1|A 2|A 3|...|A k.
 Call 1, 2, 3,...,k the alternatives for A
7. Unless stated otherwise, the head of the first production is the start symbol.
Example:- E E+ T|E-T|T
T T*F|T/F|F
F (E)|id
8
• The notational
conventions tell us that
E,T, and F are non-
terminals, with E the start
symbol.
• The remaining symbols
are terminals
 A sequence of replacements of non-terminal symbols to obtain
strings/sentences is called a derivation
• If we have a grammar E  E+E then we can replace E by
E+E
• In general a derivation step is A   if there is a
production rule A in a grammar
• where  and  are arbitrary strings of terminal and non-
terminal symbols
Derivation of a string should start from a production with start
symbol in the left
Derivation
S  
 is a sentential form (terminals & non-terminals Mixed)
 is a sentence if it contains only terminal symbols
9
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 The derivations are classified into two types based on the order of
replacement of production. They are:
Leftmost derivation (LMD): If the leftmost non-terminal is
replaced by its production in derivation, then it called leftmost
derivation.
Rightmost derivation (RMD): If the rightmost non-terminal is
replaced by its production in derivation, then it called rightmost
derivation.
 Egg. given grammar G : E → E+E | E*E | ( E ) | - E | id
 LMD for - ( id + id )
 RMD for - ( id + id )
Derivation…
10
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Parsing is the process of analyzing a continuous stream of input in order
to determine its grammatical structure with respect to a given formal
grammar.
 A parse tree is a graphical representation of a derivation. It is convenient
to see how strings are derived from the start symbol. The start symbol of
the derivation becomes the root of the parse tree.
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.
Parse Tree
E  -E E
E
-
E
E
E + E
-
( E )
E
E
-
( E )
E
id
E
E + E
-
( E )
id
E
E
E + E
-
( E )
id
 -(E)  -(E+E)
 -(id+E)  -(id+id)
11
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 An ambiguous grammar is one that produces more than
one LMD or more than one RMD for the same sentence.
E  E+E
 id+E
 id+E*E
 id+id*E
 id+id*id
Ambiguity
E
E + E
id E * E
id id
E  E*E
 E+E*E
 id+E*E
 id+id*E
 id+id*id
E
id
E + E id
id
*
E E
12
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 For the most parsers, the grammar must be unambiguous.
 If a grammar unambiguous grammar then there are unique
selection of the parse tree for a sentence
• We should eliminate the ambiguity in the grammar during
the design phase of the compiler.
 An unambiguous grammar should be written to eliminate the
ambiguity.
• We have to prefer one of the parse trees of a sentence
(generated by an ambiguous grammar) to disambiguate
that grammar to restrict to this choice.
Ambiguity
13
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 A grammar is left recursive if it has a non-terminal A
such that there is a derivation.
A  A for some string 
 Top-down parsing techniques cannot handle left-
recursive grammars.
 So, we have to convert our left-recursive grammar into
an equivalent grammar which is not left-recursive.
 Two types of left-recursion
• immediate left-recursion - appear in a single step of the derivation (),
• Indirect left-recursion - appear in more than one step of the derivation.
Left Recursion
14
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Eliminating left recursion
A  A  |  where  does not start with A
eliminate left recursion
A   A’
A’→  A’ | 
• In general,
A  A1|…|A m| 1 |... | n where 1... n do not start with A
A  1A’|…. | nA’
A’→ 1 A’ | …. |  nA’ | 
Eliminating left recursion…
Example
E→E+T | T
T→T*F | F
F→(E) | id
After eliminating the left-recursion the grammar becomes,
E → TE’
E’→+TE’|ε
T → FT’
T’→*FT’|ε
F→ (E) |id
Left factoring
When a non-terminal has two or more productions
whose right-hand sides start with the same grammar
symbols, the grammar is not LL(1) and cannot be used for
predictive parsing
A predictive parser (a top-down parser without
backtracking) insists that the grammar must be left-
factored.
In general : A  αβ1 | αβ2 , where α-is a non empty and the
first symbol of β1 and β2.
17
Left factoring…
When processing α we do not know whether to expand A to
αβ1 or to αβ2, but if we re-write the grammar as follows:
A  αA’
A’  β1 | β2 so, we can immediately expand A to αA’.
Example: given the following grammar:
S  iEtS | iEtSeS | a
E  b
Left factored, this grammar becomes:
S  iEtSS’ | a
S’  eS | ε
E  b
18
19
Left factoring…
The following
grammar:
stmt  if expr then stmt else stmt
| if expr then stmt
Cannot be parsed by a predictive parser that looks
one element ahead.
But the grammar
can be re-written:
stmt  if expr then stmt stmt’
stmt‘ else stmt | 
Where  is the empty string.
Rewriting a grammar to eliminate multiple productions
starting with the same token is called left factoring.
Left factoring…
Example 1
A  abB|aB|cdg|cdeB|cdfB
A  a A’ |cd g|cd eB|cd fB
A’  bB|B
A  a A’ |cd A’‘
A’  bB|B
A’‘ g|eB|fB
Example 2
A ad|a|ab|abc|b
A a A’ |b
A’ d|Ɛ|b |bc
A a A’ |b
A’ d| Ɛ| b A’‘
A’‘ Ɛ|c
 Describe the same language, the set of strings of a's and b's
ending in abb. So we can easily describe these languages either by
finite Automata or PDA.
 On the other hand, the language L ={anbn | n ≥1} with an equal
number of a's and b's is a prototypical example of a language that
can be described by a grammar but not by a regular expression.
 We can say that "finite automata cannot count" meaning that a
finite automaton cannot accept a language like {anbn | n ≥ 1} that
would require it to keep count of the number of a's before it sees
the b’s.
 So these kinds of languages (Context-Free Grammars) are
accepted by PDA as PDA uses stack as its memory.
Context-Free Grammars versus Regular Expressions
 Every regular language is a context-free language, but not vice-
versa.
 Example: The grammar for regular expression (a|b)*abb
21
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 The general comparison of Regular Expressions vs. Context-
Free Grammars:
Context-Free Grammars versus Regular Expressions
22
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Parsing
23
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Top Down Parsing
 Recursive-Descent Parsing
 Predictive Parser
• Recursive Predictive Parsing
• Non-Recursive Predictive Parsing
 LL(1) Parser – Parser Actions
 Constructing LL(1) - Parsing Tables
 Computing FIRST and FOLLOW functions
 LL(1) Grammars
 Properties of LL(1) Grammars
Parsing methods
Parsing
Bottom up parsing (Shift reduce)
Operator precedence
LR parsing
SLR
CLR
LALR
Top down parsing
Recursive descent
Involves Back tracking
predictive parsing
Parsing without
backtracking
Recursive
predictive
Non-Recursive
predictive
Or LL(1)
24
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Top-down parsing involves constructing a parse tree for the input
string, starting from the root
• Basically, top-down parsing can be viewed as finding a leftmost
derivation (LMD) for an input string.
 How it works? Start with the tree of one node labeled with the start symbol
and repeat the following steps until the fringe of the parse tree matches the
input string
• 1. At a node labeled A, select a production with A on its LHS and for
each symbol on its RHS, construct the appropriate child
• 2. When a terminal is added to the fringe that doesn't match the
input string, backtrack
• 3. Find the next node to be expanded
 ! Minimize the number of backtracks as much as possible
Top Down Parsing
25
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Two types of top-down parsing
• Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does
not work, we backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used
because it is not efficient
• Predictive Parsing
• no backtracking and hence efficient
• needs a special form of grammars (LL(1) grammars).
• Two types
– Recursive Predictive Parsing is a special form of
Recursive Descent Parsing without backtracking.
– Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.
Top Down Parsing…
26
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 As the name indicates, recursive descent uses recursive functions
to implement predictive parsing.
 It tries to find the left-most derivation.
 Backtracking is needed
 Example
S  aBc
B  bc | b
 input: abc
Recursive-Descent Parsing
a
27
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
B c
 A left-recursive grammar can cause a recursive-descent
parser to go into an infinite loop.
 Hence, elimination of left-recursion must be done before
parsing.
 Consider the grammar for arithmetic expressions
Example
After eliminating the left-recursion the grammar becomes,
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
Recursive-Descent Parsing
28
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Recursive-Descent Parsing
 Then write the recursive procedure for grammar as follows:
Procedur
e E()
begin T(
);
EPRIME( );
end
Procedure
Procedure
EPRIME( )
begin
If
input_symbol=’+’
then
ADVAN
CE( );
T( );
EPRIME( );
end
begin
If
input_symbol=’*’
then
ADVANCE( );
F( ); TPRIME( );
end
Procedure
F( ) begin
If input-
symbol=’id’ then
ADVANCE( );
else if input-
symbol=’(‘ then
ADVAN
CE( );
E( );
else if input-
Remember a
grammar
29
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
• Predictive parsers always build the syntax tree
from the root down to the leaves and are hence
also called (deterministic) top-down parsers
Predictive Parsing can be recursive or non-
recursive
• In recursive predictive parsing, each non-
terminal corresponds to a procedure/function.

Predictive Parser
30
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 When to apply -productions?
A  aA | bB | 
• If all other productions fail, we should apply an  -
production.
• For example, if the current token is not a or b, we may
apply the -production.
• Most correct choice: We should apply an  -production for
a non-terminal A when the current token is in the follow
set of A (which terminals can follow A in the sentential
forms).
31
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Recursive Predictive Parsing…
Recursive Predictive Parsing…
32
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Non-Recursive Predictive Parsing
 A non-recursive predictive parser can be built by
maintaining a stack explicitly, rather than implicitly via
recursive calls
 Non-Recursive predictive parsing is a table-driven top-down
parser.
Model of a table-driven predictive parser
33
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Non-Recursive Predictive Parsing…
 Input buffer
• our string to be parsed. We will assume that its end is marked with a special
symbol $.
 Output
• a production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
 Stack
• contains the grammar symbols
• at the bottom of the stack, there is a special end marker symbol $.
• initially the stack contains only the symbol $ and the starting symbol S.
• when the stack is emptied (i.e. only $ left in the stack), the parsing is
completed.
 Parsing table
• a two-dimensional array M[A,a]
• each row is a non-terminal symbol
• each column is a terminal symbol or the special symbol $
• each entry holds a production rule.
34
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 The symbol at the top of the stack (say X) and the current symbol
in the input string (say a) determine the parser action.
 There are four possible parser actions.
1. If X and a are $  parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
 parser pops X from the stack, and moves the next symbol in the input
buffer.
3. If X is a non-terminal
 parser looks at the parsing table entry M[X,a]. If M[X,a] holds a
production rule XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-
1,...,Y1 into the stack. The parser also outputs the production rule
XY1Y2...Yk to represent a step of the derivation.
4. none of the above  error
• all empty entries in the parsing table are errors.
• If X is a terminal symbol different from a, this is also an error
case.
Non-Recursive Predictive /LL(1) Parser – Parser Actions
35
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
General algorithm of LL (1) parser:
METHOD: Initially, the parser is in a configuration with w$ in the
input buffer and the start symbol S of G on top of the stack, above $.
The following procedure uses the predictive parsing table M to
produce a predictive parse for the input.
Non-Recursive Predictive /LL(1) Parser – Parser Actions
36
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
LL(1) Parser – Example1
S  aBa
B  bB | 
stack input output
$S abba$ S  aBa
$aBa
$aB
abba$
bba$ B  bB
$aBb
$aB
bba$
ba$ B  bB
$aBb
$aB
ba$
a$ B  
$a a$
$ $ accept, successful completion
a b $
S S  aBa
B B   B  bB
We will see how to
construct parsing
table next
37
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
LL(1) Parsing
Table
Assume current
string is: abba
LL(1) Parser – Example2
38
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id
id + * ( ) $
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’   T’  *FT’ T’   T’  
F F  id F  (E)
E is start symbol
LL(1) Parser – Example2…
39
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
put id+id stack input output
parsing is $E id+id$ E  TE’
oking the
$E’T id+id$ T  FT’
$E’ T’F id+id$ F  id
$ E’ T’id id+id$
$ E’ T’ +id$ T’  
$ E’ +id$ E’  +TE’
$ E’ T+ +id$
$ E’ T id$ T  FT’
$ E’ T’ F id$ F  id
$ E’ T’id id$
$ E’ T’ $ T’  
$ E’ $ E’  
$ $ accept
Parse the in
Notice: the
done by lo
parse table
non-recursive predictive /LL(1) Parsing steps
 Steps to be involved in non-recursive predictive /LL(1)
Parsing Method:
1. Stack is pushed with $.
2. Construction of parsing table T.
1. Computation of FIRST set.
2. Computation of FOLLOW set.
3. Making entries into the parsing table.
3. Parsing by the help of parsing routine.
Step 1: pushing $ to Stack and ready to start parsing
process.
40
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Two functions are used in the construction of LL(1) parsing tables:
FIRST and FOLLOW.
 These can provide (with their sets) the actual position of any
terminal in the derivation.
 FIRST() is a set of the terminal symbols which occur as first
symbols in strings derived from  where  is any string of
grammar symbols.
• if  derives to , then  is also in FIRST() .
 FOLLOW(A) is the set of the terminals which occur immediately
after the non-terminal A in the strings derived from the starting
symbol.
• a terminal a is in FOLLOW(A) if S  Aa
41
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Step 2: Constructing LL(1) Parsing Tables
1. If X is a terminal symbol, then FIRST(X)={X}
2. If X is , then FIRST(X)={}
3. If X is a non-terminal symbol and X   is a
production rule, then add  in FIRST(X).
4. If X is a non-terminal symbol and X  Y1Y2..Yn is a
production rule, then terminals in FIRST(Y1)
except  will be in FIRST(X).Further more
• if a terminal a in FIRST(Yi) and  is in all FIRST(Yj)
for j=1,...,i-1, then a is in FIRST(X).
• if  is in all FIRST(Yj) for j=1,...,n, then  is in
FIRST(X).
Compute FIRST for a String X
42
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example
E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’| 
| id
F  (E)
From Rule 1
FIRST(id) = {id}
From Rule 2
FIRST() = {}
From Rule 3 and 4
First(F) = {(, id}
First(T’) = {*, }
FIRST(T) = {(,id}
Compute FIRST for a String X…
43
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
FIRST(E’) = {+, }
FIRST(E) = {(,id}
Others
FIRST(TE’) = {(,id}
FIRST(+TE’ ) = {+}
FIRST(FT’) = {(,id}
FIRST(*FT’) = {*}
FIRST((E)) = {(}
1. $ is in FOLLOW(S), if S is the start symbol
2. Look at the occurrence of a non‐terminal on the RHS of a
production which is followed by something
• if A  B is a production rule, then everything in FIRST() except  is
FOLLOW(B)
3. Look at B on the RHS that is not followed by anything
• If ( A  B is a production rule ) or ( A  B is a production rule and 
is in FIRST() ), then everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added
to any follow set.
Compute FOLLOW (for non-terminals)
44
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example
i. E  TE’
ii. E’  +TE’ | 
iii. T  FT’
iv. T’  *FT’ | 
v. F  (E) | id
FOLLOW(E) = { $, ) }, because
• From first rule Follow (E) contains $
• From Rule 2 Follow(E) is first()), from the production F  (E)
FOLLOW(E’) = { $, ) } …. Rule 3
FOLLOW(T) = { +, ), $ }
•
•
From Rule 2 + is in FOLLOW(T)
From Rule 3 Everything in Follow(E) is in Follow(T) since First(E’)
contains 
FOLLOW(F) = {+, *, ), $ } …same reasoning as above
FOLLOW(T’) = { +, ), $ } ….Rule3
Compute FOLLOW (for non-terminals)
45
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 For each production rule A   of a grammar G
1. for each terminal a in FIRST()
 add A   to M[A,a]
2. If  in FIRST()
 for each terminal a in FOLLOW(A) add A   to M[A,a]
3. If  in FIRST() and $ in FOLLOW(A)
 add A   to M[A,$]
 All other undefined entries of the parsing table are error entries.
Constructing LL(1) Parsing Table -- Algorithm
46
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
The grammmar: Step 2: compute FIRST and FOLLOW
E→E+T | T
T→T*F | F
F→(E) | id
Step 1: Remove left recursion
E→TE’
E’→+TE’ | ϵ
T→FT’
T’→*FT’ | ϵ
F→(E) | id
Constructing LL(1) Parsing Table -- Example
First( ) :
FIRST(E) = { ( , id}
FIRST(E’) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
Step 3: constructing predictive parsing tree
47
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
How it works example: M[A, a]
Stack implementation to parse string id+id*id$ using the above parsing
Constructing LL(1) Parsing Table –
ExamplewEWAAaq ``Q``
48
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example 2: Consider this following grammar:
S → iEtS | iEtSeS | a
E → b
After eliminating left factoring, we have
S → iEtSS’ | a
S’→ eS | ε
E → b
To construct a parsing table, we need FIRST() and FOLLOW() for all the non
terminals.
FIRST(S) = { i, a }
FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
And the Parsing table is:
Constructing LL(1) Parsing Table -- Example
49
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
LL(1) Grammars
50
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 A grammar whose parsing table has no multiple-defined entries is said
to be LL(1) grammar.
• First L refers input scanned from left, the second L refers left-
most derivation and 1 refers one input symbol used as a look-
head symbol do determine parser action input scanned from left to
right
 A grammar G is LL(1) if and only if the following conditions hold for
two distinctive production rules A   and A  
1. Both  and  cannot derive strings starting with same terminals.
2. At most one of  and  can derive to .
3. If  can derive to , then  cannot derive to any string starting with a terminal
in FOLLOW(A).
 From 1 & 2, we can say that First( ) I First() = 0
 From 3, means that if  is in First(), then First( ) I Follow(A) = 0 and
the like
A Grammar which is not LL(1)
 The parsing table of a grammar may contain more than one
production rule.
• In this case, we say that it is not a LL(1) grammar.
S  i C t S E | a
E  e S | 
C  b
FIRST(iCtSE) = {i}
FIRST(a) = {a}
FIRST(eS) = {e}
FIRST() = {}
FIRST(b) = {b}
FOLLOW(S) = { $,e }
FOLLOW(E) = { $,e }
FOLLOW(C) = { t } two production rules for M[E,e]
Problem  ambiguity
a b e i t $
S S  a S  iCtSE
E E  e S
E  
E  
C C  b
51
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 What do we have to do if the resulting parsing table contains
multiply defined entries?
• Eliminate left recursion in the grammar, if it is not eliminated
• A  A | 
 any terminal that appears in FIRST() also appears
FIRST(A) because A  .
 If  is , any terminal that appears in FIRST() also
appears in FIRST(A) and FOLLOW(A).
• Left factor the grammar, if it is not left factored.
• A grammar is not left factored, it cannot be a LL(1) grammar:
A  1 | 2
any terminal that appears in FIRST(1) also appears
in FIRST( 2).
• If its (new grammar’s) parsing table still contains multiply defined entries,
that grammar is ambiguous or it is inherently not a LL(1) grammar.
• An ambiguous grammar cannot be a LL(1) grammar.
A Grammar which is not LL(1)
52
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Error Recovery in Predictive Parsing
53
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 An error may occur in the predictive parsing (LL(1) parsing)
• if the terminal symbol on the top of stack does not match with
the current input symbol.
• if the top of stack is a non-terminal A, the current input symbol
is a, the parsing table entry M[A,a] is empty.
 What should the parser do in an error case?
• The parser should be able to give an error message (as much as
possible meaningful error message).
• It should recover from that error case, and it should be able to
continue the parsing with the rest of the input.
Contents (Session-2)
54
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Bottom Up Parsing
 Handle Pruning
 Implementation of A Shift-Reduce Parser
 LR Parsers
 LR Parsing Algorithm
 Actions of an LR-Parser
 Constructing SLR Parsing Tables
 SLR(1) Grammar
 Error Recovery in LR Parsing
Bottom-Up Parsing
55
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 A bottom-up parser creates the parse tree of the given
input starting from leaves towards the root.
• A bottom-up parser tries to find the RMD of the given input in the
reverse order.
 Bottom-up parsing is also known as shift-reduce parsing
because its two main actions are shift and reduce.
• At each shift action, the current symbol in the input string is
pushed to a stack.
• At each reduction step, the symbols at the top of the stack will be
replaced by the non-terminal at the left side of that production.
• Accept: Successful completion of parsing.
• Error: Parser discovers a syntax error, and calls an error recovery
routine.
Bottom-Up Parsing…
56
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 A shift-reduce parser tries to reduce the given input string
into the starting symbol.
a string  the starting symbol
reduced to
 At each reduction step, a substring of the input matching to
the right side of a production rule is replaced by the non-
terminal at the left side of that production rule.
 If the substring is chosen correctly, the right most derivation
of that string is created in the reverse order.
Rightmost Derivation:
Shift-Reduce Parser finds:
rm rm
S  
rm
  ...  S
S  aABb input string: aaabb
A  aA | a
B  bB | b
aaAbb
aAbb  reduction
aABb
S
Shift-Reduce Parsing -- Example
S  aABb  aAbb  aaAbb  aaabb
Right Sentential Forms
 How do we know which substring to be replaced at
each reduction step?
57
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Handle & Handle pruning
 Handle: A “handle” of a string is a substring of the string that matches the right
side of a production, and whose reduction to the non terminal of the production
is one step along the reverse of rightmost derivation.
 Handle pruning: The process of discovering a handle and reducing
appropriate left hand side non terminal is known as handle pruning.
EE+E
it to
E*E
Right sentential form Handle Production
id1+id2*id3 id1 Eid
E+id2*id3 id2 Eid
E+E*id3 id3 Eid
E+E*E EE*E
E+E E+E EE+E
E
EE*E
Eid
Rightmost Derivation
E
E+E
E+E*E
E+E*id3
E+id2*id3
id1+id2*id3
String: id1+id2*id3
58
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Handle Pruning
 A right-most derivation in reverse can be obtained by handle-
pruning.
S  0  1  2  ...  n-1  n=
input string
 Start from n, find a handle Ann in n, and replace n by An to
get n-1.
 Then find a handle An-1n-1 in n-1, and replace n-1 in by An-1 to
get n-2.
 Repeat this, until we reach S.
rm
rm rm rm rm
59
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Shift reduce parser
60
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Shift-reduce parsing is a type of bottom-up parsing that attempts to
construct a parse tree for an input string beginning at the leaves (the
bottom) and working up towards the root (the top).
 The shift reduce parser performs following basic operations/actions:
1. Shift: Moving of the symbols from input buffer onto the stack, this
action is called shift.
2. Reduce: If handle appears on the top of the stack then reduction of it
by appropriate rule is done. This action is called reduce action.
3. Accept: If stack contains start symbol only and input buffer is empty at
the same time then that action is called accept.
4. Error: A situation in which parser cannot either shift or reduce the
symbols, it cannot even perform accept action then it is called error
action.
A Shift-Reduce Parser - example
61
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
E  E+T | T
T  T*F | F
F  (E) | id
Right-Most Derivation of id+id*id
E  E+T  E+T*F  E+T*id  E+F*id
 E+id*id  T+id*id  F+id*id  id+id*id
Right-Most Sentential Form
id+id*id
F+id*id
T+id*id
E+id*id
E+F*id
E+T*id
E+T*F
E+T
E
Reducing Production
F  id
T  F
E  T
F  id
T  F
F  id
T  T*F
E  E+T
are red and underlined in the right-sentential
Handles
forms.
Implementation of A Shift-Reduce
Stack
$
$id
$F
Input
id+id*id$
+id*id$
+id*id$
Action
shift
reduce by F  id
reduce by T  F
$T
$E
$E+
$E+id
$E+F
$E+T
$E+T*
$E+T*id
$E+T*F
$E+T
$E
+id*id$
+id*id$
id*id$
*id$
*id$
*id$
id$
$
$
$
$
reduce by E  T
shift
shift
reduce by F  id
reduce by T  F
shift
shift
reduce by F  id
reduce by T  T*F
reduce by E  E+T
accept
Parse Tree
Initial stack just contains
only the end-marker $
& the end of the
input string is marked
by the end-marker $.
62
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Conflicts in shift-reduce parsing
 There are two conflicts that occur in shift-reduce parsing:
 Shift-reduce conflict: The parser cannot decide whether to shift or to
reduce.
 Example: Consider the input id+id*id generated from the grammar G:
E->E+E | E*E | id
 NB: If a shift-reduce parser cannot be used for a grammar, that grammar is called
non-LR(k) grammar. An ambiguous grammar can never be an LR grammar
63
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Conflicts in shift-reduce parsing …
64
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Reduce-reduce conflict: The parser cannot decide which of
several reductions to make.
Example: Consider the input c+c generated from the grammar:
M → R+R | R+c | R, R→c
Operatorprecedenceparser
65
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
 An efficient way of constructing shift-reduce parser is called operator-
precedence parsing.
 Operator precedence parser canbe constructed from agrammarcalled
Operator-grammar.
 Operator Grammar: AGrammar in which there is no Єin RHSof
any production or no adjacent non terminals is called operator
grammar.
 Example: agrammar: is not operator grammar
becauseright side EAE hasconsecutive non terminals.
 In operator precedence parsing we define following disjoint
relations: <. , =,.>
Relation Meaning
a<.b a“yields precedence to” b
.
a=b a“has the sameprecedence as” b
a.>b a“takes precedence over” b
Rulesfor binaryoperations
66
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
 1. If operator θ1 hashigher precedence than operatorθ2, then
make θ1 . >θ2 and θ2 <. Θ1
 2. If operators θ1 and θ2, are of equal precedence, then make
θ1 . >θ2 and
θ2 . >θ1 if operators are leftassociative
θ1 <. θ2 and
θ2 <. θ1 if rightassociative
 3. Make the following for all operators θ:
θ <. id , id .>θ
θ <.(, ( <. θ
) .> θ, θ >.)
θ . >$ , $ <. Θ
 Also make ( =) , ( <. ( , ) . >) , ( <. id, id . >),
$ <. id, id . >$,.$ <. (, ) . >$
Precedence& associativityofoperators
67
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
Operator Precedence Associative
↑ 1 right
*, / 2 left
+, - 3 left
OperatorPrecedence example
Operator-precedence relations for the grammar
E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id is given in the
following table assuming
1. ↑ is of highest precedence and right-associative
2. * and / are of next higher precedence and left-associative, and
3. + and - are of lowest precedence and left-associative
Note that the blanks in the table denote error entries.
Operator-precedence relations
table 68
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
Operator precedenceparsingalgorithm
69
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
Method: Initially the stack contains $ and the input buffer the string w$.
To parse, we execute the following program:
1. Set ip to point to the first symbol of w$;
2. repeat forever
3. if $ is on top of the stack and ip points to $ then
4. return
else begin
5. let a be the topmost terminal symbol on the stack and let b be the
symbol pointed to by ip;
6. if a <. b or a = b then begin
7. push b onto the stack;
8. advance ip to the next input symbol; end;
9. else if a .> b then /*reduce*/
10. repeat
11. pop the stack until the top stack terminal is related by <. to the
Next input symbol
13. else error( ) end
Stackimplementation of operator precedenceparsing
70
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
 Operator precedence parsing usesastackand precedence relation table for its
implementation of above algorithm. It is ashift -reduce parsing containing all
four actions shift , reduce, accept anderror.
 Example: Consider the grammar E→E+E| E-E| E*E| E/E| E↑E |
(E)| id. Input string is id+id*id
Prosandconsof operator precedenceparsing
71
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
 Advantages of operator precedence parsing:
1.It is easyto implement.
2.Oncean operator precedence relation is made between all pairs of
terminals of agrammar,
the grammar canbe ignored. Thegrammar is not referred anymore
during implementation.
Disadvantages of operator precedence parsing:
1.Itis hard to handle tokens like the minus sign (-) which hastwo
different precedence.
2. Only asmall classof grammar canbe parsed using operator-
precedence parser.
 NB:Theworst caseof computation of this table (on previous slide)is
O(n2). becausethe table entry isnXn.
 T
odecrease the cost of computation, the operator function table can be
constructed. and the cost/order of operator function table computation
is O(2n)
Algorithm for constructing operator precedencefunctions
72
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
1. Create functions fa and gafor each athat is terminal or $.
2. Partition the symbols in asmany asgroups possible, in suchaway
that fa and gbare in the samegroup if a=b.
3. Create adirected graph whose nodes are in the groups, nextfor
each symbol aand b do:
a)if a<·b, place an edge from the group of gb to the group of fa
b)if a·>b, place an edge from the group of fa to the group of gb
4. If the constructed graph hasacycle then noprecedence
functions exist.
When there are no cycles collect the length of the longest paths
from the groups of fa and gbrespectively.
Example:constructing operator precedencefunctions
 Example: Grammar:
 Step1: Create functions fa and gafor each athat is terminal or$
.i.e. a={+,∗,id} or $
73
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
Example:constructingoperator precedencefunctions
 Step2: Partition the symbols in asmany asgroups possible,in such
away that fa and gbare in the samegroup if a=b.
 Step3: if a<·b, place an edge from the group of gbtothe group of
fa if a·>b, place an edge from the group of fa to the group of gb
74
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
Example:constructing operator precedencefunctions
 From this graph we canextract the followingprecedence function
table: Find the longest path to include all at once.
 Example:
Then count the edgesfor each
 Hence, this table contains information asthe sameasto
precedence relation table. And you can toimplement parsing
using this table and stack likeprevious.
71
Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
Shift-Reduce Parsers
 The most prevalent type of bottom-up parser today is based on
a concept called LR(k) parsing;
left to right right-most
 LR-Parsers covers wide range of grammars.
• Simple LR parser (SLR )
• Look Ahead LR (LALR)
• most general LR parser (LR )
 SLR, LR and LALR work same, only their parsing tables are
different.
SLR
k lookhead (k is omitted  it is 1)
CFG
LR
LALR
76
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
LR Parsers
77
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 LR parsing is attractive because:
• LR parsers can be constructed to recognize virtually all
programming-language constructs for which context-free
grammars can be written.
• LR parsing is most general non-backtracking shift-reduce
parsing, yet it is still efficient.
• The class of grammars that can be parsed using LR methods
is a proper superset of the class of grammars that can be
parsed with predictive parsers.
• LL(1)-Grammars  LR(1)-Grammars
• An LR-parser can detect a syntactic error as soon as it is
possible to do so a left-to-right scan of the input.
 Drawback of the LR method is that it is too much work
to construct an LR parser by hand.
• Use tools e.g. yacc
An overview of LR Parsing model
Sm
Xm
Sm-1
Xm-1
.
.
S1
X1
S0
a1 ... ai ... an $
Action Table Goto Table
s
t
a
t
e
S
terminals and $
s
t
a
t
e
s
non-terminal
four different
actions will be
applied
each item is
a state number
LR ParsingAlgorithm
stack
input
output
LR parser configuration
Behavior of an LR parser  describe the complete state of
the parser.
A configuration of an LR parser is a pair:
(S0 X1 S1 X2 S2… Xm Sm , ai ai+1 … an $)
79
stack
inputs
This configuration represents the right-sentential form
(X1 X2 … Xm , ai ai+1,…, an $)
Xi is the grammar symbol
represented by state Si.
Note: S0 is on the top of the stack
at the beginning of parsing
Behavior of LR parser
The parser program decides the next step by using:
the top of the stack (Sm),
the input symbol (ai), and
the parsing table which has two parts: ACTION and GOTO.
then consulting the entry ACTION[Sm , ai] in the parsing
action table
1. If Action[Sm, ai] = shift S, the parser program shifts both the
current input symbol ai and state S on the top of the stack,
entering the configuration
(S0 X1 S1 X2 S2 … Xm Sm ai S, ai+1 … an $)
80
Behavior of LR parser…
2. Action[Sm, ai] = reduce A  β: the parser pops the first 2r
symbols off the stack, where r = |β| (at this point, Sm-r will
be the state on top of the stack), entering the configuration,
(S0 X1 S1 X2 S2 … Xm-r Sm-r A S, ai ai+1 … an $)
Then A and S are pushed on top of the stack where
S = goto[Sm-r, A]. The input buffer is not modified.
3. Action[Sm, ai] = accept, parsing is completed.
4. Action[Sm, ai] = error, parsing has discovered an error and
calls an error recovery routine.
81
LR-parsing algorithm
METHOD: Initially, the parser has s0 on its stack, where s0 is the initial state,
and w$ in the input buffer.
Let a be the first symbol of w$;
while(1)
{ /* repeat forever */
let S be the state on top of the stack;
if ( ACTION[S, a] = shift t )
{ push t onto the stack;
let a be the next input symbol;
}
else if ( ACTION[S, a] = reduce Aβ) //reduce previous input symbol to head
{ pop |β| symbols off the stack;
let state t now be on top of the stack;
push GOTO[t, A] onto the stack;
output the production Aβ;
} else if ( ACTION[S, a] = accept ) break; /* parsing is done */
else call error- recovery routine;
}
82
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
(SLR) Parsing Tables for Expression Grammar
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Action Table Goto Table
Expression
Grammar
1) E  E+T
2) E  T
3) T  T*F
4) T  F
5) F  (E)
6) F  id
Steps for Constructing SLR Parsing Tables
84
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
1. Augment G and produce G' (The purpose of this new starting
production is to indicate to the parser when it should stop parsing and
announce acceptance of the input)
2. Construct the canonical collection of set of items C for G'
(finding closure/LR(0) items)
3. Compute goto(I, X), where, I is set of items and X is grammar
symbol.
4. Construction of DFA using GOTO result and find FOLLOW set
5. Construct LR(0) parsing table using the DFA result and
FOLLOW set
 An LR(0) item of a grammar G is a production of G a dot at the
some position of the right side.
• Ex: A  aBb Possible LR(0) Items:
(four different possibility)
A  .aBb
A  a.Bb
A  aB.b
A  aBb.
 Sets of LR(0) items will be the states of action and goto table of
the SLR parser.
• i.e. States represent sets of "items.“
 A collection of sets of LR(0) items (the canonical LR(0) collection)
is the basis for constructing a deterministic finite automaton that
is used to make parsing decisions.
Such an automaton is called an LR(0) automaton..
Constructing SLR Parsing Tables – LR(0) Item
 An LR parser makes shift-reduce decisions by maintaining states
to keep track of where we are in a parse.
85
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Constructing SLR Parsing Table:- closure & GOTO operation
 closure operation:
 If I is a set of items for a grammar G, then closure(I) is the set of items
constructed from I by two rules:
1. Initially, every item in I is added to closure(I).
2. If A→ α .Bβ is in closure(I) and B → γ is a production, then add the item B →
.γ to I , if it is
not already there. We apply this rule until no more new items can be added to
closure(I).
 Goto operation: Goto(I, X) (I is CLOURE set and X is all Grammar symbol)
Goto(I, X) is defined to be the closure of the set of all items [A→ αX.β] such that
[A→ α .Xβ] is in I.
GOTO (I, X) will be performed based on the following Rules:
1. IfA->X.BY where B is a Terminal, including this item only in the
CLOSURE (X) Item.
2. IfA->X.BY where B is a Non-Terminal including this item along with
B's CLOSURE (B).
86
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Algorithm for construction of SLR parsing Table
 Input:An augmented grammar G’
Output: The SLR parsing table functions action and goto for G’
Method:
1. Construct C = {I0, I1, …. In}, collection of sets of LR(0) items for G’.
2. State i is constructed from Ii.. The parsing functions for state i are determined as
follows:
If [A→α∙aβ] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to
“shift j”. Here a must be terminal.
If [A→α∙] is in Ii , then set action[i,a] to “reduceA→α” for all a in
FOLLOW(A).
If [S’→S.] is in Ii, then set action[i,$] to “accept”.
 If any conflicting actions are generated by the above rules, we say grammar is not
SLR(1).
3. The goto transitions for state i are constructed for all non -terminalsAusing the
rule : If goto(Ii,A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the set of items
containing [S’→.S].
87
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example for LR(0) parser
 Example of LR(0): Let the grammar G1:
88
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example of LR(0) parsing Table
 Step 4: Construct a DFA
 NB: I1, I4, I5, I6 are called final items. They lead to fill the
‘reduce’/ri action in specific row of action part in a
89
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example of LR(0) parsing Table
 Step 5: construct LR(0) parsing table
First level the grammar with integer value i.e
S→AA ---1
A→aA ---2
A→ b ---3
LR(0) parsing table
NB: In the LR(0) construction table whenever any state having final item in
that particular row of action part put Ri completely.
egg. in row 4, put R3, 3 is a leveled number for production in G
90
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example of LR(0) parsing Table
 Step 6: check the parser by implementing using stack for string abb$
91
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example of SLR(1) parsing Table
92
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 Example of SLR(1) parsing table: use example the above grammar previously
used in LR(0) example
 To construct SLR(1) parser it is the same as to LR(0) parser but the
difference is only construction of parsing table (that is to fill reduce part
(Ri), we must use FOLLOW set.
 If the input terminal belongs to FOLLOW set, fill the corresponding cell
else not fill the corresponding cell). Hence apply the difference only.
Find the FOLLOW set of each non terminal.
SAA---1
AaA---2
Ab -----3
 FOLLOW(S) ={$}, FOLLOW(A) ={$, a, b}
 Ab. is final item. Then FOLLOW(A) ={$, a, b}, fill row 4 of action part with
R3.
 SAA. is final item. FOLLOW(S) ={$}, fill row 5.$ cell with R1.
 AaA. is final item. FOLLOW(A) ={$, a, b}, fill row 6 of action part with R2.
Example of SLR(1) parsing Table
SLR(1) parsing table
 Stack implementation is the same as to LR(0)
93
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Example of SLR(1) parsing Table
SLR(1) parsing table
 Stack implementation is the same as to LR(0)
94
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
Constructing SLR parsing tables…
Exercise: Construct the SLR parsing table for the
following grammar and show the input string
id * id + id is parsed:
E  E + T
E  T
T  T * F
T  F
F  (E)
F  id
Answer:
I = {[E’  .E]}
Closure (I) = {[E’  .E], [E  .E + T], [E  .T], [T  .T
* F], [T  .F], [F  .(E)], [F  .id]}
95
Constructing SLR parsing tables…
I0 = {[E’  .E], [E  .E + T], [E .T], [T .T * F],
[T .F], [F .(E)], [F .id]}
I1 = Goto (I0, E) = {[E’  E.], [E  E. + T]}
I2 = Goto (I0, T) = {[E  T.], [T  T. * F]}
I3 = Goto (I0, F) = {[T  F.]}
I4 = Goto (I0, () = {[F  (.E)], [E  .E + T], [E  .T],
[T  .T * F], [T  .F], [F  . (E)], [F  .id]}
I5 = Goto (I0, id) = {[F  id.]}
I6 = Goto (I1, +) = {[E  E + . T], [T  .T * F], [T  .F],
[F  .(E)], [F  .id]}
96
I7 = Goto (I2, *) = {[T T * . F], [F .(E)],
[F  .id]}
I8 = Goto (I4, E) = {[F (E.)], [E  E . + T]}
Goto(I4,T)={[ET.], [TT.*F]}=I2;
Goto(I4,F)={[TF.]}=I3;
Goto (I4, () = I4;
Goto (I4, id) = I5;
I9 = Goto (I6, T) = {[E  E + T.], [T  T . * F]}
Goto (I6, F) = I3;
Goto (I6, () = I4;
Goto (I6, id) = I5;
I10 = Goto (I7, F) = {[T  T * F.]}
Goto (I7, () = I4;
Goto (I7, id) = I5;
I11= Goto (I8, )) = {[F  (E).]}
Goto (I8, +) = I6;
Goto (I9, *) = I7;
97
LR(0) automation
98
SLR table construction method…
Construct the SLR parsing table for the grammar G1’
Follow (E) = {+, ), $} Follow (T) = {+, ), $, *}
Follow (F) = {+, ), $,*}
E’  E
1 E  E + T
2 E  T
3 T  T * F
4 T  F
5 F  (E)
6 F  id
99
100
State action goto
id + * ( ) $ E T F
0 S5 S4 1 2 3
1 S6 accept
2 R2 S7 R2 R2
3 R4 R4 R4 R4
4 S5 S4 8 2 3
5 R6 R6 R6 R6
6 S5 S4 9 3
7 S5 S4 10
8 S6 S11
9 R1 S7 R1 R1
10 R3 R3 R3 R3
11 R5 R5 R5 R5
SLR parser…
How a shift/reduce parser parses an input string w = id * id + id using
the parsing table shown above.
3-101
LALR and CLR parser
10
Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
 NB: LR(0) and SLR(1) used LR(0) items to create a parsing table
but LALR and CLR parsers used LR(1) items in order to construct a
parsing table.
 Reading assignment
• LALR parser and
• CLR parser

Contenu connexe

Similaire à Ch3_Syntax Analysis.pptx

Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysisIffat Anjum
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdfTANZINTANZINA
 
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...movocode
 
Chapter Three(1)
Chapter Three(1)Chapter Three(1)
Chapter Three(1)bolovv
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptxObsa2
 
Chapter3pptx__2021_12_23_22_52_54.pptx
Chapter3pptx__2021_12_23_22_52_54.pptxChapter3pptx__2021_12_23_22_52_54.pptx
Chapter3pptx__2021_12_23_22_52_54.pptxDrIsikoIsaac
 
5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx
5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx
5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptxvenkatapranaykumarGa
 
Compiler design important questions
Compiler design   important questionsCompiler design   important questions
Compiler design important questionsakila viji
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.pptAmrita Sharma
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designSadia Akter
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfarpitaeron555
 
Computational model language and grammar bnf
Computational model language and grammar bnfComputational model language and grammar bnf
Computational model language and grammar bnfTaha Shakeel
 
Theory of automata and formal language lab manual
Theory of automata and formal language lab manualTheory of automata and formal language lab manual
Theory of automata and formal language lab manualNitesh Dubey
 

Similaire à Ch3_Syntax Analysis.pptx (20)

3. Syntax Analyzer.pptx
3. Syntax Analyzer.pptx3. Syntax Analyzer.pptx
3. Syntax Analyzer.pptx
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
Compilers Design
Compilers DesignCompilers Design
Compilers Design
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Lexical 2
Lexical 2Lexical 2
Lexical 2
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf
 
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
 
Chapter Three(1)
Chapter Three(1)Chapter Three(1)
Chapter Three(1)
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptx
 
Chapter3pptx__2021_12_23_22_52_54.pptx
Chapter3pptx__2021_12_23_22_52_54.pptxChapter3pptx__2021_12_23_22_52_54.pptx
Chapter3pptx__2021_12_23_22_52_54.pptx
 
5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx
5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx
5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx
 
Syntax
SyntaxSyntax
Syntax
 
Syntax part1
Syntax part1Syntax part1
Syntax part1
 
Compiler design important questions
Compiler design   important questionsCompiler design   important questions
Compiler design important questions
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.ppt
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
 
Parsing
ParsingParsing
Parsing
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdf
 
Computational model language and grammar bnf
Computational model language and grammar bnfComputational model language and grammar bnf
Computational model language and grammar bnf
 
Theory of automata and formal language lab manual
Theory of automata and formal language lab manualTheory of automata and formal language lab manual
Theory of automata and formal language lab manual
 

Dernier

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stageAbc194748
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 

Dernier (20)

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 

Ch3_Syntax Analysis.pptx

  • 1.  Syntax error handling : Programs can contain errors at many different levels. For example : 1. Lexical, such as misspelling a keyword. 2.Syntactic, such as an arithmetic expression with unbalanced parentheses. 3.Semantic, such as an operator applied to an incompatible operand. 4. Logical, such as an infinitely recursive call. Functions of error handler : 1. It should report the presence of errors clearly and accurately. 2.It should recover from each error quickly enough to be able to detect subsequent errors. 3. It should not significantly slow down the processing of correct programs. Some Issues related to parser: 1 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 2.  The different strategies that a parse uses to recover from a syntactic error are:  Panic mode recovery: On discovering an error enywhere in the statement, the parser discards rest of the statement by not processing input from erroneous input to delimiters, such as semicolon or end. It has the advantage of simplicity and does not go into an infinite loop. When multiple errors in the same statement are rare, this method is quite useful.  Phrase level recovery: On discovering an error, the parser performs local correction on the remaining input that allows it to continue. Example: Insert a missing semicolon, replacing comma with semicolon or delete an extraneous semicolon etc. Error recovery strategies: 2 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 3.  Error productions: The parser is constructed using augmented grammar with error productions for some common errors that may occur. The augmented grammar contains productions that generates erroneous constructs when this error encountered.  Global correction: Given an incorrect input statement x and grammar G, certain algorithms can be used to find a parse tree for some closest error free statement y.This may allow the parser to make minimal changes in the source code . However, these methods are in general too costly in terms of time and space. Error recovery strategies: 3 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 4.  Universal parsers Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley's algorithm can parse any grammar.  These general methods are too inefficient to use in production.  This method is not commonly used in compilers.  Top-down parsers Top-down methods build parse trees from the top (root) to the bottom (leaves).  Bottom-up parsers Bottom-up methods start from the leaves and work their way up to the root. Types of parsers for grammars: 4 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 5.  CFG is used to specify the structure of legal programs. • The design of the grammar is an initial phase of the design of a programming language.  Formally a CFG G = (V, Ʃ,S,P), where: • V = non-terminals, are variables that denote sets of (sub)strings occurring in the language. These impose a structure on the grammar. • Ʃ is the set of terminal symbols in the grammar (i.e., the set of tokens returned by the scanner) • S is the start/goal symbol, a distinguished non-terminal in V denoting the entire set of strings in L(G). • P is a finite set of productions specifying how terminals and non- terminals can be combined to form strings in the language. Each production must have a single non-terminal on its left hand side.  The set V = V  Ʃ is called the vocabulary of G Context Free Grammars(CFG) 5 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 6.  Example (G1): E – E | E * E | E / E | - E E  E + E | E  ( E ) E  id • Where • Vt = {+, -, *, / (,), id}, Vn = {E} • S = {E} • Production are shown above • Sometimes  can be replaced by ::=  CFG is more expressive than RE - Every language that can be described by regular expressions can also be described by a CFG • L = {𝑎𝑛𝑏𝑛 | 𝑛 >= 1}, is an example language that can be expressed by CFG but not by RE  Context-free grammar is sufficient to describe most programming languages. Context Free Grammars(CFG)… 6 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 7. Conventions 1. These symbols are terminals: A. Lowercase letters early in the alphabet, such as a, b, c. B. Operator symbols such as +, *, and so on . C. Punctuation symbols such as parentheses , comma, and so on. D. The digits 0, 1, ... ,9 . E. Boldface strings such as id or if, each of which represents a single terminal symbol. 2. These symbols are non-terminals: i. Uppercase letters early in the alphabet, such as A, B, C. ii. The letter S, which, when it appears, is usually the start symbol. iii. Lowercase, italic names such as expr or stmt. iv. Uppercase letters may be used to represent non-terminals for the constructs. For example:- non terminals for expressions, terms, and factors are often represented by E, T, and F, respectively. 3. Uppercase letters late in the alphabet , such as X, Y, Z, represent grammar symbols; 7
  • 8. Contd. 4. Lowercase letters late in the alphabet , chiefly u, v, ... ,z , represent (possibly empty) strings of terminals. 5. Lowercase Greek letters ,,, for example, represent (possibly empty) strings of grammar symbols.  Thus, a generic production can be written as A , where A is the head and  the body. 6. A set of productions A 1, A 2, A 3,..., A k with a common head A (call them A-productions), may be written A 1|A 2|A 3|...|A k.  Call 1, 2, 3,...,k the alternatives for A 7. Unless stated otherwise, the head of the first production is the start symbol. Example:- E E+ T|E-T|T T T*F|T/F|F F (E)|id 8 • The notational conventions tell us that E,T, and F are non- terminals, with E the start symbol. • The remaining symbols are terminals
  • 9.  A sequence of replacements of non-terminal symbols to obtain strings/sentences is called a derivation • If we have a grammar E  E+E then we can replace E by E+E • In general a derivation step is A   if there is a production rule A in a grammar • where  and  are arbitrary strings of terminal and non- terminal symbols Derivation of a string should start from a production with start symbol in the left Derivation S    is a sentential form (terminals & non-terminals Mixed)  is a sentence if it contains only terminal symbols 9 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 10.  The derivations are classified into two types based on the order of replacement of production. They are: Leftmost derivation (LMD): If the leftmost non-terminal is replaced by its production in derivation, then it called leftmost derivation. Rightmost derivation (RMD): If the rightmost non-terminal is replaced by its production in derivation, then it called rightmost derivation.  Egg. given grammar G : E → E+E | E*E | ( E ) | - E | id  LMD for - ( id + id )  RMD for - ( id + id ) Derivation… 10 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 11.  Parsing is the process of analyzing a continuous stream of input in order to determine its grammatical structure with respect to a given formal grammar.  A parse tree is a graphical representation of a derivation. It is convenient to see how strings are derived from the start symbol. The start symbol of the derivation becomes the root of the parse tree. • Inner nodes of a parse tree are non-terminal symbols. • The leaves of a parse tree are terminal symbols. Parse Tree E  -E E E - E E E + E - ( E ) E E - ( E ) E id E E + E - ( E ) id E E E + E - ( E ) id  -(E)  -(E+E)  -(id+E)  -(id+id) 11 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 12.  An ambiguous grammar is one that produces more than one LMD or more than one RMD for the same sentence. E  E+E  id+E  id+E*E  id+id*E  id+id*id Ambiguity E E + E id E * E id id E  E*E  E+E*E  id+E*E  id+id*E  id+id*id E id E + E id id * E E 12 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 13.  For the most parsers, the grammar must be unambiguous.  If a grammar unambiguous grammar then there are unique selection of the parse tree for a sentence • We should eliminate the ambiguity in the grammar during the design phase of the compiler.  An unambiguous grammar should be written to eliminate the ambiguity. • We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice. Ambiguity 13 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 14.  A grammar is left recursive if it has a non-terminal A such that there is a derivation. A  A for some string   Top-down parsing techniques cannot handle left- recursive grammars.  So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive.  Two types of left-recursion • immediate left-recursion - appear in a single step of the derivation (), • Indirect left-recursion - appear in more than one step of the derivation. Left Recursion 14 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 15. Eliminating left recursion A  A  |  where  does not start with A eliminate left recursion A   A’ A’→  A’ |  • In general, A  A1|…|A m| 1 |... | n where 1... n do not start with A A  1A’|…. | nA’ A’→ 1 A’ | …. |  nA’ | 
  • 16. Eliminating left recursion… Example E→E+T | T T→T*F | F F→(E) | id After eliminating the left-recursion the grammar becomes, E → TE’ E’→+TE’|ε T → FT’ T’→*FT’|ε F→ (E) |id
  • 17. Left factoring When a non-terminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing A predictive parser (a top-down parser without backtracking) insists that the grammar must be left- factored. In general : A  αβ1 | αβ2 , where α-is a non empty and the first symbol of β1 and β2. 17
  • 18. Left factoring… When processing α we do not know whether to expand A to αβ1 or to αβ2, but if we re-write the grammar as follows: A  αA’ A’  β1 | β2 so, we can immediately expand A to αA’. Example: given the following grammar: S  iEtS | iEtSeS | a E  b Left factored, this grammar becomes: S  iEtSS’ | a S’  eS | ε E  b 18
  • 19. 19 Left factoring… The following grammar: stmt  if expr then stmt else stmt | if expr then stmt Cannot be parsed by a predictive parser that looks one element ahead. But the grammar can be re-written: stmt  if expr then stmt stmt’ stmt‘ else stmt |  Where  is the empty string. Rewriting a grammar to eliminate multiple productions starting with the same token is called left factoring.
  • 20. Left factoring… Example 1 A  abB|aB|cdg|cdeB|cdfB A  a A’ |cd g|cd eB|cd fB A’  bB|B A  a A’ |cd A’‘ A’  bB|B A’‘ g|eB|fB Example 2 A ad|a|ab|abc|b A a A’ |b A’ d|Ɛ|b |bc A a A’ |b A’ d| Ɛ| b A’‘ A’‘ Ɛ|c
  • 21.  Describe the same language, the set of strings of a's and b's ending in abb. So we can easily describe these languages either by finite Automata or PDA.  On the other hand, the language L ={anbn | n ≥1} with an equal number of a's and b's is a prototypical example of a language that can be described by a grammar but not by a regular expression.  We can say that "finite automata cannot count" meaning that a finite automaton cannot accept a language like {anbn | n ≥ 1} that would require it to keep count of the number of a's before it sees the b’s.  So these kinds of languages (Context-Free Grammars) are accepted by PDA as PDA uses stack as its memory. Context-Free Grammars versus Regular Expressions  Every regular language is a context-free language, but not vice- versa.  Example: The grammar for regular expression (a|b)*abb 21 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 22.  The general comparison of Regular Expressions vs. Context- Free Grammars: Context-Free Grammars versus Regular Expressions 22 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 23. Parsing 23 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  Top Down Parsing  Recursive-Descent Parsing  Predictive Parser • Recursive Predictive Parsing • Non-Recursive Predictive Parsing  LL(1) Parser – Parser Actions  Constructing LL(1) - Parsing Tables  Computing FIRST and FOLLOW functions  LL(1) Grammars  Properties of LL(1) Grammars
  • 24. Parsing methods Parsing Bottom up parsing (Shift reduce) Operator precedence LR parsing SLR CLR LALR Top down parsing Recursive descent Involves Back tracking predictive parsing Parsing without backtracking Recursive predictive Non-Recursive predictive Or LL(1) 24 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 25.  Top-down parsing involves constructing a parse tree for the input string, starting from the root • Basically, top-down parsing can be viewed as finding a leftmost derivation (LMD) for an input string.  How it works? Start with the tree of one node labeled with the start symbol and repeat the following steps until the fringe of the parse tree matches the input string • 1. At a node labeled A, select a production with A on its LHS and for each symbol on its RHS, construct the appropriate child • 2. When a terminal is added to the fringe that doesn't match the input string, backtrack • 3. Find the next node to be expanded  ! Minimize the number of backtracks as much as possible Top Down Parsing 25 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 26.  Two types of top-down parsing • Recursive-Descent Parsing • Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other alternatives.) • It is a general parsing technique, but not widely used because it is not efficient • Predictive Parsing • no backtracking and hence efficient • needs a special form of grammars (LL(1) grammars). • Two types – Recursive Predictive Parsing is a special form of Recursive Descent Parsing without backtracking. – Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser. Top Down Parsing… 26 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 27.  As the name indicates, recursive descent uses recursive functions to implement predictive parsing.  It tries to find the left-most derivation.  Backtracking is needed  Example S  aBc B  bc | b  input: abc Recursive-Descent Parsing a 27 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology B c
  • 28.  A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop.  Hence, elimination of left-recursion must be done before parsing.  Consider the grammar for arithmetic expressions Example After eliminating the left-recursion the grammar becomes, E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → (E) | id Recursive-Descent Parsing 28 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 29. Recursive-Descent Parsing  Then write the recursive procedure for grammar as follows: Procedur e E() begin T( ); EPRIME( ); end Procedure Procedure EPRIME( ) begin If input_symbol=’+’ then ADVAN CE( ); T( ); EPRIME( ); end begin If input_symbol=’*’ then ADVANCE( ); F( ); TPRIME( ); end Procedure F( ) begin If input- symbol=’id’ then ADVANCE( ); else if input- symbol=’(‘ then ADVAN CE( ); E( ); else if input- Remember a grammar 29 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 30. • Predictive parsers always build the syntax tree from the root down to the leaves and are hence also called (deterministic) top-down parsers Predictive Parsing can be recursive or non- recursive • In recursive predictive parsing, each non- terminal corresponds to a procedure/function.  Predictive Parser 30 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 31.  When to apply -productions? A  aA | bB |  • If all other productions fail, we should apply an  - production. • For example, if the current token is not a or b, we may apply the -production. • Most correct choice: We should apply an  -production for a non-terminal A when the current token is in the follow set of A (which terminals can follow A in the sentential forms). 31 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology Recursive Predictive Parsing…
  • 32. Recursive Predictive Parsing… 32 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 33. Non-Recursive Predictive Parsing  A non-recursive predictive parser can be built by maintaining a stack explicitly, rather than implicitly via recursive calls  Non-Recursive predictive parsing is a table-driven top-down parser. Model of a table-driven predictive parser 33 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 34. Non-Recursive Predictive Parsing…  Input buffer • our string to be parsed. We will assume that its end is marked with a special symbol $.  Output • a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input buffer.  Stack • contains the grammar symbols • at the bottom of the stack, there is a special end marker symbol $. • initially the stack contains only the symbol $ and the starting symbol S. • when the stack is emptied (i.e. only $ left in the stack), the parsing is completed.  Parsing table • a two-dimensional array M[A,a] • each row is a non-terminal symbol • each column is a terminal symbol or the special symbol $ • each entry holds a production rule. 34 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 35.  The symbol at the top of the stack (say X) and the current symbol in the input string (say a) determine the parser action.  There are four possible parser actions. 1. If X and a are $  parser halts (successful completion) 2. If X and a are the same terminal symbol (different from $)  parser pops X from the stack, and moves the next symbol in the input buffer. 3. If X is a non-terminal  parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk- 1,...,Y1 into the stack. The parser also outputs the production rule XY1Y2...Yk to represent a step of the derivation. 4. none of the above  error • all empty entries in the parsing table are errors. • If X is a terminal symbol different from a, this is also an error case. Non-Recursive Predictive /LL(1) Parser – Parser Actions 35 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 36. General algorithm of LL (1) parser: METHOD: Initially, the parser is in a configuration with w$ in the input buffer and the start symbol S of G on top of the stack, above $. The following procedure uses the predictive parsing table M to produce a predictive parse for the input. Non-Recursive Predictive /LL(1) Parser – Parser Actions 36 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 37. LL(1) Parser – Example1 S  aBa B  bB |  stack input output $S abba$ S  aBa $aBa $aB abba$ bba$ B  bB $aBb $aB bba$ ba$ B  bB $aBb $aB ba$ a$ B   $a a$ $ $ accept, successful completion a b $ S S  aBa B B   B  bB We will see how to construct parsing table next 37 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology LL(1) Parsing Table Assume current string is: abba
  • 38. LL(1) Parser – Example2 38 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology E  TE’ E’  +TE’ |  T  FT’ T’  *FT’ |  F  (E) | id id + * ( ) $ E E  TE’ E  TE’ E’ E’  +TE’ E’   E’   T T  FT’ T  FT’ T’ T’   T’  *FT’ T’   T’   F F  id F  (E) E is start symbol
  • 39. LL(1) Parser – Example2… 39 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology put id+id stack input output parsing is $E id+id$ E  TE’ oking the $E’T id+id$ T  FT’ $E’ T’F id+id$ F  id $ E’ T’id id+id$ $ E’ T’ +id$ T’   $ E’ +id$ E’  +TE’ $ E’ T+ +id$ $ E’ T id$ T  FT’ $ E’ T’ F id$ F  id $ E’ T’id id$ $ E’ T’ $ T’   $ E’ $ E’   $ $ accept Parse the in Notice: the done by lo parse table
  • 40. non-recursive predictive /LL(1) Parsing steps  Steps to be involved in non-recursive predictive /LL(1) Parsing Method: 1. Stack is pushed with $. 2. Construction of parsing table T. 1. Computation of FIRST set. 2. Computation of FOLLOW set. 3. Making entries into the parsing table. 3. Parsing by the help of parsing routine. Step 1: pushing $ to Stack and ready to start parsing process. 40 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 41.  Two functions are used in the construction of LL(1) parsing tables: FIRST and FOLLOW.  These can provide (with their sets) the actual position of any terminal in the derivation.  FIRST() is a set of the terminal symbols which occur as first symbols in strings derived from  where  is any string of grammar symbols. • if  derives to , then  is also in FIRST() .  FOLLOW(A) is the set of the terminals which occur immediately after the non-terminal A in the strings derived from the starting symbol. • a terminal a is in FOLLOW(A) if S  Aa 41 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology Step 2: Constructing LL(1) Parsing Tables
  • 42. 1. If X is a terminal symbol, then FIRST(X)={X} 2. If X is , then FIRST(X)={} 3. If X is a non-terminal symbol and X   is a production rule, then add  in FIRST(X). 4. If X is a non-terminal symbol and X  Y1Y2..Yn is a production rule, then terminals in FIRST(Y1) except  will be in FIRST(X).Further more • if a terminal a in FIRST(Yi) and  is in all FIRST(Yj) for j=1,...,i-1, then a is in FIRST(X). • if  is in all FIRST(Yj) for j=1,...,n, then  is in FIRST(X). Compute FIRST for a String X 42 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 43. Example E  TE’ E’  +TE’ |  T  FT’ T’  *FT’|  | id F  (E) From Rule 1 FIRST(id) = {id} From Rule 2 FIRST() = {} From Rule 3 and 4 First(F) = {(, id} First(T’) = {*, } FIRST(T) = {(,id} Compute FIRST for a String X… 43 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology FIRST(E’) = {+, } FIRST(E) = {(,id} Others FIRST(TE’) = {(,id} FIRST(+TE’ ) = {+} FIRST(FT’) = {(,id} FIRST(*FT’) = {*} FIRST((E)) = {(}
  • 44. 1. $ is in FOLLOW(S), if S is the start symbol 2. Look at the occurrence of a non‐terminal on the RHS of a production which is followed by something • if A  B is a production rule, then everything in FIRST() except  is FOLLOW(B) 3. Look at B on the RHS that is not followed by anything • If ( A  B is a production rule ) or ( A  B is a production rule and  is in FIRST() ), then everything in FOLLOW(A) is in FOLLOW(B). We apply these rules until nothing more can be added to any follow set. Compute FOLLOW (for non-terminals) 44 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 45. Example i. E  TE’ ii. E’  +TE’ |  iii. T  FT’ iv. T’  *FT’ |  v. F  (E) | id FOLLOW(E) = { $, ) }, because • From first rule Follow (E) contains $ • From Rule 2 Follow(E) is first()), from the production F  (E) FOLLOW(E’) = { $, ) } …. Rule 3 FOLLOW(T) = { +, ), $ } • • From Rule 2 + is in FOLLOW(T) From Rule 3 Everything in Follow(E) is in Follow(T) since First(E’) contains  FOLLOW(F) = {+, *, ), $ } …same reasoning as above FOLLOW(T’) = { +, ), $ } ….Rule3 Compute FOLLOW (for non-terminals) 45 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 46.  For each production rule A   of a grammar G 1. for each terminal a in FIRST()  add A   to M[A,a] 2. If  in FIRST()  for each terminal a in FOLLOW(A) add A   to M[A,a] 3. If  in FIRST() and $ in FOLLOW(A)  add A   to M[A,$]  All other undefined entries of the parsing table are error entries. Constructing LL(1) Parsing Table -- Algorithm 46 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 47. The grammmar: Step 2: compute FIRST and FOLLOW E→E+T | T T→T*F | F F→(E) | id Step 1: Remove left recursion E→TE’ E’→+TE’ | ϵ T→FT’ T’→*FT’ | ϵ F→(E) | id Constructing LL(1) Parsing Table -- Example First( ) : FIRST(E) = { ( , id} FIRST(E’) ={+ , ε } FIRST(T) = { ( , id} FIRST(T’) = {*, ε } FIRST(F) = { ( , id } Follow( ): FOLLOW(E) = { $, ) } FOLLOW(E’) = { $, ) } FOLLOW(T) = { +, $, ) } FOLLOW(T’) = { +, $, ) } FOLLOW(F) = {+, * , $ , ) } Step 3: constructing predictive parsing tree 47 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology How it works example: M[A, a]
  • 48. Stack implementation to parse string id+id*id$ using the above parsing Constructing LL(1) Parsing Table – ExamplewEWAAaq ``Q`` 48 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 49. Example 2: Consider this following grammar: S → iEtS | iEtSeS | a E → b After eliminating left factoring, we have S → iEtSS’ | a S’→ eS | ε E → b To construct a parsing table, we need FIRST() and FOLLOW() for all the non terminals. FIRST(S) = { i, a } FIRST(S’) = {e, ε } FIRST(E) = { b} FOLLOW(S) = { $ ,e } FOLLOW(S’) = { $ ,e } FOLLOW(E) = {t} And the Parsing table is: Constructing LL(1) Parsing Table -- Example 49 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 50. LL(1) Grammars 50 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  A grammar whose parsing table has no multiple-defined entries is said to be LL(1) grammar. • First L refers input scanned from left, the second L refers left- most derivation and 1 refers one input symbol used as a look- head symbol do determine parser action input scanned from left to right  A grammar G is LL(1) if and only if the following conditions hold for two distinctive production rules A   and A   1. Both  and  cannot derive strings starting with same terminals. 2. At most one of  and  can derive to . 3. If  can derive to , then  cannot derive to any string starting with a terminal in FOLLOW(A).  From 1 & 2, we can say that First( ) I First() = 0  From 3, means that if  is in First(), then First( ) I Follow(A) = 0 and the like
  • 51. A Grammar which is not LL(1)  The parsing table of a grammar may contain more than one production rule. • In this case, we say that it is not a LL(1) grammar. S  i C t S E | a E  e S |  C  b FIRST(iCtSE) = {i} FIRST(a) = {a} FIRST(eS) = {e} FIRST() = {} FIRST(b) = {b} FOLLOW(S) = { $,e } FOLLOW(E) = { $,e } FOLLOW(C) = { t } two production rules for M[E,e] Problem  ambiguity a b e i t $ S S  a S  iCtSE E E  e S E   E   C C  b 51 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 52.  What do we have to do if the resulting parsing table contains multiply defined entries? • Eliminate left recursion in the grammar, if it is not eliminated • A  A |   any terminal that appears in FIRST() also appears FIRST(A) because A  .  If  is , any terminal that appears in FIRST() also appears in FIRST(A) and FOLLOW(A). • Left factor the grammar, if it is not left factored. • A grammar is not left factored, it cannot be a LL(1) grammar: A  1 | 2 any terminal that appears in FIRST(1) also appears in FIRST( 2). • If its (new grammar’s) parsing table still contains multiply defined entries, that grammar is ambiguous or it is inherently not a LL(1) grammar. • An ambiguous grammar cannot be a LL(1) grammar. A Grammar which is not LL(1) 52 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 53. Error Recovery in Predictive Parsing 53 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  An error may occur in the predictive parsing (LL(1) parsing) • if the terminal symbol on the top of stack does not match with the current input symbol. • if the top of stack is a non-terminal A, the current input symbol is a, the parsing table entry M[A,a] is empty.  What should the parser do in an error case? • The parser should be able to give an error message (as much as possible meaningful error message). • It should recover from that error case, and it should be able to continue the parsing with the rest of the input.
  • 54. Contents (Session-2) 54 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  Bottom Up Parsing  Handle Pruning  Implementation of A Shift-Reduce Parser  LR Parsers  LR Parsing Algorithm  Actions of an LR-Parser  Constructing SLR Parsing Tables  SLR(1) Grammar  Error Recovery in LR Parsing
  • 55. Bottom-Up Parsing 55 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  A bottom-up parser creates the parse tree of the given input starting from leaves towards the root. • A bottom-up parser tries to find the RMD of the given input in the reverse order.  Bottom-up parsing is also known as shift-reduce parsing because its two main actions are shift and reduce. • At each shift action, the current symbol in the input string is pushed to a stack. • At each reduction step, the symbols at the top of the stack will be replaced by the non-terminal at the left side of that production. • Accept: Successful completion of parsing. • Error: Parser discovers a syntax error, and calls an error recovery routine.
  • 56. Bottom-Up Parsing… 56 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  A shift-reduce parser tries to reduce the given input string into the starting symbol. a string  the starting symbol reduced to  At each reduction step, a substring of the input matching to the right side of a production rule is replaced by the non- terminal at the left side of that production rule.  If the substring is chosen correctly, the right most derivation of that string is created in the reverse order. Rightmost Derivation: Shift-Reduce Parser finds: rm rm S   rm   ...  S
  • 57. S  aABb input string: aaabb A  aA | a B  bB | b aaAbb aAbb  reduction aABb S Shift-Reduce Parsing -- Example S  aABb  aAbb  aaAbb  aaabb Right Sentential Forms  How do we know which substring to be replaced at each reduction step? 57 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 58. Handle & Handle pruning  Handle: A “handle” of a string is a substring of the string that matches the right side of a production, and whose reduction to the non terminal of the production is one step along the reverse of rightmost derivation.  Handle pruning: The process of discovering a handle and reducing appropriate left hand side non terminal is known as handle pruning. EE+E it to E*E Right sentential form Handle Production id1+id2*id3 id1 Eid E+id2*id3 id2 Eid E+E*id3 id3 Eid E+E*E EE*E E+E E+E EE+E E EE*E Eid Rightmost Derivation E E+E E+E*E E+E*id3 E+id2*id3 id1+id2*id3 String: id1+id2*id3 58 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 59. Handle Pruning  A right-most derivation in reverse can be obtained by handle- pruning. S  0  1  2  ...  n-1  n= input string  Start from n, find a handle Ann in n, and replace n by An to get n-1.  Then find a handle An-1n-1 in n-1, and replace n-1 in by An-1 to get n-2.  Repeat this, until we reach S. rm rm rm rm rm 59 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 60. Shift reduce parser 60 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).  The shift reduce parser performs following basic operations/actions: 1. Shift: Moving of the symbols from input buffer onto the stack, this action is called shift. 2. Reduce: If handle appears on the top of the stack then reduction of it by appropriate rule is done. This action is called reduce action. 3. Accept: If stack contains start symbol only and input buffer is empty at the same time then that action is called accept. 4. Error: A situation in which parser cannot either shift or reduce the symbols, it cannot even perform accept action then it is called error action.
  • 61. A Shift-Reduce Parser - example 61 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology E  E+T | T T  T*F | F F  (E) | id Right-Most Derivation of id+id*id E  E+T  E+T*F  E+T*id  E+F*id  E+id*id  T+id*id  F+id*id  id+id*id Right-Most Sentential Form id+id*id F+id*id T+id*id E+id*id E+F*id E+T*id E+T*F E+T E Reducing Production F  id T  F E  T F  id T  F F  id T  T*F E  E+T are red and underlined in the right-sentential Handles forms.
  • 62. Implementation of A Shift-Reduce Stack $ $id $F Input id+id*id$ +id*id$ +id*id$ Action shift reduce by F  id reduce by T  F $T $E $E+ $E+id $E+F $E+T $E+T* $E+T*id $E+T*F $E+T $E +id*id$ +id*id$ id*id$ *id$ *id$ *id$ id$ $ $ $ $ reduce by E  T shift shift reduce by F  id reduce by T  F shift shift reduce by F  id reduce by T  T*F reduce by E  E+T accept Parse Tree Initial stack just contains only the end-marker $ & the end of the input string is marked by the end-marker $. 62 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 63. Conflicts in shift-reduce parsing  There are two conflicts that occur in shift-reduce parsing:  Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.  Example: Consider the input id+id*id generated from the grammar G: E->E+E | E*E | id  NB: If a shift-reduce parser cannot be used for a grammar, that grammar is called non-LR(k) grammar. An ambiguous grammar can never be an LR grammar 63 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 64. Conflicts in shift-reduce parsing … 64 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  Reduce-reduce conflict: The parser cannot decide which of several reductions to make. Example: Consider the input c+c generated from the grammar: M → R+R | R+c | R, R→c
  • 65. Operatorprecedenceparser 65 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology  An efficient way of constructing shift-reduce parser is called operator- precedence parsing.  Operator precedence parser canbe constructed from agrammarcalled Operator-grammar.  Operator Grammar: AGrammar in which there is no Єin RHSof any production or no adjacent non terminals is called operator grammar.  Example: agrammar: is not operator grammar becauseright side EAE hasconsecutive non terminals.  In operator precedence parsing we define following disjoint relations: <. , =,.> Relation Meaning a<.b a“yields precedence to” b . a=b a“has the sameprecedence as” b a.>b a“takes precedence over” b
  • 66. Rulesfor binaryoperations 66 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology  1. If operator θ1 hashigher precedence than operatorθ2, then make θ1 . >θ2 and θ2 <. Θ1  2. If operators θ1 and θ2, are of equal precedence, then make θ1 . >θ2 and θ2 . >θ1 if operators are leftassociative θ1 <. θ2 and θ2 <. θ1 if rightassociative  3. Make the following for all operators θ: θ <. id , id .>θ θ <.(, ( <. θ ) .> θ, θ >.) θ . >$ , $ <. Θ  Also make ( =) , ( <. ( , ) . >) , ( <. id, id . >), $ <. id, id . >$,.$ <. (, ) . >$
  • 67. Precedence& associativityofoperators 67 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology Operator Precedence Associative ↑ 1 right *, / 2 left +, - 3 left
  • 68. OperatorPrecedence example Operator-precedence relations for the grammar E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id is given in the following table assuming 1. ↑ is of highest precedence and right-associative 2. * and / are of next higher precedence and left-associative, and 3. + and - are of lowest precedence and left-associative Note that the blanks in the table denote error entries. Operator-precedence relations table 68 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
  • 69. Operator precedenceparsingalgorithm 69 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology Method: Initially the stack contains $ and the input buffer the string w$. To parse, we execute the following program: 1. Set ip to point to the first symbol of w$; 2. repeat forever 3. if $ is on top of the stack and ip points to $ then 4. return else begin 5. let a be the topmost terminal symbol on the stack and let b be the symbol pointed to by ip; 6. if a <. b or a = b then begin 7. push b onto the stack; 8. advance ip to the next input symbol; end; 9. else if a .> b then /*reduce*/ 10. repeat 11. pop the stack until the top stack terminal is related by <. to the Next input symbol 13. else error( ) end
  • 70. Stackimplementation of operator precedenceparsing 70 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology  Operator precedence parsing usesastackand precedence relation table for its implementation of above algorithm. It is ashift -reduce parsing containing all four actions shift , reduce, accept anderror.  Example: Consider the grammar E→E+E| E-E| E*E| E/E| E↑E | (E)| id. Input string is id+id*id
  • 71. Prosandconsof operator precedenceparsing 71 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology  Advantages of operator precedence parsing: 1.It is easyto implement. 2.Oncean operator precedence relation is made between all pairs of terminals of agrammar, the grammar canbe ignored. Thegrammar is not referred anymore during implementation. Disadvantages of operator precedence parsing: 1.Itis hard to handle tokens like the minus sign (-) which hastwo different precedence. 2. Only asmall classof grammar canbe parsed using operator- precedence parser.  NB:Theworst caseof computation of this table (on previous slide)is O(n2). becausethe table entry isnXn.  T odecrease the cost of computation, the operator function table can be constructed. and the cost/order of operator function table computation is O(2n)
  • 72. Algorithm for constructing operator precedencefunctions 72 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology 1. Create functions fa and gafor each athat is terminal or $. 2. Partition the symbols in asmany asgroups possible, in suchaway that fa and gbare in the samegroup if a=b. 3. Create adirected graph whose nodes are in the groups, nextfor each symbol aand b do: a)if a<·b, place an edge from the group of gb to the group of fa b)if a·>b, place an edge from the group of fa to the group of gb 4. If the constructed graph hasacycle then noprecedence functions exist. When there are no cycles collect the length of the longest paths from the groups of fa and gbrespectively.
  • 73. Example:constructing operator precedencefunctions  Example: Grammar:  Step1: Create functions fa and gafor each athat is terminal or$ .i.e. a={+,∗,id} or $ 73 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
  • 74. Example:constructingoperator precedencefunctions  Step2: Partition the symbols in asmany asgroups possible,in such away that fa and gbare in the samegroup if a=b.  Step3: if a<·b, place an edge from the group of gbtothe group of fa if a·>b, place an edge from the group of fa to the group of gb 74 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
  • 75. Example:constructing operator precedencefunctions  From this graph we canextract the followingprecedence function table: Find the longest path to include all at once.  Example: Then count the edgesfor each  Hence, this table contains information asthe sameasto precedence relation table. And you can toimplement parsing using this table and stack likeprevious. 71 Chapter –3: SyntaxAnalysis BahirDarInstitute ofTechnology
  • 76. Shift-Reduce Parsers  The most prevalent type of bottom-up parser today is based on a concept called LR(k) parsing; left to right right-most  LR-Parsers covers wide range of grammars. • Simple LR parser (SLR ) • Look Ahead LR (LALR) • most general LR parser (LR )  SLR, LR and LALR work same, only their parsing tables are different. SLR k lookhead (k is omitted  it is 1) CFG LR LALR 76 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 77. LR Parsers 77 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  LR parsing is attractive because: • LR parsers can be constructed to recognize virtually all programming-language constructs for which context-free grammars can be written. • LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient. • The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive parsers. • LL(1)-Grammars  LR(1)-Grammars • An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right scan of the input.  Drawback of the LR method is that it is too much work to construct an LR parser by hand. • Use tools e.g. yacc
  • 78. An overview of LR Parsing model Sm Xm Sm-1 Xm-1 . . S1 X1 S0 a1 ... ai ... an $ Action Table Goto Table s t a t e S terminals and $ s t a t e s non-terminal four different actions will be applied each item is a state number LR ParsingAlgorithm stack input output
  • 79. LR parser configuration Behavior of an LR parser  describe the complete state of the parser. A configuration of an LR parser is a pair: (S0 X1 S1 X2 S2… Xm Sm , ai ai+1 … an $) 79 stack inputs This configuration represents the right-sentential form (X1 X2 … Xm , ai ai+1,…, an $) Xi is the grammar symbol represented by state Si. Note: S0 is on the top of the stack at the beginning of parsing
  • 80. Behavior of LR parser The parser program decides the next step by using: the top of the stack (Sm), the input symbol (ai), and the parsing table which has two parts: ACTION and GOTO. then consulting the entry ACTION[Sm , ai] in the parsing action table 1. If Action[Sm, ai] = shift S, the parser program shifts both the current input symbol ai and state S on the top of the stack, entering the configuration (S0 X1 S1 X2 S2 … Xm Sm ai S, ai+1 … an $) 80
  • 81. Behavior of LR parser… 2. Action[Sm, ai] = reduce A  β: the parser pops the first 2r symbols off the stack, where r = |β| (at this point, Sm-r will be the state on top of the stack), entering the configuration, (S0 X1 S1 X2 S2 … Xm-r Sm-r A S, ai ai+1 … an $) Then A and S are pushed on top of the stack where S = goto[Sm-r, A]. The input buffer is not modified. 3. Action[Sm, ai] = accept, parsing is completed. 4. Action[Sm, ai] = error, parsing has discovered an error and calls an error recovery routine. 81
  • 82. LR-parsing algorithm METHOD: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input buffer. Let a be the first symbol of w$; while(1) { /* repeat forever */ let S be the state on top of the stack; if ( ACTION[S, a] = shift t ) { push t onto the stack; let a be the next input symbol; } else if ( ACTION[S, a] = reduce Aβ) //reduce previous input symbol to head { pop |β| symbols off the stack; let state t now be on top of the stack; push GOTO[t, A] onto the stack; output the production Aβ; } else if ( ACTION[S, a] = accept ) break; /* parsing is done */ else call error- recovery routine; } 82 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 83. (SLR) Parsing Tables for Expression Grammar state id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 Action Table Goto Table Expression Grammar 1) E  E+T 2) E  T 3) T  T*F 4) T  F 5) F  (E) 6) F  id
  • 84. Steps for Constructing SLR Parsing Tables 84 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology 1. Augment G and produce G' (The purpose of this new starting production is to indicate to the parser when it should stop parsing and announce acceptance of the input) 2. Construct the canonical collection of set of items C for G' (finding closure/LR(0) items) 3. Compute goto(I, X), where, I is set of items and X is grammar symbol. 4. Construction of DFA using GOTO result and find FOLLOW set 5. Construct LR(0) parsing table using the DFA result and FOLLOW set
  • 85.  An LR(0) item of a grammar G is a production of G a dot at the some position of the right side. • Ex: A  aBb Possible LR(0) Items: (four different possibility) A  .aBb A  a.Bb A  aB.b A  aBb.  Sets of LR(0) items will be the states of action and goto table of the SLR parser. • i.e. States represent sets of "items.“  A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for constructing a deterministic finite automaton that is used to make parsing decisions. Such an automaton is called an LR(0) automaton.. Constructing SLR Parsing Tables – LR(0) Item  An LR parser makes shift-reduce decisions by maintaining states to keep track of where we are in a parse. 85 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 86. Constructing SLR Parsing Table:- closure & GOTO operation  closure operation:  If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I by two rules: 1. Initially, every item in I is added to closure(I). 2. If A→ α .Bβ is in closure(I) and B → γ is a production, then add the item B → .γ to I , if it is not already there. We apply this rule until no more new items can be added to closure(I).  Goto operation: Goto(I, X) (I is CLOURE set and X is all Grammar symbol) Goto(I, X) is defined to be the closure of the set of all items [A→ αX.β] such that [A→ α .Xβ] is in I. GOTO (I, X) will be performed based on the following Rules: 1. IfA->X.BY where B is a Terminal, including this item only in the CLOSURE (X) Item. 2. IfA->X.BY where B is a Non-Terminal including this item along with B's CLOSURE (B). 86 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 87. Algorithm for construction of SLR parsing Table  Input:An augmented grammar G’ Output: The SLR parsing table functions action and goto for G’ Method: 1. Construct C = {I0, I1, …. In}, collection of sets of LR(0) items for G’. 2. State i is constructed from Ii.. The parsing functions for state i are determined as follows: If [A→α∙aβ] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to “shift j”. Here a must be terminal. If [A→α∙] is in Ii , then set action[i,a] to “reduceA→α” for all a in FOLLOW(A). If [S’→S.] is in Ii, then set action[i,$] to “accept”.  If any conflicting actions are generated by the above rules, we say grammar is not SLR(1). 3. The goto transitions for state i are constructed for all non -terminalsAusing the rule : If goto(Ii,A) = Ij, then goto[i,A] = j. 4. All entries not defined by rules (2) and (3) are made “error” 5. The initial state of the parser is the one constructed from the set of items containing [S’→.S]. 87 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 88. Example for LR(0) parser  Example of LR(0): Let the grammar G1: 88 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 89. Example of LR(0) parsing Table  Step 4: Construct a DFA  NB: I1, I4, I5, I6 are called final items. They lead to fill the ‘reduce’/ri action in specific row of action part in a 89 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 90. Example of LR(0) parsing Table  Step 5: construct LR(0) parsing table First level the grammar with integer value i.e S→AA ---1 A→aA ---2 A→ b ---3 LR(0) parsing table NB: In the LR(0) construction table whenever any state having final item in that particular row of action part put Ri completely. egg. in row 4, put R3, 3 is a leveled number for production in G 90 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 91. Example of LR(0) parsing Table  Step 6: check the parser by implementing using stack for string abb$ 91 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 92. Example of SLR(1) parsing Table 92 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  Example of SLR(1) parsing table: use example the above grammar previously used in LR(0) example  To construct SLR(1) parser it is the same as to LR(0) parser but the difference is only construction of parsing table (that is to fill reduce part (Ri), we must use FOLLOW set.  If the input terminal belongs to FOLLOW set, fill the corresponding cell else not fill the corresponding cell). Hence apply the difference only. Find the FOLLOW set of each non terminal. SAA---1 AaA---2 Ab -----3  FOLLOW(S) ={$}, FOLLOW(A) ={$, a, b}  Ab. is final item. Then FOLLOW(A) ={$, a, b}, fill row 4 of action part with R3.  SAA. is final item. FOLLOW(S) ={$}, fill row 5.$ cell with R1.  AaA. is final item. FOLLOW(A) ={$, a, b}, fill row 6 of action part with R2.
  • 93. Example of SLR(1) parsing Table SLR(1) parsing table  Stack implementation is the same as to LR(0) 93 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 94. Example of SLR(1) parsing Table SLR(1) parsing table  Stack implementation is the same as to LR(0) 94 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology
  • 95. Constructing SLR parsing tables… Exercise: Construct the SLR parsing table for the following grammar and show the input string id * id + id is parsed: E  E + T E  T T  T * F T  F F  (E) F  id Answer: I = {[E’  .E]} Closure (I) = {[E’  .E], [E  .E + T], [E  .T], [T  .T * F], [T  .F], [F  .(E)], [F  .id]} 95
  • 96. Constructing SLR parsing tables… I0 = {[E’  .E], [E  .E + T], [E .T], [T .T * F], [T .F], [F .(E)], [F .id]} I1 = Goto (I0, E) = {[E’  E.], [E  E. + T]} I2 = Goto (I0, T) = {[E  T.], [T  T. * F]} I3 = Goto (I0, F) = {[T  F.]} I4 = Goto (I0, () = {[F  (.E)], [E  .E + T], [E  .T], [T  .T * F], [T  .F], [F  . (E)], [F  .id]} I5 = Goto (I0, id) = {[F  id.]} I6 = Goto (I1, +) = {[E  E + . T], [T  .T * F], [T  .F], [F  .(E)], [F  .id]} 96
  • 97. I7 = Goto (I2, *) = {[T T * . F], [F .(E)], [F  .id]} I8 = Goto (I4, E) = {[F (E.)], [E  E . + T]} Goto(I4,T)={[ET.], [TT.*F]}=I2; Goto(I4,F)={[TF.]}=I3; Goto (I4, () = I4; Goto (I4, id) = I5; I9 = Goto (I6, T) = {[E  E + T.], [T  T . * F]} Goto (I6, F) = I3; Goto (I6, () = I4; Goto (I6, id) = I5; I10 = Goto (I7, F) = {[T  T * F.]} Goto (I7, () = I4; Goto (I7, id) = I5; I11= Goto (I8, )) = {[F  (E).]} Goto (I8, +) = I6; Goto (I9, *) = I7; 97
  • 99. SLR table construction method… Construct the SLR parsing table for the grammar G1’ Follow (E) = {+, ), $} Follow (T) = {+, ), $, *} Follow (F) = {+, ), $,*} E’  E 1 E  E + T 2 E  T 3 T  T * F 4 T  F 5 F  (E) 6 F  id 99
  • 100. 100 State action goto id + * ( ) $ E T F 0 S5 S4 1 2 3 1 S6 accept 2 R2 S7 R2 R2 3 R4 R4 R4 R4 4 S5 S4 8 2 3 5 R6 R6 R6 R6 6 S5 S4 9 3 7 S5 S4 10 8 S6 S11 9 R1 S7 R1 R1 10 R3 R3 R3 R3 11 R5 R5 R5 R5
  • 101. SLR parser… How a shift/reduce parser parses an input string w = id * id + id using the parsing table shown above. 3-101
  • 102. LALR and CLR parser 10 Chapter – 3 : Syntax Analysis Bahir Dar Institute of Technology  NB: LR(0) and SLR(1) used LR(0) items to create a parsing table but LALR and CLR parsers used LR(1) items in order to construct a parsing table.  Reading assignment • LALR parser and • CLR parser