Lecture 3 RE NFA DFA

Compilers(CPL5316)
Software Engineering
Koya university
2017-2018
Lecture 3 :
Lexical analysers
Compilers (cpl5316) Page 1 Lectured by : Rebaz Najeeb

Outline
Lexical analysis
Implementation of Regular Expression
RE  NFA DFA Tables
Non-deterministic Finite Automata (NFA)
Converting a RE to NFA
Deterministic Finite Automata ( DFA)
Converting NFA to DFA
Converting RE to DFA directly

Compiler phases
1. Lexical analysis
2. Parsing
3. Semantic analysis
4. Optimization
5. Code Generation
Source code
Target code

Lexical analysis
 Lexical analysis: reads the input characters of the source program as taken from
preprocessors , and group them into lexemes, and produce as output a sequence
of tokens for each lexeme in the source program.
 Roles of lexical analyzer
 Breaks source program into small lexical units , and produces tokens
 Remove white space and comments
 If there is any invalid token, it generates an error

Dividing source code
Human format Lexical analyzer format
• Divide the program into lexical units
if (i==3)
X=0;
else
X=1;
tif (i==3)nttX=0;ntelsenttX=1;

Grouping (classifying)lexemes
• In English
• Verb , Noun, Adj, Adv.
• In Programming language
• Keywords, Identifier, operators, assignment, semicolon
• Token = <token name , attribute value>
• Example of creating class token
int a = 3;
<keyword, int> <identifier, a> <assignment,=>
<constant, 3> <symbol,;>
Token class

Token classes
• Token classes correspond to set of strings, such as followings
• Identifiers : String of letters or digits start with letters
• Identifier = (letter)(letter | digit)*
• Integers : non-empty digit of strings.
• integers= (sign)?(digit)+
• Keywords : fixed set of reserved words
• Else , if , for , while , do.
• Whitespace : blanks, newlines, tabs

Lexical analyzer
a= 3; <id,a>
<op,=>
<int,3>
<symb,;>
<Class, String>

Regular expression
letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]
Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*
Float = (sign)? (digit)+ (.digit)*
Odd number ? Alphabets {0,1}
Email ? Website URL?

Observation
• Many regular expressions can have exactly the same meaning
• 0* == 0+ 0* == ɛ + 0*
• Meaning function is many-to-one
• 0 1 2 3 4 5
• 0 1 10 11 100 101
• I II III IV V
Syntax Semantic
optimization
Ambiguity

Finite state automata (FSA)
⌐ There are two main kinds of finite state automata:
i. NFAs (Non-Deterministic Finite Automata): at a particular state , a unique path may
not be determined for each input alphabet.
ii. DFAs (Deterministic Finite Automata) : at a particular state , a unique path
determined for each input alphabet.
⌐ For every nondeterministic automata, there is an equivalent deterministic automata.
Computations && Compilers (CS33) # 11 Lectured by : Rebaz NajeebComputations && Compilers (CS33) # 11 Lectured by : Rebaz Najeeb
q0 q1 q2 q3 q4
a b c a
e
e
c
– The above NFA is equivalent to the regular expression /ab*ca?/.

NFAs (Non-Deterministic Finite Automata)
⌐ In a nondeterministic finite automaton (NFA), for each state there can be zero, one,
two, or more transitions corresponding to a particular symbol.
⌐ Only NFA state automaton can have an e transition.
⌐ the procedure is like: RE  NFA DFA Tables
Computations && Compilers (CS33) # 12 Lectured by : Rebaz Najeeb

Implementation of RE

RE to NFA using Thomson’s Construction
a
b
a:
b:
(a | b)
a
b
e
e
e
e
e
e
e
e
a
b
e
e e
(a|b) *
e
e
e
e
e
e
a
b
e e
e
a(a|b) *a
(a|b)* a

NFA example 1
∑={a,b}
S0
S1
S2 S3

NFA example 2
⌐ ∑={0,1} , Construct DFA to accept 00(0+1)*
p
0 0
0
1
s q s p q
0 0
1
Ǿ
1
0,1
0, 1
NFA DFA

e
e
e
e
a
b
e e a0 1
3
5
2
4
7 86
e
S0 = e-closure({0}) = {0,1,2,4,7} S0 into DS as an unmarked state
 mark S0
e-closure(move(S0,a)) = e-closure({3,8}) = {1,2,3,4,6,7,8} = S1 S1 into DS
e-closure(move(S0,b)) = e-closure({5}) = {1,2,4,5,6,7} = S2 S2 into DS
transfunc[S0,a]  S1 transfunc[S0,b]  S2
 mark S1
e-closure(move(S1,a)) = e-closure({3,8}) = {1,2,3,4,6,7,8} = S1
e-closure(move(S1,b)) = e-closure({5}) = {1,2,4,5,6,7} = S2
 mark S2
e-closure(move(S2,a)) = e-closure({3,8}) = {1,2,3,4,6,7,8} = S1
e-closure(move(S2,b)) = e-closure({5}) = {1,2,4,5,6,7} = S2
1- Creating start state with e-closure({0}
2- Move start state with input alphabets
Then find e-closure({new set }
3- Repeat the procedure until there will be
No more moves with input alphabets

b a
a
b
b
3
a
2
1

Converting NFA to DFA using table

NFA to DFA with table
⌐ ∑={a,b} , L=any String starts with a
1- Generate NFA.
2- Convert NFA to DFA using table.
What if L=any string ends with a ?

RE to DFA directly steps
Create augmented RegEx
and number the alphabets
Create annotated syntax tree and
Label the tree
Find firstPos and Lastpos
Then followPos
Derive DFA from followPos
Table
1- 2-
3- 4-

RE to DFA directly
We may convert a regular expression into a DFA (without creating a NFA first).
1. First we augment the given regular expression by concatenating it with a special symbol #.
r → (r)# augmented regular expression
2. Then, construct a syntax tree from the augmented regular expression (r)#
3. Leaves in a syntax tree are labeled by an alphabet symbols (plus # ) or by the empty string, and inner
nodes will be the operators in the augmented regular expression.
4. Then each alphabet symbol (plus #) will be numbered (position numbers).
5. Finally, compute four functions: nullable, firstpos, lastpos and followpos.

Building syntax tree
• Example
(a|b)*abb
(a|b)*abb#
a b
21
* a
3
b
4
b
5
#
6
position
number
(for leafs)
Concatenation
Or Cat-nodes
Closure
Or Star
Alternation
Or Union
augmented regular expression
1 2 3 4 5 6
Step 1
Step 2

Functions
• There are four functions have to be computed from syntax tree
1. Nullable(n): is true for a syntax tree node n if the subexpression represented
by n has ԑ in its languages.
2. Firstpos(n): is the set of the positions in the subtree that correspond to the
first symbols of strings generated by the sub-expression rooted by n.
3. Lastpos(n): is the set of the positions in the subtree that correspond to the
last symbols of strings generated by the sub-expression rooted by n.
4. Followpos(i): is the set of positions that can follow the position i in the tree
in the strings generated by the augmented regular expression.

Computing (Nullable, Firstpos, Lastpos)

Example of the functions
҂ (a|b)* a
҂ nullable(n)=false
҂ firstpos(n)={1,2,3}
҂ lastpos(n)={3}
҂ followpos(1)={1,2,3}
n
1 2 3

Annotated syntax tree
{6}{1, 2, 3}
{5}{1, 2, 3}
{4}{1, 2, 3}
{3}{1, 2, 3}
{1, 2}{1, 2}
*
{1, 2}{1, 2} |
{1}{1} a {2}{2} b
{3}{3} a
{4}{4} b
{5}{5} b
{6}{6} #
nullable
firstpos lastpos
1 2
3
4
5
6
(a|b)*abb#
1 2 3 4 5 6
Step 3 - A

Finding FollowPos
Followpos can be computed as following
• (rule 1) if n is a cat-node c1 c2
for every position i in lastpos(c1), then
all positions in firstpos(c2) are in followpos(i)
• (rule 2) if n is a star-node
if i is a position in lastpos(n), then
all positions in firstpos(n) are in followpos(i)
C1 C2F(C1) F(C2) L(C2)L(C1)
followpos
*F(n) L(n)
followpos

Followpos example
• Applying rule 1
• followpos(1) incl.{3}
• Applying rule 2
• followpos(1) incl.{1,2}
• followpos(2) incl.{1,2}
{6}{1, 2, 3}
{5}{1, 2, 3}
{4}{1, 2, 3}
{3}{1, 2, 3}
{1, 2}{1, 2}
*
{1, 2}{1, 2} |
{1}{1} a {2}{2} b
{3}{3} a
{4}{4} b
{5}{5} b
{6}{6} #
1 2
3
4
5
6
(a|b)*abb#1 2 3 4 5 6
Step 3- B

A=firstpos(n0)={1,2,3}
Move[A,a]=
followpos(1) U followpos(3)= {1,2,3,4}=B
o Move[A,b]=
followpos(2)={1,2,3}=A
o Move[B,a]=
followpos(1) U followpos(3)=B
o Move[B,b]=
followpos(2) U followpos(4)={1,2,3,5}=C
RE to DFA
1,2,3
start a 1,2,
3,4
1,2,
3,6
1,2,
3,5
b b
b b
a
a
a
Node followpos
1 {1, 2, 3}
2 {1, 2, 3}
3 {4}
4 {5}
5 {6}
6 -
(a|b)*abb#
1 2 3 4 5 6
Step 4

Minimizing Number of States of a DFA
• partition the set of states into two groups:
– G1 : set of accepting states
– G2 : set of non-accepting states
• For each new group G
– partition G into subgroups such that states s1 and s2 are in the same group iff
for all input symbols a, states s1 and s2 have transitions to states in the same group.
• Start state of the minimized DFA is the group containing
the start state of the original DFA.
• Accepting states of the minimized DFA are the groups containing
the accepting states of the original DFA.

Minimizing DFA - example
b a
a
b
b
3
a
2
1
G1 = {2}
G2 = {1,3}
G2 cannot be partitioned because
move(1,a)=2
move(3,a)=2
move(1,b)=3
move(2,b)=3
So, the minimized DFA (with minimum states)
{1,3}
a
a
b
b
{2}

Lecture 3 RE NFA DFA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lecture 3 RE NFA DFA

Similar to Lecture 3 RE NFA DFA (20)

Recently uploaded

Recently uploaded (20)

Lecture 3 RE NFA DFA