SlideShare une entreprise Scribd logo
1  sur  42
ISBN 0-321-49362-1
Chapter 4
Lexical and Syntax
Analysis
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-2
Chapter 4 Topics
• Introduction
• Lexical Analysis
• The Parsing Problem
• Recursive-Descent Parsing
• Bottom-Up Parsing
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-3
Introduction
• Language implementation systems must
analyze source code, regardless of the specific
implementation approach
• Nearly all syntax analysis is based on a formal
description of the syntax of the source language
(BNF)
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-4
Syntax Analysis
• The syntax analysis portion of a language
processor nearly always consists of two parts:
– A low-level part called a lexical analyzer
(mathematically, a finite automaton based on a
regular grammar)
– A high-level part called a syntax analyzer, or parser
(mathematically, a push-down automaton based on a
context-free grammar, or BNF)
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-5
Advantages of Using BNF to Describe
Syntax
• Provides a clear and concise syntax description
• The parser can be based directly on the BNF
• Parsers based on BNF are easy to maintain
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-6
Reasons to Separate Lexical and Syntax
Analysis
• Simplicity - less complex approaches can be
used for lexical analysis; separating them
simplifies the parser
• Efficiency - separation allows optimization of the
lexical analyzer
• Portability - parts of the lexical analyzer may not
be portable, but the parser always is portable
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-7
Lexical Analysis
• A lexical analyzer is a pattern matcher for
character strings
• A lexical analyzer is a “front-end” for the parser
• Identifies substrings of the source program that
belong together - lexemes
– Lexemes match a character pattern, which is
associated with a lexical category called a token
– sum is a lexeme; its token may be IDENT
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-8
Lexical Analysis (continued)
• The lexical analyzer is usually a function that is called by
the parser when it needs the next token
• Three approaches to building a lexical analyzer:
– Write a formal description of the tokens and use a software tool
that constructs table-driven lexical analyzers given such a
description
– Design a state diagram that describes the tokens and write a
program that implements the state diagram
– Design a state diagram that describes the tokens and hand-
construct a table-driven implementation of the state diagram
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-9
State Diagram Design
– A naïve state diagram would have a transition from
every state on every character in the source
language - such a diagram would be very large!
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-10
Lexical Analysis (cont.)
• In many cases, transitions can be combined to
simplify the state diagram
– When recognizing an identifier, all uppercase and
lowercase letters are equivalent
• Use a character class that includes all letters
– When recognizing an integer literal, all digits are
equivalent - use a digit class
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-11
Lexical Analysis (cont.)
• Reserved words and identifiers can be
recognized together (rather than having a part
of the diagram for each reserved word)
– Use a table lookup to determine whether a possible
identifier is in fact a reserved word
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-12
Lexical Analysis (cont.)
• Convenient utility subprograms:
– getChar - gets the next character of input, puts it in
nextChar, determines its class and puts the class in
charClass
– addChar - puts the character from nextChar into
the place the lexeme is being accumulated, lexeme
– lookup - determines whether the string in lexeme is
a reserved word (returns a code)
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-13
State Diagram
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-14
Lexical Analysis (cont.)
Implementation (assume initialization):
/* Global variables */
int charClass;
char lexeme [100];
char nextChar;
int lexLen;
int Letter = 0;
int DIGIT = 1;
int UNKNOWN = -1;
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-15
Lexical Analysis (cont.)
int lex() {
lexLen = 0;
static int first = 1;
/* If it is the first call to lex, initialize by calling getChar */
if (first) {
getChar();
first = 0;
}
getNonBlank();
switch (charClass) {
/* Parse identifiers and reserved words */
case LETTER:
addChar();
getChar();
while (charClass == LETTER || charClass == DIGIT){
addChar();
getChar();
}
return lookup(lexeme);
break;
…
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-16
Lexical Analysis (cont.)
…
/* Parse integer literals */
case DIGIT:
addChar();
getChar();
while (charClass == DIGIT) {
addChar();
getChar();
}
return INT_LIT;
break;
} /* End of switch */
} /* End of function lex */
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-17
The Parsing Problem
• Goals of the parser, given an input program:
– Find all syntax errors; for each, produce an
appropriate diagnostic message and recover quickly
– Produce the parse tree, or at least a trace of the
parse tree, for the program
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-18
The Parsing Problem (cont.)
• Two categories of parsers
– Top down - produce the parse tree, beginning at the
root
• Order is that of a leftmost derivation
• Traces or builds the parse tree in preorder
– Bottom up - produce the parse tree, beginning at the
leaves
• Order is that of the reverse of a rightmost derivation
• Useful parsers look only one token ahead in the
input
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-19
The Parsing Problem (cont.)
• Top-down Parsers
– Given a sentential form, xAα , the parser must
choose the correct A-rule to get the next sentential
form in the leftmost derivation, using only the first
token produced by A
• The most common top-down parsing algorithms:
– Recursive descent - a coded implementation
– LL parsers - table driven implementation
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-20
The Parsing Problem (cont.)
• Bottom-up parsers
– Given a right sentential form, α, determine what
substring of α is the right-hand side of the rule in the
grammar that must be reduced to produce the
previous sentential form in the right derivation
– The most common bottom-up parsing algorithms are
in the LR family
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-21
The Parsing Problem (cont.)
• The Complexity of Parsing
– Parsers that work for any unambiguous grammar are
complex and inefficient ( O(n3
), where n is the length
of the input )
– Compilers use parsers that only work for a subset of
all unambiguous grammars, but do it in linear time
( O(n), where n is the length of the input )
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-22
Recursive-Descent Parsing
• There is a subprogram for each nonterminal in
the grammar, which can parse sentences that
can be generated by that nonterminal
• EBNF is ideally suited for being the basis for a
recursive-descent parser, because EBNF
minimizes the number of nonterminals
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-23
Recursive-Descent Parsing (cont.)
• A grammar for simple expressions:
<expr> → <term> {(+ | -) <term>}
<term> → <factor> {(* | /) <factor>}
<factor> → id | ( <expr> )
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-24
Recursive-Descent Parsing (cont.)
• Assume we have a lexical analyzer named lex,
which puts the next token code in nextToken
• The coding process when there is only one
RHS:
– For each terminal symbol in the RHS, compare it with
the next input token; if they match, continue, else
there is an error
– For each nonterminal symbol in the RHS, call its
associated parsing subprogram
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-25
Recursive-Descent Parsing (cont.)
/* Function expr
Parses strings in the language
generated by the rule:
<expr> → <term> {(+ | -) <term>}
*/
void expr() {
/* Parse the first term */
term();
…
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-26
Recursive-Descent Parsing (cont.)
/* As long as the next token is + or -, call
lex to get the next token, and parse the
next term */
while (nextToken == PLUS_CODE ||
nextToken == MINUS_CODE){
lex();
term();
}
}
• This particular routine does not detect errors
• Convention: Every parsing routine leaves the next token
in nextToken
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-27
Recursive-Descent Parsing (cont.)
A nonterminal that has more than one RHS
requires an initial process to determine which
RHS it is to parse
– The correct RHS is chosen on the basis of the next
token of input (the lookahead)
– The next token is compared with the first token that
can be generated by each RHS until a match is found
– If no match is found, it is a syntax error
Note: the algorithm accordingly fails if the right hand
sides, of two productions with the same left hand
side, have terminal heads that are not disjoint.
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-28
Recursive-Descent Parsing (cont.)
/* Function factor
Parses strings in the language
generated by the rule:
<factor> -> id | (<expr>) */
void factor() {
/* Determine which RHS */
if (nextToken) == ID_CODE)
/* For the RHS id, just call lex */
lex();
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-29
Recursive-Descent Parsing (cont.)
/* If the RHS is (<expr>) – call lex to pass
over the left parenthesis, call expr, and
check for the right parenthesis */
else if (nextToken == LEFT_PAREN_CODE) {
lex();
expr();
if (nextToken == RIGHT_PAREN_CODE)
lex();
else
error();
} /* End of else if (nextToken == ... */
else error(); /* Neither RHS matches */
}
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-30
Recursive-Descent Parsing (cont.)
• The LL Grammar Class
– The Left Recursion Problem
• If a grammar has left recursion, either direct or indirect,
it cannot be the basis for a top-down parser
– A grammar can be modified to remove left recursion
For each nonterminal, A,
1. Group the A-rules as A → Aα1 | … | Aαm | β1 | β2 | … | βn
where none of the β‘s begins with A
2. Replace the original A-rules with
A → β1A’ | β2A’ | … | βnA’
A’ → α1A’ | α2A’ | … | αmA’ | ε
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-31
Recursive-Descent Parsing (cont.)
• The other characteristic of grammars that
disallows top-down parsing is the lack of
pairwise disjointness
– The inability to determine the correct RHS on the
basis of one token of lookahead
– Def: FIRST(α) = {a | α =>* aβ }
(If α =>* ε, ε is in FIRST(α))
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-32
Recursive-Descent Parsing (cont.)
• Pairwise Disjointness Test:
– For each nonterminal, A, in the grammar that has
more than one RHS, for each pair of rules, A → αi
and A → αj, it must be true that
FIRST(αi) ⋂ FIRST(αj) = φ
• Examples:
A → a | bB | cAb
A → a | aB
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-33
Bottom-up Parsing
• The parsing problem is finding the correct RHS
in a right-sentential form to reduce to get the
previous right-sentential form in the derivation
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-34
Bottom-up Parsing (cont.)
• Shift-Reduce Algorithms
– Reduce is the action of replacing the right hand side
on the top of the parse stack with its corresponding
LHS
– Shift is the action of moving the next token to the top
of the parse stack
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-35
Bottom-up Parsing (cont.)
• Advantages of LR parsers:
– They will work for nearly all grammars that describe
programming languages.
– They work on a larger class of grammars than other
bottom-up algorithms, but are as efficient as any
other bottom-up parser.
– They can detect syntax errors as soon as it is
possible.
– The LR class of grammars is a superset of the class
parsable by LL parsers.
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-36
Bottom-up Parsing (cont.)
• LR parsers must be constructed with a tool
• Knuth’s insight: A bottom-up parser could use
the entire history of the parse, up to the current
point, to make parsing decisions
– There were only a finite and relatively small number
of different parse situations that could have occurred,
so the history could be stored in a parser state, on
the parse stack
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-37
Bottom-up Parsing (cont.)
• LR parsers are table driven, where the table
has two components, an ACTION table and a
GOTO table
– The ACTION table specifies the action of the
parser, given the parser state and the next token
• Rows are state names; columns are terminals
– The GOTO table specifies which state to put on
top of the parse stack after a reduction action is
done
• Rows are state names; columns are nonterminals
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-38
Bottom-up Parsing (cont.)
• Parser actions (continued):
– If ACTION[Sm, ai] = Accept, the parse is complete and
no errors were found.
– If ACTION[Sm, ai] = Error, the parser calls an error-
handling routine.
The next slide shows the parsing table for the
grammar
1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → ( E )
6. F → id
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-40
LR Parsing Table
S4
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-41
Bottom-up Parsing (cont.)
• A parser table can be generated from a given
grammar with a tool, e.g., yacc
Copyright © 2007 Addison-
Wesley. All rights reserved. 1-42
Summary
• Syntax analysis is a common part of language
implementation
• A lexical analyzer is a pattern matcher that isolates
small-scale parts of a program
– Detects syntax errors
– Produces a parse tree
• A recursive-descent parser is an LL parser
– EBNF
• Parsing problem for bottom-up parsers: find the
substring of current sentential form
• The LR family of shift-reduce parsers is the most
common bottom-up parsing approach

Contenu connexe

Tendances

Research in translation studies
Research in translation studiesResearch in translation studies
Research in translation studiesSugeng Hariyanto
 
M1 lesson 1.2 slides
M1 lesson 1.2 slidesM1 lesson 1.2 slides
M1 lesson 1.2 slidesAnh Le
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler DesignKuppusamy P
 
Ambiguities in nlp
Ambiguities in nlpAmbiguities in nlp
Ambiguities in nlpRaza Azeem
 
Morphological Analysis
Morphological AnalysisMorphological Analysis
Morphological AnalysisAkshat Pandey
 
Fundamentals of Language Processing
Fundamentals of Language ProcessingFundamentals of Language Processing
Fundamentals of Language ProcessingHemant Sharma
 
Morpheme, morphological analysis and morphemic analysis
Morpheme, morphological analysis and morphemic analysisMorpheme, morphological analysis and morphemic analysis
Morpheme, morphological analysis and morphemic analysissyerencs
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol tablenadarmispapaulraj
 
Discourse historical approach
Discourse historical approachDiscourse historical approach
Discourse historical approachJannatQamar
 
Code switching &; code mixing
Code switching &; code mixingCode switching &; code mixing
Code switching &; code mixingYoushaib Alam
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfDeptii Chaudhari
 
Expression and Operartor In C Programming
Expression and Operartor In C Programming Expression and Operartor In C Programming
Expression and Operartor In C Programming Kamal Acharya
 

Tendances (20)

Research in translation studies
Research in translation studiesResearch in translation studies
Research in translation studies
 
M1 lesson 1.2 slides
M1 lesson 1.2 slidesM1 lesson 1.2 slides
M1 lesson 1.2 slides
 
Wordnet
WordnetWordnet
Wordnet
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler Design
 
Parsing
ParsingParsing
Parsing
 
First and follow set
First and follow setFirst and follow set
First and follow set
 
Parsing
ParsingParsing
Parsing
 
Ambiguities in nlp
Ambiguities in nlpAmbiguities in nlp
Ambiguities in nlp
 
Media discourse
Media discourseMedia discourse
Media discourse
 
Morphological Analysis
Morphological AnalysisMorphological Analysis
Morphological Analysis
 
Fundamentals of Language Processing
Fundamentals of Language ProcessingFundamentals of Language Processing
Fundamentals of Language Processing
 
Morpheme, morphological analysis and morphemic analysis
Morpheme, morphological analysis and morphemic analysisMorpheme, morphological analysis and morphemic analysis
Morpheme, morphological analysis and morphemic analysis
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol table
 
Discourse historical approach
Discourse historical approachDiscourse historical approach
Discourse historical approach
 
Code switching &; code mixing
Code switching &; code mixingCode switching &; code mixing
Code switching &; code mixing
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdf
 
Expression and Operartor In C Programming
Expression and Operartor In C Programming Expression and Operartor In C Programming
Expression and Operartor In C Programming
 
Language and human rights
Language and human rightsLanguage and human rights
Language and human rights
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 

En vedette

5 Names, bindings,Typechecking and Scopes
5 Names, bindings,Typechecking and Scopes5 Names, bindings,Typechecking and Scopes
5 Names, bindings,Typechecking and ScopesMunawar Ahmed
 
CH # 1 preliminaries
CH # 1 preliminariesCH # 1 preliminaries
CH # 1 preliminariesMunawar Ahmed
 
2 evolution of the major
2 evolution of the major2 evolution of the major
2 evolution of the majorMunawar Ahmed
 
7 expressions and assignment statements
7 expressions and assignment statements7 expressions and assignment statements
7 expressions and assignment statementsMunawar Ahmed
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to coursenikit meshram
 
4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysisjigeno
 
Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design BasicsAkhil Kaushik
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 

En vedette (20)

Introduction
IntroductionIntroduction
Introduction
 
3 describing syntax
3 describing syntax3 describing syntax
3 describing syntax
 
5 Names, bindings,Typechecking and Scopes
5 Names, bindings,Typechecking and Scopes5 Names, bindings,Typechecking and Scopes
5 Names, bindings,Typechecking and Scopes
 
9 subprograms
9 subprograms9 subprograms
9 subprograms
 
CH # 1 preliminaries
CH # 1 preliminariesCH # 1 preliminaries
CH # 1 preliminaries
 
8 statement level
8 statement level8 statement level
8 statement level
 
Java Variable Storage
Java Variable StorageJava Variable Storage
Java Variable Storage
 
2 evolution of the major
2 evolution of the major2 evolution of the major
2 evolution of the major
 
7 expressions and assignment statements
7 expressions and assignment statements7 expressions and assignment statements
7 expressions and assignment statements
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
 
Complier designer
Complier designerComplier designer
Complier designer
 
6 data types
6 data types6 data types
6 data types
 
4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysis
 
LR Parsing
LR ParsingLR Parsing
LR Parsing
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
 
Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design Basics
 
Module 11
Module 11Module 11
Module 11
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical Approach
Lexical ApproachLexical Approach
Lexical Approach
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 

Similaire à 4 lexical and syntax

Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.pptsivaganesh293
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.pptsivaganesh293
 
Compier Design_Unit I_SRM.ppt
Compier Design_Unit I_SRM.pptCompier Design_Unit I_SRM.ppt
Compier Design_Unit I_SRM.pptApoorv Diwan
 
Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01riddhi viradiya
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical AnalyzerArchana Gopinath
 
Lexical Analyzer Implementation
Lexical Analyzer ImplementationLexical Analyzer Implementation
Lexical Analyzer ImplementationAkhil Kaushik
 
Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Avelin Huo
 
Using Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUUsing Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUSkills Matter
 

Similaire à 4 lexical and syntax (20)

sabesta.ppt
sabesta.pptsabesta.ppt
sabesta.ppt
 
Lecture 6.pdf
Lecture 6.pdfLecture 6.pdf
Lecture 6.pdf
 
sabesta3.ppt
sabesta3.pptsabesta3.ppt
sabesta3.ppt
 
UNIT 1 part II.ppt
UNIT 1 part II.pptUNIT 1 part II.ppt
UNIT 1 part II.ppt
 
Parsing
ParsingParsing
Parsing
 
1 compiler outline
1 compiler outline1 compiler outline
1 compiler outline
 
Control structure
Control structureControl structure
Control structure
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
 
Unit1.ppt
Unit1.pptUnit1.ppt
Unit1.ppt
 
Compier Design_Unit I_SRM.ppt
Compier Design_Unit I_SRM.pptCompier Design_Unit I_SRM.ppt
Compier Design_Unit I_SRM.ppt
 
Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
Lexical Analyzers and Parsers
Lexical Analyzers and ParsersLexical Analyzers and Parsers
Lexical Analyzers and Parsers
 
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
 
Lexical Analyzer Implementation
Lexical Analyzer ImplementationLexical Analyzer Implementation
Lexical Analyzer Implementation
 
Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03
 
Using Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUUsing Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPU
 

Dernier

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Dernier (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

4 lexical and syntax

  • 2. Copyright © 2007 Addison- Wesley. All rights reserved. 1-2 Chapter 4 Topics • Introduction • Lexical Analysis • The Parsing Problem • Recursive-Descent Parsing • Bottom-Up Parsing
  • 3. Copyright © 2007 Addison- Wesley. All rights reserved. 1-3 Introduction • Language implementation systems must analyze source code, regardless of the specific implementation approach • Nearly all syntax analysis is based on a formal description of the syntax of the source language (BNF)
  • 4. Copyright © 2007 Addison- Wesley. All rights reserved. 1-4 Syntax Analysis • The syntax analysis portion of a language processor nearly always consists of two parts: – A low-level part called a lexical analyzer (mathematically, a finite automaton based on a regular grammar) – A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF)
  • 5. Copyright © 2007 Addison- Wesley. All rights reserved. 1-5 Advantages of Using BNF to Describe Syntax • Provides a clear and concise syntax description • The parser can be based directly on the BNF • Parsers based on BNF are easy to maintain
  • 6. Copyright © 2007 Addison- Wesley. All rights reserved. 1-6 Reasons to Separate Lexical and Syntax Analysis • Simplicity - less complex approaches can be used for lexical analysis; separating them simplifies the parser • Efficiency - separation allows optimization of the lexical analyzer • Portability - parts of the lexical analyzer may not be portable, but the parser always is portable
  • 7. Copyright © 2007 Addison- Wesley. All rights reserved. 1-7 Lexical Analysis • A lexical analyzer is a pattern matcher for character strings • A lexical analyzer is a “front-end” for the parser • Identifies substrings of the source program that belong together - lexemes – Lexemes match a character pattern, which is associated with a lexical category called a token – sum is a lexeme; its token may be IDENT
  • 8. Copyright © 2007 Addison- Wesley. All rights reserved. 1-8 Lexical Analysis (continued) • The lexical analyzer is usually a function that is called by the parser when it needs the next token • Three approaches to building a lexical analyzer: – Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description – Design a state diagram that describes the tokens and write a program that implements the state diagram – Design a state diagram that describes the tokens and hand- construct a table-driven implementation of the state diagram
  • 9. Copyright © 2007 Addison- Wesley. All rights reserved. 1-9 State Diagram Design – A naïve state diagram would have a transition from every state on every character in the source language - such a diagram would be very large!
  • 10. Copyright © 2007 Addison- Wesley. All rights reserved. 1-10 Lexical Analysis (cont.) • In many cases, transitions can be combined to simplify the state diagram – When recognizing an identifier, all uppercase and lowercase letters are equivalent • Use a character class that includes all letters – When recognizing an integer literal, all digits are equivalent - use a digit class
  • 11. Copyright © 2007 Addison- Wesley. All rights reserved. 1-11 Lexical Analysis (cont.) • Reserved words and identifiers can be recognized together (rather than having a part of the diagram for each reserved word) – Use a table lookup to determine whether a possible identifier is in fact a reserved word
  • 12. Copyright © 2007 Addison- Wesley. All rights reserved. 1-12 Lexical Analysis (cont.) • Convenient utility subprograms: – getChar - gets the next character of input, puts it in nextChar, determines its class and puts the class in charClass – addChar - puts the character from nextChar into the place the lexeme is being accumulated, lexeme – lookup - determines whether the string in lexeme is a reserved word (returns a code)
  • 13. Copyright © 2007 Addison- Wesley. All rights reserved. 1-13 State Diagram
  • 14. Copyright © 2007 Addison- Wesley. All rights reserved. 1-14 Lexical Analysis (cont.) Implementation (assume initialization): /* Global variables */ int charClass; char lexeme [100]; char nextChar; int lexLen; int Letter = 0; int DIGIT = 1; int UNKNOWN = -1;
  • 15. Copyright © 2007 Addison- Wesley. All rights reserved. 1-15 Lexical Analysis (cont.) int lex() { lexLen = 0; static int first = 1; /* If it is the first call to lex, initialize by calling getChar */ if (first) { getChar(); first = 0; } getNonBlank(); switch (charClass) { /* Parse identifiers and reserved words */ case LETTER: addChar(); getChar(); while (charClass == LETTER || charClass == DIGIT){ addChar(); getChar(); } return lookup(lexeme); break; …
  • 16. Copyright © 2007 Addison- Wesley. All rights reserved. 1-16 Lexical Analysis (cont.) … /* Parse integer literals */ case DIGIT: addChar(); getChar(); while (charClass == DIGIT) { addChar(); getChar(); } return INT_LIT; break; } /* End of switch */ } /* End of function lex */
  • 17. Copyright © 2007 Addison- Wesley. All rights reserved. 1-17 The Parsing Problem • Goals of the parser, given an input program: – Find all syntax errors; for each, produce an appropriate diagnostic message and recover quickly – Produce the parse tree, or at least a trace of the parse tree, for the program
  • 18. Copyright © 2007 Addison- Wesley. All rights reserved. 1-18 The Parsing Problem (cont.) • Two categories of parsers – Top down - produce the parse tree, beginning at the root • Order is that of a leftmost derivation • Traces or builds the parse tree in preorder – Bottom up - produce the parse tree, beginning at the leaves • Order is that of the reverse of a rightmost derivation • Useful parsers look only one token ahead in the input
  • 19. Copyright © 2007 Addison- Wesley. All rights reserved. 1-19 The Parsing Problem (cont.) • Top-down Parsers – Given a sentential form, xAα , the parser must choose the correct A-rule to get the next sentential form in the leftmost derivation, using only the first token produced by A • The most common top-down parsing algorithms: – Recursive descent - a coded implementation – LL parsers - table driven implementation
  • 20. Copyright © 2007 Addison- Wesley. All rights reserved. 1-20 The Parsing Problem (cont.) • Bottom-up parsers – Given a right sentential form, α, determine what substring of α is the right-hand side of the rule in the grammar that must be reduced to produce the previous sentential form in the right derivation – The most common bottom-up parsing algorithms are in the LR family
  • 21. Copyright © 2007 Addison- Wesley. All rights reserved. 1-21 The Parsing Problem (cont.) • The Complexity of Parsing – Parsers that work for any unambiguous grammar are complex and inefficient ( O(n3 ), where n is the length of the input ) – Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time ( O(n), where n is the length of the input )
  • 22. Copyright © 2007 Addison- Wesley. All rights reserved. 1-22 Recursive-Descent Parsing • There is a subprogram for each nonterminal in the grammar, which can parse sentences that can be generated by that nonterminal • EBNF is ideally suited for being the basis for a recursive-descent parser, because EBNF minimizes the number of nonterminals
  • 23. Copyright © 2007 Addison- Wesley. All rights reserved. 1-23 Recursive-Descent Parsing (cont.) • A grammar for simple expressions: <expr> → <term> {(+ | -) <term>} <term> → <factor> {(* | /) <factor>} <factor> → id | ( <expr> )
  • 24. Copyright © 2007 Addison- Wesley. All rights reserved. 1-24 Recursive-Descent Parsing (cont.) • Assume we have a lexical analyzer named lex, which puts the next token code in nextToken • The coding process when there is only one RHS: – For each terminal symbol in the RHS, compare it with the next input token; if they match, continue, else there is an error – For each nonterminal symbol in the RHS, call its associated parsing subprogram
  • 25. Copyright © 2007 Addison- Wesley. All rights reserved. 1-25 Recursive-Descent Parsing (cont.) /* Function expr Parses strings in the language generated by the rule: <expr> → <term> {(+ | -) <term>} */ void expr() { /* Parse the first term */ term(); …
  • 26. Copyright © 2007 Addison- Wesley. All rights reserved. 1-26 Recursive-Descent Parsing (cont.) /* As long as the next token is + or -, call lex to get the next token, and parse the next term */ while (nextToken == PLUS_CODE || nextToken == MINUS_CODE){ lex(); term(); } } • This particular routine does not detect errors • Convention: Every parsing routine leaves the next token in nextToken
  • 27. Copyright © 2007 Addison- Wesley. All rights reserved. 1-27 Recursive-Descent Parsing (cont.) A nonterminal that has more than one RHS requires an initial process to determine which RHS it is to parse – The correct RHS is chosen on the basis of the next token of input (the lookahead) – The next token is compared with the first token that can be generated by each RHS until a match is found – If no match is found, it is a syntax error Note: the algorithm accordingly fails if the right hand sides, of two productions with the same left hand side, have terminal heads that are not disjoint.
  • 28. Copyright © 2007 Addison- Wesley. All rights reserved. 1-28 Recursive-Descent Parsing (cont.) /* Function factor Parses strings in the language generated by the rule: <factor> -> id | (<expr>) */ void factor() { /* Determine which RHS */ if (nextToken) == ID_CODE) /* For the RHS id, just call lex */ lex();
  • 29. Copyright © 2007 Addison- Wesley. All rights reserved. 1-29 Recursive-Descent Parsing (cont.) /* If the RHS is (<expr>) – call lex to pass over the left parenthesis, call expr, and check for the right parenthesis */ else if (nextToken == LEFT_PAREN_CODE) { lex(); expr(); if (nextToken == RIGHT_PAREN_CODE) lex(); else error(); } /* End of else if (nextToken == ... */ else error(); /* Neither RHS matches */ }
  • 30. Copyright © 2007 Addison- Wesley. All rights reserved. 1-30 Recursive-Descent Parsing (cont.) • The LL Grammar Class – The Left Recursion Problem • If a grammar has left recursion, either direct or indirect, it cannot be the basis for a top-down parser – A grammar can be modified to remove left recursion For each nonterminal, A, 1. Group the A-rules as A → Aα1 | … | Aαm | β1 | β2 | … | βn where none of the β‘s begins with A 2. Replace the original A-rules with A → β1A’ | β2A’ | … | βnA’ A’ → α1A’ | α2A’ | … | αmA’ | ε
  • 31. Copyright © 2007 Addison- Wesley. All rights reserved. 1-31 Recursive-Descent Parsing (cont.) • The other characteristic of grammars that disallows top-down parsing is the lack of pairwise disjointness – The inability to determine the correct RHS on the basis of one token of lookahead – Def: FIRST(α) = {a | α =>* aβ } (If α =>* ε, ε is in FIRST(α))
  • 32. Copyright © 2007 Addison- Wesley. All rights reserved. 1-32 Recursive-Descent Parsing (cont.) • Pairwise Disjointness Test: – For each nonterminal, A, in the grammar that has more than one RHS, for each pair of rules, A → αi and A → αj, it must be true that FIRST(αi) ⋂ FIRST(αj) = φ • Examples: A → a | bB | cAb A → a | aB
  • 33. Copyright © 2007 Addison- Wesley. All rights reserved. 1-33 Bottom-up Parsing • The parsing problem is finding the correct RHS in a right-sentential form to reduce to get the previous right-sentential form in the derivation
  • 34. Copyright © 2007 Addison- Wesley. All rights reserved. 1-34 Bottom-up Parsing (cont.) • Shift-Reduce Algorithms – Reduce is the action of replacing the right hand side on the top of the parse stack with its corresponding LHS – Shift is the action of moving the next token to the top of the parse stack
  • 35. Copyright © 2007 Addison- Wesley. All rights reserved. 1-35 Bottom-up Parsing (cont.) • Advantages of LR parsers: – They will work for nearly all grammars that describe programming languages. – They work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser. – They can detect syntax errors as soon as it is possible. – The LR class of grammars is a superset of the class parsable by LL parsers.
  • 36. Copyright © 2007 Addison- Wesley. All rights reserved. 1-36 Bottom-up Parsing (cont.) • LR parsers must be constructed with a tool • Knuth’s insight: A bottom-up parser could use the entire history of the parse, up to the current point, to make parsing decisions – There were only a finite and relatively small number of different parse situations that could have occurred, so the history could be stored in a parser state, on the parse stack
  • 37. Copyright © 2007 Addison- Wesley. All rights reserved. 1-37 Bottom-up Parsing (cont.) • LR parsers are table driven, where the table has two components, an ACTION table and a GOTO table – The ACTION table specifies the action of the parser, given the parser state and the next token • Rows are state names; columns are terminals – The GOTO table specifies which state to put on top of the parse stack after a reduction action is done • Rows are state names; columns are nonterminals
  • 38. Copyright © 2007 Addison- Wesley. All rights reserved. 1-38 Bottom-up Parsing (cont.) • Parser actions (continued): – If ACTION[Sm, ai] = Accept, the parse is complete and no errors were found. – If ACTION[Sm, ai] = Error, the parser calls an error- handling routine.
  • 39. The next slide shows the parsing table for the grammar 1. E → E + T 2. E → T 3. T → T * F 4. T → F 5. F → ( E ) 6. F → id
  • 40. Copyright © 2007 Addison- Wesley. All rights reserved. 1-40 LR Parsing Table S4
  • 41. Copyright © 2007 Addison- Wesley. All rights reserved. 1-41 Bottom-up Parsing (cont.) • A parser table can be generated from a given grammar with a tool, e.g., yacc
  • 42. Copyright © 2007 Addison- Wesley. All rights reserved. 1-42 Summary • Syntax analysis is a common part of language implementation • A lexical analyzer is a pattern matcher that isolates small-scale parts of a program – Detects syntax errors – Produces a parse tree • A recursive-descent parser is an LL parser – EBNF • Parsing problem for bottom-up parsers: find the substring of current sentential form • The LR family of shift-reduce parsers is the most common bottom-up parsing approach