SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Compiler design
1
Structure of a Compiler 2
Compiler
Source code
Errors
Object code
Conceptual Structure: two major phases
 Front-end performs the analysis of the source language:
 Recognises legal and illegal programs and reports errors.
 “understands” the input program and collects its semantics in an IR.
 Produces IR and shapes the code for the back-end.
 Back-end does the target language synthesis:
 Translates IR into target code.
3
Front-End Back-EndIntermediate
Representation
Source code Target code
General Structure of a compiler
4
Source
Intermediate code generat.
tokens
Intermediate Representation
Syntax Tree
Target Machine code
Lexical Analysis
Semantic Analysis
Syntax Analysis
I.C.Optimisation
Code Generation
Target code Optimisation
Syntax Tree
Intermediate Representation
Error
Handler
Symbol Table
Manager
Lexical Analysis (Scanning)
 Reads characters in the source program and groups them into words -
lexeme (basic unit of syntax)
 Produces words and recognises what sort they are.
 The output is called token and is a pair of the form
<token_name, attribute value>
 Eg: Position = initial + rate * 60
 keep a symbol table.
 Lexical analysis eliminates white spaces
 Speed is important - use a specialised tool: e.g., flex - a tool for generating
scanners: programs which recognise lexical patterns in text;
5
Syntax (or syntactic) Analysis (Parsing)
 Imposes a hierarchical structure on the token stream.
 Use output of lexical analyzer to create a tree like structure of the token stream –
syntax tree.
 The tree shows the order in which operations are performed
 Ordering based on the precedence of mathematical operations.
6
<id,3
>
60
=
=
=
<id,1
>
<id,2
>
Semantic Analysis (context handling)
 Collects context (semantic) information, checks for semantic
errors, and annotates nodes of the tree with the results.
 Ensures that component of a program fit together
meaningfully.
 Also gathers type information and saves either in syntax tree
or symbol table.
7
<id,3>
inttofloat
=
=
=
<id,1>
<id,2>
60
Semantic Analysis (context handling)
 Important part is type checking, where compiler checks that each
operator has matching operands.
 Examples:
 type checking: report error if an operator is applied to an incompatible operand.
 Eg : Array index is an integer; report error if floating point index.
 check flow-of-controls.
 uniqueness or name-related checks.
8
Intermediate code generation
 Translate language-specific constructs in the AST into more general constructs.
 Generate an explicit low level intermediate representation .
 intermediate representation should have two important properties.
 It should be easy to produce.
 It should be easy to translate into target machine code.
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 * t2
Id1 = t3
9
Intermediate code generation
 Generate an explicit machine like intermediate representation.
 Two important properties
 It should be easy to produce
 It should be easy to translate into target code
 The output of intermediate code generator consist of three address code sequence.
 Each three address assignment instruction has atmost one operator on the right hand side.
 The compiler must generate temporary name to hold the value computed by three address
instruction.
 Some instruction may have less than three operands.
10
Code Optimisation
 The goal is to improve the intermediate code and, thus, the effectiveness
of code generation and the performance of the target code ge improved.
 Faster, shorter code and even that consumes less power.
 Intofloat conversion can be done once.
t1 = id3 * 60.0
id1 = id2 + t1
 Significant amout of time spent on this phase.
11
Front-End
Middle-End
(optimiser)
Back-End
Target
code
Source
code
IR IR
Code Generation Phase
 Map the AST onto a linear list of target machine instructions in a
symbolic form:
 Instruction selection: a pattern matching problem.
 Register allocation: each value should be in a register when it is used
 Instruction scheduling: take advantage of multiple functional units.
 Target, machine-specific properties may be used to optimise the
code.
 Finally, machine code and associated information required by the
Operating System are generated.
12
 t1=id3 * 60.0
 Id1=id2 + t1
 can be converted into target code as
LDF R2, R2, id3
MULF R2,R2 #60.0
LDF R1,id2
ADDF R1,R2,R2
STF id1,R1
13
Compiler Construction Tools
 Parser Generator : Automatically produce syntax analysis from grammatical description of
a programming language.
 Scanner Generator : Produce lexical analyzer from regular expression description of the
language.
 Syntax Directed translation engines : produce collection of routines for walking a parse
tree and generating intermediate code.
 Code Generator generators : produce code generator from a collection of rules for
translating each operation of the intermediate language into the machine language for a
target machine.
 Data Flow Engines : that facilitate the gathering of information about how values are
transmitted from one part of a program to each other part.
 Compiler Construction tool kit : provide an integrated set of routines for constructing
various phases of a compiler.
14
C compilation Phases 15
Lexical Analyzer 16
Lexical Analysis Versus Parsing
 Simplicity of design.
 Efficiency. With the task separated it is easier
to apply specialized techniques.
 Portability. Only the lexer need communicate
with the outside.
17
Token 18
Pattern is the description of of the form that the lexemes may take.
Token Example 19
Lexical Errors
 f i ( a == f (x) )
 The simplest recovery strategy is "panic mode" recovery.
 delete successive characters from the remaining input, until the lexical
analyzer can find a well-formed token at the beginning of what input is
left.
 Delete one character from the remaining input.
 Insert a missing character into the remaining input.
 Replace a character by another character.
 Transpose two adjacent characters.
20
Input Buffering 21
• Buffer Pairs
• Pointer lexemeBegin, marks the beginning of the current lexeme.
• Pointer forward scans ahead until a pattern match is found.
• Sentinel : special character not the part of the source program
Specification fo tokens
 Reular expressions
 A vocabulary (alphabet) is a finite set of symbols.
 A string is any finite sequence of symbols from a vocabulary.
 A language is any set of strings over a fixed vocabulary.
 A grammar is a finite way of describing a language.
22
Strings
 A PREFIX of string S is any string obtained by removing zero or more symbols
from the end of s. For example, ban, banana, and E are prefixes of banana.
 A SUFFIX of string s is any string obtained by removing zero or more symbols
from the beginning of s. For example, nana, banana, and E are suffixes of
banana.
 A SUBSTRING of s is obtained by deleting any prefix and any suffix from s.
For instance, banana, nan, and E are substrings of banana.
 The PROPER PREFIXES, SUFFIXES, and substrings of a string s are those,
prefixes, suffixes, and substrings, respectively, of s that are not E or not equal to
s itself.
 A SUBSEQUENCE of s is any string formed by deleting zero or more not
necessarily consecutive positions of s. For example, baan is a subsequence of
banana.
23
Operation on Languages
 Let L be the set of letters {A, B , . . . , Z , a, b, . . . , z} and let D be the set
of digits {a, 1 , . . . 9}
 L U D is the set of letters and digits - strictly speaking the language with 62
strings of length one, each of which strings is either one letter or one digit.
 LD is the set df 520 strings of length two, each consisting of one letter followed
by one digit.
 L4 is the set of all 4-letter strings.
 L * is the set of ail strings of letters, including E, the empty string.
 L(L U D)* is the set of all strings of letters and digits beginning with a letter.
 D+ is the set of all strings of one or more digits.
24
Regular Expressions
Patterns form a regular language. A regular expression is a way of
specifying a regular language. It is a formula that describes a
possibly infinite set of strings.
(Have you ever tried ls [x-z]* ?)
Regular Expression (RE) (over a vocabulary V):
  is a RE denoting the empty set {}.
 If a V then a is a RE denoting {a}.
 If r1, r2 are REs then:
 r1* denotes zero or more occurrences of r1;
 r1r2 denotes concatenation;
 r1 | r2 denotes either r1 or r2;
 Shorthands: [a-d] for a | b | c | d; r+ for rr*; r? for r | 
Describe the languages denoted by the following REs
a; a | b; a*; (a | b)*; (a | b)(a | b); (a*b*)*; (a | b)*baa;
(What about ls [x-z]* above? Hmm… not a good example?)
25
Examples
 integer  (+ | – | ) (0 | 1 | 2 | … | 9)+
 integer  (+ | – | ) (0 | (1 | 2 | … | 9) (0 | 1 | 2 | … | 9)*)
 decimal  integer.(0 | 1 | 2 | … | 9)*
 identifier  [a-zA-Z] [a-zA-Z0-9]*
 Real-life application (perl regular expressions):
 [+–]?(d+.d+|d+.|.d+)
 [+–]?(d+.d+|d+.|.d+|d+)([eE][+–]?d+)?
(for more information read: % man perlre)
(Not all languages can be described by regular expressions. But, we
don’t care for now).
26
Building a Lexical Analyser by hand
Based on the specifications of tokens through regular expressions we can
write a lexical analyser. One approach is to check case by case and split
into smaller problems that can be solved ad hoc. Example:
void get_next_token() {
c=input_char();
if (is_eof(c)) { token  (EOF,”eof”); return}
if (is_letter(c)) {recognise_id()}
else if (is_digit(c)) {recognise_number()}
else if (is_operator(c))||is_separator(c))
{token  (c,c)} //single char assumed
else {token  (ERROR,c)}
return;
}
...
do {
get_next_token();
print(token.class, token.attribute);
} while (token.class != EOF);
Can be efficient; but requires a lot of work and may be difficult to modify!
27
Building Lexical Analysers “automatically”
Idea: try the regular expressions one by one and find the longest match:
set (token.class, token.length) (NULL, 0)
// first
find max_length such that input matches T1RE1
if max_length > token.length
set (token.class, token.length) (T1, max_length)
// second
find max_length such that input matches T2RE2
if max_length > token.length
set (token.class, token.length) (T2, max_length)
…
// n-th
find max_length such that input matches TnREn
if max_length > token.length
set (token.class, token.length) (Tn, max_length)
// error
if (token.class == NULL) { handle no_match }
Disadvantage: linearly dependent on number of token classes and requires
restarting the search for each regular expression.
28
We study REs to automate scanner construction!
Consider the problem of recognising register names starting with r and
requiring at least one digit:
Register  r (0|1|2|…|9) (0|1|2|…|9)* (or, Register  r Digit Digit*)
The RE corresponds to a transition diagram:
29
Depicts the actions that take place in the scanner.
• A circle represents a state; S0: start state; S2: final state (double circle)
• An arrow represents a transition; the label specifies the cause of the transition.
A string is accepted if, going through the transitions, ends in a final state (for
example, r345, r0, r29, as opposed to a, r, rab)
S0 S1
r digit
S2
digit
start
Towards Automation
An easy (computerised) implementation of a transition diagram is a
transition table: a column for each input symbol and a row for
each state. An entry is a set of states that can be reached from a
state on some input symbol. E.g.:
state ‘r’ digit
0 1 -
1 - 2
2(final) - 2
If we know the transition table and the final state(s) we can build
directly a recogniser that detects acceptance:
char=input_char();
state=0; // starting state
while (char != EOF) {
state  table(state,char);
if (state == ‘-’) return failure;
word=word+char;
char=input_char();
}
if (state == FINAL) return acceptance; else return failure;
30
DFA Minimisation: the problem
 Problem: can we minimise the number of states?
 Answer: yes, if we can find groups of states where, for each
input symbol, every state of such a group will have
transitions to the same group.
31
A
B
D E
C
b
a
a
a
a
a
b
bb
b
DFA minimisation: the algorithm
(Hopcroft’s algorithm: simple version)
Divide the states of the DFA into two groups: those containing final
states and those containing non-final states.
while there are group changes
for each group
for each input symbol
if for any two states of the group and a given input symbol, their
transitions do not lead to the same group, these states must belong to
different groups.
For the curious, there is an alternative approach: create a graph in which there is an edge
between each pair of states which cannot coexist in a group because of the conflict above.
Then use a graph colouring algorithm to find the minimum number of colours needed so that
any two nodes connected by an edge do not have the same colour (we’ll examine graph
colouring algorithms later on, in register allocation)
32
How does it work? Recall (a | b)* abb
In each iteration, we consider any non-single-member groups and we
consider the partitioning criterion for all pairs of states in the group.
33
Iteration Current groups Split on a Split on b
0 {E}, {A,B,C,D} None {A,B,C}, {D}
1 {E}, {D}, {A,B,C} None {A,C}, {B}
2 {E}, {D}, {B}, {A, C} None None
A
B
D E
C
b
a
a
a a
a
b
bb
b
B
D E
A, C
a
a
aa
b
bb
b
From NFA to minimized DFA: recall a (b | c)*
34
S0 S1 S2 S3
S4
S6
S5
S7
S8 S9
a  







b
c
-closure(move(s,*))DFA
states
NFA
states a b c
A S0 S1,S2,S3,
S4,S6,S9
None None
B S1,S2,S3,
S4,S6,S9
None S5,S8,S9,
S3,S4,S6
S7,S8,S9,
S3,S4,S6
C S5,S8,S9,
S3,S4,S6
None S5,S8,S9,
S3,S4,S6
S7,S8,S9,
S3,S4,S6
D S7,S8,S9,
S3,S4,S6
None S5,S8,S9,
S3,S4,S6
S7,S8,S9,
S3,S4,S6
A
D
C
B
a
c
c
c b
b
b
DFA minimisation: recall a (b | c)*
Apply the minimisation algorithm to produce the minimal DFA:
35
A
D
C
B
a
c
c
c b
b
bCurrent groups Split on a Split on b Split on c
0 {B, C, D} {A} None None None
A B,C,D
b | c
a
Remember, last time, we said
that a human could construct
a simpler automaton than
Thompson’s construction?
Well, algorithms can produce
the same DFA!
Lex/Flex: Generating Lexical Analysers
Flex is a tool for generating scanners: programs which recognised lexical
patterns in text (from the online manual pages: % man flex)
 Lex input consists of 3 sections:
 regular expressions;
 pairs of regular expressions and C code;
 auxiliary C code.
 When the lex input is compiled, it generates as output a C source file
lex.yy.c that contains a routine yylex(). After compiling the C file, the
executable will start isolating tokens from the input according to the
regular expressions, and, for each token, will execute the code associated
with it. The array char yytext[] contains the representation of a token.
36

Contenu connexe

Tendances

Introduction to compiler
Introduction to compilerIntroduction to compiler
Introduction to compiler
Abha Damani
 

Tendances (20)

Code optimization
Code optimizationCode optimization
Code optimization
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Fundamentals of Language Processing
Fundamentals of Language ProcessingFundamentals of Language Processing
Fundamentals of Language Processing
 
Phases of Compiler
Phases of CompilerPhases of Compiler
Phases of Compiler
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 
Introduction to compiler
Introduction to compilerIntroduction to compiler
Introduction to compiler
 
COMPILER DESIGN Run-Time Environments
COMPILER DESIGN Run-Time EnvironmentsCOMPILER DESIGN Run-Time Environments
COMPILER DESIGN Run-Time Environments
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
Chapter 2 program-security
Chapter 2 program-securityChapter 2 program-security
Chapter 2 program-security
 
Design of a two pass assembler
Design of a two pass assemblerDesign of a two pass assembler
Design of a two pass assembler
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
 
Single Pass Assembler
Single Pass AssemblerSingle Pass Assembler
Single Pass Assembler
 
Input-Buffering
Input-BufferingInput-Buffering
Input-Buffering
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compiler
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in Compiler
 
what is compiler and five phases of compiler
what is compiler and five phases of compilerwhat is compiler and five phases of compiler
what is compiler and five phases of compiler
 
Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)
 

En vedette

En vedette (20)

Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Role-of-lexical-analysis
Role-of-lexical-analysisRole-of-lexical-analysis
Role-of-lexical-analysis
 
Akshayappt
AkshayapptAkshayappt
Akshayappt
 
Intermediate code generation
Intermediate code generationIntermediate code generation
Intermediate code generation
 
Lexical
LexicalLexical
Lexical
 
Bottomupparser
BottomupparserBottomupparser
Bottomupparser
 
Operator precedence
Operator precedenceOperator precedence
Operator precedence
 
Compiler presention
Compiler presentionCompiler presention
Compiler presention
 
Techniques & applications of Compiler
Techniques & applications of CompilerTechniques & applications of Compiler
Techniques & applications of Compiler
 
Compiler design
Compiler designCompiler design
Compiler design
 
Data Locality
Data LocalityData Locality
Data Locality
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Bottom up parser
Bottom up parserBottom up parser
Bottom up parser
 
Advanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems SolutionAdvanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems Solution
 
Unit 1 cd
Unit 1 cdUnit 1 cd
Unit 1 cd
 
Lecture4 lexical analysis2
Lecture4 lexical analysis2Lecture4 lexical analysis2
Lecture4 lexical analysis2
 
Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysis
 
Compiler unit 1
Compiler unit 1Compiler unit 1
Compiler unit 1
 
Lexical Analyzers and Parsers
Lexical Analyzers and ParsersLexical Analyzers and Parsers
Lexical Analyzers and Parsers
 

Similaire à Compilers Design

Compiler Design lab manual for Computer Engineering .pdf
Compiler Design lab manual for Computer Engineering .pdfCompiler Design lab manual for Computer Engineering .pdf
Compiler Design lab manual for Computer Engineering .pdf
kalpana Manudhane
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
bolovv
 
Structure-Compiler-phases information about basics of compiler. Pdfpdf
Structure-Compiler-phases information  about basics of compiler. PdfpdfStructure-Compiler-phases information  about basics of compiler. Pdfpdf
Structure-Compiler-phases information about basics of compiler. Pdfpdf
ovidlivi91
 

Similaire à Compilers Design (20)

Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Ch 2.pptx
Ch 2.pptxCh 2.pptx
Ch 2.pptx
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Compiler design important questions
Compiler design   important questionsCompiler design   important questions
Compiler design important questions
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
 
Compiler Design lab manual for Computer Engineering .pdf
Compiler Design lab manual for Computer Engineering .pdfCompiler Design lab manual for Computer Engineering .pdf
Compiler Design lab manual for Computer Engineering .pdf
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Compiler Designs
Compiler DesignsCompiler Designs
Compiler Designs
 
compiler Design course material chapter 2
compiler Design course material chapter 2compiler Design course material chapter 2
compiler Design course material chapter 2
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
 
Chapter-3 compiler.pptx course materials
Chapter-3 compiler.pptx course materialsChapter-3 compiler.pptx course materials
Chapter-3 compiler.pptx course materials
 
Unit1.ppt
Unit1.pptUnit1.ppt
Unit1.ppt
 
CC Week 11.ppt
CC Week 11.pptCC Week 11.ppt
CC Week 11.ppt
 
Lex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.pptLex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.ppt
 
Structure-Compiler-phases information about basics of compiler. Pdfpdf
Structure-Compiler-phases information  about basics of compiler. PdfpdfStructure-Compiler-phases information  about basics of compiler. Pdfpdf
Structure-Compiler-phases information about basics of compiler. Pdfpdf
 
Module 2
Module 2 Module 2
Module 2
 
SS & CD Module 3
SS & CD Module 3 SS & CD Module 3
SS & CD Module 3
 
Language processors
Language processorsLanguage processors
Language processors
 
CD U1-5.pptx
CD U1-5.pptxCD U1-5.pptx
CD U1-5.pptx
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Compilers Design

  • 2. Structure of a Compiler 2 Compiler Source code Errors Object code
  • 3. Conceptual Structure: two major phases  Front-end performs the analysis of the source language:  Recognises legal and illegal programs and reports errors.  “understands” the input program and collects its semantics in an IR.  Produces IR and shapes the code for the back-end.  Back-end does the target language synthesis:  Translates IR into target code. 3 Front-End Back-EndIntermediate Representation Source code Target code
  • 4. General Structure of a compiler 4 Source Intermediate code generat. tokens Intermediate Representation Syntax Tree Target Machine code Lexical Analysis Semantic Analysis Syntax Analysis I.C.Optimisation Code Generation Target code Optimisation Syntax Tree Intermediate Representation Error Handler Symbol Table Manager
  • 5. Lexical Analysis (Scanning)  Reads characters in the source program and groups them into words - lexeme (basic unit of syntax)  Produces words and recognises what sort they are.  The output is called token and is a pair of the form <token_name, attribute value>  Eg: Position = initial + rate * 60  keep a symbol table.  Lexical analysis eliminates white spaces  Speed is important - use a specialised tool: e.g., flex - a tool for generating scanners: programs which recognise lexical patterns in text; 5
  • 6. Syntax (or syntactic) Analysis (Parsing)  Imposes a hierarchical structure on the token stream.  Use output of lexical analyzer to create a tree like structure of the token stream – syntax tree.  The tree shows the order in which operations are performed  Ordering based on the precedence of mathematical operations. 6 <id,3 > 60 = = = <id,1 > <id,2 >
  • 7. Semantic Analysis (context handling)  Collects context (semantic) information, checks for semantic errors, and annotates nodes of the tree with the results.  Ensures that component of a program fit together meaningfully.  Also gathers type information and saves either in syntax tree or symbol table. 7 <id,3> inttofloat = = = <id,1> <id,2> 60
  • 8. Semantic Analysis (context handling)  Important part is type checking, where compiler checks that each operator has matching operands.  Examples:  type checking: report error if an operator is applied to an incompatible operand.  Eg : Array index is an integer; report error if floating point index.  check flow-of-controls.  uniqueness or name-related checks. 8
  • 9. Intermediate code generation  Translate language-specific constructs in the AST into more general constructs.  Generate an explicit low level intermediate representation .  intermediate representation should have two important properties.  It should be easy to produce.  It should be easy to translate into target machine code. t1 = inttofloat(60) t2 = id3 * t1 t3 = id2 * t2 Id1 = t3 9
  • 10. Intermediate code generation  Generate an explicit machine like intermediate representation.  Two important properties  It should be easy to produce  It should be easy to translate into target code  The output of intermediate code generator consist of three address code sequence.  Each three address assignment instruction has atmost one operator on the right hand side.  The compiler must generate temporary name to hold the value computed by three address instruction.  Some instruction may have less than three operands. 10
  • 11. Code Optimisation  The goal is to improve the intermediate code and, thus, the effectiveness of code generation and the performance of the target code ge improved.  Faster, shorter code and even that consumes less power.  Intofloat conversion can be done once. t1 = id3 * 60.0 id1 = id2 + t1  Significant amout of time spent on this phase. 11 Front-End Middle-End (optimiser) Back-End Target code Source code IR IR
  • 12. Code Generation Phase  Map the AST onto a linear list of target machine instructions in a symbolic form:  Instruction selection: a pattern matching problem.  Register allocation: each value should be in a register when it is used  Instruction scheduling: take advantage of multiple functional units.  Target, machine-specific properties may be used to optimise the code.  Finally, machine code and associated information required by the Operating System are generated. 12
  • 13.  t1=id3 * 60.0  Id1=id2 + t1  can be converted into target code as LDF R2, R2, id3 MULF R2,R2 #60.0 LDF R1,id2 ADDF R1,R2,R2 STF id1,R1 13
  • 14. Compiler Construction Tools  Parser Generator : Automatically produce syntax analysis from grammatical description of a programming language.  Scanner Generator : Produce lexical analyzer from regular expression description of the language.  Syntax Directed translation engines : produce collection of routines for walking a parse tree and generating intermediate code.  Code Generator generators : produce code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine.  Data Flow Engines : that facilitate the gathering of information about how values are transmitted from one part of a program to each other part.  Compiler Construction tool kit : provide an integrated set of routines for constructing various phases of a compiler. 14
  • 17. Lexical Analysis Versus Parsing  Simplicity of design.  Efficiency. With the task separated it is easier to apply specialized techniques.  Portability. Only the lexer need communicate with the outside. 17
  • 18. Token 18 Pattern is the description of of the form that the lexemes may take.
  • 20. Lexical Errors  f i ( a == f (x) )  The simplest recovery strategy is "panic mode" recovery.  delete successive characters from the remaining input, until the lexical analyzer can find a well-formed token at the beginning of what input is left.  Delete one character from the remaining input.  Insert a missing character into the remaining input.  Replace a character by another character.  Transpose two adjacent characters. 20
  • 21. Input Buffering 21 • Buffer Pairs • Pointer lexemeBegin, marks the beginning of the current lexeme. • Pointer forward scans ahead until a pattern match is found. • Sentinel : special character not the part of the source program
  • 22. Specification fo tokens  Reular expressions  A vocabulary (alphabet) is a finite set of symbols.  A string is any finite sequence of symbols from a vocabulary.  A language is any set of strings over a fixed vocabulary.  A grammar is a finite way of describing a language. 22
  • 23. Strings  A PREFIX of string S is any string obtained by removing zero or more symbols from the end of s. For example, ban, banana, and E are prefixes of banana.  A SUFFIX of string s is any string obtained by removing zero or more symbols from the beginning of s. For example, nana, banana, and E are suffixes of banana.  A SUBSTRING of s is obtained by deleting any prefix and any suffix from s. For instance, banana, nan, and E are substrings of banana.  The PROPER PREFIXES, SUFFIXES, and substrings of a string s are those, prefixes, suffixes, and substrings, respectively, of s that are not E or not equal to s itself.  A SUBSEQUENCE of s is any string formed by deleting zero or more not necessarily consecutive positions of s. For example, baan is a subsequence of banana. 23
  • 24. Operation on Languages  Let L be the set of letters {A, B , . . . , Z , a, b, . . . , z} and let D be the set of digits {a, 1 , . . . 9}  L U D is the set of letters and digits - strictly speaking the language with 62 strings of length one, each of which strings is either one letter or one digit.  LD is the set df 520 strings of length two, each consisting of one letter followed by one digit.  L4 is the set of all 4-letter strings.  L * is the set of ail strings of letters, including E, the empty string.  L(L U D)* is the set of all strings of letters and digits beginning with a letter.  D+ is the set of all strings of one or more digits. 24
  • 25. Regular Expressions Patterns form a regular language. A regular expression is a way of specifying a regular language. It is a formula that describes a possibly infinite set of strings. (Have you ever tried ls [x-z]* ?) Regular Expression (RE) (over a vocabulary V):   is a RE denoting the empty set {}.  If a V then a is a RE denoting {a}.  If r1, r2 are REs then:  r1* denotes zero or more occurrences of r1;  r1r2 denotes concatenation;  r1 | r2 denotes either r1 or r2;  Shorthands: [a-d] for a | b | c | d; r+ for rr*; r? for r |  Describe the languages denoted by the following REs a; a | b; a*; (a | b)*; (a | b)(a | b); (a*b*)*; (a | b)*baa; (What about ls [x-z]* above? Hmm… not a good example?) 25
  • 26. Examples  integer  (+ | – | ) (0 | 1 | 2 | … | 9)+  integer  (+ | – | ) (0 | (1 | 2 | … | 9) (0 | 1 | 2 | … | 9)*)  decimal  integer.(0 | 1 | 2 | … | 9)*  identifier  [a-zA-Z] [a-zA-Z0-9]*  Real-life application (perl regular expressions):  [+–]?(d+.d+|d+.|.d+)  [+–]?(d+.d+|d+.|.d+|d+)([eE][+–]?d+)? (for more information read: % man perlre) (Not all languages can be described by regular expressions. But, we don’t care for now). 26
  • 27. Building a Lexical Analyser by hand Based on the specifications of tokens through regular expressions we can write a lexical analyser. One approach is to check case by case and split into smaller problems that can be solved ad hoc. Example: void get_next_token() { c=input_char(); if (is_eof(c)) { token  (EOF,”eof”); return} if (is_letter(c)) {recognise_id()} else if (is_digit(c)) {recognise_number()} else if (is_operator(c))||is_separator(c)) {token  (c,c)} //single char assumed else {token  (ERROR,c)} return; } ... do { get_next_token(); print(token.class, token.attribute); } while (token.class != EOF); Can be efficient; but requires a lot of work and may be difficult to modify! 27
  • 28. Building Lexical Analysers “automatically” Idea: try the regular expressions one by one and find the longest match: set (token.class, token.length) (NULL, 0) // first find max_length such that input matches T1RE1 if max_length > token.length set (token.class, token.length) (T1, max_length) // second find max_length such that input matches T2RE2 if max_length > token.length set (token.class, token.length) (T2, max_length) … // n-th find max_length such that input matches TnREn if max_length > token.length set (token.class, token.length) (Tn, max_length) // error if (token.class == NULL) { handle no_match } Disadvantage: linearly dependent on number of token classes and requires restarting the search for each regular expression. 28
  • 29. We study REs to automate scanner construction! Consider the problem of recognising register names starting with r and requiring at least one digit: Register  r (0|1|2|…|9) (0|1|2|…|9)* (or, Register  r Digit Digit*) The RE corresponds to a transition diagram: 29 Depicts the actions that take place in the scanner. • A circle represents a state; S0: start state; S2: final state (double circle) • An arrow represents a transition; the label specifies the cause of the transition. A string is accepted if, going through the transitions, ends in a final state (for example, r345, r0, r29, as opposed to a, r, rab) S0 S1 r digit S2 digit start
  • 30. Towards Automation An easy (computerised) implementation of a transition diagram is a transition table: a column for each input symbol and a row for each state. An entry is a set of states that can be reached from a state on some input symbol. E.g.: state ‘r’ digit 0 1 - 1 - 2 2(final) - 2 If we know the transition table and the final state(s) we can build directly a recogniser that detects acceptance: char=input_char(); state=0; // starting state while (char != EOF) { state  table(state,char); if (state == ‘-’) return failure; word=word+char; char=input_char(); } if (state == FINAL) return acceptance; else return failure; 30
  • 31. DFA Minimisation: the problem  Problem: can we minimise the number of states?  Answer: yes, if we can find groups of states where, for each input symbol, every state of such a group will have transitions to the same group. 31 A B D E C b a a a a a b bb b
  • 32. DFA minimisation: the algorithm (Hopcroft’s algorithm: simple version) Divide the states of the DFA into two groups: those containing final states and those containing non-final states. while there are group changes for each group for each input symbol if for any two states of the group and a given input symbol, their transitions do not lead to the same group, these states must belong to different groups. For the curious, there is an alternative approach: create a graph in which there is an edge between each pair of states which cannot coexist in a group because of the conflict above. Then use a graph colouring algorithm to find the minimum number of colours needed so that any two nodes connected by an edge do not have the same colour (we’ll examine graph colouring algorithms later on, in register allocation) 32
  • 33. How does it work? Recall (a | b)* abb In each iteration, we consider any non-single-member groups and we consider the partitioning criterion for all pairs of states in the group. 33 Iteration Current groups Split on a Split on b 0 {E}, {A,B,C,D} None {A,B,C}, {D} 1 {E}, {D}, {A,B,C} None {A,C}, {B} 2 {E}, {D}, {B}, {A, C} None None A B D E C b a a a a a b bb b B D E A, C a a aa b bb b
  • 34. From NFA to minimized DFA: recall a (b | c)* 34 S0 S1 S2 S3 S4 S6 S5 S7 S8 S9 a          b c -closure(move(s,*))DFA states NFA states a b c A S0 S1,S2,S3, S4,S6,S9 None None B S1,S2,S3, S4,S6,S9 None S5,S8,S9, S3,S4,S6 S7,S8,S9, S3,S4,S6 C S5,S8,S9, S3,S4,S6 None S5,S8,S9, S3,S4,S6 S7,S8,S9, S3,S4,S6 D S7,S8,S9, S3,S4,S6 None S5,S8,S9, S3,S4,S6 S7,S8,S9, S3,S4,S6 A D C B a c c c b b b
  • 35. DFA minimisation: recall a (b | c)* Apply the minimisation algorithm to produce the minimal DFA: 35 A D C B a c c c b b bCurrent groups Split on a Split on b Split on c 0 {B, C, D} {A} None None None A B,C,D b | c a Remember, last time, we said that a human could construct a simpler automaton than Thompson’s construction? Well, algorithms can produce the same DFA!
  • 36. Lex/Flex: Generating Lexical Analysers Flex is a tool for generating scanners: programs which recognised lexical patterns in text (from the online manual pages: % man flex)  Lex input consists of 3 sections:  regular expressions;  pairs of regular expressions and C code;  auxiliary C code.  When the lex input is compiled, it generates as output a C source file lex.yy.c that contains a routine yylex(). After compiling the C file, the executable will start isolating tokens from the input according to the regular expressions, and, for each token, will execute the code associated with it. The array char yytext[] contains the representation of a token. 36