The Phases of a Compiler

SHREE SWAMI ATMANAND SARASWATI INSTITUTE
OF TECHNOLOGY
Compiler Design(2170701)
PREPARED BY: (Group:1)
Bhumi Aghera(130760107001)
Monika Dudhat(130760107007)
Radhika Talaviya(130760107029)
Rajvi Vaghasiya(130760107031)
The Phases of a Compiler
GUIDED BY:
Prof. Akhilesh Ladha

Language Processing System
• We have learnt that any computer system is made of hardware and software.
• The hardware understands a language, which humans cannot understand. So we write
programs in high-level language, which is easier for us to understand and remember.
• These programs are then fed into a series of tools and OS components to get the desired
code that can be used by the machine.
• This is known as Language Processing System.

Phases of Compiler
• There are mainly two parts of compilation process.
1. Analysis phase
2. Synthesis phase
1. Analysis phase: The main objective of the Analysis phase is to break the source code into
parts and then arranges these pieces into a meaningful structure.
• The analysis phase also collect information about source program and stores it in a data
structure called symbol table.
• The analysis phase is often called the front end of compiler.
• Analysis phase contains:
I. Lexical analysis
II. Syntax analysis
III. Semantic analysis

2. Synthesis phase: Synthesis phase is concerned with generation of target language statement
which has the same meaning as the source statement.
• The synthesis phase is often called the back end of compiler.
• Synthesis phase contains:
I. Intermediate code generation
II. Code optimization
III. Code generation
Phases of Compiler

Lexical Analysis
• The first phase of a compiler is called lexical analysis or scanning.
• The lexical analyzer takes the modified source code from language preprocessors that are
written in the form of sentences.
• The lexical analyzer reads the stream of characters making up the source program and groups
the characters into meaningful sequences called lexemes.
• Lexical analyzer represents these lexemes in the form of tokens as:
<token-name, attribute-value>
where, token-name: an abstract symbol is used during syntax analysis.
attribute-value: points to an entry in the symbol table for this token.
Lexical analysisInput string Tokens or lexemes

Tokens, Patterns And Lexemes
Token: Token is a sequence of characters that can be treated as a single logical entity.
Typical tokens are,
Identifiers
keywords
operators
special symbols
constants
Pattern: A set of strings in the input for which the same token is produced as output. This
set of strings is described by a rule called a pattern associated with the token.
Lexeme: A lexeme is a sequence of characters in the source program that is matched by
the pattern for a token.

Tokenization
• Process of forming tokens from input stream is called tokenization.
div = 6/2;
Lexeme Token
div Identifier
= Assignment symbol
6 Number
/ Division operator
2 Number
; End of statement

Type Example
Comment /* ignored */
Preprocessor directive #include<stdio.h>
#define NUMS 5
Macro NUMS
whitespace t n b
Type Example
ID foo n_14 last
NUM 73 00 517 082
REAL 66.1 .5 10. 1e67 5.5e-10
COMMA ,
NOTEQ !=
LPAREN (
RPAREN )
Examples of tokens
Example of non-tokens

Tasks – lexical analyzer
• Separation of the input source code into tokens.
• Remove the unnecessary white spaces from the source code.
• Removing the comments from the source text.
• Keeping track of line numbers while scanning the new line characters. These line numbers
are used by the error handler to print the error messages.
• Preprocessing of macros.

Syntax Analysis
• The second phase of the compiler is called the syntax analysis or parsing.
• It takes the token produced by lexical analysis as input and generates a parse tree (or syntax
tree).
• The parser uses the first components of the tokens produced by the lexical analyzer to create
a tree-like intermediate representation that depicts the grammatical structure of the token
stream.
• A typical representation is a syntax tree in which each interior node represents an operation
and the children of the node represent the arguments of the operation.
• In this phase, token arrangements are checked against the source code grammar, i.e. the
parser checks if the expression made by the tokens is syntactically correct.

Parse tree
• It shows how the start symbol of a grammar derives a string in the language
• root is labeled by the start symbol
• leaf nodes are labeled by tokens
• Each internal node is labeled by a non terminal
• if A is a non-terminal labeling an internal node and x1, x2, …xn are labels of children of that
node then A  x1 x2 … xn is a production
• For example,
Parse tree for 9-5+2
list
list
list
digit
digit
+
-
digit
9
5
2

Semantic Analysis
• The semantic analyzer uses the syntax tree and the information in the symbol table to check
the source program for semantic consistency with the language definition.
• It also gathers type information and saves it in either the syntax tree or the symbol table, for
subsequent use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the compiler checks that each
operator has matching operands.
• For example, many programming language definitions require an array index to be an integer;
the compiler must report an error if a floating-point number is used to index an array.
• The language specification may permit some type conversions called coercions.

Semantic Analysis
• For example, a binary arithmetic operator may be applied to either a pair of integers or to a
pair of floating point numbers. If the operator is applied to a floating-point number and an
integer, the compiler may convert or coerce the integer to a floating-point number. As shown
in figure given below:
position = initial + rate * 60
=
<id,1>
+
<id,2> *
<id,3> 60
=
<id,1> +
<id,2> *
<id,3>
digit
60
Syntax Tree Semantic Tree

Intermediate Code Generation
• In the process of translating a source program into target code, a compiler may construct one
or more intermediate representation; they are commonly used during syntax and semantic
analysis.
• After syntax and semantic analysis of the source program, many compilers generate an
explicit low-level or machine-like intermediate representation.
• This intermediate representation should have two important properties:
1. It should be easy to produce.
2. It should be easy to translate into the target machine.

Intermediate Code Generation
• The considered intermediate form called three-address code, which consists of a sequence of
assembly-like instructions with three operands per instruction.
• Each operand can act like a register.
• Three address code sequence of intermediate code generator is as follows:
position = initial + rate * 60
Three-address code:
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
• Each three-address assignment instruction has at most one operator on the right side. These
instructions fix the order in which operations to be done.

Code Optimization
• The next phase does code optimization of the intermediate code.
• Optimization can be assumed as something that removes unnecessary code lines, and
arranges the sequence of statements in order to speed up the program execution without
wasting resources (CPU, memory) and deliver high speed
• In optimization, high-level general programming constructs are replaced by very efficient
low-level programming codes.
• A code optimizing process must follow the three rules given below:
• The output code must not, in any way, change the meaning of the program.
• Optimization should increase the speed of the program and if possible, the program
should demand less number of resources.
• Optimization should itself be fast and should not delay the overall compiling process.

Code Optimization
• Efforts for an optimized code can be made at various levels of compiling the process.
• At the beginning, users can change/rearrange the code or use better algorithms to write
the code.
• After generating intermediate code, the compiler can modify the intermediate code by
address calculations and improving loops.
• While producing the target machine code, the compiler can make use of memory
hierarchy and CPU registers.
• Optimization can be categorized broadly into two types :
1) Machine independent
2) Machine dependent.

Machine-independent Optimization
• In this optimization, the compiler takes in the intermediate code and transforms a part of the
code that does not involve any CPU registers and/or absolute memory locations.
• For example:
do{
i=10;
v=v+i;
}while(v<10);
• This code involves repeated assignment of the identifier i, which if we put this way:
i=10;
do{
v=v+i;
}while(v<10);

Machine-dependent Optimization
• Machine-dependent optimization is done after the target code has been generated and when
the code is transformed according to the target machine architecture.
• It involves CPU registers and may have absolute memory references rather than relative
references. Machine-dependent optimizers put efforts to take maximum advantage of
memory hierarchy.

• Code generation can be considered as the final phase of compilation.
• In this phase, the code generator takes the optimized representation of the intermediate code
and maps it to the target machine language.
• The code generator translates the intermediate code into a sequence of (generally) re-
locatable machine code. Sequence of instructions of machine code performs the task as the
intermediate code would do.
• Through post code generation, optimization process can be applied on the code, but that can
be seen as a part of code generation phase itself.
• The code generated by the compiler is an object code of some lower-level programming
language, for example, assembly language.
• The source code written in a higher-level language is transformed into a lower-level language
that results in a lower-level object code, which should have the following minimum
properties:
• It should carry the exact meaning of the source code.
• It should be efficient in terms of CPU usage and memory management.
Code Generation

The Phases of a Compiler

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à The Phases of a Compiler

Similaire à The Phases of a Compiler (20)

Plus de Radhika Talaviya

Plus de Radhika Talaviya (16)

Dernier

Dernier (20)

The Phases of a Compiler