SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
Chapter 3.2 The Functions and Purposes of Translators

3.2 (a)        Interpreters and Compilers
When electronic computers were first used, the programs had to be written in machine
code. This code was comprised of simple instructions each of which was represented
by a binary pattern in the computer. To produce these programs, programmers had to
write the instructions in binary. This not only took a long time, it was also prone to
errors. To improve program writing assembly languages were developed. Assembly
languages allowed the use of mnemonics and names for locations in memory. Each
assembly instruction mapped to a single machine instruction which meant that it was
fairly easy to translate a program written in assembly language to machine code. To
speed up this translation process, assemblers were written which could be loaded into
the computer and then the computer could translate the assembly language to machine
code. Writing programs in assembly language, although easier than using machine
code, was still tedious and took a long time.

After assembly languages, came high-level languages which used the type of
language used by the person writing the program. Thus FORTRAN (FORmula
TRANslation) was developed for science and engineering programs and it used
formulae in the same way as would scientists and engineers. COBOL (Common
Business Oriented Language) was developed for business applications. Programs
written in these languages needed to be translated into machine code. This led to the
birth of compilers.

A compiler takes a program written in a high-level language and translates into an
equivalent program in machine code. Once this is done, the machine code version can
be loaded into the machine and run without any further help as it is complete in itself.
The high-level language version of the program is usually called the source code and
the resulting machine code program is called the object code. The relationship
between them is shown in Fig. 3.2.a.1.


          Source Code                                                   Object Code
          (High-Level                    COMPILER                        (Machine
           Language)                                                     Language)


                                      Fig. 3.2.a.1

The problem with using a compiler is that it uses a lot of computer resources. It has
to be loaded in the computer's memory at the same time as the source code and there
has to be sufficient memory to hold the object code. Further, there has to be sufficient
memory for working storage while the translation is taking place. Another
disadvantage is that when an error in a program occurs it is difficult to pin-point its
source in the original program.

An alternative system is to use interpretation. In this system each instruction is taken
in turn and translated to machine code. The instruction is then executed before the
next instruction is translated. This system was developed because early personal


                                          4.2 - 1
computers lacked the power and memory needed for compilation. This method also
has the advantage of producing error messages as soon as an error is encountered.
This means that the instruction causing the problem can be easily identified. Against
interpretation is the fact that execution of a program is slow compared to that of a
compiled program. This is because the original program has to be translated every
time it is executed. Also, instructions inside a loop will have to be translated each
time the loop is entered.

However, interpretation is very useful during program development as errors can be
found and corrected as soon as they are encountered. In fact many languages, such as
Visual Basic, use both an interpreter and a compiler. This enables the programmer to
use the interpreter during program development and, when the program is fully
working, it can be translated by the compiler into machine code. This machine code
version can then be distributed to users who do not have access to the original code.

Whether a compiler or interpreter is used, the translation from a high-level language
to machine code has to go through various stages and these are shown in Fig. 3.2.a.2.


                                   Source Program



                                  Lexical Analysis


                                   Syntax Analysis


                                 Semantic Analysis



                                     Intermediate
                                      Language




                                  Code Generation


                                 Code Optimisation



                                    Object Program




                                         4.2 - 2
3.2 (b)         Lexical Analysis
The lexical analyser uses the source program as input and creates a stream of tokens
for the syntax analyser. Tokens are normally 16-bit unsigned integers. Each group of
characters is replaced by a token. Single characters, which have a meaning in their
own right, are replaced by their ASCII values. Multiple character tokens are
represented by integers greater than 255 because the ones up to 255 are reserved for
the ASCII codes. Variable names will need to have extra information stored about
them. This is done by means of a symbol table. This table is used throughout
compilation to build up information about names used in the program. During lexical
analysis only the variable's name will be noted. It is during syntax and semantic
analysis that details such as the variable's type and scope are added. The lexical
analyser may output some error messages and diagnostics. For example, it will report
errors such as an identifier or variable name which breaks the rules of the language.

At various stages during compilation it will be necessary to look up details about the
names in the symbol table. This must be done efficiently so a linear search is not
sufficiently fast. In fact, it is usual to use a hash table and to calculate the position of
a name by hashing the name itself. When two names are hashed to the same address,
a linked list can be used to avoid the symbol table filling up.

The lexical analyser also removes redundant characters such as white space (these are
spaces, tabs, etc., which we may find useful to make the code more readable, but the
computer does not want) and comments. Often the lexical analysis takes longer than
the other stages of compilation. This is because it has to handle the original source
code, which can have many formats. For example, the following two pieces of code
are equivalent although their format is considerably different.

IF X = Y THEN 'square X                          IF X = Y THEN Z := X * X
  Z := X * X                                     ELSE Z := Y * Y
ELSE          'square Y                          PRINT Z
  Z := Y *Y
ENDIF
PRINT Z

When the lexical analyser has completed its task, the code will be in a standard format
which means that the syntax analyser (which is the next stage, and we will be looking
at in 3.2.c) can always expect the format of its input to be the same.




                                            4.2 - 3
3.2 (c)           Syntax Analysis
This Section should be read in conjunction with Section 3.5.j which discusses Backus-
Naur Form (BNF) and syntax diagrams.

During this stage of compilation the code generated by the lexical analyser is parsed
(broken into small units) to check that it is grammatically correct. All languages have
rules of grammar and computer languages are no exception. The grammar of
programming languages is defined by means of BNF notation or syntax diagrams. It
is against these rules that the code has to be checked.

For example, taking a very elementary language, an assignment statement may be
defined to be of the form

                      <variable> <assignment_operator> <expression>

and expression is

                        <variable> <arithmetic_operator> <variable>

and the parser must take the output from the lexical analyser and check that it is of
this form.

If the statement is

                                   sum := sum + number

the parser will receive

  <variable> <assignment_operator> <variable> <arithmetic_operator> <variable>

which becomes

                      <variable> <assignment_operator> <expression>

and then

                                  <assignment statement>

which is valid.

If the original statement is

                                  sum := sum + + number

this will be input as

          <variable> <assignment_operator> <variable> <arithmetic_operator>
                           <arithmetic_operator> <variable>



                                           4.2 - 4
and this does not represent a valid statement hence an error message will be returned.

It is at this stage that invalid names can be found such as PRIT instead of PRINT as
PRIT will be read as a variable name instead of a reserved word. This will mean that
the statement containing PRIT will not parse to a valid statement. Note that in
languages that require variables to be declared before being used, the lexical analyser
may pick up this error because PRIT has not been declared as a variable and so is not
in the symbol table.

Most compilers will report errors found during syntax analysis as soon as they are
found and will attempt to show where the error has occurred. However, they may not
be very accurate in their conclusions nor may the error message be very clear.

During syntax analysis certain semantic checks are carried out. These include label
checks, flow of control checks and declaration checks.

Some languages allow GOTO statements (not recommended by the authors) which
allow control to be passed, unconditionally, to another statement which has a label.
The GOTO statement specifies the label to which the control must pass. The
compiler must check that such a label exists.

Certain control constructs can only be placed in certain parts of a program. For
example in C (and C++) the CONTINUE statement can only be placed inside a loop
and the BREAK statement can only be placed inside a loop or SWITCH statement.
The compiler must ensure that statements like these are used in the correct place.

Many languages insist on the programmer declaring variables and their types. It is at
this stage that the compiler verifies that all variables have been properly declared and
that they are used correctly.




                                          4.2 - 5
3.2 (d)        Code Generation
It is at this stage, when all the errors due to incorrect use of the language have been
removed, that the program is translated into code suitable for use by the computer's
processor.

During lexical and syntax analysis a table of variables has been built up which
includes details of the variable name, its type and the block in which it is valid. The
address of the variable is now calculated and stored in the symbol table. This is done
as soon as the variable is encountered during code generation.

Before the final code can be generated, an intermediate code is produced. This
intermediate code can then be interpreted or translated into machine code. In the
latter case, the code can be saved and distributed to computer systems as an
executable program
3.2 (e)        Linkers and Loaders
Programs are usually built up in modules. These modules are then compiled into
machine code that has to be loaded into the computer's memory. This process is done
by the loader. The loader decides where in memory to place the code and then adjusts
memory addresses as described in Chapter 3.1. As the whole program may consist of
many modules, all of which have been separately compiled, the modules will have to
be correctly linked once they have been loaded. This is the job of the linker. The
linker calculates the addresses of the separate pieces that make up the whole program
and then links these pieces so that all the modules can interact with one another.




                                          4.2 - 6

Contenu connexe

Tendances

PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORPSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORijistjournal
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compilerIffat Anjum
 
Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design BasicsAkhil Kaushik
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler designDHARANI BABU
 
Cs6660 compiler design
Cs6660 compiler designCs6660 compiler design
Cs6660 compiler designhari2010
 
Compiler Construction Course - Introduction
Compiler Construction Course - IntroductionCompiler Construction Course - Introduction
Compiler Construction Course - IntroductionMuhammad Sanaullah
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysismengistu23
 
Compiler Construction introduction
Compiler Construction introductionCompiler Construction introduction
Compiler Construction introductionRana Ehtisham Ul Haq
 
Chap 1-language processor
Chap 1-language processorChap 1-language processor
Chap 1-language processorshindept123
 
basics of compiler design
basics of compiler designbasics of compiler design
basics of compiler designPreeti Katiyar
 
Compiler design Introduction
Compiler design IntroductionCompiler design Introduction
Compiler design IntroductionAman Sharma
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processorsRebaz Najeeb
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler ConstructionSarmad Ali
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction Sarmad Ali
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture NotesFellowBuddy.com
 
Language processor
Language processorLanguage processor
Language processorAbha Damani
 

Tendances (20)

Phases of compiler
Phases of compilerPhases of compiler
Phases of compiler
 
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORPSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
 
Compiler
CompilerCompiler
Compiler
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compiler
 
Compiler Design Basics
Compiler Design BasicsCompiler Design Basics
Compiler Design Basics
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler design
 
Cs6660 compiler design
Cs6660 compiler designCs6660 compiler design
Cs6660 compiler design
 
Compiler Construction Course - Introduction
Compiler Construction Course - IntroductionCompiler Construction Course - Introduction
Compiler Construction Course - Introduction
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
 
Compiler Construction introduction
Compiler Construction introductionCompiler Construction introduction
Compiler Construction introduction
 
Chap 1-language processor
Chap 1-language processorChap 1-language processor
Chap 1-language processor
 
basics of compiler design
basics of compiler designbasics of compiler design
basics of compiler design
 
Compiler design Introduction
Compiler design IntroductionCompiler design Introduction
Compiler design Introduction
 
Java chapter 5
Java chapter 5Java chapter 5
Java chapter 5
 
Compiler design
Compiler designCompiler design
Compiler design
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture Notes
 
Language processor
Language processorLanguage processor
Language processor
 

En vedette (9)

3.4
3.43.4
3.4
 
Eq v2
Eq v2Eq v2
Eq v2
 
3.10
3.103.10
3.10
 
Nov 05 P3
Nov 05 P3Nov 05 P3
Nov 05 P3
 
3.9
3.93.9
3.9
 
3.7
3.73.7
3.7
 
3.8
3.83.8
3.8
 
3.1
3.13.1
3.1
 
Nov 09 MS32
Nov 09 MS32Nov 09 MS32
Nov 09 MS32
 

Similaire à 3.2

SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGEIJCI JOURNAL
 
SSD Mod 2 -18CS61_Notes.pdf
SSD Mod 2 -18CS61_Notes.pdfSSD Mod 2 -18CS61_Notes.pdf
SSD Mod 2 -18CS61_Notes.pdfJacobDragonette
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdfAkarTaher
 
Language translators
Language translatorsLanguage translators
Language translatorsAditya Sharat
 
design intoduction of_COMPILER_DESIGN.pdf
design intoduction of_COMPILER_DESIGN.pdfdesign intoduction of_COMPILER_DESIGN.pdf
design intoduction of_COMPILER_DESIGN.pdfadvRajatSharma
 
Chapter 2 Program language translation.pptx
Chapter 2 Program language translation.pptxChapter 2 Program language translation.pptx
Chapter 2 Program language translation.pptxdawod yimer
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysisSudhaa Ravi
 
Compiler an overview
Compiler  an overviewCompiler  an overview
Compiler an overviewamudha arul
 
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docxvenkatapranaykumarGa
 
Concept of compiler in details
Concept of compiler in detailsConcept of compiler in details
Concept of compiler in detailskazi_aihtesham
 
Phases of Compiler.pptx
Phases of Compiler.pptxPhases of Compiler.pptx
Phases of Compiler.pptxssuser3b4934
 
Code generation errors and recovery
Code generation errors and recoveryCode generation errors and recovery
Code generation errors and recoveryMomina Idrees
 

Similaire à 3.2 (20)

Compiler Design Material
Compiler Design MaterialCompiler Design Material
Compiler Design Material
 
How a Compiler Works ?
How a Compiler Works ?How a Compiler Works ?
How a Compiler Works ?
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
 
Assignment1
Assignment1Assignment1
Assignment1
 
SSD Mod 2 -18CS61_Notes.pdf
SSD Mod 2 -18CS61_Notes.pdfSSD Mod 2 -18CS61_Notes.pdf
SSD Mod 2 -18CS61_Notes.pdf
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
 
Language translators
Language translatorsLanguage translators
Language translators
 
Chapter#01 cc
Chapter#01 ccChapter#01 cc
Chapter#01 cc
 
design intoduction of_COMPILER_DESIGN.pdf
design intoduction of_COMPILER_DESIGN.pdfdesign intoduction of_COMPILER_DESIGN.pdf
design intoduction of_COMPILER_DESIGN.pdf
 
Chapter 2 Program language translation.pptx
Chapter 2 Program language translation.pptxChapter 2 Program language translation.pptx
Chapter 2 Program language translation.pptx
 
Compiler
Compiler Compiler
Compiler
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysis
 
Compiler an overview
Compiler  an overviewCompiler  an overview
Compiler an overview
 
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
2-Design Issues, Patterns, Lexemes, Tokens-28-04-2023.docx
 
Concept of compiler in details
Concept of compiler in detailsConcept of compiler in details
Concept of compiler in details
 
Phases of Compiler.pptx
Phases of Compiler.pptxPhases of Compiler.pptx
Phases of Compiler.pptx
 
Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
 
Phases of Compiler
Phases of CompilerPhases of Compiler
Phases of Compiler
 
1._Introduction_.pptx
1._Introduction_.pptx1._Introduction_.pptx
1._Introduction_.pptx
 
Code generation errors and recovery
Code generation errors and recoveryCode generation errors and recovery
Code generation errors and recovery
 

Plus de Samimvez

Sql installation tutorial
Sql installation tutorialSql installation tutorial
Sql installation tutorialSamimvez
 
Coms1010 exam paper - nov10
Coms1010   exam paper - nov10Coms1010   exam paper - nov10
Coms1010 exam paper - nov10Samimvez
 
Coms1010 exam paper - may 08
Coms1010   exam paper - may 08Coms1010   exam paper - may 08
Coms1010 exam paper - may 08Samimvez
 
Labsheet 3
Labsheet 3Labsheet 3
Labsheet 3Samimvez
 
Labsheet 3,5
Labsheet 3,5Labsheet 3,5
Labsheet 3,5Samimvez
 
June 02 MS2
June 02 MS2June 02 MS2
June 02 MS2Samimvez
 
June 05 MS2
June 05 MS2June 05 MS2
June 05 MS2Samimvez
 
June 09 P1
June 09 P1June 09 P1
June 09 P1Samimvez
 
Nov 08 MS1
Nov 08 MS1Nov 08 MS1
Nov 08 MS1Samimvez
 
June 07 MS3
June 07 MS3June 07 MS3
June 07 MS3Samimvez
 
June 03 P2
June 03 P2June 03 P2
June 03 P2Samimvez
 

Plus de Samimvez (20)

Sql installation tutorial
Sql installation tutorialSql installation tutorial
Sql installation tutorial
 
Example3
Example3Example3
Example3
 
Coms1010 exam paper - nov10
Coms1010   exam paper - nov10Coms1010   exam paper - nov10
Coms1010 exam paper - nov10
 
Coms1010 exam paper - may 08
Coms1010   exam paper - may 08Coms1010   exam paper - may 08
Coms1010 exam paper - may 08
 
Example2
Example2Example2
Example2
 
Labsheet 3
Labsheet 3Labsheet 3
Labsheet 3
 
Labsheet 3,5
Labsheet 3,5Labsheet 3,5
Labsheet 3,5
 
EQ V3x
EQ V3xEQ V3x
EQ V3x
 
3.6
3.63.6
3.6
 
3.3
3.33.3
3.3
 
3.5
3.53.5
3.5
 
June 02 MS2
June 02 MS2June 02 MS2
June 02 MS2
 
June 05 MS2
June 05 MS2June 05 MS2
June 05 MS2
 
Nov 03 MS
Nov 03 MSNov 03 MS
Nov 03 MS
 
June 09 P1
June 09 P1June 09 P1
June 09 P1
 
Nov 08 MS1
Nov 08 MS1Nov 08 MS1
Nov 08 MS1
 
June 07 MS3
June 07 MS3June 07 MS3
June 07 MS3
 
June 03 P2
June 03 P2June 03 P2
June 03 P2
 
Nov 05 P2
Nov 05 P2Nov 05 P2
Nov 05 P2
 
Nov 05 P1
Nov 05 P1Nov 05 P1
Nov 05 P1
 

3.2

  • 1. Chapter 3.2 The Functions and Purposes of Translators 3.2 (a) Interpreters and Compilers When electronic computers were first used, the programs had to be written in machine code. This code was comprised of simple instructions each of which was represented by a binary pattern in the computer. To produce these programs, programmers had to write the instructions in binary. This not only took a long time, it was also prone to errors. To improve program writing assembly languages were developed. Assembly languages allowed the use of mnemonics and names for locations in memory. Each assembly instruction mapped to a single machine instruction which meant that it was fairly easy to translate a program written in assembly language to machine code. To speed up this translation process, assemblers were written which could be loaded into the computer and then the computer could translate the assembly language to machine code. Writing programs in assembly language, although easier than using machine code, was still tedious and took a long time. After assembly languages, came high-level languages which used the type of language used by the person writing the program. Thus FORTRAN (FORmula TRANslation) was developed for science and engineering programs and it used formulae in the same way as would scientists and engineers. COBOL (Common Business Oriented Language) was developed for business applications. Programs written in these languages needed to be translated into machine code. This led to the birth of compilers. A compiler takes a program written in a high-level language and translates into an equivalent program in machine code. Once this is done, the machine code version can be loaded into the machine and run without any further help as it is complete in itself. The high-level language version of the program is usually called the source code and the resulting machine code program is called the object code. The relationship between them is shown in Fig. 3.2.a.1. Source Code Object Code (High-Level COMPILER (Machine Language) Language) Fig. 3.2.a.1 The problem with using a compiler is that it uses a lot of computer resources. It has to be loaded in the computer's memory at the same time as the source code and there has to be sufficient memory to hold the object code. Further, there has to be sufficient memory for working storage while the translation is taking place. Another disadvantage is that when an error in a program occurs it is difficult to pin-point its source in the original program. An alternative system is to use interpretation. In this system each instruction is taken in turn and translated to machine code. The instruction is then executed before the next instruction is translated. This system was developed because early personal 4.2 - 1
  • 2. computers lacked the power and memory needed for compilation. This method also has the advantage of producing error messages as soon as an error is encountered. This means that the instruction causing the problem can be easily identified. Against interpretation is the fact that execution of a program is slow compared to that of a compiled program. This is because the original program has to be translated every time it is executed. Also, instructions inside a loop will have to be translated each time the loop is entered. However, interpretation is very useful during program development as errors can be found and corrected as soon as they are encountered. In fact many languages, such as Visual Basic, use both an interpreter and a compiler. This enables the programmer to use the interpreter during program development and, when the program is fully working, it can be translated by the compiler into machine code. This machine code version can then be distributed to users who do not have access to the original code. Whether a compiler or interpreter is used, the translation from a high-level language to machine code has to go through various stages and these are shown in Fig. 3.2.a.2. Source Program Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Language Code Generation Code Optimisation Object Program 4.2 - 2
  • 3. 3.2 (b) Lexical Analysis The lexical analyser uses the source program as input and creates a stream of tokens for the syntax analyser. Tokens are normally 16-bit unsigned integers. Each group of characters is replaced by a token. Single characters, which have a meaning in their own right, are replaced by their ASCII values. Multiple character tokens are represented by integers greater than 255 because the ones up to 255 are reserved for the ASCII codes. Variable names will need to have extra information stored about them. This is done by means of a symbol table. This table is used throughout compilation to build up information about names used in the program. During lexical analysis only the variable's name will be noted. It is during syntax and semantic analysis that details such as the variable's type and scope are added. The lexical analyser may output some error messages and diagnostics. For example, it will report errors such as an identifier or variable name which breaks the rules of the language. At various stages during compilation it will be necessary to look up details about the names in the symbol table. This must be done efficiently so a linear search is not sufficiently fast. In fact, it is usual to use a hash table and to calculate the position of a name by hashing the name itself. When two names are hashed to the same address, a linked list can be used to avoid the symbol table filling up. The lexical analyser also removes redundant characters such as white space (these are spaces, tabs, etc., which we may find useful to make the code more readable, but the computer does not want) and comments. Often the lexical analysis takes longer than the other stages of compilation. This is because it has to handle the original source code, which can have many formats. For example, the following two pieces of code are equivalent although their format is considerably different. IF X = Y THEN 'square X IF X = Y THEN Z := X * X Z := X * X ELSE Z := Y * Y ELSE 'square Y PRINT Z Z := Y *Y ENDIF PRINT Z When the lexical analyser has completed its task, the code will be in a standard format which means that the syntax analyser (which is the next stage, and we will be looking at in 3.2.c) can always expect the format of its input to be the same. 4.2 - 3
  • 4. 3.2 (c) Syntax Analysis This Section should be read in conjunction with Section 3.5.j which discusses Backus- Naur Form (BNF) and syntax diagrams. During this stage of compilation the code generated by the lexical analyser is parsed (broken into small units) to check that it is grammatically correct. All languages have rules of grammar and computer languages are no exception. The grammar of programming languages is defined by means of BNF notation or syntax diagrams. It is against these rules that the code has to be checked. For example, taking a very elementary language, an assignment statement may be defined to be of the form <variable> <assignment_operator> <expression> and expression is <variable> <arithmetic_operator> <variable> and the parser must take the output from the lexical analyser and check that it is of this form. If the statement is sum := sum + number the parser will receive <variable> <assignment_operator> <variable> <arithmetic_operator> <variable> which becomes <variable> <assignment_operator> <expression> and then <assignment statement> which is valid. If the original statement is sum := sum + + number this will be input as <variable> <assignment_operator> <variable> <arithmetic_operator> <arithmetic_operator> <variable> 4.2 - 4
  • 5. and this does not represent a valid statement hence an error message will be returned. It is at this stage that invalid names can be found such as PRIT instead of PRINT as PRIT will be read as a variable name instead of a reserved word. This will mean that the statement containing PRIT will not parse to a valid statement. Note that in languages that require variables to be declared before being used, the lexical analyser may pick up this error because PRIT has not been declared as a variable and so is not in the symbol table. Most compilers will report errors found during syntax analysis as soon as they are found and will attempt to show where the error has occurred. However, they may not be very accurate in their conclusions nor may the error message be very clear. During syntax analysis certain semantic checks are carried out. These include label checks, flow of control checks and declaration checks. Some languages allow GOTO statements (not recommended by the authors) which allow control to be passed, unconditionally, to another statement which has a label. The GOTO statement specifies the label to which the control must pass. The compiler must check that such a label exists. Certain control constructs can only be placed in certain parts of a program. For example in C (and C++) the CONTINUE statement can only be placed inside a loop and the BREAK statement can only be placed inside a loop or SWITCH statement. The compiler must ensure that statements like these are used in the correct place. Many languages insist on the programmer declaring variables and their types. It is at this stage that the compiler verifies that all variables have been properly declared and that they are used correctly. 4.2 - 5
  • 6. 3.2 (d) Code Generation It is at this stage, when all the errors due to incorrect use of the language have been removed, that the program is translated into code suitable for use by the computer's processor. During lexical and syntax analysis a table of variables has been built up which includes details of the variable name, its type and the block in which it is valid. The address of the variable is now calculated and stored in the symbol table. This is done as soon as the variable is encountered during code generation. Before the final code can be generated, an intermediate code is produced. This intermediate code can then be interpreted or translated into machine code. In the latter case, the code can be saved and distributed to computer systems as an executable program 3.2 (e) Linkers and Loaders Programs are usually built up in modules. These modules are then compiled into machine code that has to be loaded into the computer's memory. This process is done by the loader. The loader decides where in memory to place the code and then adjusts memory addresses as described in Chapter 3.1. As the whole program may consist of many modules, all of which have been separately compiled, the modules will have to be correctly linked once they have been loaded. This is the job of the linker. The linker calculates the addresses of the separate pieces that make up the whole program and then links these pieces so that all the modules can interact with one another. 4.2 - 6