Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

CC week 1.pptx

Prochain SlideShare
Compiler Design
Compiler Design
Chargement dans…3

Consultez-les par la suite

1 sur 55 Publicité

Plus De Contenu Connexe

Similaire à CC week 1.pptx (20)

Plus récents (20)


CC week 1.pptx

  1. 1. Compiler Construction Week-1 Lecture-1 Semester-# Fall 2022 1
  2. 2. Instructor Contact Details Lahore Garrison University ▶ Name: Tayyaba Sultana ▶ Course: CSC- CompilerConstruction ▶ Credit Hours:3 ▶ Designation :Lecturer ▶ Office Location:CS-Staff Room 3rd Floor ▶ Email:Tayyabasultana@lgu.edu.com ▶ Visiting Hours:Monday - Friday(9:00am –11:00am) 2
  3. 3. Objectives Lahore Garrison University ▶ Inthis course we will study compilers that translate a program written in high-level source language to a lower-level assembly code. We will study the theories and algorithms that can be applied to any language and the course. ▶ Course isdivided into 3 sections ▶ Front End (Deals with syntactic and semantic analysis of the source language) ▶ Middle End (Deals with the intermediate representation its analysis and optimizations into which the source code is translated ) ▶ Back End or Code generator (Which generates the machine code) 3
  4. 4. Today’s Lecture Lahore Garrison University 4 ▶ In this lesson we will try to understand overall working of the compiler in terms of theirvarious phases and interaction. ▶ In particularwe will understand ▶ What is compiler ▶ T wo main partsof compiler ▶ How it works ▶ Programs that helps compiler ▶ Phasesof compiler Through a simple language and its processing based on the grammar and its lexical specification.
  5. 5. What isa program? Lahore Garrison University 5
  6. 6. What isa Compiler? Lahore Garrison University 6
  7. 7. Compiler is… 7 Lahore Garrison University
  8. 8. ▶ A compiler is a program that reads a program written in any high level language and translate it into one equivalent language. Lahore Garrison University ▶ Input language:Source Program ▶ Output language:T arget Program ▶ Compiler also reports errors present in the source program as it is the part of its translation 8
  9. 9. Types of compiler Lahore Garrison University ▶ Single passcompiler ▶ Multi passcompiler 9
  10. 10. Lahore Garrison University Single pass compiler ▶It compiles the whole process in only one pass. Multi pass compiler ▶ It compiles the process source code of a program multiple time 10
  11. 11. 2 main parts of compiler Lahore Garrison University Analysis ▶ It takes the input source program and breaks it into parts then create an intermediate code representation of source program Synthesis ▶ It takes the intermediate code as input and creates target program 10
  12. 12. Programs that help the compiler Lahore Garrison University ▶ Preprocessors (accepts the source code as input and is responsible for ▶ Macro expansion (limit)(single line of code) ▶ File inclusion (#include<stdio.h> ▶ Assembler ▶ Ittakes the assembly code that has generated by the compiler and convert into machine code ▶ Linker Loader ▶ Linker includes libraries, linked file and creates executable file ▶ Loader loads the executable program into the memory 11
  13. 13. 12 Source program Preprocessor Modified source program Compiler T arget assembly program Linker Loader Machine Code (Machine dependent code) Lahore Garrison University
  14. 14. Phases of Compiler Lahore Garrison University 13 Steam of Char Token Parse Tree STD Syntax directed translation grammer rules apply action 3 address code Source Program Lexical Analysis(Scanner) Syntax Analysis(parser) Semantic Analysis( logicaltypeoferror) Intermediate code generator Code Optimization(no. of lines/ loop) T arget Code Generator T arget Program Error Handler Symbol Table Manager
  15. 15. Lexical Analysis Lahore Garrison University ▶ Lexical generates tokens ▶ Scan program character by character ▶ It determines  operators, identifier, keywords ▶ Group all together and creates tokens ▶ Also removes the spaces (spaces are use for better readability) ▶ x=a +b *20 14
  16. 16. Syntax Analysis Lahore Garrison University ▶ Generates parse tree ▶ Checks the structure of program that it follows rules or not ▶ Checks syntax by using grammar ▶ x=a +b *20 15
  17. 17. Semantic Analysis Lahore Garrison University ▶ Verify parse tree semantically 16
  18. 18. Intermediate Code Generator Lahore Garrison University ▶ Generates three address code ▶ Address can be a memory location, can be a register 17
  19. 19. Symbol table Lahore Garrison University ▶ Symbol table is a data structure that stores various identifiers/tokens and their attributes. ▶ e.g. int a; ▶ a isvariable of integer data type. 18
  20. 20. 19 Lahore Garrison University
  21. 21. Lexical Analysis Lahore Garrison University ▶ Lexical Analysis converts source program into a stream of valid words of the language, known astokens ▶ Also known as scanning ▶ There are two parts of lexical analysis ▶ Scanning  read program character by character ▶ Lexical analysisgenerates T okens 20
  22. 22. Basic functions of Lexical Analysis Lahore Garrison University 1. Reads the input program character by character and produces a stream of tokens which isused by parser 2. Removescomments from source program 3. Removes whitespaces characters (blank spaces, tabs, new line) 4. Handling of preprocessor directive 1. #include 2. #define 5. Display errors (if present) n the source program along with line numbers 21
  23. 23. 22 Lahore Garrison University Lexical Analyzer Symbol Table Parser
  24. 24. ▶ T oken is valid word ▶ It may be a keyword, operator, identifier or any punctuation character Lahore Garrison University 23
  25. 25. Token, Pattern, Lexeme Lahore Garrison University ▶ Tokens are terminal symbol of source language (identifier, operators, punctuation symbol) ▶ Pattern is a rule particular token in a source language ▶ Id: starts with an alphabet or underscore followed by any alphanumeric character ▶ Lexemes are match against the pattern a specific instance of a token 24
  26. 26. Example Lahore Garrison University ▶ count =count +temp; ▶ In this count and temp are identifier and =and +are operators and ;is punctuation ▶ 31 +28 –59 ▶ In this number [0-9]+31, 28 and 59 are numbers and +and – are operators 25
  27. 27. Lexical Errors Lahore Garrison University ▶ A lexical analyzer may not proceed if no rules/pattern matches the prefix of remaining input. 26
  28. 28. Attribute for a token Lahore Garrison University ▶ When a lexeme is encountered in a program, it is necessary to keep a track of another occurrence of the same lexeme. (i.e. if the lexeme has been seen before rnot) ▶ T rack of operators are not necessary ▶ T rack of identifiersare necessary ▶ We use symbol table to store lexeme in symbol table ▶ *the pointer of thissymbol table entry becomes an attribute of that particulartoken ▶ E =m *c ^2 <e, pointer> pointerindicatesthe entry of lexeme in the symbol table 27
  29. 29. The process followed by lexical Lahore Garrison University ▶ T ake source program asinput ▶ scan it character by character ▶ group these characters into lexemes ▶ Passthe tokens and attributes to the parser 28
  30. 30. Look ahead Lahore Garrison University ▶ a > =b ▶ Read character next of lexeme is compulsory to verify correct lexeme ▶ And push back extra character to the program that hasbeen read 29
  31. 31. Input Buffer Lahore Garrison University ▶ The lexical analyzer scans the input from left to right one character at a time. It uses two pointers begin ptr(bp) and forward to keep track of the pointerof the input scanned. 30
  32. 32. 31 Lahore Garrison University
  33. 33. 32 Lahore Garrison University
  34. 34. ▶ The forward ptr moves ahead to search for end of lexeme. As soon as the blank space is encountered, it indicates end of lexeme. In above example as soon as ptr (fp) encounters a blank space the lexeme “int” is identified. Lahore Garrison University ▶ The fp will be moved ahead at white space, when fp encounters white space, it ignore and moves ahead. then both the begin ptr(bp) and forward ptr(fp) are set at next token. 33
  35. 35. One Buffer Scheme 34 Lahore Garrison University
  36. 36. ▶ One Buffer Scheme: Inthis scheme, only one buffer is used to store the input string but the problem with this scheme is that if lexeme is very long then it crosses the buffer boundary, to scan rest of the lexeme the buffer has to be refilled, that makes overwriting the first of lexeme. Lahore Garrison University 35
  37. 37. Two Buffer Scheme 36 Lahore Garrison University
  38. 38. ▶ Initially both the bp and fp are pointing to the first character of first buffer. Then the fp moves towards right in search of end of lexeme. as soon as blank character isrecognized, the string between bp and fp isidentified as corresponding token. to identify, the boundary of first buffer end of buffer character should be placed at the end first buffer. Lahore Garrison University ▶ Similarly end of second buffer isalso recognized by the end of buffer mark present at the end of second buffer. when fp encounters first eof, then one can recognize end of first buffer and hence filling up second buffer is started. in the same way when second eof isobtained then it indicates of second buffer. alternatively both the buffers can be filled up until end of the input program and stream of tokens isidentified. Thiseof character introduced at the end is calling Sentinel which is used to identify the end of buffer. 37
  39. 39. Symbol Table Lahore Garrison University ▶ It is a memory that construct in our language. Symbol table is implemented by # table. 38
  40. 40. Symbol table Lahore Garrison University ▶ An essential function of a compiler is to record the variable names used in the source program and collect information about the various attributes of each name. ▶ A symbol table is a data structure containing all the identifies (name of variables, procedures etc.) of a source program together with all the attributes of each identifier . ▶ The symbol table stores information about the entire source program, is used by all phases of the compiler . 39
  41. 41. ▶ The analysis part also collects information about the source program and stores it in a data structure called a symbol table, which passed along with the intermediate representation to the synthesis part. Lahore Garrison University ▶ Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variables names, function names, objects, classes, interface etc. 40
  42. 42. Purpose of Symbol Table Lahore Garrison University ▶ To store the names of all entities in a structured form at one place. ▶ Provide quick and uniform access to identifier attributes throughout the compilerprocess ▶ T o verify if a variable has been declared. ▶ To implement type checking, by verifying assignments and expressions in the source code are semantically correct. ▶ T o determine the scope of a name (scope resolution) 41
  43. 43. Interaction between the symbol table and the phases of a compiler Lahore Garrison University ▶ Virtually every phase of the compiler will use the symbol table. ▶ Initialization phase will place keywords, operators, and standard identifiers in it. ▶ Scanner will place user-defined identifiers and literals in it and will return the corresponding token. ▶ The parser uses these token to create the parse tree, the product of the syntactic analysisof the program. ▶ The semantic action routines place data type in its entries and uses this information in performing basic type checking. ▶ The intermediate code generation phase use pointers to entries in symbol table in creating the intermediate representation of the program. ▶ The object code generation phase uses pointers to entries in the symbol table in allocating storage of its variables and constants, as well as to store the addresses of its procedures and functions. 42
  44. 44. Symbol Table Lahore Garrison University ▶ Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, objects, classes, interfaces, etc. 43
  45. 45. A symbol table may serve the following purposes Lahore Garrison University ▶ Symbol table is used in Analysis and Synthesis ▶ Information store in symbol table by analysisphases ▶ Scanner enter an identifier into a symbol table ▶ Parser/ semantic enter corresponding attributes ▶ It helps to determine scope of identifier 44
  46. 46. Continue … Lahore Garrison University ▶ It helps whether a variable isdefined already or not ▶ T o add a new name to the table ▶ T o delete a name from the table ▶ T o access information with a given name ▶ T ype checking for semantic correctness ▶ If procedures/functions, it also store number of arguments and types of arguments 45
  47. 47. Why we need Symbol table Lahore Garrison University ▶ T ype checking ▶ verify declaration if variable (a variable must be declared before its use) ▶ example ▶ int b; ▶ b=2+3; ▶ Sum=b+2; 46
  48. 48. Operation of Symbol Table Lahore Garrison University ▶ Insert(name,type) ▶ int b; ▶ Insert(b , int) ▶ Lookup(name) ▶ It checks the name in symbol table ▶ Iffound it will return the attribute of identifier ▶ If not found it will return 0 ▶ Delete operation ▶ Modify operation 47
  49. 49. Data Structures used for Symbol table Lahore Garrison University ▶ Linear structures (sorted / unsorted) ▶ Binary trees ▶ Hash table ▶ *linear approach is simplest as it stores in order of arrival of variable 48
  50. 50. Symbol table representation Lahore Garrison University Fixed Length ▶ int calculate; Variable Length c a l c u l a t e int s u m int a Int b int 49 ▶ int sum; ▶ int a,b; c a l c u l a t e $ s u m $ a $ b $ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Starting Index Length Type 0 10 int 10 4 int 14 2 int 16 2 int
  51. 51. Example Lahore Garrison University ▶ int var1; ▶ int procA() ▶ { ▶ int var2,var3; ▶ { ▶ int var4,var5; ▶ } ▶ int var6; ▶ { ▶ int var7,var8; ▶ } ▶ } ▶ int procB() ▶ { ▶ int var9,var10; ▶ { ▶ int var11,var12; ▶ } ▶ int var13; ▶ } 50
  52. 52. Lahore Garrison University 51 Symbol table (Global) var1 var int procA Procedure int procB procedure int Symbol table procA Var2 var int Var3 var int var6 var int Symbol table procB Var9 var int Var10 var int Var13 var int
  53. 53. 52 Symbol table inner scope 1 Var4 var int Var5 var int Symbol table inner scope 2 Var7 var int Var8 var int Symbol table inner scope 3 Var11 var int Var12 var int Lahore Garrison University
  54. 54. Q & A Lahore Garrison University 53
  55. 55. References Lahore Garrison University ▶ 1.Compilers: Principles, Techniques, and Tools, A. V. Aho, R. Sethi and J. D. Ullman, Addison-Wesley, 2nded., 2006 ▶ 2.Modern Compiler Design, D. Grune, H. E. Bal, C. J. H. Jacobs, K. G. Langendoen, John Wiley, 2003. ▶ 3.Modern Compiler Implementation in C, A. W. Appel, M. Ginsburg, Cambridge University Press, 2004 54