Unit iv-syntax-directed-translation

APOLLO ENGINEERING COLLEGE CHAPTER – 4 SDT &RTE
CS6660- COMPILER DESIGN 4.1 M.MANORANJITHAM / AP/IT
UNIT IV
SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT
Syllabus:
Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S-
Attribute Definitions- Design of predictive translator - Type Systems-Specification of a
simple type checker-Equivalence of Type Expressions-Type Conversions. RUN-TIME
ENVIRONMENT: Source Language Issues-Storage Organization-Storage Allocation-
Parameter Passing-Symbol Tables-Dynamic Storage Allocation-Storage Allocation in
FORTAN.
4.1 - Introduction
Intermediate codes are machine independent codes, but they are close to machine
instructions. The given program in a source language is converted to an equivalent program
in an intermediate language by the intermediate code generator.
Advantages of using a machine-independent intermediate form are as follows:
1. Retargeting is facilitated; a computer for a different machine can be created by
attaching a back end for the new machine to an existing front end.
2. A machine-independent code optimizer can be applied to the intermediate
representation.
For intermediate code generation, there is a notational framework that is an extension
of CFG. This frame work is called syntax-directed translation scheme. It allows sub-
routing or a semantic action to be attached to the production generates intermediate code
when called at appropriate times by a parser for that grammer.
There are two notations for associating semantic rules with productions, they are
(i) Syntax-directed definition: Syntax-directed definitions are high-level
specifications for translations. They hide many implementation details and the
user need not specify the order in which translation takes place.
(ii) Translation schemes: Translation schemes indicate the order in which semantic
rules are to be evaluated, so they allow some implementation details to be shown.

In both Syntax-directed definitions and translation schemes, we parse the input token
stream, build the parse tree, and then traverse the tree as needed to evaluate the semantic rules
at the parse-tree nodes.
Evaluation of the semantic rules may generate code, save information in a symbol
table, issue error messages, or perform any other activities. The translation of the token
stream is the result obtained by evaluating the semantic rules.
4.2 – Syntax directed Definitions
A syntax-directed definition (SDD) is a context-free grammar together with
attributes and rules. Attributes are associated with grammar symbols and rules are associated
with productions.
Node : A node for the grammer symbol in a parse tree as a record with fields for
holding information, then an attribute corresponds to the name of a field.
An attribute can be represents anything we choose: a string, a type, a memory
location or whatever. The value of an attribute at a parse-tree node is defined by a semantic
rule associated with the production used at that node.
There are two kinds of attributes for nonterminals: synthesized attribute and
inherited attribute.
Semantic rules set up dependencies between attributes that will be represented by a
graph. From the dependency graph, we derive an evaluation order for the semantic rules.
Evaluation of the semantic rules defines the values of the attributes at the nodes in the
parse tree for the input string.
A semantic rule may also have side effects, eg., printing a value or updating a global
variable.

A parse tree showing the values of attributes at each node is called an annotated parse
tree. The process of computing the attribute values at the nodes is called annotating or
decorating the parse tree.
Example: The SDD in Fig. 5.1 is based on our familiar grammar for arithmetic
expressions with operators + and *. It evaluates expressions terminated by an endmarker n. In
the SDD, each of the nonterminals has a single synthesized attribute, called val We also
suppose that the terminal digit has a synthesized attribute lexval, which is an integer value
returned by the lexical analyzer
The rule for production 1, L →E n, the starting nonterminal L is just a procedure tha
prints as output the value of the arithmetic generated by E. Production 2, E → E1 + T, also
has one rule, which computes the val attribute for the head E as the sum of the values at E1
and T. At any parse tree node N labeled E, the value of val for E is the sum of the values of
val at the children of node N labeled E and T. Production 3, E → T, has a single rule that
defines the value of val for E to be the same as the value of val at the child for T. Production
4 is similar to the second production; its rule multiplies the values at the children instead of
adding them. The rules for productions 5 and 6 copy values at a child, like that for the third
production. Production 7 gives F.val the value of a digit, that is, the numerical value of the
token digit that the lexical analyzer returned
1. A synthesized attribute for a nonterminal A at a parse-tree node N is defined by a
semantic rule associated with the production at N.
Note that the production must have A as its head. A synthesized attribute at node N is
defined only in terms of attribute values at the children of N and at N itself.
A syntax-directed definition that uses synthesized attributes exclusively is said to be
an S-attributed definition. A parse tree for an S-attributed definition can always be

annoted by evaluating the semantic rules for the attributes at each node bottom-up from
the leaves to the root.
For example, the given expression, 3*5+4 followed by a newline, the program prints
the value 19.
2. An inherited attribute for a nonterminal B at a parse-tree node N is defined by a
semantic rule associated with the production at the parent of N.
Note that the production must have B as a symbol in its body. An inherited attribute at
node N is defined only in terms of attribute values at N's parent, N itself, and N's
siblings.
For Example, A declaration generated by the non-terminal D in the syntax-directed
definition given below:
The given grammer consists of the keyword int or real, following by a list of
identifiers. The nonterminal T has a synthesized attribute type, whose value is determined by
the keyword in the declaration. The semantic rule L.in:=T.type, associated with production

D→TL, sets inherited attribute L.in to the type in the declaration. The rules then pass this
type down the parse tree using the inherited attribute L.in. Rules associated with the
productions for L call procedure addtype to add the type of each identifier to its entry in the
symbol table.
Dependency Graphs:
If an attribute b at a node in a parse tree dependence on an attribute c, then the
semantic rule for b at that node must be evaluated after the semantic rule that defines c. The
interdependencies among the inherited and synthesized attributes at the nodes in a parse tree
can be depicted by a directed graph called a dependency graph.
Before constructing a dependency graph for a parse tree, we put each semantic rule
into the form b:=f(c1, c2,...,ck), by introducing a dummy synthesized attribute b for each
semantic rule that consists of a procedure call. The graph has a node for each attribute and an
edge to the node for b from the node for c if attribute b depends on attribute c. In more detail,
the dependency graph for a given parse tree is constructed as follows.
for each node n in the parse tree do
for each attribute a of the grammar symbol at n do
construct a node in the dependency graph for a;
for each node n in the parse tree do
for each semantic rule b:=f(c1, c2,...,ck)
associated with the production used at n do
for i:=1 to k do
construct an edge from the node for ci to the node for b;

Example: Consider the following production and rule:
PRODUCTION SEMANTIC RULE
E → E1 + E2 E.val := E1.val + E2.val
We add the edges as shown below
The three nodes of the dependency graph marked by ● represent the synthesized
attributes E.val, E1.val and E2.val and the edge to E.val from E2.val shows that E.val also
depends on E2.val. The dotted lines represent the parse tree and are not part of the
dependency graph.
Evaluation Order:
A topological sort of a directed acyclic graph is any ordering m1,m2,...,mk of the
nodes of the graph such that edges go from nodes earlier in the ordering to later nodes; that is,
if mi → mj is an edge from mi to mj, then mi appears before mj in the ordering.
Any topological sort of a dependency graph gives a valid order in which the semantic
rules associated with the nodes in a parse tree can be evaluated. that is, in the topological sort,
the dependent attributes c1,c2,....,ck in a semantic rule b:=f(c1, c2,...,ck) are available at a node
before f is evaluated.

Example: Each of the edges in the dependency graph in fig 5.7 goes from a lower-
numbered node to a higher-numbered node. Hence, a topological sort of the dependency
graph is obtained by writing down the node in the order of their number. From this
topological sort. We obtain the following program. We write an for the attribute associated
with the node numbered n in the dependency graph.
a4 := real;
a5 := a4;
addtype(id3.entry, a5);
a7 := a5;
a9 := a7;
Evaluating these semantic rules stores the type real in the symbol-table entry for each
identifier.
Several methods have been proposed for evaluating semantic rules:
1. Parse-Tree Methods: At compiler time, these methods obtain an evaluation order
from a topological sort of the dependency graph constructed from the parse tree for
each input. these methods will fail to find an evaluation order only if the dependency
graph for the particular parse tree under consideration has a cycle.
2. Rule-Based Method: At compiler-construction time, the semantic rules associated
with productions are analyzed, either by hand, or by a specialized tool. For each
production, the order in which the attributes associated with that production are
evaluated is predetermined at compiler-construction time.
3. Oblivious Methods: An evaluation order is chosen without considering the semantic
rules. For example, if translation takes place during parsing, then the order of
evaluation is forced by the parsing method, independent of the semantic rules. an
oblivious evaluation order restricts the class of syntax-directed definitions that can be
implemented.
Rule-based and oblivious methods need not explicitly construct the dependency graph
at compile time, so they can be more efficient in their use of compile time and space.

4.3 – Construction of Syntax Trees
The use of syntax trees as an intermediate representation allows translation to be
decoupled from parsing. Translation routines that are invoked during parsing must live with
two kinds of restriction.
1. A grammar that is suitable for parsing may not reflect the natural hierarchical
structure of the constructs in the language.
2. The parsing method constrains the order in which nodes in a parse tree are
considered. This order may not match the order in which information about a
construct become available.
Syntax Tree:
An (abstract) syntax tree is a condensed form of parse tree useful for representing
language constructs. In a syntax tree, operators and keywords do not appear as leaves, but
rather are associated with the interior node that would be the parent of those leaves in the
parse tree.
Constructing Syntax Trees for Expressions:
The construction of a syntax tree for an expression is similar to the translation of the
expression into postfix form. We construct subtree for the subexpressions by creating a node
for each operator and operand. The children of an operator node are the roots of the nodes
representing the subexpressions constituting the operands of that operator.
Each node in a syntax tree can be implemented as a record with several fields. In the
node for an operator, one field identifies the operator and the remaining fields contain
pointers to the nodes for the operands. The operator is often called the label of the node.
When used for translation, the nodes in a syntax tree may have additional fields to hold the
values of attributes attached to the node.
Each function returns a pointer to a newly created node.
1. mknode (op, left, right): Creates an operator node with label op and two fields
containing pointers to left and right.
2. mkleaf (id, entry): Create an identifier node with lable id and a field containing
entry, a pointer to the symbol-table entry for the identifiers.

3. mkleaf (num, val): Creates a number node with label num and a field containing
val, the value of the number.
Example: Consider the expression a – 4 + c. In this sequence, p1, p2, ..., p5 are pointers to
nodes and entry-a and entry-c are pointers to the symbol-table entries for identifiers a and c,
respectively,
(1) P1 = mkleaf(id, entry-a);
(2) P2 = mkleaf(num, 4);
(3) P3 = mknode('-',p1,p2);
(4) P4 = mkleaf'(id, entry-c);
(5) p5 = mknode('+',p3,p4);
A Syntax-Directed Definition for Constructing Syntax Trees:
Fig 5.9 contains an S-attributed definition for constructing a syntax tree for an
expression containing the operators + and -. It uses the underlying productions of the
grammar to schedule the call of the functions mknode and mkleaf to construct the tree. The
synthesized attribute nptr for E and T keeps track of the pointers returned by the function
calls.
An annotated parse tree depicting the construction of a syntax tree for expression
a – 4 + c is shown below

Directed Acyclic Graphs for Expressions:
A directed acyclic graph for an expression identifies the common subexpressions in
the expression. Like a syntax tree, a dag has a node for every subexpression of the
expression; a interior node represents an operator and its children represent its operands. The
difference is that a node in a dag representing a common subexpression has more than one
“parent” in a syntax tree, the common subexpression would be represented as a duplicated
subtree.
The leaf for a has two parents because a is common to the two subexpressions a and
a*(b-c).

4.4 – Bottom-Up Evaluation of S-Attributed Definitions
A translator for an arbitrary syntax-directed definition can be difficult to build.
However, there are large classes of useful syntax-directed definitions for which it is easy to
construct translators. The S-attribute definitions, that is, the syntax-directed definitions with
only synthesized attributes.
Synthesized attributes can be evaluated by a bottom-up parser as the input is being
parsed. The parser can keep the value of the new synthesized attributes are computed from
the attributes appearing on the stack for the grammar symbols on the right side of the
reducing production.
Synthesized Attributes On The Parser Stack:
A translator for an S-attributed definition can often be implemented with the help of
an LR-parser generator.
A bottom-up parser uses a stack to hold subtree that have been parsed. We can use
extra fields in the parser stack to hold the values of synthesized attributes.
Let us suppose, as in the figure, that the stack is implemented by a pair of arrays state
and val. Each state entry is a pointer to an LR(1) parsing table. It is convenient, however, to
refer to the state by the unique grammar symbol that it covers when placed on the parsing
stack. If the ith state symbol is A, then val[i] will hold the value of the attribute associated
with the parse tree node corresponding to this A.
The current top of the stack is indicated by the pointer top. We assume that
synthasized attributes are evaluated just before each reduction. Suppose the semantic rule
A.a:=f(X.x, Y.y, Z.z) is associated with the production A→XYZ. Before XYZ is reduced to
A, the value of the attribute Z.z is in val[top], that of Y.y in val[top-1] and that of X.x in

val[top-2]. If the symbol has no attribute, then the corresponding entry in the val array is
undefined. After the reduction, top is decremented by 2,the state covering A is put in
state[top] and the value of the synthesized attribute A.a is put in val[top].
Example:

4.5 – L-Attributed Definitions
When thranslation takes place during parsing, the order of evaluation of attributes is
linked to the order in which nodes of a parse tree are “created” by the parsing method. A
natural order that characterizes many top-down and bottom-up translation methods is the one
obtained by applying the procedure dfvisit to the root pf a parser tree. We call this evaluation
order the depth-first order. Even if the parse tree is not actually constructed, it is useful to
study translation during parsing by considering depth-first evaluation of attributes at the
nodes of a parse tree.
We now introduce a class of syntax-directed definitions, called L-attributed
definitions, whose attributes can always be evaluated in depth-first order. (The L is for
“Left”), because attribute information appears to flow from left to right.
L-Attributed Definition:
A Syntax-directed definition is L-attributed if each inherited attribute of Xj, 1 ≤ j ≤ n,
on the right side of A→X1X2....Xn, depends only on
1. The attributes of the symbols X1, X2,.....Xj-1 to the left of Xj in the production and
2. The inherited attributes of A.
Note that every S-attributed definition is L-attributed, because the restrictions (1) and
(2) apply only to inherited attributes.

Translation Schemes:
A translation scheme is a context-free grammar in which attributes are associated with
the grammar symbols and semantic actions enclosed between braces { } are inserted within
the right side of productions.

4.6 – Top-Down Translation
In this session, L-attributed definitions will be implemented during predictive parsing.
We work with translation schemes rather than syntax-directed definitions so we can be
explicit about the order in which actions and attributes take place. We also extend the
algorithm for left-recursion to translation schemes with synthesized attributes.
Eliminating Left Recursion from a Translation Scheme:
Since most arithmetic operators associate to the left, it is natural to use left-recursive
grammars for expressions.

Design of a Predictive Translator:
the next algorithm generalizes the construction of predictive parsers to implement a
translation scheme based on a grammar suitable for top-down parsing.
Algorithm: Construction of a predictive syntax-directed translator
Input: A syntax-directed translation scheme with an underlying grammar suitable for
predictive parsing.
Output: Code for a syntax-directed translator.
Method: The technique is a modification of the predictive-parser construction.
1. For each nonterminal A, construct a function that has a formal parameter for each
inherited attribute of A and that returns the values of the synthesized attributes of A.
For simplicity, we assume that each nonterminal has just one synthesized attribute.
The function for A has a local variable for each attribute of each grammar symbol that
appears in a production for A.
2. The code for nonterminal A decides what production to use based on the current input
symbol.
3. The code associated with each production does the following. We consider the tokens,
nonterminals, and actions on the right side of the production from left to right.
(i) For token X with synthesized attribute x, save the value of x in the variable
declared for X.x. then generate a call to match token X and advance the input.
(ii) For nonterminal B, generate an assignment c:=B(b1, b2,..., bk) with a function
call on the right side, where b1, b2, .. bk are the variables for the inherited
attributes of B and c is the variable for the synthesized attribute of B.
(iii)For an action, copy the code into the parser, replacing each reference to an
attribute by the variable for that attribute.

4.7 – Type Checking
A compiler must check that the source program follows both the syntactic and
semantic conventions of the source language. This checking, called static checking, ensures
that certain kinds of programming errors will be detected and reported.
Examples of static checks include:
(i) Type checks. A compiler should report an error if an operator is applied to an
incompatible operand; for examp1e;if an array variable and a function variable are
added together.
(ii) Flow-of-control checks. Statements that cause flow of control to leave a construct
must have some place to which to transfer the flow of control. For example, a
break statement in C causes control to leave the smallest enclosing while, for, or
switch statement; an error occurs if such an enclosing statement does not exist.
(iii) Uniqueness checks. There are situations in which an object must be defined
exactly once. For example, label in case statement
(iv) Name-related checks. Sometimes, the same name must appear two or more
times. For example, in Ada, a loop or block may have a name that appears at the
beginning and end of the construct. The compiler must check that the same name
is used at both places.
4.7.1 TYPE SYSTEMS
The design of a type checker for a language is based on information about the
syntactic constructs in the language, the notion of types, and the rules for assigning types to
language constructs, The following excerpts from the Pascal report and the C reference
manual, respectively, are examples of information that a compiler writer might have to start
with,
 “If both operands of the arithmetic operators of addition, subtraction and
multiplication are of type integer, then the result is of type integer.”
 “The result of the unary & operator is a pointer to the object referred to by the
operand. If the type of the operand is '..,', the type of the result is 'pointer to . . .'."
In both Pascal and C, types are either basic or constructed. Basic types are the atomic
types with no internal structure as far as the programmer is concerned. In Pascal, the basic
types are boolean, character, integer and real. Subrange types, like 1 . . 10, and enumerated
types, like (violet, indigo, blue, green, yellow, orange, red) can be treated as basic types.

Pascal allows a programmer to construct types from basic types and other constructed types,
with arrays, records, and sets being examples. In addition, pointers and functions can also be
treated as constructed types.
Type Expression
The type of a language construct will be denoted by a "type expression." Informally, a
type expression is either a basic type or is formed by applying an operator called a type
constructor to other type expressions. The sets of basic types and constructors depend on the
language to be checked. We uses the following definition of type expressions:
1. A basic type is a type expression. Among the basic types are boolean, char,
integer, and real. A special basic type, type_error, will signal an error during type checking.
Finally, a basic type void denoting "the absence of a value" allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression. An
example of the use of type names appears in 3(c) below.
3. A type constructor applied to type expressions is a type expression. Constructors
include:
a) Arrays. If T is a type expression, then array(1, T) is a type expression denoting the
type of an array with elements of type T and index set I. I is often a range of integers. For
example, the Pascal declaration
var A: array[ 1. .10] of integer ;
associates the type expression array ( 1 ...10, integer) with A.
b) Products. If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is
a type expression. We assume that X associates to the left.
C) Records;. The difference between a record and a product is that the fields of a
record have names. The record type constructor will be applied to a tuple formed from field
names and field types. For example, the Pascal program fragment
type row a record
address: integer;
lexeme: array [1..151 of char
end ;
var table: array [1..101] of row;
declares the type name row representing the type expression

record ((address x integer) x (lexeme x array ( 1.. 15, char)))
and the variable table to be an array of records of this type.
d) Pointers, If T is a type expression, then pointer(T) is a type expression denoting
the type "pointer to an object of type T." For example, in Pascal, the declaration
var p: ↑ row
declares variable p to have type pointer (row).
e) Functions, Mathematically, a function maps dements of one set, the domain, to
another set, the range. We may treat functions in programming languages as mapping a
domain type D to a range type R. The type of such a function will be denoted by the type
expression D → R. For example, the built-in function mad of Pascal has domain type int x
int, i.e., a pair of integers, and range type int. Thus. we say mod bas the type int x int → int
4. Type expressions may contain variables whose values are type expressions.
A convenient way to represent a type expression is to use a graph. Using the syntax-
directed approach, we can construct a tree or a dag for a type expression, with interior nodes
for type constructors and leaves for basic types, type names, and type variables (Fig, 6.2).
Examples:
Type Expression:
A type system is a collection of rules for assigning type expressions to the various
parts of a program. A type checker implements a type system.
(i) Static Checking of Types :Checking done by a compiler is said to be static,
(ii) Dynamic Checking of Types: Checking done when the target program runs is
termed dynamic.
A language is said to be strongly typed if its compiler can guarantee that the programs
it accepts will execute without type errors.

4.7.2 Specification of a simple Type Checker
The type checker is a translation scheme that synthesizes the type of each expression
from the types of its subexpressions. The type checker cans handle arrays, pointers,
statements, and functions.
A Simple Language
The grammar in Fig. 6.3 generates programs, represented by the nonterminal P,
consisting of a sequence of declarations D followed by a single expression E.
The language has two basic types, char, integer, a third basic type, type error is used
to signal errors.
Type Checking of Expressions
In the following rules, the synthesized attribute type for E gives the type expression assigned
by the type system to the expression generated by E. The following semantic rules say that
constants represented by the tokens literal and num haw type char and integer, respectively:
E → literal {E.type := char}
E → num {E.type := integer}

We use a function lookup(e) to fetch the type saved in the symbol-table entry pointed to by e.
When an identifier appears in an expression, its declared type is fetched and assigned to the
attribute type;
E → id { E.type := lookup (id.entry) }
The expression formed by applying the mod operator to two subexpressions of type integer
has type integer; otherwise, its type is type_error. The rule is
E → E1 mod E2 { E.type := if E1.type = integer and E2.type = integer
then integer else type_error }
In an array reference E1 [ E2 ], the index expression E2 must have type integer, in which case
the result is the element type t obtained from the type array(s. t) of E1; we make no use of the
index set s of the array.
E → E1 [ E2 ] { E.type := if E2.type = integer and E1.type = array(s,t)
then t else type_error }
Within expressions, the postfix operator ↑ yields the object pointed to by its operand. The
type of E ↑ is the type t of the object pointed to by the pointer E:
E → E1 ↑ { E.type := if E1.type = pointer(t)
then t else type_error }
Type Checking of Statements

The first value checks that the left and right sides of an assignment statements have
the same type.
The second and third rule specify that expressions in conditional and while statements
has type void only if each sub statement has type void.
Type Checking of Function
The application of a function to an argument can be captured by the production
E→ E(E)
in which an expression 1s the application of one expression to another. The rules for
associating type expressions with nonterminal T can be augmented by the following
production and action to permit function types in declarations.
T → T1’ →’T2 { T.type := T1.type → T2.type}
Quotes around the arrow used as a function constructor distinguish it from the arrow used as
the meta symbol in a production. The rule for checking the type of a function application is
E→ E1(E2) { E.type := if E2.type =s and E1.type =s
then t else type_error}
This rule says that in an expression formed by applying E1 to E2, the type of E1 must
be a function s → t from the type s of E2 to some range type t; the type of El (E2) is t.

4.7.3 EQUIVALENCE OF TYPE EXPRESSIONS
The checking rules is of the form, "if two type expressions are equal then return a
certain type else return type_error." It is therefore important to have a precise definition of
when two type expressions are equivalent. Potential ambiguities arise when names are given
to type expressions and the names are then used in subsequent type expressions. The key
issue is whether a name in a type expression stands for itself or whether it is an abbreviation
for another type expression.
Since there is interaction between the notion of equivalence of types and the
representation of types, we shall talk about both together. For efficiency, compilers we
representations that allow type equivalence to be determined quickly. The notion of type
equivalence implemented by a specific compiler can often be explained using the concepts of
structural and name .
Structural Equivalence of Type Expressions
As long as type expressions are built from basic types and constructors, a natural
notion of equivalence between two type expressions is structural equivalence; i.e., two
expressions are either the same basic type, or are formed by applying the same constructor to
structurally equivalent types. That is, two type expressions are structurally equivalent if and
only if they are identical
For example, the type expression integer is equivalent only to integer because they are
the same basic type. Similarly, pointer (integer) is equivalent only to pointer(integer) because
the two are formed by applying the same constructor pointer to equivalent types.

The algorithm far testing structural equivalence in Fig. 7.1 can be adapted to test
modified notions of equivalence. It assumes that the only type constructors are for arrays,
products, pointers, and functions.
The algorithm recursively compares the structure of type expressions without
checking for cycles so it can be applied to a tree or a dag representation. Identical type
expressions do not need to be represented by the same node in the dag.
The array bounds s1 and t1 in
s = array (s1, s2)
t = array (t1, t2)
are ignored if the test for array equivalence in lines 4 and 5 of Fig. 7.1 is reformulated as
else if s = array (s1, s2) and t = array (t1, t2) then
return sequiv (s2, t2)
In certain situations, we can find a representation for type expressions that is
significantly more compact than the type graph notation.
Names for Type Expressions
In some languages, types can be given names. For example, in the Pascal program
fragment
type link = ↑ cell;
var next : link;
last : link;
P : ↑ cell;
q, r : ↑ cell;
the identifier link is declared to be a name for the type ↑cell. The question arises, do the
variables next, last, p, q, r all have identical types? Surprisingly, the answer depends on the
implementation; The problem arose because the Pascal Report did not define the term
"identical type."
To model this situation, we allow type expressions to be named and allow these
names to appear in type expressions where we previously had only basic types.
For example. if cell is the name of a type expression, then pointer(cell) is a type
expression. For the time being, suppose there are no circular type expression definitions such
as defining cell to be the name of a type expression containing cell.

When names are allowed in type expressions, two notions of equivalence of type
expressions arise, depending on the treatment of names. Name equivalence views each type
name as a distinct type, so two type expressions are name equivalent if and only if they are
identical. Under structural equivalence, names are replaced by the type expressions they
define, so two type expressions are structurally equivalent if they represent two structurally
equivalent type expressions when all names have been substituted out.
Cycles in Representations Type
Basic data structures like linked lists and trees are often defined recursively; e.g,, a
linked list is either empty or consists of a cell with a pointer to a linked list. Such data
structures are usually implemented using records that contain pointers to similar records, and
type names play an essential role in defining the types of such records.
Consider a linked list of cells, each containing some integer information and a pointer
to the next cell in the list. Pascal declarations of type names corresponding to links and cells
are:
type link = ↑ cell;
cell = record
info : integer;
next : link
end;
Note that the type name link is defined in terms of cell and that cell is defined in terms of
link, so their definitions are recursive.
Recursively defined type names can be substituted our if we are willing to introduce
cycles into the type graph. If pointer(cell) is substituted for link, the type expression shown in
Fig. 6.(a) is obtained for cell. Using cycles as in Fig. 6.8(b), we can eliminate mention of cell
from the part of the type graph below the node labeled record.

4.7.4 TYPE CONVERSIONS
Consider expressions like x+ i where x is of type real and i is of type integer. Since
the representation of integers and real is different within a computer, and different machine
instructions are used for operations on integers and reals, the compiler may have to first
convert one of the operands of + to ensure that both operands are of the same type when the
addition takes place.
The language definition specifies what conversions are necessary . When an integer is
assigned to a real, or vice versa, the conversion is to the type of the left side of the
assignment, In expressions, the usual transformation is to convert the integer into a real
number and then perform a real operation on the resulting pair of real operands. The type
checker in a compiler can be used to insert these conversion operations into the intermediate
representation of the source program, For example. Postfix notation for x+ i, might be
Here, the inttoreal operator converts i from integer to real and then real+ performs
real addition on its operands.
Type conversion often arises in another context. A symbol having different meanings
depending on its context is said to be overloaded.
Coercions
Conversion from one type to another is said to be implicit if it is to be done
automatically by the compiler. Implicit type conversions, also called coercions, are limited in
many languages to situations where no information is lost in principle; e.g,, an integer may be
converted to a real but nut vice-versa. In practice, however, loss is possible when a real
number must fit into the same number of bits as an integer.
Conversion is said to be explicit if the programmer must write something to cause the
conversion. For all practical purposes, all conversions in Ada are explicit. Explicit
conversions look just Like function applications to a type checker, so they present no new
problems. For example,. in Pascal, a built-in function ord maps a character to an integer, end
chr does the inverse mapping from an integer to a character, so these conversions are explicit,
C, on the other hand, coerces (i,e,, implicitiy converts) ASCII characters to integers ktween 0
and 127 in arithmetic expressions.

4.8 – RUN-TIME ENVIRONMENTS
Runtime environment examines relationship between names and data objects. The
allocation and de-allocation is managed by run time support package, consisting of routines
loaded with the generated target code.
Each execution of a procedure is referred to as an activation of the procedure. It
procedure is recursive; several activations may be alive at the same time.
If a and b are activations of two procedures then their lifetime is either non-
overlapping or nested. A procedure is recursive if an activation can being before an either
activation of the same procedure has ended.
4.8.1 – Source Languages Issues
This section distinguishes between the source text of a procedure and its activations at
run time.
Procedures:
A procedure definition is a declaration that, associates an identifier with a statement.
The identifier is the procedure name, and the statement is the procedure body.

For example, the Pascal code in Fig. 7.1 contains the definition of the procedure
named readarray on lines 3-7; the body of the procedure is on lines 5-7. Procedures that
return values are called functions in many languages; however, it is convenient to refer them
a procedures. A complete program will also be treated as a procedure.
When a procedure name appears within an executable statement, we say that the
procedure is called at that point. The basic idea is that a procedure call executes the procedure
body. The main program in lines 21 -25 of Fig. 7.1 calls the procedure readarray at line 23
and then calls quicksort at line 24. Note that procedure calls can also occur within
expressions, as on line 16. Some of the identifiers appearing in a procedure definition are
special, and are called formal parameters of the procedure. (C calls them "formal arguments"
and Fortran calls them "dummy arguments, ")
The identifiers m and n on line 12 are formal parameters of quicksort. Arguments,
known as actual parameters (or actual) may be passed to a called procedure; they are
substituted for the formals in the body.
Activation Tree
Lets make the following assumptions about the flow of control among procedures
during the execution of a program:
1. Control flows sequentially; that is, the execution of a program consists of a sequence of
steps, with control being at some specific point in the program at each step.
2. Each execution of a procedure starts at the beginning of the procedure body and
eventually returns control to the point immediately following the place where the
procedure was called. This means the flow of control between procedures can be
depicted using trees.
Each execution of a procedure body is referred to as an activation of the procedure,
The lifetime of an activation of a procedure p is the sequence of steps between the first and
last steps in the execution of the procedure body, including time spent executing procedures
called by p, the procedures called by them, and so on. In general, the term "lifetime" refers to
a consecutive sequence of steps during the execution of a program.
In languages like Pascal, each time control enters a procedure q from procedure p, it
eventually returns to p. More precisely, each time control flows from an activation of a
procedure p to an activation of a procedure q, it returns to the same activation of p.

If a and b arc procedure activations, then their lifetimes are either non-overlapping or
are nested. That is, if b is entered before a is left, then control must leave b before it leaves a.
This nested property of activation lifetimes can be illustrated by inserting two print
statements in each procedure, one before the first statement of the procedure body and the
other after the last, The first statement prints enter followed by the name of the procedure and
the values of the actual parameters; the last statement prints leave followed by the same
information.
One execution of the program in Fig. 7.1 with these print statements produced the
output shown in Fig. 7,2. The lifetime of the activation quicksort ( (1, 9) is the sequence of
steps executed between printing enter quicksort ( 1 ,9) and printing leave quicksort( l,9). In
Fig. 7.2, it is assumed that the value returned by partition(1,9) is 4.
A procedure is recursive if a new activation can begin before an earlier activation of
the same procedure has ended. Figure 7,2 shows that control enters the activation of
quicksort(1,9) , from line 24, early in the execution of the program but leaves this activation
almost at the end. In the meantime, there are several other activations of quicksort, so
quicksort is recursive.
A recursive procedure p need not call itself directly; may call another procedure p,
which may then call p through some sequence of procedure calls. We can us a tree, called an
activation tree, to depict the way control enters and leaves activations.
In an activation tree,
1. each node represents an activation of a procedure,
2. the root represents the activation of the main program,

3. the node for a is the parent of the node for b if and only if control flows from
activation a to b, and
4. the node for a is to the left of the node for b if and only if the lifetime of a occurs
before the lifetime of b.
Since each node represents a unique activation, and vice versa, it is convenient to talk
of control being at a node when it is in the activation represented by the node.
The activation tree in Fig. 7.3 is constructed from the output in Fig. 7.2.
Control Stacks:
The flow of control in a program corresponds to a depth-first traversal of the
activation tree that starts at the root, visits a node before its children, and recursively visits
children at each node in a left-to-right order.
We can use a stack, called a control stack to keep track of live procedure activations,
The idea is to push the node for an activation onto the control stack as the activation begins
and to pop the node when the activation ends. Then the contents of the control stack are
related to paths to the root of the activation tree. When node n is at the top of the control
stack, the stack contains the nodes along the path from n to the root.
In the previous example, we consider the activation tree when the control reaches
q(2,3), then at this point the control stack will contain the following nodes.

The Scope of a Declaration
A declaration in a language is a syntactic construct that associates information with a
name. Declarations may be explicit, as in the Pascal fragment
var i : integer;
or they may be Implicit. For example, any variable name starting with I is assumed to denote
an integer in a Fortran program, unless otherwise declared.
There may be independent declarations of the same name in different parts of a
program. The scope rules of a language determine which declaration of a name applies when
the name appears in the text of a program.
In the Pascal program In Fig. 7,1, i is declared thrice, on lines 4, 9, and 13, and the
uses of the name i in procedures readarray, partition, and quicksort are independent of each
other. The declaration on line 4 applies to the uses of i on line 6. That is, the two occurrences
of i on line 6 are in the scope of the declaration on line 4. The three occurrences of i on lines
16-18 are in the scope of the declaration of i on line 13.
The portion of the program to which a declaration applies is called the scope of that
declaration. An occurrence of a name in a procedure is said to be local to the procedure if it is
in the scope of a declaration within the procedure; otherwise, the occurrence is said to be
non-local. The distinction between local and nonlocal names carries over to any syntactic
construct that can have declarations within it.
While scope is a property of the declaration of a name, it is sometimes convenient to
use the abbreviation "the scope of name x" for "the scope of the declaration of name x that
applies to this occurrence of x." In this sense, the scope of i on line 17 in Fig. 7. I is the body
of quicksort.
At compile time, the symbol table can be used to find the declaration that applies to
an occurrence of a name. When a declaration is seen, a symbol-table entry is created for it. As
long as we are in the scope of the declaration, its entry is returned when the name in it is
looked up.

Bindings of Names
Even if each name is declared once in a program, the same name may denote different
data objects at run time. The informal term "data object" corresponds to a storage location
that can hold values.
In programming language semantics, the term environment refers to a function that
maps a name to a storage location, and the term state refers to a function that maps a storage
location to the value held there, as in Fig. 7.5. Using the terms l-value and r-value, an
environment maps a name to an l-value, and a state maps the l-value to an r-value.
Environments and states are different; an assignment changes the state, but not the
environment. For example, suppose that storage address 100, associated with variable pi,
holds 0. After the assignment pi := 3. 14, the same storage address is associated with pi, but
the value held there is 3.14.
When an environment associates storage location s with a name x, we say that x is
bound to s; the association itself is referred to as a binding of x. The term storage "location"
is to be taken figuratively. If x is not of a basic type, the storage s for x may be a collection of
memory words.
A binding is the dynamic counterpart of a declaration, as shown in Fig. 7.6. As we
have seen, more than one activation of a recursive procedure can be alive at the same time. In
Pascal, a local variable name in a procedure is bound to a different storage location in each
activation of a procedure.

4.8.2 – STORAGE ORGANlZATlON
I The organization of run-time storage in this sect ion can lae u~d for languages such
as Fortran, Pascal, and C.
Subdivision of Run-Time Memory
Suppose that the compiler obtains a black of storage from the operating system for the
compiled program to run in. From the discussion in the last section, this run-time storage
might be subdivided to hold:
1. the generated target code,
2. data objects, and
3. a counterpart of the control stack to keep track of procedure activations.
The size of the generated target code is fixed at compile time, so the compiler can
place it in a statically determined area, perhaps in the low end of memory. Similarly, the size
of some d the data objects may also be known at compile time, and these too can be placed in
a statically determined area, as in Fig. 7.7, One reason for statically allocating as many data
objects as possible is that the addresses of these objects can be compiled into the target code.
All data objects in Fortran can be allocated statically.
Implementations of languages like Pascal and C use extensions of the control stack to
manage activations of procedures, When a call occurs, execution of an activation is
interrupted and information about the status of the machine, such as the value of the program
counter and machine registers, is saved on the stack. When control returns from the call, this
activation can k restarted after restoring the values of relevant registers and setting the
program counter to the point immediately after the call. Data objects whose life times are
contained in that of activation can be allocated on the slack, along with other information
associated with the activation.
A separate area of run-time memory, called a heap, holds all other information. Pascal
allows data to be allocated under program control. The storage for such data is taken from the
heap. Implementations of languages in which the lifetimes of activations cannot be
represented by an activation tree might use the heap to keep information about activations;
The controlled way in which data is allocated and deallocated on a stack makes it cheaper to
place data on the stack than on the heap.

The sizes of the stack and the heap can change as the program executes, So we show
these at opposite ends of memory in Fig. 7.7. where they can grow toward each other as
needed. Pascal and C need both a run-time stack and heap, but not all languages do.
By convention, stacks grow down. .That is, the "top" of the stack is drawn towards the
bottom of the page. Since memory addresses increase as we go down a page, "downwards-
growing " means toward higher addresses. If top marks the top of the stack, offsets from the
top of the stack can be computed by subtracting the offset from top. On many machines this
computation can be done efficiently by keeping the value of top in a register. Stack addresses
can then k represented as offsets from top.
Activation Records
Information needed by a single execution of a procedure is managed using a
contiguous block of storage called an activation record or frame, consisting of the collection
of fields shown in Fig. 7.8. Not all languages, nor all compilers use all of these fields; often
registers can take the place of one or more of them. For languages like Pascal and C, it is
customary to push the activation record of a procedure on the run-time stack when the

procedure is called and to pop the activation record off the stack when control returns to the
caller.
The purpose of the fields of an activation record is as follows, starting from the field
for temporaries.
1. Temporary values, such as those arising in the evaluation of expressions, are stored in
the field for temporaries.
2. The field for local data holds data that is local to an execution of a procedure.
3. The field for saved machine status holds information about the state of the machine just
before the procedure is called. This information includes the values of the program
counter and machine registers that have to be restored when control returns from the
procedure.
4. For a language like Fortran access links are not needed because nonlocal data is kept in
a Fixed place. Access links, or the related "display" mechanism. are needed for Pascal.
5. The optional control link paints to the activation record of the caller
6. The field for actual parameters is used by the calling procedure to supply parameters to
the called procedure. We show space for parameters in the activation record, but in
practice parameters are often passed in machine registers for greater efficiency.
7. The field for the returned value is used by the called procedure to return a value to the
calling procedure, Again, in practice this value is often returned in a register for greater
efficiency.
The sizes of each of these fields can be determined at the time a procedure is called. In
fact, the sizes of almost all fields can be determined at compile time. An exception occurs if a
procedure may have a local array whose size is determined by the value of an actual
parameter, available only when the procedure is called at run time.
Compile-Time Layout of Local Data
Suppose run-time storage comes in blocks of contiguous bytes, where a byte is the
smallest unit of addressable memory. On many machines, a byte is eight bits and some
number of bytes forms a machine word. Multibyte objects are stored in consecutive bytes and
given the address of the first byte.
The amount of storage needed for a name is determined from its type. An elementary
data type, such as a character, integer, or real, can usually be stored in an integer number of

bytes. Storage for an aggregate, such as an array or record, must be large enough to hold all
its components. For easy access to the components, storage for aggregates is typically
allocated in one contiguous block of bytes.
The field for local data is laid out as the declarations in a procedure are examined at
compile time, Variable-length data is kept outside this field. We keep a count of the memory
locations that have been allocated for previous declarations. From the count we determine a
relative address of the storage for a local with respect to some position such as the beginning
of the activation record. The relative address, or offset, is the difference between the
addresses of the position and the data object.
The storage layout for data objects is strongly influenced by the addressing constraints
of the target machine, For example, instructions to add integers may expect integers to be
aligned, that is, placed at certain positions in memory such as an address divisible by 4.
Although an array of ten characters needs only enough bytes to hold ten characters, a
compiler may therefore allocate 12 bytes, leaving 2 bytes unused. Space left unused due to
alignment considerations is referred to as padding. When space is at a premium, a compiler
may pack data so that no padding is left; additional instruction may then need to be executed
at run time to position packed data so that it can be operated on as if it were properly aligned.

Unit iv-syntax-directed-translation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Unit iv-syntax-directed-translation

Similaire à Unit iv-syntax-directed-translation (20)

Dernier

Dernier (20)

Unit iv-syntax-directed-translation