1. 1 LOVELY PROFESSIONAL UNIVERISTY
Term paper
OF
System software
(CAP-318)
TOPIC NAME :- Source To Source Compiler
Source to source complier
2. 2 LOVELY PROFESSIONAL UNIVERISTY
SUBMITTED TO: SUBMITTED BY:
MR. SANDEEP SHARMA NAME- VISHAL KEDIA
CLASS- BCA-MCA (INT)
R. NO. – RE37D1-A09
REGD NO-3010070236
ACKNOWLEDGEMENT
It is very difficult for any Term Paper schedule to be satisfactorily completed
without cooperation and benefit of advice from a large no, of persons whether they
are Engineers or Experts in their field of specialization.
We sincerely extend our gratitude to all those who helped us to do this Term Paper,
however this quite inadequate for the precious time they devoted to me. I am
deeply thankful to MR SANDEEP SHARMA our term paper guide who gave us
her immaculate support and guided us throughout the project work. The Term
Paper would not have been possible without his Moral Support.
Vishal Kedia
Source to source complier
3. 3 LOVELY PROFESSIONAL UNIVERISTY
Cer tificate
This is to certify that this Minor Term Paper titled “Source To Source
Compiler “has been submitted in the partial fulfillment of the Minor Term
Paper in the course System Software. It has been further certified that this
Term Paper is an Original work carried out by VISHAL KEDIA under the
continuous guidance of MR. SANDEEP SHARMA
Lecturer,
Lovely Professional University,
Source to source complier
4. 4 LOVELY PROFESSIONAL UNIVERISTY
Contents
1.Introduction (Project Name and Description)...........................................................................................4
Compiler - Native versus cross compiler.............................................................................................11
Compiler - One-pass versus multi-pass compilers..............................................................................11
Compiler - Compiled versus interpreted languages................................................................................12
1. Introduction (Project Name and Description)
Source to Source Compiler
This Project is based on the working process of the Source Compiler
A source-to-source compiler is a type of compiler that takes a high level programming language as its
input and outputs a high level language. For example, an automatic parallelizing compiler will frequently
take in a high level language program as an input and then transform the code and annotate it with
parallel code annotations (e.g., OpenMP) or language constructs (e.g. Fortran's DOALL statements).
Another purpose of source-to-source-compiling is translating legacy code to use the next version
of the underlying programming language or an API that breaks backward compatibility. It will
perform automatic code refactoring which is useful when the programs to refactor are outside the
control of the original implementer (for example, converting programs from Python 2 to Python
3, or converting programs from an old API to the new API) or when the size of the program
makes it impractical or time consuming to refactor it by hand.
Source to source complier
5. 5 LOVELY PROFESSIONAL UNIVERISTY
What is Compiler?
A compiler is a special type of computer program that translates a human readable text file into a form
that the computer can more easily understand. At its most basic level, a computer can only understand
two things, a 1 and a 0. At this level, a human will operate very slowly and find the information contained
in the long string of 1s and 0s incomprehensible. A compiler is a computer program that bridges this gap.
In the beginning, compilers were very simple programs that could only translate symbols into the bits,
the 1s and 0s, the computer understood. Programs were also very simple, composed of a series of steps
that were originally translated by hand into data the computer could understand. This was a very time
consuming task, so portions of this task were automated or programmed, and the first compiler was
written. This program assembled, or compiled, the steps required to execute the step by step program.
These simple compilers were used to write a more sophisticated compiler. With the newer version, more
rules could be added to the compiler program to allow a more natural language structure for the human
programmer to operate with. This made writing programs easier and allowed more people to begin
writing programs. As more people started writing programs, more ideas about writing programs were
offered and used to make more sophisticated compilers. In this way, compiler programs continue to
evolve, improve and become easier to use.
Source to source complier
6. 6 LOVELY PROFESSIONAL UNIVERISTY
Compiler programs can also be specialized. Certain language structures are better suited for a particular
task than others, so specific compilers were developed for specific tasks or languages. Some compilers
are multistage or multiple pass. A first pass could take a very natural language and make it closer to a
computer understandable language. A second or even a third pass could take it to the final stage, the
executable file.
The intermediate output in a multistage compiler is usually called pseudo-code, since it not usable by the
computer. Pseudo-code is very structured, like a computer program, not free flowing and verbose like a
more natural language. The final output is called the executable file, since it is what is actually executed
or run by the computer. Splitting the task up like this made it easier to write more sophisticated
compilers, as each sub task is different. It also made it easier for the computer to point out where it had
trouble understanding what it was being asked to do.
Errors that limit the compiler in understanding a program are called syntax errors. Errors in the way the
program functions are called logic errors. Logic errors are much harder to spot and correct. Syntax errors
are like spelling mistakes, whereas logic errors are a bit more like grammatical errors.
Example of Execution of Java Source Code
Source to source complier
7. 7 LOVELY PROFESSIONAL UNIVERISTY
What is Cross Compiler?
Source to source complier
8. 8 LOVELY PROFESSIONAL UNIVERISTY
Cross compiler programs have also been developed. A cross compiler allows a text file set of
instructions that is written for one computer designed by a specific manufacturer to be compiled
and run for a different computer by a different manufacturer. For example, a program that was
written to run on an Intel computer can sometimes be cross compiled to run a on computer
developed by Motorola. This frequently does not work very well. At the level at which computer
programs operate, the computer hardware can look very different, even if they may look similar
to you.
Cross compilation is different from having one computer emulate another computer. If a computer is
emulating a different computer, it is pretending to be that other computer. Emulation is frequently
slower than cross compilation, since two programs are running at once, the program that is pretending
to be the other computer and the program that is running. However, for cross compilation to work, you
need both the original natural language text that describes the program and a computer that is
sufficiently similar to the original computer that the program can function on to run on a different
computer. This is not always possible, so both techniques are in use.
History of Compiler
Several experimental compilers were developed in the 1950s (see, for example, the seminal work by
Grace Hopper on the A-0 language), but the FORTRAN team led by John Backus at IBM is generally
credited as having introduced the first complete compiler, in 1957. COBOL was an early language to be
compiled on multiple architectures, in 1960.
The idea of compilation quickly caught on, and most of the principles of compiler design were developed
during the 1960s.
A compiler is itself a computer program written in some implementation language. Early compilers were
written in assembly language. The first self-hosting compiler — capable of compiling its own source
code in a high-level language — was created for Lisp by Hart and Levin at MIT in 1962. The use of high-
level languages for writing compilers gained added impetus in the early 1970s when Pascal and C
compilers were written in their own languages. Building a self-hosting compiler is a bootstrapping
problem -- the first such compiler for a language must be compiled either by a compiler written in a
different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an
interpreter.
Compiler construction and compiler optimization are taught at universities as part of the computer
science curriculum. Such courses are usually supplemented with the implementation of a compiler for an
educational programming language. A well documented example is the PL/0 compiler, which was
originally used by Nicklaus Wirth for teaching compiler construction in the 1970s. In spite of its simplicity,
the PL/0 compiler introduced several concepts to the field which have since become established
educational standards:
1. The use of Program Development by Stepwise Refinement
Source to source complier
9. 9 LOVELY PROFESSIONAL UNIVERISTY
2. The use of a Recursive descent parser
3. The use of EBNF to specify the syntax of a language
4. The use of P-Code during generation of portable output code
5. The use of T-diagrams for the formal description of the bootstrapping problem
Types of Compiler
A compiler may produce code intended to run on the same type of computer and operating
system ("platform") as the compiler itself runs on. This is sometimes called a native-code
compiler. Alternatively, it might produce code designed to run on a different platform. This is
known as a cross compiler. Cross compilers are very useful when bringing up a new hardware
platform for the first time (see bootstrapping). A "source to source compiler" is a type of
compiler that takes a high level language as its input and outputs a high level language. For
example, an automatic parallelizing compiler will frequently take in a high level language
program as an input and then transform the code and annotate it with parallel code annotations
(e.g. OpenMP) or language constructs (e.g. Fortran's DOALL statements).
1. One-pass compiler, like early compilers for Pascal
o The compilation is done in one pass, hence it is very fast.
2. Threaded code compiler (or interpreter), like most implementations of FORTH
o This kind of compiler can be thought of as a database lookup program. It just replaces
given strings in the source with given binary code. The level of this binary code can vary;
in fact, some FORTH compilers can compile programs that don't even need an operating
system.
3. Incremental compiler, like many Lisp systems
o Individual functions can be compiled in a run-time environment that also includes
interpreted functions. Incremental compilation dates back to 1962 and the first Lisp
compiler, and is still used in Common Lisp systems.
4. Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog
implementations
o This Prolog machine is also known as the Warren abstract machine (or WAM). Byte-code
compilers for Java, Python (and many more) are also a subtype of this.
5. Just-in-time compiler, used by Smalltalk and Java systems
o Applications are delivered in byte code, which is compiled to native machine code just
prior to execution.
6. A re-targetable compiler is a compiler that can relatively easily be modified to generate code for
different CPU architectures. The object code produced by these is frequently of lesser quality
Source to source complier
10. 10 LOVELY PROFESSIONAL UNIVERISTY
than that produced by a compiler developed specifically for a processor. Re-targetable compilers
are often also cross compilers. GCC is an example of a re-targetable compiler.
7. A parallelizing compiler converts a serial input program into a form suitable for efficient
execution on parallel computer architecture.
Source to source complier
11. 11 LOVELY PROFESSIONAL UNIVERISTY
Compiler - Native versus cross compiler
Most compilers are classified as either native or cross-compilers.
A compiler may produce binary output intended to run on the same type of computer and
operating system ("platform") as the compiler itself runs on. This is sometimes called a native-
code compiler. Alternatively, it might produce binary output designed to run on a different
platform. This is known as a cross compiler. Cross compilers are very useful when bringing up a
new hardware platform for the first time (see bootstrapping). Cross compilers are necessary when
developing software for microcontroller systems that have barely enough storage for the final
machine code, much less a compiler. Compilers which are capable of producing both native and
foreign binary output may be called either a cross-compiler or a native compiler depending on a
specific use, although it would be more correct to classify them as a cross-compilers.
Interpreters are never classified as native or cross-compilers, because they don't output a binary
representation of their input code.
Virtual machine compilers are typically not classified as either native or cross-compilers.
However, if need be, they can be classified as one or the other, especially in the less usual cases
where a compiler is running inside the same VM (making it a native compiler), or where a
compiler is capable of producing an output for several different platforms, including a VM
(making it a cross-compiler).
Compiler - One-pass versus multi-pass compilers
All compilers are either one-pass or multi-pass.
1. One-pass compilers like early compilers for the Pascal programming language.
o The compilation is done in one pass over the program source, hence the compilation is
completed very quickly.
2. Multi-pass compilers, like 2-pass compilers or 3-pass compilers
o The compilation is done step by step. Each step uses the result of the previous step and
creates another intermediate result. This can improve final performance at the cost of
compilation speed.
While the typical multi-pass compiler outputs machine code from its final pass, there are several
other types:
• A "source-to-source compiler" is a type of compiler that takes a high level language as its input
and outputs a high level language. For example, an automatic parallelizing compiler will
frequently take in a high level language program as an input and then transform the code and
annotate it with parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's
DOALL statements).
Source to source complier
12. 12 LOVELY PROFESSIONAL UNIVERISTY
• Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog
implementations
o This Prolog machine is also known as the Warren Abstract Machine (or WAM). Byte-code
compilers for Java, Python (and many more) are also a subtype of this.
• Just-in-time compiler, used by Smalltalk and Java systems, and also by Microsoft .Net's Common
Intermediate Language (CIL)
o Applications are delivered in byte code, which is compiled to native machine code just
prior to execution.
Compiler - Compiled versus interpreted languages
Many people divide higher-level programming languages into compiled languages and
interpreted languages. However, there is rarely anything about a language that requires it to be
compiled or interpreted. Compilers and interpreters are implementations of languages, not
languages themselves. The categorization usually reflects the most popular or widespread
implementations of a language -- for instance, BASIC is thought of as an interpreted language,
and C a compiled one, despite the existence of BASIC compilers and C interpreters.
There are exceptions; some language specifications assume the use of a compiler (as with C), or
spell out that implementations must include a compilation facility (as with Common Lisp). Some
languages have features that are very easy to implement in an interpreter, but make writing a
compiler much harder; for example, SNOBOL4, and many scripting languages are capable of
constructing arbitrary source code at runtime with regular string operations, and then executing
that code by passing it to a special evaluation function.
Compiler- Compiler Design
In the past, compilers were divided into many passes to save space. A pass in this context is a run
of the compiler through the source code of the program to be compiled, resulting in the building
up of the internal data of the compiler (such as the evolving symbols table and other assisting
data). When each pass is finished, the compiler can free the internal data space needed during that
pass. This 'multi pass' method of compiling was the common compiler technology at the time, but
was also due to the small main memories of host computers relative to the source code and data.
Many modern compilers share a common 'two stage' design. The front end translates the source
language into an intermediate representation. The second stage is the back end, which works with
the internal representation to produce code in the output language. The front end and back end
may operate as separate passes, or the front end may call the back end as a subroutine, passing it
the intermediate representation.
This approach mitigates complexity separating the concerns of the front end, which typically
revolve around language semantics, error checking, and the like, from the concerns of the back
end, which concentrates on producing output that is both efficient and correct. It also has the
advantage of allowing the use of a single back end for multiple source languages, and similarly
allows the use of different back ends for different targets.
Source to source complier
13. 13 LOVELY PROFESSIONAL UNIVERISTY
Often, optimizers and error checkers can be shared by both front ends and back ends if they are
designed to operate on the intermediate language that a front-end passes to a back end. This can
let many compilers (combinations of front and back ends) reuse the large amounts of work that
often go into code analyzers and optimizers.
Certain languages, due to the design of the language and certain rules placed on the declaration of
variables and other objects used, and the pre declaration of executable procedures prior to
reference or use, are capable of being compiled in a single pass. The Pascal programming
language is well known for this capability, and in fact many Pascal compilers are themselves
written in the Pascal language because of the rigid specification of the language and the
capability to use a single pass to compile Pascal language programs.
Compiler Front End
The compiler front end consists of multiple phases itself, each informed by formal language
theory:
1. Lexical analysis - breaking the source code text into small pieces ('tokens' or 'terminals'), each
representing a single atomic unit of the language, for instance a keyword, identifier or symbol
names. The token language is typically a regular language, so a finite state automaton
constructed from a regular expression can be used to recognize it. This phase is also called lexing
or scanning.
2. Syntax analysis - Identifying syntactic structures of source code. It only focuses on the structure.
In other words, it identifies the order of tokens and understand hierarchical structures in code.
This phase is also called parsing.
3. Semantic analysis is to recognize the meaning of program code and start to prepare for output.
In that phase, type checking is done and most of compiler errors show up.
4. Intermediate language generation - an equivalent to the original program is created in an
intermediate language.
Source to source complier
14. 14 LOVELY PROFESSIONAL UNIVERISTY
Compiler Back End
While there are applications where only the compiler front end is necessary, such as static
language verification tools, a real compiler hands the intermediate representation generated by
the front end to the back end, which produces a functional equivalent program in the output
language. This is done in multiple steps:
1. Compiler Analysis - This is the process to gather program information from the intermediate
representation of the input source files. Typical analysis are variable define-use and use-define
chain, data dependence analysis, alias analysis etc. Accurate analysis is the base for any compiler
optimizations. The call graph and control flow graph are usually also built during the analysis
phase.
2. Optimization - the intermediate language representation is transformed into functionally
equivalent but faster (or smaller) forms. Popular optimizations are in-line expansion, dead code
elimination, constant propagation, loop transformation, register allocation or even auto
parallelization.
3. Code generation - the transformed intermediate language is translated into the output
language, usually the native machine language of the system. This involves resource and storage
decisions, such as deciding which variables to fit into registers and memory and the selection
and scheduling of appropriate machine instructions along with their associated addressing
modes
Source to source complier
15. 15 LOVELY PROFESSIONAL UNIVERISTY
Bibliography
• SYSTEM PROGRAMMING BY JOHN J. DONOVAN
• WWW.GOOGLE.COM
Source to source complier