SlideShare une entreprise Scribd logo
1  sur  91
Language processing:
introduction to compiler
construction
Andy D. Pimentel
Computer Systems Architecture group
andy@science.uva.nl
http://www.science.uva.nl/~andy/taalverwerking.html
About this course
• This part will address compilers for programming
languages
• Depth-first approach
– Instead of covering all compiler aspects very briefly,
we focus on particular compiler stages
– Focus: optimization and compiler back issues
• This course is complementary to the compiler
course at the VU
• Grading: (heavy) practical assignment and one
or two take-home assignments
About this course (cont’d)
• Book
– Recommended, not compulsory: Seti, Aho and
Ullman,”Compilers Principles, Techniques and Tools”
(the Dragon book)
– Old book, but still more than sufficient
– Copies of relevant chapters can be found in the library
• Sheets are available at the website
• Idem for practical/take-home assignments,
deadlines, etc.
Topics
• Compiler introduction
– General organization
• Scanning & parsing
– From a practical viewpoint: LEX and YACC
• Intermediate formats
• Optimization: techniques and algorithms
– Local/peephole optimizations
– Global and loop optimizations
– Recognizing loops
– Dataflow analysis
– Alias analysis
Topics (cont’d)
• Code generation
– Instruction selection
– Register allocation
– Instruction scheduling: improving ILP
• Source-level optimizations
– Optimizations for cache behavior
Compilers:
general organization
Compilers: organization
• Frontend
– Dependent on source language
– Lexical analysis
– Parsing
– Semantic analysis (e.g., type checking)
Frontend Optimizer BackendSource Machine
code
IR IR
Compilers: organization (cont’d)
• Optimizer
– Independent part of compiler
– Different optimizations possible
– IR to IR translation
– Can be very computational intensive part
Frontend Optimizer BackendSource Machine
code
IR IR
Compilers: organization (cont’d)
• Backend
– Dependent on target processor
– Code selection
– Code scheduling
– Register allocation
– Peephole optimization
Frontend Optimizer BackendSource Machine
code
IR IR
Frontend
Introduction to parsing
using LEX and YACC
Overview
• Writing a compiler is difficult requiring lots of time
and effort
• Construction of the scanner and parser is routine
enough that the process may be automated
Lexical RulesLexical Rules
GrammarGrammar
SemanticsSemantics
CompilerCompiler
CompilerCompiler
ScannerScanner
------------------
ParserParser
------------------
CodeCode
generatorgenerator
YACC
• What is YACC ?
– Tool which will produce a parser for a given
grammar.
– YACC (Yet Another Compiler Compiler) is a program
designed to compile a LALR(1) grammar and to
produce the source code of the syntactic analyzer of
the language produced by this grammar
– Input is a grammar (rules) and actions to take upon
recognizing a rule
– Output is a C program and optionally a header file of
tokens
LEX
• Lex is a scanner generator
– Input is description of patterns and actions
– Output is a C program which contains a function yylex()
which, when called, matches patterns and performs
actions per input
– Typically, the generated scanner performs lexical
analysis and produces tokens for the (YACC-generated)
parser
LEX and YACC: a team
How to
work ?
LEX and YACC: a team
call yylex()
[0-9]+
next token is NUM
NUM ‘+’ NUM
Availability
• lex, yacc on most UNIX systems
• bison: a yacc replacement from GNU
• flex: fast lexical analyzer
• BSD yacc
• Windows/MS-DOS versions exist
YACC
Basic Operational Sequence
a.out
File containing desired
grammar in YACC format
YACC programYACC program
C source program created by YACC
C compilerC compiler
Executable program that will parse
grammar given in gram.y
gram.y
yacc
y.tab.c
cc
or gcc
YACC File Format
Definitions
%%
Rules
%%
Supplementary Code The identical LEX format wasThe identical LEX format was
actually taken from this...actually taken from this...
Rules Section
• Is a grammar
• Example
expr : expr '+' term | term;
term : term '*' factor | factor;
factor : '(' expr ')' | ID | NUM;
Rules Section
• Normally written like this
• Example:
expr : expr '+' term
| term
;
term : term '*' factor
| factor
;
factor : '(' expr ')'
| ID
| NUM
;
Definitions Section
Example
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token ID NUM
%start expr
This is called a
terminal
The start
symbol
(non-terminal)
Sidebar
• LEX produces a function called yylex()
• YACC produces a function called yyparse()
• yyparse() expects to be able to call yylex()
• How to get yylex()?
• Write your own!
• If you don't want to write your own: Use LEX!!!
Sidebar
int yylex()
{
if(it's a num)
return NUM;
else if(it's an id)
return ID;
else if(parsing is done)
return 0;
else if(it's an error)
return -1;
}
Semantic actions
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
Semantic actions (cont’d)
expr : exprexpr '+' term { $$ = $1 + $3; }
| termterm { $$ = $1; }
;
term : termterm '*' factor { $$ = $1 * $3; }
| factorfactor { $$ = $1; }
;
factor : '((' expr ')' { $$ = $2; }
| ID
| NUM
;
$1$1
Semantic actions (cont’d)
expr : expr '++' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '**' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' exprexpr ')' { $$ = $2; }
| ID
| NUM
;
$2$2
Semantic actions (cont’d)
expr : expr '+' termterm { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factorfactor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr '))' { $$ = $2; }
| ID
| NUM
;
$3$3
Default: $$ = $1;Default: $$ = $1;
yacc -v gram.y
• Will produce:
y.output
Bored, lonely? Try this!
yacc -d gram.y
• Will produce:
y.tab.h
Look at this and you'll
never be unhappy again!
Look at this and you'll
never be unhappy again!
Shows "State
Machine"®
Shows "State
Machine"®
Example: LEX
%{
#include <stdio.h>
#include "y.tab.h"
%}
id [_a-zA-Z][_a-zA-Z0-9]*
wspc [ tn]+
semi [;]
comma [,]
%%
int { return INT; }
char { return CHAR; }
float { return FLOAT; }
{comma} { return COMMA; } /* Necessary? */
{semi} { return SEMI; }
{id} { return ID;}
{wspc} {;}
scanner.lscanner.l
Example: Definitions
%{
#include <stdio.h>
#include <stdlib.h>
%}
%start line
%token CHAR, COMMA, FLOAT, ID, INT, SEMI
%%
decl.ydecl.y
/* This production is not part of the "official"
* grammar. It's primary purpose is to recover from
* parser errors, so it's probably best if you leave
* it here. */
line : /* lambda */
| line decl
| line error {
printf("Failure :-(n");
yyerrok;
yyclearin;
}
;
Example: Rules
decl.ydecl.y
Example: Rules
decl : type ID list { printf("Success!n"); } ;
list : COMMA ID list
| SEMI
;
type : INT | CHAR | FLOAT
;
%%
decl.ydecl.y
Example: Supplementary Code
extern FILE *yyin;
main()
{
do {
yyparse();
} while(!feof(yyin));
}
yyerror(char *s)
{
/* Don't have to do anything! */
}
decl.ydecl.y
Bored, lonely? Try this!
yacc -d decl.y
• Produced
y.tab.h
# define CHAR 257
# define COMMA 258
# define FLOAT 259
# define ID 260
# define INT 261
# define SEMI 262
Symbol attributes
• Back to attribute grammars...
• Every symbol can have a value
– Might be a numeric quantity in case of a number (42)
– Might be a pointer to a string ("Hello, World!")
– Might be a pointer to a symbol table entry in case of a
variable
• When using LEX we put the value into yylval
– In complex situations yylval is a union
• Typical LEX code:
[0-9]+ {yylval = atoi(yytext); return NUM}
Symbol attributes (cont’d)
• YACC allows symbols to have multiple types of
value symbols
%union {
double dval;
int vblno;
char* strval;
}
Symbol attributes (cont’d)
%union {
double dval;
int vblno;
char* strval;
}
yacc -d
y.tab.h
…
extern YYSTYPE yylval;
[0-9]+ { yylval.vblno = atoi(yytext);
return NUM;}
[A-z]+ { yylval.strval =
strdup(yytext);
return STRING;}
LEX file
include “y.tab.h”
Precedence / Association
1. 1-2-3 = (1-2)-3? or 1-(2-3)?
Define ‘-’ operator is left-association.
2. 1-2*3 = 1-(2*3)
Define “*” operator is precedent to “-” operator
(1) 1 – 2 - 3
(2) 1 – 2 * 3
Precedence /
Association
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-’ expr { $$ = $1 - $3; }
| expr ‘*’ expr { $$ = $1 * $3; }
| expr ‘/’ expr { if($3==0)
yyerror(“divide 0”);
else
$$ = $1 / $3;
}
| ‘-’ expr %prec UMINUS {$$ = -$2; }
%left '+' '-'
%left '*' '/'
%noassoc UMINUS
Precedence / Association
%right ‘=‘
%left '<' '>' NE LE GE
%left '+' '-‘
%left '*' '/'
highest precedence
Big trick
Getting YACC & LEX to work together!
LEX & YACC
cc/
gcc
lex.yy.clex.yy.c
y.tab.cy.tab.c
a.outa.out
Building Example
• Suppose you have a lex file called scanner.l
and a yacc file called decl.y and want parser
• Steps to build...
lex scanner.l
yacc -d decl.y
gcc -c lex.yy.c y.tab.c
gcc -o parser lex.yy.o y.tab.o -ll
Note: scanner should include in the definitions
section: #include "y.tab.h"
YACC
• Rules may be recursive
• Rules may be ambiguous
• Uses bottom-up Shift/Reduce parsing
– Get a token
– Push onto stack
– Can it be reduced (How do we know?)
• If yes: Reduce using a rule
• If no: Get another token
• YACC cannot look ahead more than one token
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
a = 7; b = 3 + a + 2
stack:
<empty>
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
= 7; b = 3 + a + 2
stack:
NAME
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
7; b = 3 + a + 2
stack:
NAME ‘=‘
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
; b = 3 + a + 2
stack:
NAME ‘=‘ 7
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
; b = 3 + a + 2
stack:
NAME ‘=‘ exp
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
; b = 3 + a + 2
stack:
stmt
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
b = 3 + a + 2
stack:
stmt ‘;’
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
= 3 + a + 2
stack:
stmt ‘;’ NAME
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
3 + a + 2
stack:
stmt ‘;’ NAME ‘=‘
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
+ a + 2
stack:
stmt ‘;’ NAME ‘=‘
NUMBER
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
+ a + 2
stack:
stmt ‘;’ NAME ‘=‘
exp
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
a + 2
stack:
stmt ‘;’ NAME ‘=‘
exp ‘+’
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
+ 2
stack:
stmt ‘;’ NAME ‘=‘
exp ‘+’ NAME
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
+ 2
stack:
stmt ‘;’ NAME ‘=‘
exp ‘+’ exp
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
+ 2
stack:
stmt ‘;’ NAME ‘=‘
exp
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
2
stack:
stmt ‘;’ NAME ‘=‘
exp ‘+’
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
<empty>
stack:
stmt ‘;’ NAME ‘=‘
exp ‘+’ NUMBER
SHIFT!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
<empty>
stack:
stmt ‘;’ NAME ‘=‘
exp ‘+’ exp
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
<empty>
stack:
stmt ‘;’ NAME ‘=‘
exp
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
<empty>
stack:
stmt ‘;’ stmt
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
<empty>
stack:
stmt
REDUCE!
Shift and reducing
stmt: stmt ‘;’ stmt
| NAME ‘=‘ exp
exp: exp ‘+’ exp
| exp ‘-’ exp
| NAME
| NUMBER
input:
<empty>
stack:
stmt
DONE!
IF-ELSE Ambiguity
• Consider following rule:
Following state : IF expr IF expr stmt . ELSE stmt
• Two possible derivations:
IF expr IF expr stmt . ELSE stmt
IF expr IF expr stmt ELSE . stmt
IF expr IF expr stmt ELSE stmt .
IF expr stmt
IF expr IF expr stmt . ELSE stmt
IF expr stmt . ELSE stmt
IF expr stmt ELSE . stmt
IF expr stmt ELSE stmt .
IF-ELSE Ambiguity
• It is a shift/reduce conflict
• YACC will always do shift first
• Solution 1 : re-write grammar
stmt : matched
| unmatched
;
matched: other_stmt
| IF expr THEN matched ELSE matched
;
unmatched: IF expr THEN stmt
| IF expr THEN matched ELSE unmatched
;
• Solution 2:
IF-ELSE Ambiguity
the rule has the
same precedence as
token IFX
Shift/Reduce Conflicts
• shift/reduce conflict
– occurs when a grammar is written in such a way that
a decision between shifting and reducing can not be
made.
– e.g.: IF-ELSE ambiguity
• To resolve this conflict, YACC will choose to shift
Reduce/Reduce Conflicts
• Reduce/Reduce Conflicts:
start : expr | stmt
;
expr : CONSTANT;
stmt : CONSTANT;
• YACC (Bison) resolves the conflict by
reducing using the rule that occurs earlier in
the grammar. NOT GOOD!!
• So, modify grammar to eliminate them
Error Messages
• Bad error message:
– Syntax error
– Compiler needs to give programmer a good advice
• It is better to track the line number in LEX:
void yyerror(char *s)
{
fprintf(stderr, "line %d: %sn:", yylineno, s);
}
Recursive Grammar
• Left recursion
• Right recursion
• LR parser prefers left recursion
• LL parser prefers right recursion
list:
item
| list ',' item
;
list:
item
| item ',' list
;
YACC Example
• Taken from LEX & YACC
• Simple calculator
a = 4 + 6
a
a=10
b = 7
c = a + b
c
c = 17
pressure = (78 + 34) * 16.4
$
Grammar
expression ::= expression '+' term |
expression '-' term |
term
term ::= term '*' factor |
term '/' factor |
factor
factor ::= '(' expression ')' |
'-' factor |
NUMBER |
NAME
parser.h
/*
*Header for calculator program
*/
#define NSYMS 20 /* maximum number
of symbols */
struct symtab {
char *name;
double value;
} symtab[NSYMS];
struct symtab *symlook();
parser.h
name value0
name value1
name value2
name value3
name value4
name value5
name value6
name value7
name value8
name value9
name value10
name value11
name value12
name value13
name value14
parser.y
%{
#include "parser.h"
#include <string.h>
%}
%union {
double dval;
struct symtab *symp;
}
%token <symp> NAME
%token <dval> NUMBER
%type <dval> expression
%type <dval> term
%type <dval> factor
%%
parser.y
statement_list: statement 'n'
| statement_list statement 'n‘
;
statement: NAME '=' expression { $1->value = $3; }
| expression { printf("= %gn", $1); }
;
expression: expression '+' term { $$ = $1 + $3; }
| expression '-' term { $$ = $1 - $3; }
term
;
parser.y
term: term '*' factor { $$ = $1 * $3; }
| term '/' factor { if($3 == 0.0)
yyerror("divide by zero");
else
$$ = $1 / $3;
}
| factor
;
factor: '(' expression ')' { $$ = $2; }
| '-' factor { $$ = -$2; }
| NUMBER
| NAME { $$ = $1->value; }
;
%%
parser.y
/* look up a symbol table entry, add if not present */
struct symtab *symlook(char *s) {
char *p;
struct symtab *sp;
for(sp = symtab; sp < &symtab[NSYMS]; sp++) {
/* is it already here? */
if(sp->name && !strcmp(sp->name, s))
return sp;
if(!sp->name) { /* is it free */
sp->name = strdup(s);
return sp;
}
/* otherwise continue to next */
}
yyerror("Too many symbols");
exit(1); /* cannot continue */
} /* symlook */ parser.y
yyerror(char *s)
{
printf( "yyerror: %sn", s);
}
parser.y
typedef union
{
double dval;
struct symtab *symp;
} YYSTYPE;
extern YYSTYPE yylval;
# define NAME 257
# define NUMBER 258
y.tab.h
calclexer.l
%{
#include "y.tab.h"
#include "parser.h"
#include <math.h>
%}
%%
calclexer.l
%%
([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) {
yylval.dval = atof(yytext);
return NUMBER;
}
[ t] ; /* ignore white space */
[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */
yylval.symp = symlook(yytext);
return NAME;
}
"$" { return 0; /* end of input */ }
n|. return yytext[0];
%% calclexer.l
Makefile
Makefile
LEX = lex
YACC = yacc
CC = gcc
calcu: y.tab.o lex.yy.o
$(CC) -o calcu y.tab.o lex.yy.o -ly -ll
y.tab.c y.tab.h: parser.y
$(YACC) -d parser.y
y.tab.o: y.tab.c parser.h
$(CC) -c y.tab.c
lex.yy.o: y.tab.h lex.yy.c
$(CC) -c lex.yy.c
lex.yy.c: calclexer.l parser.h
$(LEX) calclexer.l
clean:
rm *.o
rm *.c
rm calcu
YACC Declaration
Summary
`%start' Specify the grammar's start symbol
`%union‘ Declare the collection of data types that
semantic values may have
`%token‘ Declare a terminal symbol (token type
name) with no precedence or associativity specified
`%type‘ Declare the type of semantic values for a
nonterminal symbol
YACC Declaration Summary
`%right‘ Declare a terminal symbol (token type name)
that is right-associative
`%left‘ Declare a terminal symbol (token type name)
that is left-associative
`%nonassoc‘ Declare a terminal symbol (token type
name) that is nonassociative (using it in a way that
would be associative is a syntax error, e.g.:
x op. y op. z is syntax error)

Contenu connexe

Tendances

Lex and Yacc ppt
Lex and Yacc pptLex and Yacc ppt
Lex and Yacc pptpssraikar
 
Introduction of bison
Introduction of bisonIntroduction of bison
Introduction of bisonvip_du
 
Yacc topic beyond syllabus
Yacc   topic beyond syllabusYacc   topic beyond syllabus
Yacc topic beyond syllabusJK Knowledge
 
More on Lex
More on LexMore on Lex
More on LexTech_MX
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lexAnusuya123
 
Compiler Construction | Lecture 13 | Code Generation
Compiler Construction | Lecture 13 | Code GenerationCompiler Construction | Lecture 13 | Code Generation
Compiler Construction | Lecture 13 | Code GenerationEelco Visser
 
Compiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesCompiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesEelco Visser
 
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolCompiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolMashaelQ
 
CS4200 2019 | Lecture 4 | Syntactic Services
CS4200 2019 | Lecture 4 | Syntactic ServicesCS4200 2019 | Lecture 4 | Syntactic Services
CS4200 2019 | Lecture 4 | Syntactic ServicesEelco Visser
 
system software
system software system software
system software randhirlpu
 
Compiler Construction | Lecture 3 | Syntactic Editor Services
Compiler Construction | Lecture 3 | Syntactic Editor ServicesCompiler Construction | Lecture 3 | Syntactic Editor Services
Compiler Construction | Lecture 3 | Syntactic Editor ServicesEelco Visser
 
6 compiler lab - Flex
6 compiler lab - Flex6 compiler lab - Flex
6 compiler lab - FlexMashaelQ
 
Generating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonGenerating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonTristan Penman
 
02. chapter 3 lexical analysis
02. chapter 3   lexical analysis02. chapter 3   lexical analysis
02. chapter 3 lexical analysisraosir123
 

Tendances (20)

Lex and Yacc ppt
Lex and Yacc pptLex and Yacc ppt
Lex and Yacc ppt
 
Introduction of bison
Introduction of bisonIntroduction of bison
Introduction of bison
 
Yacc topic beyond syllabus
Yacc   topic beyond syllabusYacc   topic beyond syllabus
Yacc topic beyond syllabus
 
Lex Tool
Lex ToolLex Tool
Lex Tool
 
More on Lex
More on LexMore on Lex
More on Lex
 
Lex & yacc
Lex & yaccLex & yacc
Lex & yacc
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
 
Yacc
YaccYacc
Yacc
 
LEX & YACC
LEX & YACCLEX & YACC
LEX & YACC
 
Lex
LexLex
Lex
 
Compiler Construction | Lecture 13 | Code Generation
Compiler Construction | Lecture 13 | Code GenerationCompiler Construction | Lecture 13 | Code Generation
Compiler Construction | Lecture 13 | Code Generation
 
Compiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual MachinesCompiler Construction | Lecture 12 | Virtual Machines
Compiler Construction | Lecture 12 | Virtual Machines
 
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolCompiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
 
CS4200 2019 | Lecture 4 | Syntactic Services
CS4200 2019 | Lecture 4 | Syntactic ServicesCS4200 2019 | Lecture 4 | Syntactic Services
CS4200 2019 | Lecture 4 | Syntactic Services
 
system software
system software system software
system software
 
Compiler Construction | Lecture 3 | Syntactic Editor Services
Compiler Construction | Lecture 3 | Syntactic Editor ServicesCompiler Construction | Lecture 3 | Syntactic Editor Services
Compiler Construction | Lecture 3 | Syntactic Editor Services
 
6 compiler lab - Flex
6 compiler lab - Flex6 compiler lab - Flex
6 compiler lab - Flex
 
Generating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonGenerating parsers using Ragel and Lemon
Generating parsers using Ragel and Lemon
 
C++ How to program
C++ How to programC++ How to program
C++ How to program
 
02. chapter 3 lexical analysis
02. chapter 3   lexical analysis02. chapter 3   lexical analysis
02. chapter 3 lexical analysis
 

Similaire à Lexyacc

Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitTomer Gabel
 
The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||Ashwin Rao
 
yacc installation & sample program exe.ppt
yacc installation & sample program exe.pptyacc installation & sample program exe.ppt
yacc installation & sample program exe.pptKoppala Guravaiah
 
ScalaCheck
ScalaCheckScalaCheck
ScalaCheckBeScala
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Tomer Gabel
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011Mandi Walls
 
Introduction to Python for Plone developers
Introduction to Python for Plone developersIntroduction to Python for Plone developers
Introduction to Python for Plone developersJim Roepcke
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsRajendran
 
The Art Of Readable Code
The Art Of Readable CodeThe Art Of Readable Code
The Art Of Readable CodeBaidu, Inc.
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
 
Programming in java basics
Programming in java  basicsProgramming in java  basics
Programming in java basicsLovelitJose
 
app4.pptx
app4.pptxapp4.pptx
app4.pptxsg4795
 
Python language data types
Python language data typesPython language data types
Python language data typesHarry Potter
 
Python language data types
Python language data typesPython language data types
Python language data typesHoang Nguyen
 
Python language data types
Python language data typesPython language data types
Python language data typesYoung Alista
 
Python language data types
Python language data typesPython language data types
Python language data typesLuis Goldster
 

Similaire à Lexyacc (20)

Compiler Design Tutorial
Compiler Design Tutorial Compiler Design Tutorial
Compiler Design Tutorial
 
Writing Parsers and Compilers with PLY
Writing Parsers and Compilers with PLYWriting Parsers and Compilers with PLY
Writing Parsers and Compilers with PLY
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
 
x86
x86x86
x86
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
 
The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||
 
yacc installation & sample program exe.ppt
yacc installation & sample program exe.pptyacc installation & sample program exe.ppt
yacc installation & sample program exe.ppt
 
ScalaCheck
ScalaCheckScalaCheck
ScalaCheck
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
Introduction to Python for Plone developers
Introduction to Python for Plone developersIntroduction to Python for Plone developers
Introduction to Python for Plone developers
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
The Art Of Readable Code
The Art Of Readable CodeThe Art Of Readable Code
The Art Of Readable Code
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
 
Programming in java basics
Programming in java  basicsProgramming in java  basics
Programming in java basics
 
app4.pptx
app4.pptxapp4.pptx
app4.pptx
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python language data types
Python language data typesPython language data types
Python language data types
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Lexyacc

  • 1. Language processing: introduction to compiler construction Andy D. Pimentel Computer Systems Architecture group andy@science.uva.nl http://www.science.uva.nl/~andy/taalverwerking.html
  • 2. About this course • This part will address compilers for programming languages • Depth-first approach – Instead of covering all compiler aspects very briefly, we focus on particular compiler stages – Focus: optimization and compiler back issues • This course is complementary to the compiler course at the VU • Grading: (heavy) practical assignment and one or two take-home assignments
  • 3. About this course (cont’d) • Book – Recommended, not compulsory: Seti, Aho and Ullman,”Compilers Principles, Techniques and Tools” (the Dragon book) – Old book, but still more than sufficient – Copies of relevant chapters can be found in the library • Sheets are available at the website • Idem for practical/take-home assignments, deadlines, etc.
  • 4. Topics • Compiler introduction – General organization • Scanning & parsing – From a practical viewpoint: LEX and YACC • Intermediate formats • Optimization: techniques and algorithms – Local/peephole optimizations – Global and loop optimizations – Recognizing loops – Dataflow analysis – Alias analysis
  • 5. Topics (cont’d) • Code generation – Instruction selection – Register allocation – Instruction scheduling: improving ILP • Source-level optimizations – Optimizations for cache behavior
  • 7. Compilers: organization • Frontend – Dependent on source language – Lexical analysis – Parsing – Semantic analysis (e.g., type checking) Frontend Optimizer BackendSource Machine code IR IR
  • 8. Compilers: organization (cont’d) • Optimizer – Independent part of compiler – Different optimizations possible – IR to IR translation – Can be very computational intensive part Frontend Optimizer BackendSource Machine code IR IR
  • 9. Compilers: organization (cont’d) • Backend – Dependent on target processor – Code selection – Code scheduling – Register allocation – Peephole optimization Frontend Optimizer BackendSource Machine code IR IR
  • 11. Overview • Writing a compiler is difficult requiring lots of time and effort • Construction of the scanner and parser is routine enough that the process may be automated Lexical RulesLexical Rules GrammarGrammar SemanticsSemantics CompilerCompiler CompilerCompiler ScannerScanner ------------------ ParserParser ------------------ CodeCode generatorgenerator
  • 12. YACC • What is YACC ? – Tool which will produce a parser for a given grammar. – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar – Input is a grammar (rules) and actions to take upon recognizing a rule – Output is a C program and optionally a header file of tokens
  • 13. LEX • Lex is a scanner generator – Input is description of patterns and actions – Output is a C program which contains a function yylex() which, when called, matches patterns and performs actions per input – Typically, the generated scanner performs lexical analysis and produces tokens for the (YACC-generated) parser
  • 14. LEX and YACC: a team How to work ?
  • 15. LEX and YACC: a team call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM
  • 16. Availability • lex, yacc on most UNIX systems • bison: a yacc replacement from GNU • flex: fast lexical analyzer • BSD yacc • Windows/MS-DOS versions exist
  • 17. YACC Basic Operational Sequence a.out File containing desired grammar in YACC format YACC programYACC program C source program created by YACC C compilerC compiler Executable program that will parse grammar given in gram.y gram.y yacc y.tab.c cc or gcc
  • 18. YACC File Format Definitions %% Rules %% Supplementary Code The identical LEX format wasThe identical LEX format was actually taken from this...actually taken from this...
  • 19. Rules Section • Is a grammar • Example expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM;
  • 20. Rules Section • Normally written like this • Example: expr : expr '+' term | term ; term : term '*' factor | factor ; factor : '(' expr ')' | ID | NUM ;
  • 21. Definitions Section Example %{ #include <stdio.h> #include <stdlib.h> %} %token ID NUM %start expr This is called a terminal The start symbol (non-terminal)
  • 22. Sidebar • LEX produces a function called yylex() • YACC produces a function called yyparse() • yyparse() expects to be able to call yylex() • How to get yylex()? • Write your own! • If you don't want to write your own: Use LEX!!!
  • 23. Sidebar int yylex() { if(it's a num) return NUM; else if(it's an id) return ID; else if(parsing is done) return 0; else if(it's an error) return -1; }
  • 24. Semantic actions expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } | ID | NUM ;
  • 25. Semantic actions (cont’d) expr : exprexpr '+' term { $$ = $1 + $3; } | termterm { $$ = $1; } ; term : termterm '*' factor { $$ = $1 * $3; } | factorfactor { $$ = $1; } ; factor : '((' expr ')' { $$ = $2; } | ID | NUM ; $1$1
  • 26. Semantic actions (cont’d) expr : expr '++' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '**' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' exprexpr ')' { $$ = $2; } | ID | NUM ; $2$2
  • 27. Semantic actions (cont’d) expr : expr '+' termterm { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factorfactor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr '))' { $$ = $2; } | ID | NUM ; $3$3 Default: $$ = $1;Default: $$ = $1;
  • 28. yacc -v gram.y • Will produce: y.output Bored, lonely? Try this! yacc -d gram.y • Will produce: y.tab.h Look at this and you'll never be unhappy again! Look at this and you'll never be unhappy again! Shows "State Machine"® Shows "State Machine"®
  • 29. Example: LEX %{ #include <stdio.h> #include "y.tab.h" %} id [_a-zA-Z][_a-zA-Z0-9]* wspc [ tn]+ semi [;] comma [,] %% int { return INT; } char { return CHAR; } float { return FLOAT; } {comma} { return COMMA; } /* Necessary? */ {semi} { return SEMI; } {id} { return ID;} {wspc} {;} scanner.lscanner.l
  • 30. Example: Definitions %{ #include <stdio.h> #include <stdlib.h> %} %start line %token CHAR, COMMA, FLOAT, ID, INT, SEMI %% decl.ydecl.y
  • 31. /* This production is not part of the "official" * grammar. It's primary purpose is to recover from * parser errors, so it's probably best if you leave * it here. */ line : /* lambda */ | line decl | line error { printf("Failure :-(n"); yyerrok; yyclearin; } ; Example: Rules decl.ydecl.y
  • 32. Example: Rules decl : type ID list { printf("Success!n"); } ; list : COMMA ID list | SEMI ; type : INT | CHAR | FLOAT ; %% decl.ydecl.y
  • 33. Example: Supplementary Code extern FILE *yyin; main() { do { yyparse(); } while(!feof(yyin)); } yyerror(char *s) { /* Don't have to do anything! */ } decl.ydecl.y
  • 34. Bored, lonely? Try this! yacc -d decl.y • Produced y.tab.h # define CHAR 257 # define COMMA 258 # define FLOAT 259 # define ID 260 # define INT 261 # define SEMI 262
  • 35. Symbol attributes • Back to attribute grammars... • Every symbol can have a value – Might be a numeric quantity in case of a number (42) – Might be a pointer to a string ("Hello, World!") – Might be a pointer to a symbol table entry in case of a variable • When using LEX we put the value into yylval – In complex situations yylval is a union • Typical LEX code: [0-9]+ {yylval = atoi(yytext); return NUM}
  • 36. Symbol attributes (cont’d) • YACC allows symbols to have multiple types of value symbols %union { double dval; int vblno; char* strval; }
  • 37. Symbol attributes (cont’d) %union { double dval; int vblno; char* strval; } yacc -d y.tab.h … extern YYSTYPE yylval; [0-9]+ { yylval.vblno = atoi(yytext); return NUM;} [A-z]+ { yylval.strval = strdup(yytext); return STRING;} LEX file include “y.tab.h”
  • 38. Precedence / Association 1. 1-2-3 = (1-2)-3? or 1-(2-3)? Define ‘-’ operator is left-association. 2. 1-2*3 = 1-(2*3) Define “*” operator is precedent to “-” operator (1) 1 – 2 - 3 (2) 1 – 2 * 3
  • 39. Precedence / Association expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; } %left '+' '-' %left '*' '/' %noassoc UMINUS
  • 40. Precedence / Association %right ‘=‘ %left '<' '>' NE LE GE %left '+' '-‘ %left '*' '/' highest precedence
  • 41. Big trick Getting YACC & LEX to work together!
  • 43. Building Example • Suppose you have a lex file called scanner.l and a yacc file called decl.y and want parser • Steps to build... lex scanner.l yacc -d decl.y gcc -c lex.yy.c y.tab.c gcc -o parser lex.yy.o y.tab.o -ll Note: scanner should include in the definitions section: #include "y.tab.h"
  • 44. YACC • Rules may be recursive • Rules may be ambiguous • Uses bottom-up Shift/Reduce parsing – Get a token – Push onto stack – Can it be reduced (How do we know?) • If yes: Reduce using a rule • If no: Get another token • YACC cannot look ahead more than one token
  • 45. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: a = 7; b = 3 + a + 2 stack: <empty>
  • 46. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: = 7; b = 3 + a + 2 stack: NAME SHIFT!
  • 47. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: 7; b = 3 + a + 2 stack: NAME ‘=‘ SHIFT!
  • 48. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: ; b = 3 + a + 2 stack: NAME ‘=‘ 7 SHIFT!
  • 49. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: ; b = 3 + a + 2 stack: NAME ‘=‘ exp REDUCE!
  • 50. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: ; b = 3 + a + 2 stack: stmt REDUCE!
  • 51. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: b = 3 + a + 2 stack: stmt ‘;’ SHIFT!
  • 52. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: = 3 + a + 2 stack: stmt ‘;’ NAME SHIFT!
  • 53. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: 3 + a + 2 stack: stmt ‘;’ NAME ‘=‘ SHIFT!
  • 54. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + a + 2 stack: stmt ‘;’ NAME ‘=‘ NUMBER SHIFT!
  • 55. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + a + 2 stack: stmt ‘;’ NAME ‘=‘ exp REDUCE!
  • 56. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: a + 2 stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ SHIFT!
  • 57. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + 2 stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ NAME SHIFT!
  • 58. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + 2 stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ exp REDUCE!
  • 59. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + 2 stack: stmt ‘;’ NAME ‘=‘ exp REDUCE!
  • 60. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: 2 stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ SHIFT!
  • 61. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: <empty> stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ NUMBER SHIFT!
  • 62. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: <empty> stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ exp REDUCE!
  • 63. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: <empty> stack: stmt ‘;’ NAME ‘=‘ exp REDUCE!
  • 64. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: <empty> stack: stmt ‘;’ stmt REDUCE!
  • 65. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: <empty> stack: stmt REDUCE!
  • 66. Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: <empty> stack: stmt DONE!
  • 67. IF-ELSE Ambiguity • Consider following rule: Following state : IF expr IF expr stmt . ELSE stmt • Two possible derivations: IF expr IF expr stmt . ELSE stmt IF expr IF expr stmt ELSE . stmt IF expr IF expr stmt ELSE stmt . IF expr stmt IF expr IF expr stmt . ELSE stmt IF expr stmt . ELSE stmt IF expr stmt ELSE . stmt IF expr stmt ELSE stmt .
  • 68. IF-ELSE Ambiguity • It is a shift/reduce conflict • YACC will always do shift first • Solution 1 : re-write grammar stmt : matched | unmatched ; matched: other_stmt | IF expr THEN matched ELSE matched ; unmatched: IF expr THEN stmt | IF expr THEN matched ELSE unmatched ;
  • 69. • Solution 2: IF-ELSE Ambiguity the rule has the same precedence as token IFX
  • 70. Shift/Reduce Conflicts • shift/reduce conflict – occurs when a grammar is written in such a way that a decision between shifting and reducing can not be made. – e.g.: IF-ELSE ambiguity • To resolve this conflict, YACC will choose to shift
  • 71. Reduce/Reduce Conflicts • Reduce/Reduce Conflicts: start : expr | stmt ; expr : CONSTANT; stmt : CONSTANT; • YACC (Bison) resolves the conflict by reducing using the rule that occurs earlier in the grammar. NOT GOOD!! • So, modify grammar to eliminate them
  • 72. Error Messages • Bad error message: – Syntax error – Compiler needs to give programmer a good advice • It is better to track the line number in LEX: void yyerror(char *s) { fprintf(stderr, "line %d: %sn:", yylineno, s); }
  • 73. Recursive Grammar • Left recursion • Right recursion • LR parser prefers left recursion • LL parser prefers right recursion list: item | list ',' item ; list: item | item ',' list ;
  • 74. YACC Example • Taken from LEX & YACC • Simple calculator a = 4 + 6 a a=10 b = 7 c = a + b c c = 17 pressure = (78 + 34) * 16.4 $
  • 75. Grammar expression ::= expression '+' term | expression '-' term | term term ::= term '*' factor | term '/' factor | factor factor ::= '(' expression ')' | '-' factor | NUMBER | NAME
  • 77. /* *Header for calculator program */ #define NSYMS 20 /* maximum number of symbols */ struct symtab { char *name; double value; } symtab[NSYMS]; struct symtab *symlook(); parser.h name value0 name value1 name value2 name value3 name value4 name value5 name value6 name value7 name value8 name value9 name value10 name value11 name value12 name value13 name value14
  • 79. %{ #include "parser.h" #include <string.h> %} %union { double dval; struct symtab *symp; } %token <symp> NAME %token <dval> NUMBER %type <dval> expression %type <dval> term %type <dval> factor %% parser.y
  • 80. statement_list: statement 'n' | statement_list statement 'n‘ ; statement: NAME '=' expression { $1->value = $3; } | expression { printf("= %gn", $1); } ; expression: expression '+' term { $$ = $1 + $3; } | expression '-' term { $$ = $1 - $3; } term ; parser.y
  • 81. term: term '*' factor { $$ = $1 * $3; } | term '/' factor { if($3 == 0.0) yyerror("divide by zero"); else $$ = $1 / $3; } | factor ; factor: '(' expression ')' { $$ = $2; } | '-' factor { $$ = -$2; } | NUMBER | NAME { $$ = $1->value; } ; %% parser.y
  • 82. /* look up a symbol table entry, add if not present */ struct symtab *symlook(char *s) { char *p; struct symtab *sp; for(sp = symtab; sp < &symtab[NSYMS]; sp++) { /* is it already here? */ if(sp->name && !strcmp(sp->name, s)) return sp; if(!sp->name) { /* is it free */ sp->name = strdup(s); return sp; } /* otherwise continue to next */ } yyerror("Too many symbols"); exit(1); /* cannot continue */ } /* symlook */ parser.y
  • 83. yyerror(char *s) { printf( "yyerror: %sn", s); } parser.y
  • 84. typedef union { double dval; struct symtab *symp; } YYSTYPE; extern YYSTYPE yylval; # define NAME 257 # define NUMBER 258 y.tab.h
  • 87. %% ([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) { yylval.dval = atof(yytext); return NUMBER; } [ t] ; /* ignore white space */ [A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); return NAME; } "$" { return 0; /* end of input */ } n|. return yytext[0]; %% calclexer.l
  • 89. Makefile LEX = lex YACC = yacc CC = gcc calcu: y.tab.o lex.yy.o $(CC) -o calcu y.tab.o lex.yy.o -ly -ll y.tab.c y.tab.h: parser.y $(YACC) -d parser.y y.tab.o: y.tab.c parser.h $(CC) -c y.tab.c lex.yy.o: y.tab.h lex.yy.c $(CC) -c lex.yy.c lex.yy.c: calclexer.l parser.h $(LEX) calclexer.l clean: rm *.o rm *.c rm calcu
  • 90. YACC Declaration Summary `%start' Specify the grammar's start symbol `%union‘ Declare the collection of data types that semantic values may have `%token‘ Declare a terminal symbol (token type name) with no precedence or associativity specified `%type‘ Declare the type of semantic values for a nonterminal symbol
  • 91. YACC Declaration Summary `%right‘ Declare a terminal symbol (token type name) that is right-associative `%left‘ Declare a terminal symbol (token type name) that is left-associative `%nonassoc‘ Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, e.g.: x op. y op. z is syntax error)