SlideShare une entreprise Scribd logo
1  sur  64
Lex Yacc tutorial 
Kun-Yuan Hsieh 
kyshieh@pllab.cs.nthu.edu.tw 
Programming Language Lab., NTHU 
PLLab, NTHU,Cs2403 Programming 
Languages 
1
PLLab, NTHU,Cs2403 Programming 
Languages 
2 
Overview 
take a glance at Lex!
Compilation Sequence 
PLLab, NTHU,Cs2403 Programming 
Languages 
3
PLLab, NTHU,Cs2403 Programming 
Languages 
4 
What is Lex? 
• The main job of a lexical analyzer 
(scanner) is to break up an input stream 
into more usable elements (tokens) 
a = b + c * d; 
ID ASSIGN ID PLUS ID MULT ID SEMI 
• Lex is an utility to help you rapidly 
generate your scanners
Lex – Lexical Analyzer 
• Lexical analyzers tokenize input streams 
• Tokens are the terminals of a language 
– English 
PLLab, NTHU,Cs2403 Programming 
Languages 
5 
• words, punctuation marks, … 
– Programming language 
• Identifiers, operators, keywords, … 
• Regular expressions define 
terminals/tokens
Lex Source Program 
PLLab, NTHU,Cs2403 Programming 
Languages 
6 
• Lex source is a table of 
– regular expressions and 
– corresponding program fragments 
digit [0-9] 
letter [a-zA-Z] 
%% 
{letter}({letter}|{digit})* printf(“id: %sn”, yytext); 
n printf(“new linen”); 
%% 
main() { 
yylex(); 
}
Lex Source to C Program 
• The table is translated to a C program 
(lex.yy.c) which 
– reads an input stream 
– partitioning the input into strings which 
match the given expressions and 
– copying it to an output stream if necessary 
PLLab, NTHU,Cs2403 Programming 
Languages 
7
An Overview of Lex 
PLLab, NTHU,Cs2403 Programming 
Languages 
8 
Lex 
C compiler 
a.out 
Lex source 
program 
lex.yy.c 
input 
lex.yy.c 
a.out 
tokens
(required) 
(optional) 
{definitions} 
%% 
{transition rules} 
%% 
{user subroutines} 
PLLab, NTHU,Cs2403 Programming 
Languages 
9 
Lex Source 
• Lex source is separated into three sections by % 
% delimiters 
• The general format of Lex source is 
• The %% 
absolute minimum Lex program is thus
PLLab, NTHU,Cs2403 Programming 
Languages 
10 
Lex v.s. Yacc 
• Lex 
– Lex generates C code for a lexical analyzer, or 
scanner 
– Lex uses patterns that match strings in the input 
and converts the strings to tokens 
• Yacc 
– Yacc generates C code for syntax analyzer, or 
parser. 
– Yacc uses grammar rules that allow it to analyze 
tokens from Lex and create a syntax tree.
Lex source 
(Lexical Rules) 
Yacc source 
(Grammar Rules) 
call 
Input Parsed 
PLLab, NTHU,Cs2403 Programming 
Languages 
11 
Lex with Yacc 
Lex Yacc 
yylex() yyparse() 
Input 
lex.yy.c y.tab.c 
return token
Regular Expressions 
PLLab, NTHU,Cs2403 Programming 
Languages 
12
Lex Regular Expressions 
(Extended Regular Expressions) 
• A regular expression matches a set of strings 
• Regular expression 
PLLab, NTHU,Cs2403 Programming 
Languages 
13 
– Operators 
– Character classes 
– Arbitrary character 
– Optional expressions 
– Alternation and grouping 
– Context sensitivity 
– Repetitions and definitions
PLLab, NTHU,Cs2403 Programming 
Languages 
14 
Operators 
“  [ ] ^ - ? . * + | ( ) $ / { } % < > 
• If they are to be used as text characters, an 
escape should be used 
$ = “$” 
 = “” 
• Every character but blank, tab (t), newline (n) 
and the list above is always a text character
Character Classes [] 
• [abc] matches a single character, which may 
be a, b, or c 
• Every operator meaning is ignored except  - 
and ^ 
• e.g. 
[ab] => a or b 
[a-z] => a or b or c or … or z 
[-+0-9] => all the digits and the two signs 
[^a-zA-Z] => any character which is not a 
PLLab, NTHU,Cs2403 Programming 
Languages 
15 
letter
Arbitrary Character . 
• To match almost character, the 
operator character . is the class of all 
characters except newline 
• [40-176] matches all printable 
characters in the ASCII character set, 
from octal 40 (blank) to octal 176 
(tilde~) 
PLLab, NTHU,Cs2403 Programming 
Languages 
16
Optional & Repeated 
PLLab, NTHU,Cs2403 Programming 
Languages 
17 
Expressions 
• a? => zero or one instance of a 
• a* => zero or more instances of a 
• a+ => one or more instances of a 
• E.g. 
ab?c => ac or abc 
[a-z]+ => all strings of lower case letters 
[a-zA-Z][a-zA-Z0-9]* => all 
alphanumeric strings with a leading 
alphabetic character
Precedence of Operators 
• Level of precedence 
– Kleene closure (*), ?, + 
– concatenation 
– alternation (|) 
• All operators are left associative. 
• Ex: a*b|cd* = ((a*)b)|(c(d*)) 
PLLab, NTHU,Cs2403 Programming 
Languages 
18
Pattern Matching Primitives 
Metacharacter Matches 
. any character except newline 
n newline 
* zero or more copies of the preceding expression 
+ one or more copies of the preceding expression 
? zero or one copy of the preceding expression 
^ beginning of line / complement 
$ end of line 
a|b a or b 
(ab)+ one or more copies of ab (grouping) 
[ab] a or b 
a{3} 3 instances of a 
“a+b” literal “a+b” (C escapes still work) 
PLLab, NTHU,Cs2403 Programming 
Languages 
19
Recall: Lex Source 
a = b + c; 
a operator: ASSIGNMENT b + c; 
PLLab, NTHU,Cs2403 Programming 
Languages 
20 
• Lex source is a table of 
– regular expressions and 
– corresponding program fragments (actions) 
… 
%% 
<regexp> <action> 
<regexp> <action> 
… 
%% 
%% 
“=“ printf(“operator: ASSIGNMENT”);
Transition Rules 
• regexp <one or more blanks> action (C code); 
• regexp <one or more blanks> { actions (C code) } 
• A null statement ; will ignore the input (no 
actions) 
[ tn] ; 
– Causes the three spacing characters to be ignored a = b + c; 
PLLab, NTHU,Cs2403 Programming 
Languages 
21 
d = b * c; 
↓ ↓ 
a=b+c;d=b*c;
Transition Rules (cont’d) 
• Four special options for actions: 
|, ECHO;, BEGIN, and REJECT; 
• | indicates that the action for this rule is from 
the action for the next rule 
PLLab, NTHU,Cs2403 Programming 
Languages 
22 
– [ tn] ; 
– “ “ | 
“t” | 
“n” ; 
• The unmatched token is using a default 
action that ECHO from the input to the output
Transition Rules (cont’d) 
PLLab, NTHU,Cs2403 Programming 
Languages 
23 
• REJECT 
– Go do the next alternative 
… 
%% 
pink {npink++; REJECT;} 
ink {nink++; REJECT;} 
pin {npin++; REJECT;} 
. | 
n ; 
%% 
…
Lex Predefined Variables 
PLLab, NTHU,Cs2403 Programming 
Languages 
24 
• yytext -- a string containing the lexeme 
• yyleng -- the length of the lexeme 
• yyin -- the input stream pointer 
– the default input of default main() is stdin 
• yyout -- the output stream pointer 
– the default output of default main() is stdout. 
• cs20: %./a.out < inputfile > outfile 
• E.g. 
[a-z]+ printf(“%s”, yytext); 
[a-z]+ ECHO; 
[a-zA-Z]+ {words++; chars += yyleng;}
Lex Library Routines 
PLLab, NTHU,Cs2403 Programming 
Languages 
25 
• yylex() 
– The default main() contains a call of yylex() 
• yymore() 
– return the next token 
• yyless(n) 
– retain the first n characters in yytext 
• yywarp() 
– is called whenever Lex reaches an end-of-file 
– The default yywarp() always returns 1
Review of Lex Predefined 
PLLab, NTHU,Cs2403 Programming 
Languages 
26 
Variables 
Name Function 
char *yytext pointer to matched string 
int yyleng length of matched string 
FILE *yyin input stream pointer 
FILE *yyout output stream pointer 
int yylex(void) call to invoke lexer, returns token 
char* yymore(void) return the next token 
int yyless(int n) retain the first n characters in yytext 
int yywrap(void) wrapup, return 1 if done, 0 if not done 
ECHO write matched string 
REJECT go to the next alternative rule 
INITAL initial start condition 
BEGIN condition switch start condition
User Subroutines Section 
• You can use your Lex routines in the same 
ways you use routines in other programming 
languages. 
PLLab, NTHU,Cs2403 Programming 
Languages 
27 
%{ 
void foo(); 
%} 
letter [a-zA-Z] 
%% 
{letter}+ foo(); 
%% 
… 
void foo() { 
… 
}
User Subroutines Section 
PLLab, NTHU,Cs2403 Programming 
Languages 
28 
(cont’d) 
• The section where main() is placed 
%{ 
int counter = 0; 
%} 
letter [a-zA-Z] 
%% 
{letter}+ {printf(“a wordn”); counter++;} 
%% 
main() { 
yylex(); 
printf(“There are total %d wordsn”, counter); 
}
PLLab, NTHU,Cs2403 Programming 
Languages 
29 
Usage 
• To run Lex on a source file, type 
lex scanner.l 
• It produces a file named lex.yy.c which is 
a C program for the lexical analyzer. 
• To compile lex.yy.c, type 
cc lex.yy.c –ll 
• To run the lexical analyzer program, type 
./a.out < inputfile
PLLab, NTHU,Cs2403 Programming 
Languages 
30 
Versions of Lex 
• AT&T -- lex 
http://www.combo.org/lex_yacc_page/lex.html 
• GNU -- flex 
http://www.gnu.org/manual/flex-2.5.4/flex.html 
• a Win32 version of flex : 
http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html 
or Cygwin : 
http://sources.redhat.com/cygwin/ 
• Lex on different machines is not created equal.
Yacc - Yet Another Compiler- 
PLLab, NTHU,Cs2403 Programming 
Languages 
31 
Compiler
PLLab, NTHU,Cs2403 Programming 
Languages 
32 
Introduction 
• What is YACC ? 
– Tool which will produce a parser for a 
given grammar. 
– YACC (Yet Another Compiler Compiler) 
is a program designed to compile a 
LALR(1) grammar and to produce the 
source code of the syntactic analyzer of 
the language produced by this grammar.
How YACC Works 
PLLab, NTHU,Cs2403 Programming 
Languages 
33 
a.out 
File containing desired 
grammar in yacc format 
yyaacccc pprrooggrraamm 
C source program created by yacc 
CC ccoommppiilleerr 
Executable program that will parse 
grammar given in gram.y 
gram.y 
yacc 
y.tab.c 
cc 
or gcc
How YACC Works 
y.output 
y.tab.c a.out 
PLLab, NTHU,Cs2403 Programming 
Languages 
34 
yacc 
(1) Parser generation time 
YACC source (*.y) 
y.tab.h 
y.tab.c 
C compiler/linker 
(2) Compile time 
a.out 
(3) Run time 
Token stream 
Abstract 
Syntax 
Tree
An YACC File Example 
PLLab, NTHU,Cs2403 Programming 
Languages 
35 
%{ 
#include <stdio.h> 
%} 
%token NAME NUMBER 
%% 
statement: NAME '=' expression 
| expression { printf("= %dn", $1); } 
; 
expression: expression '+' NUMBER { $$ = $1 + $3; } 
| expression '-' NUMBER { $$ = $1 - $3; } 
| NUMBER { $$ = $1; } 
; 
%% 
int yyerror(char *s) 
{ 
fprintf(stderr, "%sn", s); 
return 0; 
} 
int main(void) 
{ 
yyparse(); 
return 0; 
}
PLLab, NTHU,Cs2403 Programming 
Languages 
36 
Works with Lex 
How to 
work ?
PLLab, NTHU,Cs2403 Programming 
Languages 
37 
Works with Lex 
call yylex() [0-9]+ 
next token is NUM 
NUM ‘+’ NUM
YACC File Format 
PLLab, NTHU,Cs2403 Programming 
Languages 
38 
%{ 
C declarations 
%} 
yacc declarations 
%% 
Grammar rules 
%% 
Additional C code 
– Comments enclosed in /* ... */ may appear in 
any of the sections.
Definitions Section 
PLLab, NTHU,Cs2403 Programming 
Languages 
39 
%{ 
#include <stdio.h> 
#include <stdlib.h> 
%} 
%token ID NUM 
%start expr 
It is a terminal 
由 expr 開始 
parse
PLLab, NTHU,Cs2403 Programming 
Languages 
40 
Start Symbol 
• The first non-terminal specified in the 
grammar specification section. 
• To overwrite it with %start declaraction. 
%start non-terminal
PLLab, NTHU,Cs2403 Programming 
Languages 
41 
Rules Section 
• This section defines grammar 
• Example 
expr : expr '+' term | term; 
term : term '*' factor | factor; 
factor : '(' expr ')' | ID | NUM;
PLLab, NTHU,Cs2403 Programming 
Languages 
42 
Rules Section 
• Normally written like this 
• Example: 
expr : expr '+' term 
| term 
; 
term : term '*' factor 
| factor 
; 
factor : '(' expr ')' 
| ID 
| NUM 
;
The Position of Rules 
expr : expr '+' term { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '*' factor { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' expr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
43 
| ID 
| NUM 
;
The Position of Rules 
expr : eexxpprr '+' term { $$ = $1 + $3; } 
| tteerrmm { $$ = $1; } 
; 
term : tteerrmm '*' factor { $$ = $1 * $3; } 
| ffaaccttoorr { $$ = $1; } 
; 
factor : '((' expr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
44 
| ID 
| NUM 
; 
$$11
The Position of Rules 
expr : expr '++' term { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '**' factor { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' eexxpprr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
45 
| ID 
| NUM 
; $$22
The Position of Rules 
expr : expr '+' tteerrmm { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '*' ffaaccttoorr { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' expr '))' { $$ = $2; } 
| ID 
| NUM 
; $$33 DDeeffaauulltt:: $$$$ == $$11;; 
PLLab, NTHU,Cs2403 Programming 
Languages 
46
Communication between LEX and 
PLLab, NTHU,Cs2403 Programming 
Languages 
47 
YACC 
call yylex() [0-9]+ 
next token is NUM 
NUM ‘+’ NUM 
LEX and YACC需要一套方法確認token的身份
Communication between LEX 
PLLab, NTHU,Cs2403 Programming 
Languages 
48 
and YACC 
yacc -d gram.y 
Will produce: 
y.tab.h 
• Use enumeration ( 列舉 ) / 
define 
• 由一方產生,另一方 
include 
• YACC 產生 y.tab.h 
• LEX include y.tab.h
Communication between LEX 
scanner.l 
PLLab, NTHU,Cs2403 Programming 
Languages 
49 
and YACC 
%{ 
#include <stdio.h> 
#include "y.tab.h" 
%} 
id [_a-zA-Z][_a-zA-Z0-9]* 
%% 
int { return INT; } 
char { return CHAR; } 
float { return FLOAT; } 
{id} { return ID;} 
%{ 
#include <stdio.h> 
#include <stdlib.h> 
%} 
%token CHAR, FLOAT, ID, INT 
%% 
yacc -d xxx.y 
Produced 
y.tab.h: 
# define CHAR 258 
# define FLOAT 259 
# define ID 260 
# define INT 261 
parser.y
Phrase -> cart_animal AND CART 
| work_animal AND PLOW… 
PLLab, NTHU,Cs2403 Programming 
Languages 
50 
YACC 
• Rules may be recursive 
• Rules may be ambiguous* 
• Uses bottom up Shift/Reduce parsing 
– Get a token 
– Push onto stack 
– Can it reduced (How do we know?) 
• If yes: Reduce using a rule 
• If no: Get another token 
• Yacc cannot look ahead more than one token
PLLab, NTHU,Cs2403 Programming 
Languages 
51 
Yacc Example 
• Taken from Lex & Yacc 
• Simple calculator 
a = 4 + 6 
a 
a=10 
b = 7 
c = a + b 
c 
c = 17 
$
PLLab, NTHU,Cs2403 Programming 
Languages 
52 
Grammar 
expression ::= expression '+' term | 
expression '-' term | 
term 
term ::= term '*' factor | 
term '/' factor | 
factor 
factor ::= '(' expression ')' | 
'-' factor | 
NUMBER | 
NAME
PLLab, NTHU,Cs2403 Programming 
Languages 
53 
statement_list: statement 'n' 
| statement_list statement 'n' 
; 
statement: NAME '=' expression { $1->value = $3; } 
| expression { printf("= %gn", $1); } 
; 
expression: expression '+' term { $$ = $1 + $3; } 
| expression '-' term { $$ = $1 - $3; } 
| term 
; 
parser.y 
Parser (cont’d)
Parser (cont’d) 
PLLab, NTHU,Cs2403 Programming 
Languages 
54 
term: term '*' factor { $$ = $1 * $3; } 
| term '/' factor { if ($3 == 0.0) 
yyerror("divide by zero"); 
else 
$$ = $1 / $3; 
} 
| factor 
; 
factor: '(' expression ')' { $$ = $2; } 
| '-' factor { $$ = -$2; } 
| NUMBER { $$ = $1; } 
| NAME { $$ = $1->value; } 
; 
%% parser.y
Scanner 
%{ 
#include "y.tab.h" 
#include "parser.h" 
#include <math.h> 
%} 
%% 
([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) { 
PLLab, NTHU,Cs2403 Programming 
Languages 
55 
yylval.dval = atof(yytext); 
return NUMBER; 
} 
[ t] ; /* ignore white space */ 
scanner.l
Scanner (cont’d) 
[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ 
yylval.symp = symlook(yytext); 
return NAME; 
PLLab, NTHU,Cs2403 Programming 
Languages 
56 
} 
"$" { return 0; /* end of input */ } 
n|”=“|”+”|”-”|”*”|”/” return yytext[0]; 
%% 
scanner.l
YACC Command 
PLLab, NTHU,Cs2403 Programming 
Languages 
57 
• Yacc (AT&T) 
– yacc –d xxx.y 
• Bison (GNU) 
– bison –d –y xxx.y 
產生y.tab.c, 與yacc相同 
不然會產生xxx.tab.c
(1) 1 – 2 - 3 
(2) 1 – 2 * 3 
PLLab, NTHU,Cs2403 Programming 
Languages 
58 
Precedence / 
Association 
1. 1-2-3 = (1-2)-3? or 1-(2-3)? 
Define ‘-’ operator is left-association. 
2. 1-2*3 = 1-(2*3) 
Define “*” operator is precedent to “-” 
operator
PLLab, NTHU,Cs2403 Programming 
Languages 
59 
Precedence / 
Association 
%right ‘=‘ 
%left '<' '>' NE LE GE 
%left '+' '-‘ 
%left '*' '/' 
highest precedence
%left '+' '-' 
%left '*' '/' 
%noassoc UMINUS 
PLLab, NTHU,Cs2403 Programming 
Languages 
60 
Precedence / 
Association 
expr : expr ‘+’ expr { $$ = $1 + $3; } 
| expr ‘-’ expr { $$ = $1 - $3; } 
| expr ‘*’ expr { $$ = $1 * $3; } 
| expr ‘/’ expr 
{ 
if($3==0) 
yyerror(“divide 0”); 
else 
$$ = $1 / $3; 
} 
| ‘-’ expr %prec UMINUS {$$ = -$2; }
Shift/Reduce Conflicts 
• shift/reduce conflict 
– occurs when a grammar is written in 
such a way that a decision between 
shifting and reducing can not be made. 
– ex: IF-ELSE ambigious. 
• To resolve this conflict, yacc will 
choose to shift. 
PLLab, NTHU,Cs2403 Programming 
Languages 
61
YACC Declaration 
PLLab, NTHU,Cs2403 Programming 
Languages 
62 
Summary `%start' 
Specify the grammar's start symbol 
`%union' 
Declare the collection of data types that semantic values may 
have 
`%token' 
Declare a terminal symbol (token type name) with no 
precedence or 
associativity specified 
`%type' 
Declare the type of semantic values for a nonterminal symbol
YACC Declaration 
PLLab, NTHU,Cs2403 Programming 
Languages 
63 
`%right' Summary 
Declare a terminal symbol (token type name) that is 
right-associative 
`%left' 
Declare a terminal symbol (token type name) that is left-associative 
`%nonassoc' 
Declare a terminal symbol (token type name) that is 
nonassociative 
(using it in a way that would be associative is a syntax error, 
ex: x op. y op. z is syntax error)
Reference Books 
PLLab, NTHU,Cs2403 Programming 
Languages 
64 
• lex & yacc, 2nd Edition 
– by John R.Levine, Tony Mason & Doug 
Brown 
– O’Reilly 
– ISBN: 1-56592-000-7 
• Mastering Regular Expressions 
– by Jeffrey E.F. Friedl 
– O’Reilly 
– ISBN: 1-56592-257-3

Contenu connexe

Tendances

Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compilerIffat Anjum
 
Convert ER diagram to Relational model and Normalization
Convert ER diagram to Relational model and NormalizationConvert ER diagram to Relational model and Normalization
Convert ER diagram to Relational model and NormalizationSaadi Rahman
 
Issues in the design of Code Generator
Issues in the design of Code GeneratorIssues in the design of Code Generator
Issues in the design of Code GeneratorDarshan sai Reddy
 
Overview of Concurrency Control & Recovery in Distributed Databases
Overview of Concurrency Control & Recovery in Distributed DatabasesOverview of Concurrency Control & Recovery in Distributed Databases
Overview of Concurrency Control & Recovery in Distributed DatabasesMeghaj Mallick
 
Error detection recovery
Error detection recoveryError detection recovery
Error detection recoveryTech_MX
 
Compiler Design
Compiler DesignCompiler Design
Compiler DesignMir Majid
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designSadia Akter
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating SystemsRitu Ranjan Shrivastwa
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design MAHASREEM
 
Bottom - Up Parsing
Bottom - Up ParsingBottom - Up Parsing
Bottom - Up Parsingkunj desai
 

Tendances (20)

Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compiler
 
Convert ER diagram to Relational model and Normalization
Convert ER diagram to Relational model and NormalizationConvert ER diagram to Relational model and Normalization
Convert ER diagram to Relational model and Normalization
 
Role-of-lexical-analysis
Role-of-lexical-analysisRole-of-lexical-analysis
Role-of-lexical-analysis
 
Unit 3 sp assembler
Unit 3 sp assemblerUnit 3 sp assembler
Unit 3 sp assembler
 
Yacc
YaccYacc
Yacc
 
Issues in the design of Code Generator
Issues in the design of Code GeneratorIssues in the design of Code Generator
Issues in the design of Code Generator
 
Overview of Concurrency Control & Recovery in Distributed Databases
Overview of Concurrency Control & Recovery in Distributed DatabasesOverview of Concurrency Control & Recovery in Distributed Databases
Overview of Concurrency Control & Recovery in Distributed Databases
 
Compiler Chapter 1
Compiler Chapter 1Compiler Chapter 1
Compiler Chapter 1
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Process synchronization
Process synchronizationProcess synchronization
Process synchronization
 
Error detection recovery
Error detection recoveryError detection recovery
Error detection recovery
 
Specification-of-tokens
Specification-of-tokensSpecification-of-tokens
Specification-of-tokens
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
 
Compilers
CompilersCompilers
Compilers
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
 
Design of a two pass assembler
Design of a two pass assemblerDesign of a two pass assembler
Design of a two pass assembler
 
Bottom - Up Parsing
Bottom - Up ParsingBottom - Up Parsing
Bottom - Up Parsing
 
Introduction to Compiler
Introduction to CompilerIntroduction to Compiler
Introduction to Compiler
 

Similaire à Lex and Yacc ppt

Similaire à Lex and Yacc ppt (20)

lex and yacc.pdf
lex and yacc.pdflex and yacc.pdf
lex and yacc.pdf
 
system software
system software system software
system software
 
Lexical
LexicalLexical
Lexical
 
Module4 lex and yacc.ppt
Module4 lex and yacc.pptModule4 lex and yacc.ppt
Module4 lex and yacc.ppt
 
Lex (lexical analyzer)
Lex (lexical analyzer)Lex (lexical analyzer)
Lex (lexical analyzer)
 
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolCompiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
 
Compiler design and lexical analyser
Compiler design and lexical analyserCompiler design and lexical analyser
Compiler design and lexical analyser
 
CD U1-5.pptx
CD U1-5.pptxCD U1-5.pptx
CD U1-5.pptx
 
Unit1.ppt
Unit1.pptUnit1.ppt
Unit1.ppt
 
Lex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.pptLex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.ppt
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manual
 
module 4.pptx
module 4.pptxmodule 4.pptx
module 4.pptx
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginners
 
lecture_lex.pdf
lecture_lex.pdflecture_lex.pdf
lecture_lex.pdf
 
Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086
 
Compilers Design
Compilers DesignCompilers Design
Compilers Design
 
Lexical
LexicalLexical
Lexical
 
Lex programming
Lex programmingLex programming
Lex programming
 
Ch 2.pptx
Ch 2.pptxCh 2.pptx
Ch 2.pptx
 

Dernier

Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfrs7054576148
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 

Dernier (20)

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 

Lex and Yacc ppt

  • 1. Lex Yacc tutorial Kun-Yuan Hsieh kyshieh@pllab.cs.nthu.edu.tw Programming Language Lab., NTHU PLLab, NTHU,Cs2403 Programming Languages 1
  • 2. PLLab, NTHU,Cs2403 Programming Languages 2 Overview take a glance at Lex!
  • 3. Compilation Sequence PLLab, NTHU,Cs2403 Programming Languages 3
  • 4. PLLab, NTHU,Cs2403 Programming Languages 4 What is Lex? • The main job of a lexical analyzer (scanner) is to break up an input stream into more usable elements (tokens) a = b + c * d; ID ASSIGN ID PLUS ID MULT ID SEMI • Lex is an utility to help you rapidly generate your scanners
  • 5. Lex – Lexical Analyzer • Lexical analyzers tokenize input streams • Tokens are the terminals of a language – English PLLab, NTHU,Cs2403 Programming Languages 5 • words, punctuation marks, … – Programming language • Identifiers, operators, keywords, … • Regular expressions define terminals/tokens
  • 6. Lex Source Program PLLab, NTHU,Cs2403 Programming Languages 6 • Lex source is a table of – regular expressions and – corresponding program fragments digit [0-9] letter [a-zA-Z] %% {letter}({letter}|{digit})* printf(“id: %sn”, yytext); n printf(“new linen”); %% main() { yylex(); }
  • 7. Lex Source to C Program • The table is translated to a C program (lex.yy.c) which – reads an input stream – partitioning the input into strings which match the given expressions and – copying it to an output stream if necessary PLLab, NTHU,Cs2403 Programming Languages 7
  • 8. An Overview of Lex PLLab, NTHU,Cs2403 Programming Languages 8 Lex C compiler a.out Lex source program lex.yy.c input lex.yy.c a.out tokens
  • 9. (required) (optional) {definitions} %% {transition rules} %% {user subroutines} PLLab, NTHU,Cs2403 Programming Languages 9 Lex Source • Lex source is separated into three sections by % % delimiters • The general format of Lex source is • The %% absolute minimum Lex program is thus
  • 10. PLLab, NTHU,Cs2403 Programming Languages 10 Lex v.s. Yacc • Lex – Lex generates C code for a lexical analyzer, or scanner – Lex uses patterns that match strings in the input and converts the strings to tokens • Yacc – Yacc generates C code for syntax analyzer, or parser. – Yacc uses grammar rules that allow it to analyze tokens from Lex and create a syntax tree.
  • 11. Lex source (Lexical Rules) Yacc source (Grammar Rules) call Input Parsed PLLab, NTHU,Cs2403 Programming Languages 11 Lex with Yacc Lex Yacc yylex() yyparse() Input lex.yy.c y.tab.c return token
  • 12. Regular Expressions PLLab, NTHU,Cs2403 Programming Languages 12
  • 13. Lex Regular Expressions (Extended Regular Expressions) • A regular expression matches a set of strings • Regular expression PLLab, NTHU,Cs2403 Programming Languages 13 – Operators – Character classes – Arbitrary character – Optional expressions – Alternation and grouping – Context sensitivity – Repetitions and definitions
  • 14. PLLab, NTHU,Cs2403 Programming Languages 14 Operators “ [ ] ^ - ? . * + | ( ) $ / { } % < > • If they are to be used as text characters, an escape should be used $ = “$” = “” • Every character but blank, tab (t), newline (n) and the list above is always a text character
  • 15. Character Classes [] • [abc] matches a single character, which may be a, b, or c • Every operator meaning is ignored except - and ^ • e.g. [ab] => a or b [a-z] => a or b or c or … or z [-+0-9] => all the digits and the two signs [^a-zA-Z] => any character which is not a PLLab, NTHU,Cs2403 Programming Languages 15 letter
  • 16. Arbitrary Character . • To match almost character, the operator character . is the class of all characters except newline • [40-176] matches all printable characters in the ASCII character set, from octal 40 (blank) to octal 176 (tilde~) PLLab, NTHU,Cs2403 Programming Languages 16
  • 17. Optional & Repeated PLLab, NTHU,Cs2403 Programming Languages 17 Expressions • a? => zero or one instance of a • a* => zero or more instances of a • a+ => one or more instances of a • E.g. ab?c => ac or abc [a-z]+ => all strings of lower case letters [a-zA-Z][a-zA-Z0-9]* => all alphanumeric strings with a leading alphabetic character
  • 18. Precedence of Operators • Level of precedence – Kleene closure (*), ?, + – concatenation – alternation (|) • All operators are left associative. • Ex: a*b|cd* = ((a*)b)|(c(d*)) PLLab, NTHU,Cs2403 Programming Languages 18
  • 19. Pattern Matching Primitives Metacharacter Matches . any character except newline n newline * zero or more copies of the preceding expression + one or more copies of the preceding expression ? zero or one copy of the preceding expression ^ beginning of line / complement $ end of line a|b a or b (ab)+ one or more copies of ab (grouping) [ab] a or b a{3} 3 instances of a “a+b” literal “a+b” (C escapes still work) PLLab, NTHU,Cs2403 Programming Languages 19
  • 20. Recall: Lex Source a = b + c; a operator: ASSIGNMENT b + c; PLLab, NTHU,Cs2403 Programming Languages 20 • Lex source is a table of – regular expressions and – corresponding program fragments (actions) … %% <regexp> <action> <regexp> <action> … %% %% “=“ printf(“operator: ASSIGNMENT”);
  • 21. Transition Rules • regexp <one or more blanks> action (C code); • regexp <one or more blanks> { actions (C code) } • A null statement ; will ignore the input (no actions) [ tn] ; – Causes the three spacing characters to be ignored a = b + c; PLLab, NTHU,Cs2403 Programming Languages 21 d = b * c; ↓ ↓ a=b+c;d=b*c;
  • 22. Transition Rules (cont’d) • Four special options for actions: |, ECHO;, BEGIN, and REJECT; • | indicates that the action for this rule is from the action for the next rule PLLab, NTHU,Cs2403 Programming Languages 22 – [ tn] ; – “ “ | “t” | “n” ; • The unmatched token is using a default action that ECHO from the input to the output
  • 23. Transition Rules (cont’d) PLLab, NTHU,Cs2403 Programming Languages 23 • REJECT – Go do the next alternative … %% pink {npink++; REJECT;} ink {nink++; REJECT;} pin {npin++; REJECT;} . | n ; %% …
  • 24. Lex Predefined Variables PLLab, NTHU,Cs2403 Programming Languages 24 • yytext -- a string containing the lexeme • yyleng -- the length of the lexeme • yyin -- the input stream pointer – the default input of default main() is stdin • yyout -- the output stream pointer – the default output of default main() is stdout. • cs20: %./a.out < inputfile > outfile • E.g. [a-z]+ printf(“%s”, yytext); [a-z]+ ECHO; [a-zA-Z]+ {words++; chars += yyleng;}
  • 25. Lex Library Routines PLLab, NTHU,Cs2403 Programming Languages 25 • yylex() – The default main() contains a call of yylex() • yymore() – return the next token • yyless(n) – retain the first n characters in yytext • yywarp() – is called whenever Lex reaches an end-of-file – The default yywarp() always returns 1
  • 26. Review of Lex Predefined PLLab, NTHU,Cs2403 Programming Languages 26 Variables Name Function char *yytext pointer to matched string int yyleng length of matched string FILE *yyin input stream pointer FILE *yyout output stream pointer int yylex(void) call to invoke lexer, returns token char* yymore(void) return the next token int yyless(int n) retain the first n characters in yytext int yywrap(void) wrapup, return 1 if done, 0 if not done ECHO write matched string REJECT go to the next alternative rule INITAL initial start condition BEGIN condition switch start condition
  • 27. User Subroutines Section • You can use your Lex routines in the same ways you use routines in other programming languages. PLLab, NTHU,Cs2403 Programming Languages 27 %{ void foo(); %} letter [a-zA-Z] %% {letter}+ foo(); %% … void foo() { … }
  • 28. User Subroutines Section PLLab, NTHU,Cs2403 Programming Languages 28 (cont’d) • The section where main() is placed %{ int counter = 0; %} letter [a-zA-Z] %% {letter}+ {printf(“a wordn”); counter++;} %% main() { yylex(); printf(“There are total %d wordsn”, counter); }
  • 29. PLLab, NTHU,Cs2403 Programming Languages 29 Usage • To run Lex on a source file, type lex scanner.l • It produces a file named lex.yy.c which is a C program for the lexical analyzer. • To compile lex.yy.c, type cc lex.yy.c –ll • To run the lexical analyzer program, type ./a.out < inputfile
  • 30. PLLab, NTHU,Cs2403 Programming Languages 30 Versions of Lex • AT&T -- lex http://www.combo.org/lex_yacc_page/lex.html • GNU -- flex http://www.gnu.org/manual/flex-2.5.4/flex.html • a Win32 version of flex : http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html or Cygwin : http://sources.redhat.com/cygwin/ • Lex on different machines is not created equal.
  • 31. Yacc - Yet Another Compiler- PLLab, NTHU,Cs2403 Programming Languages 31 Compiler
  • 32. PLLab, NTHU,Cs2403 Programming Languages 32 Introduction • What is YACC ? – Tool which will produce a parser for a given grammar. – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar.
  • 33. How YACC Works PLLab, NTHU,Cs2403 Programming Languages 33 a.out File containing desired grammar in yacc format yyaacccc pprrooggrraamm C source program created by yacc CC ccoommppiilleerr Executable program that will parse grammar given in gram.y gram.y yacc y.tab.c cc or gcc
  • 34. How YACC Works y.output y.tab.c a.out PLLab, NTHU,Cs2403 Programming Languages 34 yacc (1) Parser generation time YACC source (*.y) y.tab.h y.tab.c C compiler/linker (2) Compile time a.out (3) Run time Token stream Abstract Syntax Tree
  • 35. An YACC File Example PLLab, NTHU,Cs2403 Programming Languages 35 %{ #include <stdio.h> %} %token NAME NUMBER %% statement: NAME '=' expression | expression { printf("= %dn", $1); } ; expression: expression '+' NUMBER { $$ = $1 + $3; } | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% int yyerror(char *s) { fprintf(stderr, "%sn", s); return 0; } int main(void) { yyparse(); return 0; }
  • 36. PLLab, NTHU,Cs2403 Programming Languages 36 Works with Lex How to work ?
  • 37. PLLab, NTHU,Cs2403 Programming Languages 37 Works with Lex call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM
  • 38. YACC File Format PLLab, NTHU,Cs2403 Programming Languages 38 %{ C declarations %} yacc declarations %% Grammar rules %% Additional C code – Comments enclosed in /* ... */ may appear in any of the sections.
  • 39. Definitions Section PLLab, NTHU,Cs2403 Programming Languages 39 %{ #include <stdio.h> #include <stdlib.h> %} %token ID NUM %start expr It is a terminal 由 expr 開始 parse
  • 40. PLLab, NTHU,Cs2403 Programming Languages 40 Start Symbol • The first non-terminal specified in the grammar specification section. • To overwrite it with %start declaraction. %start non-terminal
  • 41. PLLab, NTHU,Cs2403 Programming Languages 41 Rules Section • This section defines grammar • Example expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM;
  • 42. PLLab, NTHU,Cs2403 Programming Languages 42 Rules Section • Normally written like this • Example: expr : expr '+' term | term ; term : term '*' factor | factor ; factor : '(' expr ')' | ID | NUM ;
  • 43. The Position of Rules expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 43 | ID | NUM ;
  • 44. The Position of Rules expr : eexxpprr '+' term { $$ = $1 + $3; } | tteerrmm { $$ = $1; } ; term : tteerrmm '*' factor { $$ = $1 * $3; } | ffaaccttoorr { $$ = $1; } ; factor : '((' expr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 44 | ID | NUM ; $$11
  • 45. The Position of Rules expr : expr '++' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '**' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' eexxpprr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 45 | ID | NUM ; $$22
  • 46. The Position of Rules expr : expr '+' tteerrmm { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' ffaaccttoorr { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr '))' { $$ = $2; } | ID | NUM ; $$33 DDeeffaauulltt:: $$$$ == $$11;; PLLab, NTHU,Cs2403 Programming Languages 46
  • 47. Communication between LEX and PLLab, NTHU,Cs2403 Programming Languages 47 YACC call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM LEX and YACC需要一套方法確認token的身份
  • 48. Communication between LEX PLLab, NTHU,Cs2403 Programming Languages 48 and YACC yacc -d gram.y Will produce: y.tab.h • Use enumeration ( 列舉 ) / define • 由一方產生,另一方 include • YACC 產生 y.tab.h • LEX include y.tab.h
  • 49. Communication between LEX scanner.l PLLab, NTHU,Cs2403 Programming Languages 49 and YACC %{ #include <stdio.h> #include "y.tab.h" %} id [_a-zA-Z][_a-zA-Z0-9]* %% int { return INT; } char { return CHAR; } float { return FLOAT; } {id} { return ID;} %{ #include <stdio.h> #include <stdlib.h> %} %token CHAR, FLOAT, ID, INT %% yacc -d xxx.y Produced y.tab.h: # define CHAR 258 # define FLOAT 259 # define ID 260 # define INT 261 parser.y
  • 50. Phrase -> cart_animal AND CART | work_animal AND PLOW… PLLab, NTHU,Cs2403 Programming Languages 50 YACC • Rules may be recursive • Rules may be ambiguous* • Uses bottom up Shift/Reduce parsing – Get a token – Push onto stack – Can it reduced (How do we know?) • If yes: Reduce using a rule • If no: Get another token • Yacc cannot look ahead more than one token
  • 51. PLLab, NTHU,Cs2403 Programming Languages 51 Yacc Example • Taken from Lex & Yacc • Simple calculator a = 4 + 6 a a=10 b = 7 c = a + b c c = 17 $
  • 52. PLLab, NTHU,Cs2403 Programming Languages 52 Grammar expression ::= expression '+' term | expression '-' term | term term ::= term '*' factor | term '/' factor | factor factor ::= '(' expression ')' | '-' factor | NUMBER | NAME
  • 53. PLLab, NTHU,Cs2403 Programming Languages 53 statement_list: statement 'n' | statement_list statement 'n' ; statement: NAME '=' expression { $1->value = $3; } | expression { printf("= %gn", $1); } ; expression: expression '+' term { $$ = $1 + $3; } | expression '-' term { $$ = $1 - $3; } | term ; parser.y Parser (cont’d)
  • 54. Parser (cont’d) PLLab, NTHU,Cs2403 Programming Languages 54 term: term '*' factor { $$ = $1 * $3; } | term '/' factor { if ($3 == 0.0) yyerror("divide by zero"); else $$ = $1 / $3; } | factor ; factor: '(' expression ')' { $$ = $2; } | '-' factor { $$ = -$2; } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } ; %% parser.y
  • 55. Scanner %{ #include "y.tab.h" #include "parser.h" #include <math.h> %} %% ([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) { PLLab, NTHU,Cs2403 Programming Languages 55 yylval.dval = atof(yytext); return NUMBER; } [ t] ; /* ignore white space */ scanner.l
  • 56. Scanner (cont’d) [A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); return NAME; PLLab, NTHU,Cs2403 Programming Languages 56 } "$" { return 0; /* end of input */ } n|”=“|”+”|”-”|”*”|”/” return yytext[0]; %% scanner.l
  • 57. YACC Command PLLab, NTHU,Cs2403 Programming Languages 57 • Yacc (AT&T) – yacc –d xxx.y • Bison (GNU) – bison –d –y xxx.y 產生y.tab.c, 與yacc相同 不然會產生xxx.tab.c
  • 58. (1) 1 – 2 - 3 (2) 1 – 2 * 3 PLLab, NTHU,Cs2403 Programming Languages 58 Precedence / Association 1. 1-2-3 = (1-2)-3? or 1-(2-3)? Define ‘-’ operator is left-association. 2. 1-2*3 = 1-(2*3) Define “*” operator is precedent to “-” operator
  • 59. PLLab, NTHU,Cs2403 Programming Languages 59 Precedence / Association %right ‘=‘ %left '<' '>' NE LE GE %left '+' '-‘ %left '*' '/' highest precedence
  • 60. %left '+' '-' %left '*' '/' %noassoc UMINUS PLLab, NTHU,Cs2403 Programming Languages 60 Precedence / Association expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; }
  • 61. Shift/Reduce Conflicts • shift/reduce conflict – occurs when a grammar is written in such a way that a decision between shifting and reducing can not be made. – ex: IF-ELSE ambigious. • To resolve this conflict, yacc will choose to shift. PLLab, NTHU,Cs2403 Programming Languages 61
  • 62. YACC Declaration PLLab, NTHU,Cs2403 Programming Languages 62 Summary `%start' Specify the grammar's start symbol `%union' Declare the collection of data types that semantic values may have `%token' Declare a terminal symbol (token type name) with no precedence or associativity specified `%type' Declare the type of semantic values for a nonterminal symbol
  • 63. YACC Declaration PLLab, NTHU,Cs2403 Programming Languages 63 `%right' Summary Declare a terminal symbol (token type name) that is right-associative `%left' Declare a terminal symbol (token type name) that is left-associative `%nonassoc' Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, ex: x op. y op. z is syntax error)
  • 64. Reference Books PLLab, NTHU,Cs2403 Programming Languages 64 • lex & yacc, 2nd Edition – by John R.Levine, Tony Mason & Doug Brown – O’Reilly – ISBN: 1-56592-000-7 • Mastering Regular Expressions – by Jeffrey E.F. Friedl – O’Reilly – ISBN: 1-56592-257-3