SlideShare a Scribd company logo
1 of 45
Let’s build a Parser!
A short introduction to parsing with PHP

                                   Boy Baukema
                      June 9th 2012, Amsterdam
2
Source: http://www.sxc.hu/photo/1384894
Boy Baukema
Software Engineer @ Ibuildings




                                 3
Reasons for common
fear of writing parsers:
1. Never took
compiler class, think it
is scary.
2. Did take compiler
- Martin Fowler

                           4
Language cacaphony




                                                   5
Source: http://www.wordle.net/show/wrdl/5292561/
     Languages_used_in_PHP_Web_Development
Lookahead (?=

   Languages

   Parsing

   QueryLang

   Parsing PHP code

   Resources



                      6
RegExes
And now you have two problems...




                                   7
Mail::RFC822::Address
(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:
rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(
?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[
t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-0
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*
](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:
(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)
?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
 t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)
?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t]
)*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[
 t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*
)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)
*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+
|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r
n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t
]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031
]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](
?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?
:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?
:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?
:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?
[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]
000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>
@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"
(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?
:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-
031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(
?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;
:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([
                                                                                 8
Source: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
Choamsky hierarchy




                                                                  9
Source: http://en.wikipedia.org/wiki/File:Chomsky-hierarchy.svg
HTTP 1.1 Accept Header BNF
Accept        = "Accept" ":"
         #( media-range [ accept-params ] )

media-range = ( "*/*"
         | ( type "/" "*" )
         | ( type "/" subtype )
         ) *( ";" parameter )

accept-params = ";" "q" "=" qvalue
       *( accept-extension )

accept-extension = ";" token
       [ "=" ( token | quoted-string ) ]
                                                                 10
Source: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Arithmetic expression BNF

<expression> ::= <term>
         | <expression> "+" <term>
<term>        ::= <factor>
         | <term> "*" <factor>
<factor>     ::= <constant>
         | <variable>
         | "(" <expression> ")"
<variable> ::= "x" | "y" | "z"
<constant> ::= <digit>
         | <digit> <constant>
<digit>     ::= "0" | "1" | "2" | "3"
         | "4" | "5" | "6" | "7"
         | "8" | "9"
                                                          11
    Source: http://en.wikipedia.org/wiki/Syntax_diagram
Recursion in BNF

Production
<constant> ::= <digit>
         | <digit> <constant>



<digit>       ::= "0" | "1" | "2" | "3"
Terminal         | "4" | "5" | "6" | "7"
           | "8" | "9"



                                           12
Matching 123

    <constant>
1    <digit>
     <constant>
 2    <digit>
      <constant>
  3     <digit>




<constant> ::= <digit>

                                                               13
Source: https://secure.flickr.com/photos/threedots/110586879/
Arithmetic expression EBNF

   expression = term , {"+" , term};
   term      = factor , {"*" , factor};
   factor    = constant
          | variable
          | "(" , expression , ")";
   variable = "x"
          | "y"
          | "z";
   constant = digit , {digit};
   digit    = "0" | "1" | "2" | "3"
          | "4" | "5" | "6" | "7"
          | "8" | "9";
                                                          14
    Source: http://en.wikipedia.org/wiki/Syntax_diagram
Parsing Expression Grammar

   expression = term ("+" term)*
   product = factor ("*" factor)*
   factor   = constant
          / variable
          / "(" expression ")"
   variable = "x"
          / "y"
          / "z"
   constant = [0-9]+


                                                               15
Source: https://secure.flickr.com/photos/sasastro/5590210866/
So how will this help
me parse a language?



                        16
Parser Generators for PHP
   Lime-php
   LALR(1) , 2008, abandoned

   PHP_ParserGenerator
   LALR(1), 2010, abandoned

   Loco
   combinatory parsing, 2011,
   alpha

   php-peg
   PEG, 2012, active?, alpha
                                17
QueryLang
https://github.com/relaxnow/QueryLang




                                        18
QueryLang: Example query

 parsers OR 123 AND (dpc OR phpbnl)

Query (OR)
|-- Term        -   "parsers"
|-- Query (AND)
   |-- Term     -   "123"
   |-- Query (OR)
      |-- Term -    "dpc"
      |-- Term -    "phpbnl"




                                      19
v1/Peg/grammar.peg.inc

  /*!* QueryLangV1
  Term: /[wd]+/
  */

  public function parse()
  {
    $match = $this->match_Term();
    if (!$match) {
        return '';
    }
    return $match['text'];
  }
                                    20
v1/Peg/Parser.php - generated match_Term

     /* Term: /[wd]+/ */
  protected $match_Term_typestack =
             array('Term');
  function match_Term ($stack = array()) {
  
 $matchrule = "Term"; $result = $this-
>construct($matchrule, $matchrule, null);
  
 if (( $subres = $this->rx( '/[wd]+/' ) ) !==
FALSE) {
  
 
 $result["text"] .= $subres;
  
 
 return $this->finalise($result);
  
 }
  
 else { return FALSE; }
  }
                                                     21
v1/Peg/grammar.peg.inc test



    $parser = new Parser('test');
    print_r($parser->parse());
    // test

    $parser = new Parser('test 123');
    print_r($parser->parse());
    // test




                                        22
v2/Peg/grammar.peg.inc

  /*!* QueryLangV2
  Query: Term (> Term)*
  Term: /[wd]+/
  */

  public function parse()
  {
    $result = $this->match_Query();
    return $result['query'];
  }



                                      23
v2/Peg/grammar.peg.inc (cont.)



public function Query__construct(&$result)
{
  $result['query'] = new NodeQuery();
}

public function Query_Term(&$result, $sub)
{
  $term = new NodeTerm($sub['text']);
  $result['query']->addTerm($term);
}

                                             24
v2/Peg/grammar.peg.inc test



    $parser = new Parser('test 123');
    print_r($parser->parse());

    Query
    |-- Term        - "test"
    |-- Term        - "123"




                                        25
v3/Peg/grammar.peg.inc

  /*!* QueryLangV3
  Query: AndQuery ([ "OR" ] AndQuery)*
  AndQuery: Term ([ "AND" ] Term)*
  Term: "(" Query ")" | Value:/[wd]+/
  */

  public function parse()
  {
    $node = $this->match_Query();
    return $node['query'];
  }


                                          26
v3/Peg/grammar.peg.inc (cont.)

  Query: AndQuery ([ "OR" ] AndQuery)*
  AndQuery: Term ([ "AND" ] Term)*
public function Query__construct(&$r) {
  $r['query'] = new NodeQuery('OR');
}
public function Query_AndQuery(&$r, $s) {
  $r['query']->add($s['query']);
}
public function AndQuery__construct(&$r) {
  $r['query'] = new NodeQuery('AND');
}
public function AndQuery_Term(&$r, $s) {
  $r['query']->add($s['query']);
                                             27
}
v3/Peg/grammar.peg.inc (cont.)

  /*!* QueryLangV3
  Term: "(" Query ")" | Value:/[wd]+/
  */

public function Term_Query(&$r, $s){
  $r['query'] = $s['query'];
}

public function Term_Value(&$r, $s){
  $r['query']= new NodeTerm($s['text']);
}


                                            28
v3/Peg/grammar.peg.inc test



   $parser = new Parser('a AND b OR c');

   Query (OR)
   |-- Query (AND)
   | |-- Term      - "a"
   | |-- Term      - "b"
   |-- Query (AND)
      |-- Term     - "c"



                                           29
Optional: Optimizer / Semantic checking




                                          30
Optimized query

$parser = new Parser('a AND b OR c');
$query = $parser->parse();

$queryOptimizer = new Optimizer($query);
$query = $queryOptimizer->optimize();

Query (OR)
|-- Term        - "c"
|-- Query (AND)
| |-- Term      - "a"
| |-- Term      - "b"


                                           31
Manual Parser Building: Predictive parsing




                                                                   32
     Source: http://en.wikipedia.org/wiki/File:PsychicBoston.jpg
Manual Parser Building: Lexing

Characters get turned into tokens by a lexical
analyzer. Also called lexer, scanner or
tokenizer.

"a OR (b)"

'term'      => "a"
'OR'
'LeftParen'
'term'      => "b"
'RightParen'                                     33
Manual Parser Building: Lexing

if ($this->_match('LeftParen', '/^(()/')) {continue;}
if ($this->_match('RightParen', '/^())/')) {continue;}
if ($this->_match('OR', '/^(OR)/i'))
{continue;}
if ($this->_match('AND', '/^(AND)/i')) {continue;}
if ($this->_match('TermValue', '/^([wd]+)/i'))
{continue;}
if ($this->_match('WS', '/^s+/', true)) {continue;}




                                                          34
Manual Parser Building: Lexing - UML




                                                              35
     Source: http://commons.wikimedia.org/wiki/File:Willem-
                 Alexander,_Prince_of_Orange.jpg
Manual Parser Building: Parsing

Non-terminals become methods

protected function _query();
protected function _andQuery();
protected function _term();
Parse to a tree structure.




                                  36
Manual Parser Building: Parsing - UML




                                        37
Manual Parser Building: example non-terminal
protected function _query() {
 $query = new NodeQuery('OR');

    $leftTerm = $this->_andQuery();
    $query->add($leftTerm);

    while($this->_tokenStream->look()->getType()
         === 'OR') {
      $this->_tokenStream->expect('OR');
      $rightTerm = $this->_andQuery();
      $query->add($rightTerm);
    }
    return $query;
}                                                  38
Predictive Parsing: Warning!

Tokens must be decidable with a fixed lookahead

<term> ::= <TermValue> "-" <TermValue>
     | <TermValue>
     | "(" <Query> ")"

No left recursion

<orQuery> ::= <orQuery> ("OR" <orQuery>)?
      | <term>


                                                 39
But I wanna parse




                    40
PHP Parsers

   PHP_Depend
   1.0.0
   PHP 5.4

   PHP-Parser
   alpha
   PHP 5.4

   phc
   0.3.0.1 (unmaintained)
   PHP 5.2 (?)

                            41
PHPDepend Abstract Syntax Tree example

$string = "Manuel $Pichler <{$email}>";

PHP_Depend_Code_ASTString
|-- ASTLiteral    - "Manuel "
|-- ASTVariable    - $Pichler
|-- ASTLiteral    - " <"
|-- ASTCompoundExpression - {...}
| |-- ASTVariable  - $email
|-- ASTLiteral    - ">"




                                          42
Resources




            43
More resources

Examples of modern parsers in PHP:
   Twig (Predictive Parser)
   Behats Gherkin (Predictive Parser)
   Smarty 3 (LALR parser)

More information:
Rich Programmer Food by Steve Yegge
Let’s Build a Compiler, by Jack Crenshaw
nathansuniversity.com
Coursera: Compilers by Stanford University
SE-Radio: Episode 182: DSLs
                                             44
QUESTIONS?

Joind.in: https://joind.in/6257
Twitter: @relaxnow
E-mail: boy@ibuildings.nl
Slideshare: http://slidesha.re/INY43R           45
GitHub: https://github.com/relaxnow/QueryLang

More Related Content

What's hot

Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)
Kang-min Liu
 
Perl training-in-navi mumbai
Perl training-in-navi mumbaiPerl training-in-navi mumbai
Perl training-in-navi mumbai
vibrantuser
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
Sway Wang
 

What's hot (20)

Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)
Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)
Mirror, mirror on the wall: Building a new PHP reflection library (DPC 2016)
 
Design Patterns - Compiler Case Study - Hands-on Examples
Design Patterns - Compiler Case Study - Hands-on ExamplesDesign Patterns - Compiler Case Study - Hands-on Examples
Design Patterns - Compiler Case Study - Hands-on Examples
 
What's new in PHP 8.0?
What's new in PHP 8.0?What's new in PHP 8.0?
What's new in PHP 8.0?
 
Diving into HHVM Extensions (php[tek] 2016)
Diving into HHVM Extensions (php[tek] 2016)Diving into HHVM Extensions (php[tek] 2016)
Diving into HHVM Extensions (php[tek] 2016)
 
Improving Dev Assistant
Improving Dev AssistantImproving Dev Assistant
Improving Dev Assistant
 
Nikita Popov "What’s new in PHP 8.0?"
Nikita Popov "What’s new in PHP 8.0?"Nikita Popov "What’s new in PHP 8.0?"
Nikita Popov "What’s new in PHP 8.0?"
 
C# 7.0 Hacks and Features
C# 7.0 Hacks and FeaturesC# 7.0 Hacks and Features
C# 7.0 Hacks and Features
 
SPL, not a bridge too far
SPL, not a bridge too farSPL, not a bridge too far
SPL, not a bridge too far
 
Understanding static analysis php amsterdam 2018
Understanding static analysis   php amsterdam 2018Understanding static analysis   php amsterdam 2018
Understanding static analysis php amsterdam 2018
 
Being functional in PHP (DPC 2016)
Being functional in PHP (DPC 2016)Being functional in PHP (DPC 2016)
Being functional in PHP (DPC 2016)
 
A Functional Guide to Cat Herding with PHP Generators
A Functional Guide to Cat Herding with PHP GeneratorsA Functional Guide to Cat Herding with PHP Generators
A Functional Guide to Cat Herding with PHP Generators
 
Functions in PHP
Functions in PHPFunctions in PHP
Functions in PHP
 
php 2 Function creating, calling, PHP built-in function
php 2 Function creating, calling,PHP built-in functionphp 2 Function creating, calling,PHP built-in function
php 2 Function creating, calling, PHP built-in function
 
Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)Good Evils In Perl (Yapc Asia)
Good Evils In Perl (Yapc Asia)
 
Perl 6 in Context
Perl 6 in ContextPerl 6 in Context
Perl 6 in Context
 
PHP Enums - PHPCon Japan 2021
PHP Enums - PHPCon Japan 2021PHP Enums - PHPCon Japan 2021
PHP Enums - PHPCon Japan 2021
 
Perl training-in-navi mumbai
Perl training-in-navi mumbaiPerl training-in-navi mumbai
Perl training-in-navi mumbai
 
Intro to Perl and Bioperl
Intro to Perl and BioperlIntro to Perl and Bioperl
Intro to Perl and Bioperl
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Neatly folding-a-tree
Neatly folding-a-treeNeatly folding-a-tree
Neatly folding-a-tree
 

Viewers also liked

Top down and botttom up Parsing
Top down     and botttom up ParsingTop down     and botttom up Parsing
Top down and botttom up Parsing
Gerwin Ocsena
 
Keyboard warriors #1 copenhagen performance
Keyboard warriors #1 copenhagen   performanceKeyboard warriors #1 copenhagen   performance
Keyboard warriors #1 copenhagen performance
Phillip Trelford
 

Viewers also liked (20)

Write Your Own Compiler in 24 Hours
Write Your Own Compiler in 24 HoursWrite Your Own Compiler in 24 Hours
Write Your Own Compiler in 24 Hours
 
Top down and botttom up Parsing
Top down     and botttom up ParsingTop down     and botttom up Parsing
Top down and botttom up Parsing
 
Auto C A D
Auto C A DAuto C A D
Auto C A D
 
Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015Beyond lists - Copenhagen 2015
Beyond lists - Copenhagen 2015
 
24 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 201524 Hours Later - NCrafts Paris 2015
24 Hours Later - NCrafts Paris 2015
 
Keyboard warriors #1 copenhagen performance
Keyboard warriors #1 copenhagen   performanceKeyboard warriors #1 copenhagen   performance
Keyboard warriors #1 copenhagen performance
 
F# for Trading - QuantLabs 2014
F# for Trading -  QuantLabs 2014F# for Trading -  QuantLabs 2014
F# for Trading - QuantLabs 2014
 
Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015
 
F# in Finance Tour
F# in Finance TourF# in Finance Tour
F# in Finance Tour
 
F# for Trading - Øredev 2013
F# for Trading - Øredev 2013F# for Trading - Øredev 2013
F# for Trading - Øredev 2013
 
Machine learning from disaster - GL.Net 2015
Machine learning from disaster  - GL.Net 2015Machine learning from disaster  - GL.Net 2015
Machine learning from disaster - GL.Net 2015
 
24 hours later - FSharp Gotham 2015
24 hours later - FSharp Gotham  201524 hours later - FSharp Gotham  2015
24 hours later - FSharp Gotham 2015
 
F# in your pipe
F# in your pipeF# in your pipe
F# in your pipe
 
F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015F# for C# devs - Copenhagen .Net 2015
F# for C# devs - Copenhagen .Net 2015
 
Build a compiler in 2hrs - NCrafts Paris 2015
Build a compiler in 2hrs -  NCrafts Paris 2015Build a compiler in 2hrs -  NCrafts Paris 2015
Build a compiler in 2hrs - NCrafts Paris 2015
 
Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015Building cross platform games with Xamarin - Birmingham 2015
Building cross platform games with Xamarin - Birmingham 2015
 
F# eXchange Keynote 2016
F# eXchange Keynote 2016F# eXchange Keynote 2016
F# eXchange Keynote 2016
 
FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015FSharp eye for the Haskell guy - London 2015
FSharp eye for the Haskell guy - London 2015
 
Building a web application with ontinuation monads
Building a web application with ontinuation monadsBuilding a web application with ontinuation monads
Building a web application with ontinuation monads
 
Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015Generative Art - Functional Vilnius 2015
Generative Art - Functional Vilnius 2015
 

Similar to Let's build a parser!

Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 
PHP and Rich Internet Applications
PHP and Rich Internet ApplicationsPHP and Rich Internet Applications
PHP and Rich Internet Applications
elliando dias
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
ConFoo
 
PHP and Rich Internet Applications
PHP and Rich Internet ApplicationsPHP and Rich Internet Applications
PHP and Rich Internet Applications
elliando dias
 

Similar to Let's build a parser! (20)

ES6 is Nigh
ES6 is NighES6 is Nigh
ES6 is Nigh
 
Security Challenges in Node.js
Security Challenges in Node.jsSecurity Challenges in Node.js
Security Challenges in Node.js
 
The Rust Borrow Checker
The Rust Borrow CheckerThe Rust Borrow Checker
The Rust Borrow Checker
 
Tools for Making Machine Learning more Reactive
Tools for Making Machine Learning more ReactiveTools for Making Machine Learning more Reactive
Tools for Making Machine Learning more Reactive
 
C# 6 and 7 and Futures 20180607
C# 6 and 7 and Futures 20180607C# 6 and 7 and Futures 20180607
C# 6 and 7 and Futures 20180607
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Fatc
FatcFatc
Fatc
 
Having Fun with Play
Having Fun with PlayHaving Fun with Play
Having Fun with Play
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)
 
PHP and Rich Internet Applications
PHP and Rich Internet ApplicationsPHP and Rich Internet Applications
PHP and Rich Internet Applications
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query Languages
 
PHP and Rich Internet Applications
PHP and Rich Internet ApplicationsPHP and Rich Internet Applications
PHP and Rich Internet Applications
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
Postman On Steroids
Postman On SteroidsPostman On Steroids
Postman On Steroids
 
Os Pruett
Os PruettOs Pruett
Os Pruett
 
Zend Framework Study@Tokyo #2
Zend Framework Study@Tokyo #2Zend Framework Study@Tokyo #2
Zend Framework Study@Tokyo #2
 
PHP and MySQL
PHP and MySQLPHP and MySQL
PHP and MySQL
 

More from Boy Baukema

Security as a part of quality assurance
Security as a part of quality assuranceSecurity as a part of quality assurance
Security as a part of quality assurance
Boy Baukema
 
Recursive descent parsing
Recursive descent parsingRecursive descent parsing
Recursive descent parsing
Boy Baukema
 
Javascript: 8 Reasons Every PHP Developer Should Love It
Javascript: 8 Reasons Every PHP Developer Should Love ItJavascript: 8 Reasons Every PHP Developer Should Love It
Javascript: 8 Reasons Every PHP Developer Should Love It
Boy Baukema
 

More from Boy Baukema (13)

Security horrors
Security horrorsSecurity horrors
Security horrors
 
Tampering with JavaScript
Tampering with JavaScriptTampering with JavaScript
Tampering with JavaScript
 
Code by the sea: Web Application Security
Code by the sea: Web Application SecurityCode by the sea: Web Application Security
Code by the sea: Web Application Security
 
Ibuildings ISO 27001 lunchbox
Ibuildings ISO 27001 lunchboxIbuildings ISO 27001 lunchbox
Ibuildings ISO 27001 lunchbox
 
OWASP ASVS 3 - What's new for level 1?
OWASP ASVS 3 - What's new for level 1?OWASP ASVS 3 - What's new for level 1?
OWASP ASVS 3 - What's new for level 1?
 
Verifying Drupal modules with OWASP ASVS 2014
Verifying Drupal modules with OWASP ASVS 2014Verifying Drupal modules with OWASP ASVS 2014
Verifying Drupal modules with OWASP ASVS 2014
 
Secure Drupal, from start to finish
Secure Drupal, from start to finishSecure Drupal, from start to finish
Secure Drupal, from start to finish
 
Security as a part of quality assurance
Security as a part of quality assuranceSecurity as a part of quality assurance
Security as a part of quality assurance
 
Recursive descent parsing
Recursive descent parsingRecursive descent parsing
Recursive descent parsing
 
Dpc14 security as part of Quality Assurance
Dpc14   security as part of Quality AssuranceDpc14   security as part of Quality Assurance
Dpc14 security as part of Quality Assurance
 
SURFconext and Mobile
SURFconext and MobileSURFconext and Mobile
SURFconext and Mobile
 
WebAppSec @ Ibuildings in 2014
WebAppSec @ Ibuildings in 2014WebAppSec @ Ibuildings in 2014
WebAppSec @ Ibuildings in 2014
 
Javascript: 8 Reasons Every PHP Developer Should Love It
Javascript: 8 Reasons Every PHP Developer Should Love ItJavascript: 8 Reasons Every PHP Developer Should Love It
Javascript: 8 Reasons Every PHP Developer Should Love It
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Let's build a parser!

  • 1. Let’s build a Parser! A short introduction to parsing with PHP Boy Baukema June 9th 2012, Amsterdam
  • 4. Reasons for common fear of writing parsers: 1. Never took compiler class, think it is scary. 2. Did take compiler - Martin Fowler 4
  • 5. Language cacaphony 5 Source: http://www.wordle.net/show/wrdl/5292561/ Languages_used_in_PHP_Web_Development
  • 6. Lookahead (?= Languages Parsing QueryLang Parsing PHP code Resources 6
  • 7. RegExes And now you have two problems... 7
  • 8. Mail::RFC822::Address (?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] )+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?: rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:( ?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-0 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)* ](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+ (?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?: (?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z |(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn) ?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn) ?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t] )*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])* )(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] )+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*) *:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+ |Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t ]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031 ]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*]( ?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(? :(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(? :rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(? :(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)? [ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]| .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<> @,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|" (?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t] )*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(? :[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[ ]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000- 031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|( ?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,; :".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([ 8 Source: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
  • 9. Choamsky hierarchy 9 Source: http://en.wikipedia.org/wiki/File:Chomsky-hierarchy.svg
  • 10. HTTP 1.1 Accept Header BNF Accept = "Accept" ":" #( media-range [ accept-params ] ) media-range = ( "*/*" | ( type "/" "*" ) | ( type "/" subtype ) ) *( ";" parameter ) accept-params = ";" "q" "=" qvalue *( accept-extension ) accept-extension = ";" token [ "=" ( token | quoted-string ) ] 10 Source: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
  • 11. Arithmetic expression BNF <expression> ::= <term> | <expression> "+" <term> <term> ::= <factor> | <term> "*" <factor> <factor> ::= <constant> | <variable> | "(" <expression> ")" <variable> ::= "x" | "y" | "z" <constant> ::= <digit> | <digit> <constant> <digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" 11 Source: http://en.wikipedia.org/wiki/Syntax_diagram
  • 12. Recursion in BNF Production <constant> ::= <digit> | <digit> <constant> <digit> ::= "0" | "1" | "2" | "3" Terminal | "4" | "5" | "6" | "7" | "8" | "9" 12
  • 13. Matching 123 <constant> 1 <digit> <constant> 2 <digit> <constant> 3 <digit> <constant> ::= <digit> 13 Source: https://secure.flickr.com/photos/threedots/110586879/
  • 14. Arithmetic expression EBNF expression = term , {"+" , term}; term = factor , {"*" , factor}; factor = constant | variable | "(" , expression , ")"; variable = "x" | "y" | "z"; constant = digit , {digit}; digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"; 14 Source: http://en.wikipedia.org/wiki/Syntax_diagram
  • 15. Parsing Expression Grammar expression = term ("+" term)* product = factor ("*" factor)* factor = constant / variable / "(" expression ")" variable = "x" / "y" / "z" constant = [0-9]+ 15 Source: https://secure.flickr.com/photos/sasastro/5590210866/
  • 16. So how will this help me parse a language? 16
  • 17. Parser Generators for PHP Lime-php LALR(1) , 2008, abandoned PHP_ParserGenerator LALR(1), 2010, abandoned Loco combinatory parsing, 2011, alpha php-peg PEG, 2012, active?, alpha 17
  • 19. QueryLang: Example query parsers OR 123 AND (dpc OR phpbnl) Query (OR) |-- Term - "parsers" |-- Query (AND) |-- Term - "123" |-- Query (OR) |-- Term - "dpc" |-- Term - "phpbnl" 19
  • 20. v1/Peg/grammar.peg.inc /*!* QueryLangV1 Term: /[wd]+/ */ public function parse() { $match = $this->match_Term(); if (!$match) { return ''; } return $match['text']; } 20
  • 21. v1/Peg/Parser.php - generated match_Term /* Term: /[wd]+/ */ protected $match_Term_typestack = array('Term'); function match_Term ($stack = array()) { $matchrule = "Term"; $result = $this- >construct($matchrule, $matchrule, null); if (( $subres = $this->rx( '/[wd]+/' ) ) !== FALSE) { $result["text"] .= $subres; return $this->finalise($result); } else { return FALSE; } } 21
  • 22. v1/Peg/grammar.peg.inc test $parser = new Parser('test'); print_r($parser->parse()); // test $parser = new Parser('test 123'); print_r($parser->parse()); // test 22
  • 23. v2/Peg/grammar.peg.inc /*!* QueryLangV2 Query: Term (> Term)* Term: /[wd]+/ */ public function parse() { $result = $this->match_Query(); return $result['query']; } 23
  • 24. v2/Peg/grammar.peg.inc (cont.) public function Query__construct(&$result) { $result['query'] = new NodeQuery(); } public function Query_Term(&$result, $sub) { $term = new NodeTerm($sub['text']); $result['query']->addTerm($term); } 24
  • 25. v2/Peg/grammar.peg.inc test $parser = new Parser('test 123'); print_r($parser->parse()); Query |-- Term - "test" |-- Term - "123" 25
  • 26. v3/Peg/grammar.peg.inc /*!* QueryLangV3 Query: AndQuery ([ "OR" ] AndQuery)* AndQuery: Term ([ "AND" ] Term)* Term: "(" Query ")" | Value:/[wd]+/ */ public function parse() { $node = $this->match_Query(); return $node['query']; } 26
  • 27. v3/Peg/grammar.peg.inc (cont.) Query: AndQuery ([ "OR" ] AndQuery)* AndQuery: Term ([ "AND" ] Term)* public function Query__construct(&$r) { $r['query'] = new NodeQuery('OR'); } public function Query_AndQuery(&$r, $s) { $r['query']->add($s['query']); } public function AndQuery__construct(&$r) { $r['query'] = new NodeQuery('AND'); } public function AndQuery_Term(&$r, $s) { $r['query']->add($s['query']); 27 }
  • 28. v3/Peg/grammar.peg.inc (cont.) /*!* QueryLangV3 Term: "(" Query ")" | Value:/[wd]+/ */ public function Term_Query(&$r, $s){ $r['query'] = $s['query']; } public function Term_Value(&$r, $s){ $r['query']= new NodeTerm($s['text']); } 28
  • 29. v3/Peg/grammar.peg.inc test $parser = new Parser('a AND b OR c'); Query (OR) |-- Query (AND) | |-- Term - "a" | |-- Term - "b" |-- Query (AND) |-- Term - "c" 29
  • 30. Optional: Optimizer / Semantic checking 30
  • 31. Optimized query $parser = new Parser('a AND b OR c'); $query = $parser->parse(); $queryOptimizer = new Optimizer($query); $query = $queryOptimizer->optimize(); Query (OR) |-- Term - "c" |-- Query (AND) | |-- Term - "a" | |-- Term - "b" 31
  • 32. Manual Parser Building: Predictive parsing 32 Source: http://en.wikipedia.org/wiki/File:PsychicBoston.jpg
  • 33. Manual Parser Building: Lexing Characters get turned into tokens by a lexical analyzer. Also called lexer, scanner or tokenizer. "a OR (b)" 'term' => "a" 'OR' 'LeftParen' 'term' => "b" 'RightParen' 33
  • 34. Manual Parser Building: Lexing if ($this->_match('LeftParen', '/^(()/')) {continue;} if ($this->_match('RightParen', '/^())/')) {continue;} if ($this->_match('OR', '/^(OR)/i')) {continue;} if ($this->_match('AND', '/^(AND)/i')) {continue;} if ($this->_match('TermValue', '/^([wd]+)/i')) {continue;} if ($this->_match('WS', '/^s+/', true)) {continue;} 34
  • 35. Manual Parser Building: Lexing - UML 35 Source: http://commons.wikimedia.org/wiki/File:Willem- Alexander,_Prince_of_Orange.jpg
  • 36. Manual Parser Building: Parsing Non-terminals become methods protected function _query(); protected function _andQuery(); protected function _term(); Parse to a tree structure. 36
  • 37. Manual Parser Building: Parsing - UML 37
  • 38. Manual Parser Building: example non-terminal protected function _query() { $query = new NodeQuery('OR'); $leftTerm = $this->_andQuery(); $query->add($leftTerm); while($this->_tokenStream->look()->getType() === 'OR') { $this->_tokenStream->expect('OR'); $rightTerm = $this->_andQuery(); $query->add($rightTerm); } return $query; } 38
  • 39. Predictive Parsing: Warning! Tokens must be decidable with a fixed lookahead <term> ::= <TermValue> "-" <TermValue> | <TermValue> | "(" <Query> ")" No left recursion <orQuery> ::= <orQuery> ("OR" <orQuery>)? | <term> 39
  • 40. But I wanna parse 40
  • 41. PHP Parsers PHP_Depend 1.0.0 PHP 5.4 PHP-Parser alpha PHP 5.4 phc 0.3.0.1 (unmaintained) PHP 5.2 (?) 41
  • 42. PHPDepend Abstract Syntax Tree example $string = "Manuel $Pichler <{$email}>"; PHP_Depend_Code_ASTString |-- ASTLiteral - "Manuel " |-- ASTVariable - $Pichler |-- ASTLiteral - " <" |-- ASTCompoundExpression - {...} | |-- ASTVariable - $email |-- ASTLiteral - ">" 42
  • 43. Resources 43
  • 44. More resources Examples of modern parsers in PHP: Twig (Predictive Parser) Behats Gherkin (Predictive Parser) Smarty 3 (LALR parser) More information: Rich Programmer Food by Steve Yegge Let’s Build a Compiler, by Jack Crenshaw nathansuniversity.com Coursera: Compilers by Stanford University SE-Radio: Episode 182: DSLs 44
  • 45. QUESTIONS? Joind.in: https://joind.in/6257 Twitter: @relaxnow E-mail: boy@ibuildings.nl Slideshare: http://slidesha.re/INY43R 45 GitHub: https://github.com/relaxnow/QueryLang

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n