SlideShare une entreprise Scribd logo
1  sur  17
Text parsing with Python and PLY 
Daniil Baturin <daniil@baturin.org> 
LVEE 2014
Formal language 
Set of strings composed of symbols of some alphabet. 
A language can be defined by a formal grammar. 
Two or more grammars may produce the same language. 
Forward problem: given a grammar, generate a string that 
belongs a language (trivial). 
Parsing is the inverse problem: given a string, decide if it 
belongs a language defined by a grammar.
Formal grammar 
Formal grammar is a set of production rules. 
A production rule has a one or more symbols at its left-hand and right-hand sides. 
You need to specify the start symbol. 
Example: 
<sentence> ::= <verb phrase> <noun phrase> 
<noun phrase> ::= <article> <adjective> <noun> 
<verb phrase> ::= <verb> <article> <noun> 
<article> ::= <”a” | “the” | empty> 
... 
“sentence” is the start symbol.
Lexers and tokens 
Lexer—breaks the text into tokens, string 
literals with metadata. 
Tokens are identified by regular expressions. 
Parsers usually operate on tokens rather than 
string literals. E.g. all numbers ([0-9]+) become 
TOKEN(NUMBER, value). 
Lexer can be autogenerated or hand-coded.
Symbols 
Terminal symbols are symbols that can't be 
further reduced (rewritten). 
Non-terminal symbols are composed from 
references to terminals and other non-terminals. 
<article> ::= <”a” | “the” | empty> 
“Article” is a non-terminal; “a”, “the”, empty are 
terminals.
Approaches to parsing 
● Hand-coded ad-hoc parser 
● Autogenerated parser 
● Hand-coded advanced algorithm
Regular grammar 
All production rules are of the following form: 
<some nonterminal> ::= <some terminal> 
<some nonterminal> ::= <some terminal> <another nonterminal> 
<some nonterminal> ::= <empty> 
Equivalent to regular expressions. 
“(a|b)*” regex: 
<start> ::= “a” <start> 
| “b” <start> 
| <empty> 
This is good for a hand-coded parser (or a regex library).
Context-free grammar 
All productions are of the following form: 
<some nonterminal> ::= <some terminals or nonterminals> 
Good for autogenerated parsers. Ad-hoc 
parsers tend to be messy.
More complex grammars 
Context-sensitive: rules may contain terminals 
and nonterminals in both sides. 
Can be conquered with special tools or lexer 
hacks. 
Not even context-sensitive: you are in trouble. ;)
Parser generators 
Usually use LALR(1) method. 
Read a token and push it on stack. 
If some rule is matched, empty the stack and 
proceed (reduce). 
If not, push the token on stack and proceed 
(shift).
PLY Lex—the lexer generator 
Token recognizers are functions. 
Token regexes are in docstrings. 
You should export a tuple of all tokens for use in 
a parser. 
def t_NUMBER(t): 
r'[0­9]+' 
t.value = int(t.value) 
return t
PLY YACC—the parser generator 
Production rule recognizers are functions. 
Rules are in docstrings. 
Rules can refer to other rules and can be recursive. 
You can refer to lexer tokens in rules. 
The argument is the token list. 
def p_expr(p): 
''' expr : NUMBER OPSIGN NUMBER ''' 
p[0] = (p[2], p[1], p[3])
Grammar “patterns” 
The rule form used by YACC doesn't have a 
notation for repetition etc. 
Empty: 
something : 
This or that: 
something : foo | bar 
One or more of something: 
list_of_something : list_of_something something 
| something
Common problems 
Shift-reduce conflict: two or more rules have more than 
one common token at the beginning. 
foo : baz quux xyzzy 
bar : baz quux fgsfds 
Usually can be solved by left factorization: 
foobar_start : baz quux 
foo : foobar_start xyzzy 
bar : foobar_start fgsfds 
Default resolution is shift. 
Reduce-reduce conflict: same string matches more 
than one rule. Often indicates a grammar design error.
What I didn't tell 
Exclusive lexer states. 
Precedence rules. 
Wrapping lexers and parsers into classes. 
Error recovery.
The sample code 
https://github.com/dmbaturin/ply-example
Questions?

Contenu connexe

Tendances

Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHPAndrew Kandels
 
Lesson 03 python statement, indentation and comments
Lesson 03   python statement, indentation and commentsLesson 03   python statement, indentation and comments
Lesson 03 python statement, indentation and commentsNilimesh Halder
 
The JavaScript Programming Language
The JavaScript Programming LanguageThe JavaScript Programming Language
The JavaScript Programming LanguageRaghavan Mohan
 
The Java Script Programming Language
The  Java Script  Programming  LanguageThe  Java Script  Programming  Language
The Java Script Programming Languagezone
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler designRiazul Islam
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)bolovv
 
Convention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsConvention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsRay Toal
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free GrammarAkhil Kaushik
 
Chapter3pptx__2021_12_23_22_52_54.pptx
Chapter3pptx__2021_12_23_22_52_54.pptxChapter3pptx__2021_12_23_22_52_54.pptx
Chapter3pptx__2021_12_23_22_52_54.pptxDrIsikoIsaac
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Basic syntax supported by python
Basic syntax supported by pythonBasic syntax supported by python
Basic syntax supported by pythonJaya Kumari
 
Types of Language in Theory of Computation
Types of Language in Theory of ComputationTypes of Language in Theory of Computation
Types of Language in Theory of ComputationAnkur Singh
 

Tendances (19)

Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHP
 
Lesson 03 python statement, indentation and comments
Lesson 03   python statement, indentation and commentsLesson 03   python statement, indentation and comments
Lesson 03 python statement, indentation and comments
 
The JavaScript Programming Language
The JavaScript Programming LanguageThe JavaScript Programming Language
The JavaScript Programming Language
 
The Java Script Programming Language
The  Java Script  Programming  LanguageThe  Java Script  Programming  Language
The Java Script Programming Language
 
Javascript
JavascriptJavascript
Javascript
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler design
 
Python revision tour II
Python revision tour IIPython revision tour II
Python revision tour II
 
Chapter 9 python fundamentals
Chapter 9 python fundamentalsChapter 9 python fundamentals
Chapter 9 python fundamentals
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
Convention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsConvention-Based Syntactic Descriptions
Convention-Based Syntactic Descriptions
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free Grammar
 
Chapter3pptx__2021_12_23_22_52_54.pptx
Chapter3pptx__2021_12_23_22_52_54.pptxChapter3pptx__2021_12_23_22_52_54.pptx
Chapter3pptx__2021_12_23_22_52_54.pptx
 
Python revision tour i
Python revision tour iPython revision tour i
Python revision tour i
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Compiler lecture 05
Compiler lecture 05Compiler lecture 05
Compiler lecture 05
 
Python syntax
Python syntaxPython syntax
Python syntax
 
Basic syntax supported by python
Basic syntax supported by pythonBasic syntax supported by python
Basic syntax supported by python
 
Types of Language in Theory of Computation
Types of Language in Theory of ComputationTypes of Language in Theory of Computation
Types of Language in Theory of Computation
 

En vedette

Views Unlimited: Unleashing the Power of Drupal's Views Module
Views Unlimited: Unleashing the Power of Drupal's Views ModuleViews Unlimited: Unleashing the Power of Drupal's Views Module
Views Unlimited: Unleashing the Power of Drupal's Views ModuleRanel Padon
 
PyCon PH 2014 - GeoComputation
PyCon PH 2014 - GeoComputationPyCon PH 2014 - GeoComputation
PyCon PH 2014 - GeoComputationRanel Padon
 
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...David Beazley (Dabeaz LLC)
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsWAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsDavid Beazley (Dabeaz LLC)
 
CKEditor Widgets with Drupal
CKEditor Widgets with DrupalCKEditor Widgets with Drupal
CKEditor Widgets with DrupalRanel Padon
 
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++David Beazley (Dabeaz LLC)
 
Generator Tricks for Systems Programmers, v2.0
Generator Tricks for Systems Programmers, v2.0Generator Tricks for Systems Programmers, v2.0
Generator Tricks for Systems Programmers, v2.0David Beazley (Dabeaz LLC)
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...David Beazley (Dabeaz LLC)
 
Python Programming - X. Exception Handling and Assertions
Python Programming - X. Exception Handling and AssertionsPython Programming - X. Exception Handling and Assertions
Python Programming - X. Exception Handling and AssertionsRanel Padon
 
Python for text processing
Python for text processingPython for text processing
Python for text processingXiang Li
 
Python Programming - III. Controlling the Flow
Python Programming - III. Controlling the FlowPython Programming - III. Controlling the Flow
Python Programming - III. Controlling the FlowRanel Padon
 
The Synergy of Drupal Hooks/APIs (Custom Module Development with ChartJS)
The Synergy of Drupal Hooks/APIs (Custom Module Development with ChartJS)The Synergy of Drupal Hooks/APIs (Custom Module Development with ChartJS)
The Synergy of Drupal Hooks/APIs (Custom Module Development with ChartJS)Ranel Padon
 
Python Programming - VII. Customizing Classes and Operator Overloading
Python Programming - VII. Customizing Classes and Operator OverloadingPython Programming - VII. Customizing Classes and Operator Overloading
Python Programming - VII. Customizing Classes and Operator OverloadingRanel Padon
 
Switchable Map APIs with Drupal
Switchable Map APIs with DrupalSwitchable Map APIs with Drupal
Switchable Map APIs with DrupalRanel Padon
 
Python Programming - IX. On Randomness
Python Programming - IX. On RandomnessPython Programming - IX. On Randomness
Python Programming - IX. On RandomnessRanel Padon
 
Python Programming - VIII. Inheritance and Polymorphism
Python Programming - VIII. Inheritance and PolymorphismPython Programming - VIII. Inheritance and Polymorphism
Python Programming - VIII. Inheritance and PolymorphismRanel Padon
 

En vedette (20)

Text analysis using python
Text analysis using pythonText analysis using python
Text analysis using python
 
Views Unlimited: Unleashing the Power of Drupal's Views Module
Views Unlimited: Unleashing the Power of Drupal's Views ModuleViews Unlimited: Unleashing the Power of Drupal's Views Module
Views Unlimited: Unleashing the Power of Drupal's Views Module
 
PyCon PH 2014 - GeoComputation
PyCon PH 2014 - GeoComputationPyCon PH 2014 - GeoComputation
PyCon PH 2014 - GeoComputation
 
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
 
Generator Tricks for Systems Programmers
Generator Tricks for Systems ProgrammersGenerator Tricks for Systems Programmers
Generator Tricks for Systems Programmers
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsWAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
 
Understanding the Python GIL
Understanding the Python GILUnderstanding the Python GIL
Understanding the Python GIL
 
CKEditor Widgets with Drupal
CKEditor Widgets with DrupalCKEditor Widgets with Drupal
CKEditor Widgets with Drupal
 
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
 
Generator Tricks for Systems Programmers, v2.0
Generator Tricks for Systems Programmers, v2.0Generator Tricks for Systems Programmers, v2.0
Generator Tricks for Systems Programmers, v2.0
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
 
Python introduction
Python introductionPython introduction
Python introduction
 
Python Programming - X. Exception Handling and Assertions
Python Programming - X. Exception Handling and AssertionsPython Programming - X. Exception Handling and Assertions
Python Programming - X. Exception Handling and Assertions
 
Python for text processing
Python for text processingPython for text processing
Python for text processing
 
Python Programming - III. Controlling the Flow
Python Programming - III. Controlling the FlowPython Programming - III. Controlling the Flow
Python Programming - III. Controlling the Flow
 
The Synergy of Drupal Hooks/APIs (Custom Module Development with ChartJS)
The Synergy of Drupal Hooks/APIs (Custom Module Development with ChartJS)The Synergy of Drupal Hooks/APIs (Custom Module Development with ChartJS)
The Synergy of Drupal Hooks/APIs (Custom Module Development with ChartJS)
 
Python Programming - VII. Customizing Classes and Operator Overloading
Python Programming - VII. Customizing Classes and Operator OverloadingPython Programming - VII. Customizing Classes and Operator Overloading
Python Programming - VII. Customizing Classes and Operator Overloading
 
Switchable Map APIs with Drupal
Switchable Map APIs with DrupalSwitchable Map APIs with Drupal
Switchable Map APIs with Drupal
 
Python Programming - IX. On Randomness
Python Programming - IX. On RandomnessPython Programming - IX. On Randomness
Python Programming - IX. On Randomness
 
Python Programming - VIII. Inheritance and Polymorphism
Python Programming - VIII. Inheritance and PolymorphismPython Programming - VIII. Inheritance and Polymorphism
Python Programming - VIII. Inheritance and Polymorphism
 

Similaire à LVEE 2014: Text parsing with Python and PLY

Chapter-3 compiler.pptx course materials
Chapter-3 compiler.pptx course materialsChapter-3 compiler.pptx course materials
Chapter-3 compiler.pptx course materialsgadisaAdamu
 
Lex and Yacc ppt
Lex and Yacc pptLex and Yacc ppt
Lex and Yacc pptpssraikar
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdfTANZINTANZINA
 
9781284077247_PPTx_CH01.pptx
9781284077247_PPTx_CH01.pptx9781284077247_PPTx_CH01.pptx
9781284077247_PPTx_CH01.pptxmainakmail2585
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptxObsa2
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginnersAbishek Purushothaman
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler DesignKuppusamy P
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.pptAmrita Sharma
 
lex and yacc.pdf
lex and yacc.pdflex and yacc.pdf
lex and yacc.pdfSurajRavi16
 
Introduction of bison
Introduction of bisonIntroduction of bison
Introduction of bisonvip_du
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfarpitaeron555
 
Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Alexey Diyan
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfeliasabdi2024
 

Similaire à LVEE 2014: Text parsing with Python and PLY (20)

Chapter-3 compiler.pptx course materials
Chapter-3 compiler.pptx course materialsChapter-3 compiler.pptx course materials
Chapter-3 compiler.pptx course materials
 
Control structure
Control structureControl structure
Control structure
 
Lex and Yacc ppt
Lex and Yacc pptLex and Yacc ppt
Lex and Yacc ppt
 
Lexical
LexicalLexical
Lexical
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
9781284077247_PPTx_CH01.pptx
9781284077247_PPTx_CH01.pptx9781284077247_PPTx_CH01.pptx
9781284077247_PPTx_CH01.pptx
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptx
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginners
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Parser
ParserParser
Parser
 
Quick start reg ex
Quick start reg exQuick start reg ex
Quick start reg ex
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.ppt
 
lex and yacc.pdf
lex and yacc.pdflex and yacc.pdf
lex and yacc.pdf
 
Introduction of bison
Introduction of bisonIntroduction of bison
Introduction of bison
 
match the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdfmatch the following attributes to the parts of a compilerstrips ou.pdf
match the following attributes to the parts of a compilerstrips ou.pdf
 
Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
 

Dernier

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Dernier (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

LVEE 2014: Text parsing with Python and PLY

  • 1. Text parsing with Python and PLY Daniil Baturin <daniil@baturin.org> LVEE 2014
  • 2. Formal language Set of strings composed of symbols of some alphabet. A language can be defined by a formal grammar. Two or more grammars may produce the same language. Forward problem: given a grammar, generate a string that belongs a language (trivial). Parsing is the inverse problem: given a string, decide if it belongs a language defined by a grammar.
  • 3. Formal grammar Formal grammar is a set of production rules. A production rule has a one or more symbols at its left-hand and right-hand sides. You need to specify the start symbol. Example: <sentence> ::= <verb phrase> <noun phrase> <noun phrase> ::= <article> <adjective> <noun> <verb phrase> ::= <verb> <article> <noun> <article> ::= <”a” | “the” | empty> ... “sentence” is the start symbol.
  • 4. Lexers and tokens Lexer—breaks the text into tokens, string literals with metadata. Tokens are identified by regular expressions. Parsers usually operate on tokens rather than string literals. E.g. all numbers ([0-9]+) become TOKEN(NUMBER, value). Lexer can be autogenerated or hand-coded.
  • 5. Symbols Terminal symbols are symbols that can't be further reduced (rewritten). Non-terminal symbols are composed from references to terminals and other non-terminals. <article> ::= <”a” | “the” | empty> “Article” is a non-terminal; “a”, “the”, empty are terminals.
  • 6. Approaches to parsing ● Hand-coded ad-hoc parser ● Autogenerated parser ● Hand-coded advanced algorithm
  • 7. Regular grammar All production rules are of the following form: <some nonterminal> ::= <some terminal> <some nonterminal> ::= <some terminal> <another nonterminal> <some nonterminal> ::= <empty> Equivalent to regular expressions. “(a|b)*” regex: <start> ::= “a” <start> | “b” <start> | <empty> This is good for a hand-coded parser (or a regex library).
  • 8. Context-free grammar All productions are of the following form: <some nonterminal> ::= <some terminals or nonterminals> Good for autogenerated parsers. Ad-hoc parsers tend to be messy.
  • 9. More complex grammars Context-sensitive: rules may contain terminals and nonterminals in both sides. Can be conquered with special tools or lexer hacks. Not even context-sensitive: you are in trouble. ;)
  • 10. Parser generators Usually use LALR(1) method. Read a token and push it on stack. If some rule is matched, empty the stack and proceed (reduce). If not, push the token on stack and proceed (shift).
  • 11. PLY Lex—the lexer generator Token recognizers are functions. Token regexes are in docstrings. You should export a tuple of all tokens for use in a parser. def t_NUMBER(t): r'[0­9]+' t.value = int(t.value) return t
  • 12. PLY YACC—the parser generator Production rule recognizers are functions. Rules are in docstrings. Rules can refer to other rules and can be recursive. You can refer to lexer tokens in rules. The argument is the token list. def p_expr(p): ''' expr : NUMBER OPSIGN NUMBER ''' p[0] = (p[2], p[1], p[3])
  • 13. Grammar “patterns” The rule form used by YACC doesn't have a notation for repetition etc. Empty: something : This or that: something : foo | bar One or more of something: list_of_something : list_of_something something | something
  • 14. Common problems Shift-reduce conflict: two or more rules have more than one common token at the beginning. foo : baz quux xyzzy bar : baz quux fgsfds Usually can be solved by left factorization: foobar_start : baz quux foo : foobar_start xyzzy bar : foobar_start fgsfds Default resolution is shift. Reduce-reduce conflict: same string matches more than one rule. Often indicates a grammar design error.
  • 15. What I didn't tell Exclusive lexer states. Precedence rules. Wrapping lexers and parsers into classes. Error recovery.
  • 16. The sample code https://github.com/dmbaturin/ply-example