SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Parsers Error Recovery for Practical Use




            Parsers Error Recovery for Practical Use

                                           Alexander Azarov

                                      azarov@osinka.ru / Osinka.ru

                                           February 18, 2012
Parsers Error Recovery for Practical Use
  Context




Osinka



              Forum of “BB” kind
                      (2.5M+ pages/day, 8M+ posts total)
                             User generated content: BBCode markup

              Migrating to Scala
                      Backend
Parsers Error Recovery for Practical Use
  Context




Plan




              Why parser combinators?
              Why error recovery?
              Example of error recovery
              Results
Parsers Error Recovery for Practical Use
  Context




BBCode



      Few BBCode tags

               [b]bold[/b]                 <b>bold</b>
               [i]italic[/i]               <i>italic</i>
               [url=href]text[/url]        <a href="href">text</a>
               [img]href[/img]             <img src="href"/>
Parsers Error Recovery for Practical Use
  Context




BBCode example


              example of BBCode

      Example

      [quote="Nick"]
      original [b]text[/b]
      [/quote]
      Here it is the reply with
      [url=http://www.google.com]link[/url]
Parsers Error Recovery for Practical Use
  Task




Why parser combinators, 1




              Regexp maintenance is a headache
              Bugs extremely hard to find
              No markup errors
Parsers Error Recovery for Practical Use
  Task




Why parser combinators, 2



      One post source, many views

              HTML render for Web
              textual view for emails
              text-only short summary
              text-only for full-text search indexer
Parsers Error Recovery for Practical Use
  Task




Why parser combinators, 3



      Post analysis algorithms

              links (e.g. spam automated analysis)
              images
              whatever structure analysis we’d want
Parsers Error Recovery for Practical Use
  Task




Universal AST




      One AST

              different printers
              various traversal algorithms
Parsers Error Recovery for Practical Use
  Problem




Sounds great. But.




                              This all looks like a perfect world.
                                   But what’s the catch??
Parsers Error Recovery for Practical Use
  Problem




Sounds great. But.




 Humans.
 They do mistakes.
Parsers Error Recovery for Practical Use
  Problem




Sounds great. But.




 Humans.                                   Example
 They do mistakes.
                                           [quote]
                                           [url=http://www.google.com]
                                           [img]http://www.image.com
                                           [/url[/img]
                                           [/b]
Parsers Error Recovery for Practical Use
  Problem




User-Generated Content: Problem



      Erroneous markup

              People do mistakes,
              But no one wants to see empty post,
              We have to show something meaningful in any case
Parsers Error Recovery for Practical Use
  Problem




Black or White World




              Scala parser combinators assume valid input
                      Parser result: Success | NoSuccess
                             no error recovery out of the box
Parsers Error Recovery for Practical Use
  Solution




Error recovery: our approach




              Our Parser never breaks
              It generates “error nodes” instead
Parsers Error Recovery for Practical Use
  Solution




Approach: Error nodes



              Part of AST, FailNode contains the possible causes of the
              failure
              They are meaningful
                      for highlighting in editor
                      to mark posts having failures in markup (for moderators/other
                      users to see this)
Parsers Error Recovery for Practical Use
  Solution




Approach: input & unpaired tags




              Assume all input except tags as text
                      E.g. [tag]text[/tag] is a text node

              Unpaired tags as the last choice: markup errors
Parsers Error Recovery for Practical Use
  Example




Example




                                           Example
Parsers Error Recovery for Practical Use
  Example




Trivial BBCode markup



      Example (Trivial "one tag" BBCode)

      Simplest [font=bold]BBCode [font=red]example[/font][/font]

              has only one tag, font
              though it may have an argument
Parsers Error Recovery for Practical Use
  Example




Corresponding AST



      AST
      trait Node

      case class Text(text: String) extends Node
      case class Font(arg: Option[String], subnodes: List[Node]) extends
          Node
Parsers Error Recovery for Practical Use
  Example




Parser

      BBCode parser

          lazy val nodes = rep(font | text)
          lazy val text =
            rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ {
              texts => Text(texts.mkString)
            }
          lazy val font: Parser[Node] = {
            fontOpen ~ nodes <~ fontClose ^^ {
              case fontOpen(_, arg) ~ subnodes => Font(Option(arg),
                   subnodes)
            }
          }
Parsers Error Recovery for Practical Use
  Example




Valid markup
      Scalatest
          describe("parser") {
            it("keeps spaces") {
               parse(" ") must equal(Right(Text(" ") :: Nil))
               parse(" n ") must equal(Right(Text(" n ") :: Nil))
            }
            it("parses text") {
               parse("plain text") must equal(Right(Text("plain text") ::
                    Nil))
            }
            it("parses bbcode-like text") {
               parse("plain [tag] [fonttext") must equal(Right(Text("
                    plain [tag] [fonttext") :: Nil))
            }
Parsers Error Recovery for Practical Use
  Example




Invalid markup



      Scalatest
          describe("error markup") {
            it("results in error") {
               parse("t[/font]") must be(’left)
               parse("[font]t") must be(’left)
            }
          }
Parsers Error Recovery for Practical Use
  Example




Recovery: Extra AST node




      FailNode
      case class FailNode(reason: String, markup: String) extends Node
Parsers Error Recovery for Practical Use
  Example




Recovery: helper methods
      Explicitly return FailNode

          protected def failed(reason:     String) = FailNode(reason, "")


      Enrich FailNode with markup

          protected def recover(p: => Parser[Node]): Parser[Node] =
            Parser { in =>
              val r = p(in)
              lazy val markup = in.source.subSequence(in.offset, r.next.offset
                   ).toString
              r match {
                case Success(node: FailNode, next) =>
                  Success(node.copy(markup = markup), next)
                case other =>
                  other
Parsers Error Recovery for Practical Use
  Example




Recovery: Parser rules



              never break (provide “alone tag” parsers)
              return FailNode explicitly if needed

      nodes
          lazy val nodes = rep(node | missingOpen)
          lazy val node = font | text | missingClose
Parsers Error Recovery for Practical Use
  Example




“Missing open tag” parser




      Catching alone [/font]

          def missingOpen = recover {
            fontClose ^^^ { failed("missing open") }
          }
Parsers Error Recovery for Practical Use
  Example




Argument check


      font may have limits on argument

          lazy val font: Parser[Node] = recover {
            fontOpen ~ rep(node) <~ fontClose ^^ {
              case fontOpen(_, arg) ~ subnodes =>
                if (arg == null || allowedFontArgs.contains(arg)) Font(
                     Option(arg), subnodes)
                else failed("arg incorrect")
            }
          }
Parsers Error Recovery for Practical Use
  Example




Passes markup error tests

      Scalatest
          describe("recovery") {
            it("reports incorrect arg") {
               parse("[font=b]t[/font]") must equal(Right(
                  FailNode("arg incorrect", "[font=b]t[/font]") :: Nil
               ))
            }
            it("recovers extra ending tag") {
               parse("t[/font]") must equal(Right(
                  Text("t") :: FailNode("missing open", "[/font]") :: Nil
               ))
            }
Parsers Error Recovery for Practical Use
  Example




Passes longer tests

      Scalatest
             it("recovers extra starting tag in a longer sequence") {
                parse("[font][font]t[/font]") must equal(Right(
                   FailNode("missing close", "[font]") :: Font(None, Text("t
                        ") :: Nil) :: Nil
                ))
             }
             it("recovers extra ending tag in a longer sequence") {
                parse("[font]t[/font][/font]") must equal(Right(
                   Font(None, Text("t") :: Nil) :: FailNode("missing open", "
                        [/font]") :: Nil
                ))
Parsers Error Recovery for Practical Use
  Example




Examples source code




              Source code, specs:
              https://github.com/alaz/slides-err-recovery
Parsers Error Recovery for Practical Use
  Results




Production use outlines




              It works reliably
              Lower maintenance costs
              Performance (see next slides)
              Beware: Scala parser combinators are not thread-safe.
Parsers Error Recovery for Practical Use
  Results




Performance


              The biggest problem is performance.

      Benchmarks (real codebase)

                                                 PHP     Scala
                                 Typical 8k      5.3ms   51ms
                                 Big w/err 76k   136ms   1245ms


              Workaround: caching
Parsers Error Recovery for Practical Use
  Results




Surprise!




                                           Never give up

              find a good motivator instead (e.g. presentation for Scala.by)
Parsers Error Recovery for Practical Use
  Results




Performance: success story

              Want performance? Do not use Lexical
              Forget those scary numbers!

      Benchmarks (real codebase)

                                             PHP     Scala
                             Typical 8k      5.3ms   51ms 16ms
                             Big w/err 76k   136ms   1245ms 31ms


              Thank you, Scala.by!
Parsers Error Recovery for Practical Use
  Results




Thank you




              Email: azarov@osinka.ru
              Twitter: http://twitter.com/aazarov
              Source code, specs:
              https://github.com/alaz/slides-err-recovery

Contenu connexe

Tendances

Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational BiologyAtreyiB
 
Perl 5.10 in 2010
Perl 5.10 in 2010Perl 5.10 in 2010
Perl 5.10 in 2010guest7899f0
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard LibrarySantosh Rajan
 
Linked to ArrayList: the full story
Linked to ArrayList: the full storyLinked to ArrayList: the full story
Linked to ArrayList: the full storyJosé Paumard
 
Yapc::NA::2009 - Command Line Perl
Yapc::NA::2009 - Command Line PerlYapc::NA::2009 - Command Line Perl
Yapc::NA::2009 - Command Line PerlBruce Gray
 
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Puppet
 
C++ for Java Developers (JavaZone 2017)
C++ for Java Developers (JavaZone 2017)C++ for Java Developers (JavaZone 2017)
C++ for Java Developers (JavaZone 2017)Patricia Aas
 
Java 8 presentation
Java 8 presentationJava 8 presentation
Java 8 presentationVan Huong
 
Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Prof. Wim Van Criekinge
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Parkpointstechgeeks
 
Perl 101 - The Basics of Perl Programming
Perl  101 - The Basics of Perl ProgrammingPerl  101 - The Basics of Perl Programming
Perl 101 - The Basics of Perl ProgrammingUtkarsh Sengar
 
You Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsYou Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsRoy Zimmer
 

Tendances (19)

Intro to Perl and Bioperl
Intro to Perl and BioperlIntro to Perl and Bioperl
Intro to Perl and Bioperl
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Java8
Java8Java8
Java8
 
Perl 5.10 in 2010
Perl 5.10 in 2010Perl 5.10 in 2010
Perl 5.10 in 2010
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard Library
 
Perl Basics with Examples
Perl Basics with ExamplesPerl Basics with Examples
Perl Basics with Examples
 
Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
 
Python
PythonPython
Python
 
Linked to ArrayList: the full story
Linked to ArrayList: the full storyLinked to ArrayList: the full story
Linked to ArrayList: the full story
 
Yapc::NA::2009 - Command Line Perl
Yapc::NA::2009 - Command Line PerlYapc::NA::2009 - Command Line Perl
Yapc::NA::2009 - Command Line Perl
 
Unit v
Unit vUnit v
Unit v
 
Power of Puppet 4
Power of Puppet 4Power of Puppet 4
Power of Puppet 4
 
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
 
C++ for Java Developers (JavaZone 2017)
C++ for Java Developers (JavaZone 2017)C++ for Java Developers (JavaZone 2017)
C++ for Java Developers (JavaZone 2017)
 
Java 8 presentation
Java 8 presentationJava 8 presentation
Java 8 presentation
 
Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Park
 
Perl 101 - The Basics of Perl Programming
Perl  101 - The Basics of Perl ProgrammingPerl  101 - The Basics of Perl Programming
Perl 101 - The Basics of Perl Programming
 
You Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsYou Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager Needs
 

Similaire à "Error Recovery" by @alaz at scalaby#8

New features in abap
New features in abapNew features in abap
New features in abapSrihari J
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersAlessandro Sanino
 
Java 8 New features
Java 8 New featuresJava 8 New features
Java 8 New featuresSon Nguyen
 
Introduction to Client-Side Javascript
Introduction to Client-Side JavascriptIntroduction to Client-Side Javascript
Introduction to Client-Side JavascriptJulie Iskander
 
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...Amazon Web Services
 
Design Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonDesign Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonManageIQ
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17Daniel Eriksson
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?lichtkind
 
Hello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfHello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfnamarta88
 
Scala is java8.next()
Scala is java8.next()Scala is java8.next()
Scala is java8.next()daewon jeong
 
Visual Studio 2010 and .NET 4.0 Overview
Visual Studio 2010 and .NET 4.0 OverviewVisual Studio 2010 and .NET 4.0 Overview
Visual Studio 2010 and .NET 4.0 Overviewbwullems
 
Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)Kai Chan
 
Functional programming in java 8 by harmeet singh
Functional programming in java 8 by harmeet singhFunctional programming in java 8 by harmeet singh
Functional programming in java 8 by harmeet singhHarmeet Singh(Taara)
 
Input/Output Exploring java.io
Input/Output Exploring java.ioInput/Output Exploring java.io
Input/Output Exploring java.ioNilaNila16
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginnersAbishek Purushothaman
 

Similaire à "Error Recovery" by @alaz at scalaby#8 (20)

New features in abap
New features in abapNew features in abap
New features in abap
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to Gophers
 
Java 8
Java 8Java 8
Java 8
 
Java 8 New features
Java 8 New featuresJava 8 New features
Java 8 New features
 
Introduction to Client-Side Javascript
Introduction to Client-Side JavascriptIntroduction to Client-Side Javascript
Introduction to Client-Side Javascript
 
Colloquium Report
Colloquium ReportColloquium Report
Colloquium Report
 
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
 
Design Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonDesign Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron Patterson
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
 
Hello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfHello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdf
 
Scala is java8.next()
Scala is java8.next()Scala is java8.next()
Scala is java8.next()
 
Visual Studio 2010 and .NET 4.0 Overview
Visual Studio 2010 and .NET 4.0 OverviewVisual Studio 2010 and .NET 4.0 Overview
Visual Studio 2010 and .NET 4.0 Overview
 
Python and You Series
Python and You SeriesPython and You Series
Python and You Series
 
Rails on Oracle 2011
Rails on Oracle 2011Rails on Oracle 2011
Rails on Oracle 2011
 
Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)
 
Functional programming in java 8 by harmeet singh
Functional programming in java 8 by harmeet singhFunctional programming in java 8 by harmeet singh
Functional programming in java 8 by harmeet singh
 
Annotation processing
Annotation processingAnnotation processing
Annotation processing
 
Input/Output Exploring java.io
Input/Output Exploring java.ioInput/Output Exploring java.io
Input/Output Exploring java.io
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginners
 

Plus de Vasil Remeniuk

Product Minsk - РТБ и Программатик
Product Minsk - РТБ и ПрограмматикProduct Minsk - РТБ и Программатик
Product Minsk - РТБ и ПрограмматикVasil Remeniuk
 
Работа с Akka Сluster, @afiskon, scalaby#14
Работа с Akka Сluster, @afiskon, scalaby#14Работа с Akka Сluster, @afiskon, scalaby#14
Работа с Akka Сluster, @afiskon, scalaby#14Vasil Remeniuk
 
Cake pattern. Presentation by Alex Famin at scalaby#14
Cake pattern. Presentation by Alex Famin at scalaby#14Cake pattern. Presentation by Alex Famin at scalaby#14
Cake pattern. Presentation by Alex Famin at scalaby#14Vasil Remeniuk
 
Scala laboratory: Globus. iteration #3
Scala laboratory: Globus. iteration #3Scala laboratory: Globus. iteration #3
Scala laboratory: Globus. iteration #3Vasil Remeniuk
 
Testing in Scala by Adform research
Testing in Scala by Adform researchTesting in Scala by Adform research
Testing in Scala by Adform researchVasil Remeniuk
 
Spark Intro by Adform Research
Spark Intro by Adform ResearchSpark Intro by Adform Research
Spark Intro by Adform ResearchVasil Remeniuk
 
Types by Adform Research, Saulius Valatka
Types by Adform Research, Saulius ValatkaTypes by Adform Research, Saulius Valatka
Types by Adform Research, Saulius ValatkaVasil Remeniuk
 
Types by Adform Research
Types by Adform ResearchTypes by Adform Research
Types by Adform ResearchVasil Remeniuk
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, PauliusVasil Remeniuk
 
Scala Style by Adform Research (Saulius Valatka)
Scala Style by Adform Research (Saulius Valatka)Scala Style by Adform Research (Saulius Valatka)
Scala Style by Adform Research (Saulius Valatka)Vasil Remeniuk
 
Spark intro by Adform Research
Spark intro by Adform ResearchSpark intro by Adform Research
Spark intro by Adform ResearchVasil Remeniuk
 
SBT by Aform Research, Saulius Valatka
SBT by Aform Research, Saulius ValatkaSBT by Aform Research, Saulius Valatka
SBT by Aform Research, Saulius ValatkaVasil Remeniuk
 
Scala laboratory: Globus. iteration #2
Scala laboratory: Globus. iteration #2Scala laboratory: Globus. iteration #2
Scala laboratory: Globus. iteration #2Vasil Remeniuk
 
Testing in Scala. Adform Research
Testing in Scala. Adform ResearchTesting in Scala. Adform Research
Testing in Scala. Adform ResearchVasil Remeniuk
 
Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1Vasil Remeniuk
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + ElkVasil Remeniuk
 
Опыт использования Spark, Основано на реальных событиях
Опыт использования Spark, Основано на реальных событияхОпыт использования Spark, Основано на реальных событиях
Опыт использования Spark, Основано на реальных событияхVasil Remeniuk
 

Plus de Vasil Remeniuk (20)

Product Minsk - РТБ и Программатик
Product Minsk - РТБ и ПрограмматикProduct Minsk - РТБ и Программатик
Product Minsk - РТБ и Программатик
 
Работа с Akka Сluster, @afiskon, scalaby#14
Работа с Akka Сluster, @afiskon, scalaby#14Работа с Akka Сluster, @afiskon, scalaby#14
Работа с Akka Сluster, @afiskon, scalaby#14
 
Cake pattern. Presentation by Alex Famin at scalaby#14
Cake pattern. Presentation by Alex Famin at scalaby#14Cake pattern. Presentation by Alex Famin at scalaby#14
Cake pattern. Presentation by Alex Famin at scalaby#14
 
Scala laboratory: Globus. iteration #3
Scala laboratory: Globus. iteration #3Scala laboratory: Globus. iteration #3
Scala laboratory: Globus. iteration #3
 
Testing in Scala by Adform research
Testing in Scala by Adform researchTesting in Scala by Adform research
Testing in Scala by Adform research
 
Spark Intro by Adform Research
Spark Intro by Adform ResearchSpark Intro by Adform Research
Spark Intro by Adform Research
 
Types by Adform Research, Saulius Valatka
Types by Adform Research, Saulius ValatkaTypes by Adform Research, Saulius Valatka
Types by Adform Research, Saulius Valatka
 
Types by Adform Research
Types by Adform ResearchTypes by Adform Research
Types by Adform Research
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, Paulius
 
Scala Style by Adform Research (Saulius Valatka)
Scala Style by Adform Research (Saulius Valatka)Scala Style by Adform Research (Saulius Valatka)
Scala Style by Adform Research (Saulius Valatka)
 
Spark intro by Adform Research
Spark intro by Adform ResearchSpark intro by Adform Research
Spark intro by Adform Research
 
SBT by Aform Research, Saulius Valatka
SBT by Aform Research, Saulius ValatkaSBT by Aform Research, Saulius Valatka
SBT by Aform Research, Saulius Valatka
 
Scala laboratory: Globus. iteration #2
Scala laboratory: Globus. iteration #2Scala laboratory: Globus. iteration #2
Scala laboratory: Globus. iteration #2
 
Testing in Scala. Adform Research
Testing in Scala. Adform ResearchTesting in Scala. Adform Research
Testing in Scala. Adform Research
 
Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
 
Опыт использования Spark, Основано на реальных событиях
Опыт использования Spark, Основано на реальных событияхОпыт использования Spark, Основано на реальных событиях
Опыт использования Spark, Основано на реальных событиях
 
ETL со Spark
ETL со SparkETL со Spark
ETL со Spark
 

Dernier

9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
20200723_insight_release_plan
20200723_insight_release_plan20200723_insight_release_plan
20200723_insight_release_planJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 

Dernier (20)

9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
20200723_insight_release_plan
20200723_insight_release_plan20200723_insight_release_plan
20200723_insight_release_plan
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 

"Error Recovery" by @alaz at scalaby#8

  • 1. Parsers Error Recovery for Practical Use Parsers Error Recovery for Practical Use Alexander Azarov azarov@osinka.ru / Osinka.ru February 18, 2012
  • 2. Parsers Error Recovery for Practical Use Context Osinka Forum of “BB” kind (2.5M+ pages/day, 8M+ posts total) User generated content: BBCode markup Migrating to Scala Backend
  • 3. Parsers Error Recovery for Practical Use Context Plan Why parser combinators? Why error recovery? Example of error recovery Results
  • 4. Parsers Error Recovery for Practical Use Context BBCode Few BBCode tags [b]bold[/b] <b>bold</b> [i]italic[/i] <i>italic</i> [url=href]text[/url] <a href="href">text</a> [img]href[/img] <img src="href"/>
  • 5. Parsers Error Recovery for Practical Use Context BBCode example example of BBCode Example [quote="Nick"] original [b]text[/b] [/quote] Here it is the reply with [url=http://www.google.com]link[/url]
  • 6. Parsers Error Recovery for Practical Use Task Why parser combinators, 1 Regexp maintenance is a headache Bugs extremely hard to find No markup errors
  • 7. Parsers Error Recovery for Practical Use Task Why parser combinators, 2 One post source, many views HTML render for Web textual view for emails text-only short summary text-only for full-text search indexer
  • 8. Parsers Error Recovery for Practical Use Task Why parser combinators, 3 Post analysis algorithms links (e.g. spam automated analysis) images whatever structure analysis we’d want
  • 9. Parsers Error Recovery for Practical Use Task Universal AST One AST different printers various traversal algorithms
  • 10. Parsers Error Recovery for Practical Use Problem Sounds great. But. This all looks like a perfect world. But what’s the catch??
  • 11. Parsers Error Recovery for Practical Use Problem Sounds great. But. Humans. They do mistakes.
  • 12. Parsers Error Recovery for Practical Use Problem Sounds great. But. Humans. Example They do mistakes. [quote] [url=http://www.google.com] [img]http://www.image.com [/url[/img] [/b]
  • 13. Parsers Error Recovery for Practical Use Problem User-Generated Content: Problem Erroneous markup People do mistakes, But no one wants to see empty post, We have to show something meaningful in any case
  • 14. Parsers Error Recovery for Practical Use Problem Black or White World Scala parser combinators assume valid input Parser result: Success | NoSuccess no error recovery out of the box
  • 15. Parsers Error Recovery for Practical Use Solution Error recovery: our approach Our Parser never breaks It generates “error nodes” instead
  • 16. Parsers Error Recovery for Practical Use Solution Approach: Error nodes Part of AST, FailNode contains the possible causes of the failure They are meaningful for highlighting in editor to mark posts having failures in markup (for moderators/other users to see this)
  • 17. Parsers Error Recovery for Practical Use Solution Approach: input & unpaired tags Assume all input except tags as text E.g. [tag]text[/tag] is a text node Unpaired tags as the last choice: markup errors
  • 18. Parsers Error Recovery for Practical Use Example Example Example
  • 19. Parsers Error Recovery for Practical Use Example Trivial BBCode markup Example (Trivial "one tag" BBCode) Simplest [font=bold]BBCode [font=red]example[/font][/font] has only one tag, font though it may have an argument
  • 20. Parsers Error Recovery for Practical Use Example Corresponding AST AST trait Node case class Text(text: String) extends Node case class Font(arg: Option[String], subnodes: List[Node]) extends Node
  • 21. Parsers Error Recovery for Practical Use Example Parser BBCode parser lazy val nodes = rep(font | text) lazy val text = rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ { texts => Text(texts.mkString) } lazy val font: Parser[Node] = { fontOpen ~ nodes <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => Font(Option(arg), subnodes) } }
  • 22. Parsers Error Recovery for Practical Use Example Valid markup Scalatest describe("parser") { it("keeps spaces") { parse(" ") must equal(Right(Text(" ") :: Nil)) parse(" n ") must equal(Right(Text(" n ") :: Nil)) } it("parses text") { parse("plain text") must equal(Right(Text("plain text") :: Nil)) } it("parses bbcode-like text") { parse("plain [tag] [fonttext") must equal(Right(Text(" plain [tag] [fonttext") :: Nil)) }
  • 23. Parsers Error Recovery for Practical Use Example Invalid markup Scalatest describe("error markup") { it("results in error") { parse("t[/font]") must be(’left) parse("[font]t") must be(’left) } }
  • 24. Parsers Error Recovery for Practical Use Example Recovery: Extra AST node FailNode case class FailNode(reason: String, markup: String) extends Node
  • 25. Parsers Error Recovery for Practical Use Example Recovery: helper methods Explicitly return FailNode protected def failed(reason: String) = FailNode(reason, "") Enrich FailNode with markup protected def recover(p: => Parser[Node]): Parser[Node] = Parser { in => val r = p(in) lazy val markup = in.source.subSequence(in.offset, r.next.offset ).toString r match { case Success(node: FailNode, next) => Success(node.copy(markup = markup), next) case other => other
  • 26. Parsers Error Recovery for Practical Use Example Recovery: Parser rules never break (provide “alone tag” parsers) return FailNode explicitly if needed nodes lazy val nodes = rep(node | missingOpen) lazy val node = font | text | missingClose
  • 27. Parsers Error Recovery for Practical Use Example “Missing open tag” parser Catching alone [/font] def missingOpen = recover { fontClose ^^^ { failed("missing open") } }
  • 28. Parsers Error Recovery for Practical Use Example Argument check font may have limits on argument lazy val font: Parser[Node] = recover { fontOpen ~ rep(node) <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => if (arg == null || allowedFontArgs.contains(arg)) Font( Option(arg), subnodes) else failed("arg incorrect") } }
  • 29. Parsers Error Recovery for Practical Use Example Passes markup error tests Scalatest describe("recovery") { it("reports incorrect arg") { parse("[font=b]t[/font]") must equal(Right( FailNode("arg incorrect", "[font=b]t[/font]") :: Nil )) } it("recovers extra ending tag") { parse("t[/font]") must equal(Right( Text("t") :: FailNode("missing open", "[/font]") :: Nil )) }
  • 30. Parsers Error Recovery for Practical Use Example Passes longer tests Scalatest it("recovers extra starting tag in a longer sequence") { parse("[font][font]t[/font]") must equal(Right( FailNode("missing close", "[font]") :: Font(None, Text("t ") :: Nil) :: Nil )) } it("recovers extra ending tag in a longer sequence") { parse("[font]t[/font][/font]") must equal(Right( Font(None, Text("t") :: Nil) :: FailNode("missing open", " [/font]") :: Nil ))
  • 31. Parsers Error Recovery for Practical Use Example Examples source code Source code, specs: https://github.com/alaz/slides-err-recovery
  • 32. Parsers Error Recovery for Practical Use Results Production use outlines It works reliably Lower maintenance costs Performance (see next slides) Beware: Scala parser combinators are not thread-safe.
  • 33. Parsers Error Recovery for Practical Use Results Performance The biggest problem is performance. Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms Big w/err 76k 136ms 1245ms Workaround: caching
  • 34. Parsers Error Recovery for Practical Use Results Surprise! Never give up find a good motivator instead (e.g. presentation for Scala.by)
  • 35. Parsers Error Recovery for Practical Use Results Performance: success story Want performance? Do not use Lexical Forget those scary numbers! Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms 16ms Big w/err 76k 136ms 1245ms 31ms Thank you, Scala.by!
  • 36. Parsers Error Recovery for Practical Use Results Thank you Email: azarov@osinka.ru Twitter: http://twitter.com/aazarov Source code, specs: https://github.com/alaz/slides-err-recovery