SlideShare une entreprise Scribd logo
1  sur  67
Télécharger pour lire hors ligne
Secrets of Regexp
      Hiro Asari
     Red Hat, Inc.
Let's Talk About
Regular Expressions
Let's Talk About
  Regular Expressions


• There is no regular expression
Let's Talk About
  Regular Expressions


• A good approximation as a name
Let's Talk About
     Regexp
Some people, when confronted
         with a problem, think, "I know,
          I'll use regular expressions."
        Now they have two problems.

                                                              Jaime Zawinski
                                                                 12 Aug, 1997




http://regex.info/blog/2006-09-15/247
http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-
problems.html

The point is not so much the evils of regular expressions, but the evils of overuse of it.
Formal Language
         Theory

• The Language L
• Over Alphabet Σ
Formal Language
          Theory

• Alphabet Σ={a, b, c, d, e, …, z, λ} (example)
Formal Language
          Theory

• Alphabet Σ={a, b, c, d, e, …, z, λ} (example)
• Words over Σ: "a", "b", "ab", "aequafdhfad"
Formal Language
          Theory

• Alphabet Σ={a, b, c, d, e, …, z, λ} (example)
• Words over Σ: "a", "b", "ab", "aequafdhfad"
• Σ*: The set of all words over Σ
Formal Language
         over Σ

• A subset L of Σ* (with various properties)
• L can be finite, and enumerate well-formed
  words, but often infinite
Example

• Language L over Σ = {a,b}
• 'a' is a word
• a word may be obtained by appending 'ab'
  to an existing word
• only words thus formed are legal
Well-formed words
a
aab
aabab
Ill-formed words
b
aaaab
abb
Succinctly…


• a(ab)*
Expression

• Textual representation of the formal
  language against which an input is tested
  whether it is a well-formed word in that
  language
Regular Languages
• ∅ (empty language) is regular
Regular Languages
• ∅ (empty language) is regular
• For each a ∈ Σ (a belongs to Σ), the
  singleton language {a} is a regular language.
Regular Languages
• ∅ (empty language) is regular
• For each a ∈ Σ (a belongs to Σ), the
  singleton language {a} is a regular language.
• If A and B are regular languages, then A ∪ B
  (union), A•B (concatenation), and A*
  (Kleene star) are regular languages
Regular Languages
• ∅ (empty language) is regular
• For each a ∈ Σ (a belongs to Σ), the
  singleton language {a} is a regular language.
• If A and B are regular languages, then A ∪ B
  (union), A•B (concatenation), and A*
  (Kleene star) are regular languages
• No other languages over Σ are regular.
Regular Expressions


• Expressions of regular languages
Regular Expressions



              ot
• Expressions of regular languages
             N
Regular? Expressions

• It turns out that some expressions are
  more powerful and expresses non-regular
  languages
• Language of 'squares': (.*)1
 • a, aa, aaaa, WikiWiki
How does Regexp
        work?

• Build a finite state automaton representing
  a given regular expression
• Feed the String to the regular expression
  and see if the match succeeds
a




a
ab*




 a

      b
.*




.
a$




a        $
a?




a

     ε
a|b



a



b
(ab|c)



a            b



      c
(ab+|c)

       b

a             b



       c
Match is attempted at
every character, left to
        right
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^
         zyxwvutsrqponmlkjihgfedcba
           ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^
         zyxwvutsrqponmlkjihgfedcba
           ^
         zyxwvutsrqponmlkjihgfedcba
             ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^
         zyxwvutsrqponmlkjihgfedcba
           ^
         zyxwvutsrqponmlkjihgfedcba
             ^
         zyxwvutsrqponmlkjihgfedcba
               ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^
         zyxwvutsrqponmlkjihgfedcba
           ^
         zyxwvutsrqponmlkjihgfedcba
             ^
         zyxwvutsrqponmlkjihgfedcba
               ^
         ⋮
         zyxwvutsrqponmlkjihgfedcba
                                  ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
^s*(.*)s*$
         abc d a dfadg
^
     abc d a dfadg
 ^
      abc d a dfadg
     ^
      abc d a dfadg
      ^

# matches 'abc d a dfadg   '
a?a?a?…a?aaa…a
def pathological(n=5)
  Regexp.new('a?' * n + 'a' * n)
end


1.upto(40) do |n|
  print n, ": "
  print Time.now, "n" if 'a'*n =~ pathological(n)
end
a?a?a?aaa
aaa
^
Regexp tips
Use /x
UP_TO_256 = /b(?:25[0-5]   #   250-255
|2[0-4][0-9]                #   200-249
|1[0-9][0-9]                #   100-199
|[1-9][0-9]                 #   2-digit numbers
|[0-9])                     #   single-digit numbers
b/x

IPV4_ADDRESS = /#{UP_TO_256}(?:.#{UP_TO_256}){3}/
A, z for strings
       ^, $ for lines
• A: the beginning of the string
• z: the end of the string
• ^: after n
• $: before n
A, z for strings
       ^, $ for lines
• A: the beginning of the string
• z: the end of the string
• ^: after n
• $: before n                      always in Ruby
What's the problem?




also note the difference in what /m means
What's the problem?
         #! /usr/bin/env perl
         $a = "abcndef";
         if ($a =~ /^d/) {
           print "yesn";
         }
         if ($a =~ /^d/m) {
           print "yes nown";
         }
         # prints 'yes now'




also note the difference in what /m means
What's the problem?
         #! /usr/bin/env ruby

         a = "abcndef";
         if (a =~ /^d/)
           p "yes"
         end




http://guides.rubyonrails.org/security.html#regular-expressions
Security Implications
         class File < ActiveRecord::Base
           validates :name, :format => /^[w.-+]+$/
         end




http://guides.rubyonrails.org/security.html#regular-expressions
file.txt%0A<script>alert(‘hello’)</script>
file.txt%0A<script>alert(‘hello’)</script>
file.txtn<script>alert(‘hello’)</script>
file.txtn<script>alert(‘hello’)</script>


             /^[w.-+]+$/
file.txtn<script>alert(‘hello’)</script>


             /^[w.-+]+$/



            Match succeeds
    ActiveRecord validation succeeds
file.txtn<script>alert(‘hello’)</script>


            /A[w.-+]+z/
file.txtn<script>alert(‘hello’)</script>


            /A[w.-+]+z/



               Match fails
       ActiveRecord validation fails
Prefer Character Class
     to Alterations
require 'benchmark'

# simple benchmark for alternations and character class

n = 5_000

str = 'cafebabedeadbeef'*5_000

Benchmark.bmbm do |x|
     x.report('alternation') do
          str =~ /^(a|b|c|d|e|f)+$/
     end
     x.report('character class') do
          str =~ /^[a-f]+$/
     end
end
Benchmarks
Ruby 1.8.7
                      user     system      total         real
alternation       0.030000   0.010000   0.040000 (   0.036702)
character class   0.000000   0.000000   0.000000 (   0.004704)

Ruby 2.0.0
                      user     system      total         real
alternation       0.020000   0.010000   0.030000 (   0.023139)
character class   0.000000   0.000000   0.000000 (   0.009641)

JRuby 1.7.4.dev
                      user     system      total       real
alternation       0.030000   0.000000   0.030000 ( 0.021000)
character class   0.010000   0.000000   0.010000 ( 0.007000)
Beware of Character
                 Classes
         # case-insensitively match any non-word character…

         # one is unlike the others
         'r' =~ /(?i:[W])/
         's' =~ /(?i:[W])/     matches, even if 's' is a word character
         't' =~ /(?i:[W])/




https://bugs.ruby-lang.org/issues/4044
/^1?$|^(11+?)1+$/
/^1?$|^(11+?)1+$/
    Matches '1' or ''
/^1?$|^(11+?)1+$/
Non-greedily match 2 or more 1's
/^1?$|^(11+?)1+$/

1 or more additional times
/^1?$|^(11+?)1+$/

matches a composite number
/^1?$|^(11+?)1+$/
Matches a string of 1's if and only
if there are a non-prime # of 1's
Integer#prime?
          class Integer
            def prime?
              "1" * self !~ /^1?$|^(11+?)1+$/
            end
          end




                         No performance guarantee




Attributed a Perl hacker Abigail
• @hiro_asari
• Github: BanzaiMan

Contenu connexe

Tendances

Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Andrea Telatin
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeProf. Wim Van Criekinge
 
Intro to pattern matching in scala
Intro to pattern matching in scalaIntro to pattern matching in scala
Intro to pattern matching in scalaJan Krag
 
Hw1 rubycalisthenics
Hw1 rubycalisthenicsHw1 rubycalisthenics
Hw1 rubycalisthenicsshelton88
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to PerlDave Cross
 

Tendances (7)

Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
 
Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
 
Intro to pattern matching in scala
Intro to pattern matching in scalaIntro to pattern matching in scala
Intro to pattern matching in scala
 
Hw1 rubycalisthenics
Hw1 rubycalisthenicsHw1 rubycalisthenics
Hw1 rubycalisthenics
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 

Similaire à Regexp secrets

Regular expressions
Regular expressionsRegular expressions
Regular expressionsJames Gray
 
Lecture 23
Lecture 23Lecture 23
Lecture 23rhshriva
 
Crash Course in Perl – Perl tutorial for C programmers
Crash Course in Perl – Perl tutorial for C programmersCrash Course in Perl – Perl tutorial for C programmers
Crash Course in Perl – Perl tutorial for C programmersGil Megidish
 
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;dankogai
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP StringsAhmed Swilam
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondMax Shirshin
 
Ruby for perl developers
Ruby for perl developersRuby for perl developers
Ruby for perl developersIdo Kanner
 
Basic perl programming
Basic perl programmingBasic perl programming
Basic perl programmingThang Nguyen
 
Advanced Regular Expressions Redux
Advanced Regular Expressions ReduxAdvanced Regular Expressions Redux
Advanced Regular Expressions ReduxJakub Nesetril
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...Codemotion
 
Zend Certification Preparation Tutorial
Zend Certification Preparation TutorialZend Certification Preparation Tutorial
Zend Certification Preparation TutorialLorna Mitchell
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionskeeyre
 

Similaire à Regexp secrets (20)

Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Lecture 23
Lecture 23Lecture 23
Lecture 23
 
Crash Course in Perl – Perl tutorial for C programmers
Crash Course in Perl – Perl tutorial for C programmersCrash Course in Perl – Perl tutorial for C programmers
Crash Course in Perl – Perl tutorial for C programmers
 
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And Beyond
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
regex.ppt
regex.pptregex.ppt
regex.ppt
 
Ruby for perl developers
Ruby for perl developersRuby for perl developers
Ruby for perl developers
 
Perl_Tutorial_v1
Perl_Tutorial_v1Perl_Tutorial_v1
Perl_Tutorial_v1
 
Perl_Tutorial_v1
Perl_Tutorial_v1Perl_Tutorial_v1
Perl_Tutorial_v1
 
Basic perl programming
Basic perl programmingBasic perl programming
Basic perl programming
 
Advanced Regular Expressions Redux
Advanced Regular Expressions ReduxAdvanced Regular Expressions Redux
Advanced Regular Expressions Redux
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 
Zend Certification Preparation Tutorial
Zend Certification Preparation TutorialZend Certification Preparation Tutorial
Zend Certification Preparation Tutorial
 
First steps in C-Shell
First steps in C-ShellFirst steps in C-Shell
First steps in C-Shell
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Cleancode
CleancodeCleancode
Cleancode
 
PHP_Lecture.pdf
PHP_Lecture.pdfPHP_Lecture.pdf
PHP_Lecture.pdf
 

Plus de Hiro Asari

JRuby: Enhancing Java Developers' Lives
JRuby: Enhancing Java Developers' LivesJRuby: Enhancing Java Developers' Lives
JRuby: Enhancing Java Developers' LivesHiro Asari
 
Spring into rails
Spring into railsSpring into rails
Spring into railsHiro Asari
 
Rubyを持て、世界に出よう
Rubyを持て、世界に出ようRubyを持て、世界に出よう
Rubyを持て、世界に出ようHiro Asari
 
Using Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRBUsing Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRBHiro Asari
 
JRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the CloudJRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the CloudHiro Asari
 

Plus de Hiro Asari (7)

JRuby: Enhancing Java Developers' Lives
JRuby: Enhancing Java Developers' LivesJRuby: Enhancing Java Developers' Lives
JRuby: Enhancing Java Developers' Lives
 
JRuby and You
JRuby and YouJRuby and You
JRuby and You
 
Spring into rails
Spring into railsSpring into rails
Spring into rails
 
Rubyを持て、世界に出よう
Rubyを持て、世界に出ようRubyを持て、世界に出よう
Rubyを持て、世界に出よう
 
Pi
PiPi
Pi
 
Using Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRBUsing Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRB
 
JRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the CloudJRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the Cloud
 

Dernier

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Dernier (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Regexp secrets

  • 1. Secrets of Regexp Hiro Asari Red Hat, Inc.
  • 3. Let's Talk About Regular Expressions • There is no regular expression
  • 4. Let's Talk About Regular Expressions • A good approximation as a name
  • 6. Some people, when confronted with a problem, think, "I know, I'll use regular expressions." Now they have two problems. Jaime Zawinski 12 Aug, 1997 http://regex.info/blog/2006-09-15/247 http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two- problems.html The point is not so much the evils of regular expressions, but the evils of overuse of it.
  • 7. Formal Language Theory • The Language L • Over Alphabet Σ
  • 8. Formal Language Theory • Alphabet Σ={a, b, c, d, e, …, z, λ} (example)
  • 9. Formal Language Theory • Alphabet Σ={a, b, c, d, e, …, z, λ} (example) • Words over Σ: "a", "b", "ab", "aequafdhfad"
  • 10. Formal Language Theory • Alphabet Σ={a, b, c, d, e, …, z, λ} (example) • Words over Σ: "a", "b", "ab", "aequafdhfad" • Σ*: The set of all words over Σ
  • 11. Formal Language over Σ • A subset L of Σ* (with various properties) • L can be finite, and enumerate well-formed words, but often infinite
  • 12. Example • Language L over Σ = {a,b} • 'a' is a word • a word may be obtained by appending 'ab' to an existing word • only words thus formed are legal
  • 16. Expression • Textual representation of the formal language against which an input is tested whether it is a well-formed word in that language
  • 17. Regular Languages • ∅ (empty language) is regular
  • 18. Regular Languages • ∅ (empty language) is regular • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language.
  • 19. Regular Languages • ∅ (empty language) is regular • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language. • If A and B are regular languages, then A ∪ B (union), A•B (concatenation), and A* (Kleene star) are regular languages
  • 20. Regular Languages • ∅ (empty language) is regular • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language. • If A and B are regular languages, then A ∪ B (union), A•B (concatenation), and A* (Kleene star) are regular languages • No other languages over Σ are regular.
  • 21. Regular Expressions • Expressions of regular languages
  • 22. Regular Expressions ot • Expressions of regular languages N
  • 23. Regular? Expressions • It turns out that some expressions are more powerful and expresses non-regular languages • Language of 'squares': (.*)1 • a, aa, aaaa, WikiWiki
  • 24. How does Regexp work? • Build a finite state automaton representing a given regular expression • Feed the String to the regular expression and see if the match succeeds
  • 25. a a
  • 26. ab* a b
  • 27. .* .
  • 28. a$ a $
  • 29. a? a ε
  • 31. (ab|c) a b c
  • 32. (ab+|c) b a b c
  • 33. Match is attempted at every character, left to right
  • 34. /a$/ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 35. /a$/ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 36. /a$/ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 37. /a$/ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 38. /a$/ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ ⋮ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 39. ^s*(.*)s*$ abc d a dfadg ^ abc d a dfadg ^ abc d a dfadg ^ abc d a dfadg ^ # matches 'abc d a dfadg '
  • 40. a?a?a?…a?aaa…a def pathological(n=5) Regexp.new('a?' * n + 'a' * n) end 1.upto(40) do |n| print n, ": " print Time.now, "n" if 'a'*n =~ pathological(n) end
  • 43. Use /x UP_TO_256 = /b(?:25[0-5] # 250-255 |2[0-4][0-9] # 200-249 |1[0-9][0-9] # 100-199 |[1-9][0-9] # 2-digit numbers |[0-9]) # single-digit numbers b/x IPV4_ADDRESS = /#{UP_TO_256}(?:.#{UP_TO_256}){3}/
  • 44. A, z for strings ^, $ for lines • A: the beginning of the string • z: the end of the string • ^: after n • $: before n
  • 45. A, z for strings ^, $ for lines • A: the beginning of the string • z: the end of the string • ^: after n • $: before n always in Ruby
  • 46. What's the problem? also note the difference in what /m means
  • 47. What's the problem? #! /usr/bin/env perl $a = "abcndef"; if ($a =~ /^d/) { print "yesn"; } if ($a =~ /^d/m) { print "yes nown"; } # prints 'yes now' also note the difference in what /m means
  • 48. What's the problem? #! /usr/bin/env ruby a = "abcndef"; if (a =~ /^d/) p "yes" end http://guides.rubyonrails.org/security.html#regular-expressions
  • 49. Security Implications class File < ActiveRecord::Base   validates :name, :format => /^[w.-+]+$/ end http://guides.rubyonrails.org/security.html#regular-expressions
  • 54. file.txtn<script>alert(‘hello’)</script> /^[w.-+]+$/ Match succeeds ActiveRecord validation succeeds
  • 56. file.txtn<script>alert(‘hello’)</script> /A[w.-+]+z/ Match fails ActiveRecord validation fails
  • 57. Prefer Character Class to Alterations require 'benchmark' # simple benchmark for alternations and character class n = 5_000 str = 'cafebabedeadbeef'*5_000 Benchmark.bmbm do |x| x.report('alternation') do str =~ /^(a|b|c|d|e|f)+$/ end x.report('character class') do str =~ /^[a-f]+$/ end end
  • 58. Benchmarks Ruby 1.8.7 user system total real alternation 0.030000 0.010000 0.040000 ( 0.036702) character class 0.000000 0.000000 0.000000 ( 0.004704) Ruby 2.0.0 user system total real alternation 0.020000 0.010000 0.030000 ( 0.023139) character class 0.000000 0.000000 0.000000 ( 0.009641) JRuby 1.7.4.dev user system total real alternation 0.030000 0.000000 0.030000 ( 0.021000) character class 0.010000 0.000000 0.010000 ( 0.007000)
  • 59. Beware of Character Classes # case-insensitively match any non-word character… # one is unlike the others 'r' =~ /(?i:[W])/ 's' =~ /(?i:[W])/ matches, even if 's' is a word character 't' =~ /(?i:[W])/ https://bugs.ruby-lang.org/issues/4044
  • 61. /^1?$|^(11+?)1+$/ Matches '1' or ''
  • 63. /^1?$|^(11+?)1+$/ 1 or more additional times
  • 65. /^1?$|^(11+?)1+$/ Matches a string of 1's if and only if there are a non-prime # of 1's
  • 66. Integer#prime? class Integer def prime? "1" * self !~ /^1?$|^(11+?)1+$/ end end No performance guarantee Attributed a Perl hacker Abigail