SlideShare une entreprise Scribd logo
1  sur  57
Regular Expressions
      Redux
Scope

• medium to advanced
• 30 minutes
• performance / backtracking irrelevant
• no compatibility charts (yet)
TOC

• basic matching, quantifiers
• character classes, types, properties, anchors
• groups, options, replace string
• look-ahead/behind
• subexpressions
RE overview
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
Quantifiers
Quantifiers
• classic greedy: ?, *, +
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}

  •   + == {1,}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}

  •   + == {1,}

• non-greedy: ??, *?, +?, {5,7}?
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Character Classes /
    Properties
Character Classes /
      Properties
• [0-9a-z]   (classes)
Character Classes /
      Properties
• [0-9a-z]     (classes)
 •   +420[0-9]{9} = simplified czech phone nr.
Character Classes /
      Properties
• [0-9a-z]      (classes)
 •   +420[0-9]{9} = simplified czech phone nr.

 •   don’t: [A-z0-]
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
  •   works great on Unicode text (Latin,Katakana)
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
  •   works great on Unicode text (Latin,Katakana)

• [:alnum:], [:^space:] (POSIX bracket)
Character Types
Character Types
• . == anything (apart from newline)
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode

• d == digit == [0-9]
  •   h == hexadecimal digit == [0-9a-fA-F]
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode

• d == digit == [0-9]
  •   h == hexadecimal digit == [0-9a-fA-F]

• SWD == [^s][^w][^d]
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

           /b[w&&[^aA]]+b/
              /W{2,}w+b/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

           /b[w&&[^aA]]+b/
              /W{2,}w+b/
Anchors
Anchors

• ^ - begining (line, string)
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
• b - word boundary ~ wW (almost)
 •   b.{5}b != Ww{5}W
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
• b - word boundary ~ wW (almost)
 •   b.{5}b != Ww{5}W

• zero width!
Options
Options
• /foo/imsx
 •   i - case insensitive

 •   m - multiline (^,$ represent start of string/file)

 •   s - single line (. matches newlines)

 •   x - extended!

 •   g - global
Options
• /foo/imsx
  •   i - case insensitive

  •   m - multiline (^,$ represent start of string/file)

  •   s - single line (. matches newlines)

  •   x - extended!

  •   g - global

• can be written inline
  •   (?imsx-imsx)

  •   (?imsx-imsx:...)
Options
• /foo/imsx
  •   i - case insensitive

  •   m - multiline (^,$ represent start of string/file)

  •   s - single line (. matches newlines)

  •   x - extended!

  •   g - global                      (?x-i)
                                         #this is cool
• can be written inline                  (
                                            foo #my important value
  •                                         | #don't forget the alternative
      (?imsx-imsx)
                                            bar
  •                                      ) # result equals to (foo|bar)
      (?imsx-imsx:...)
Groups/Replacing
Groups/Replacing
• (...) - matched group
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)

• nested groups ordered by left bracket
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)

• nested groups ordered by left bracket
• (?:...) - non-captured group
  •   useful for (?:foo)+ or (?:foo|bar)
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')

    • foobar
      •   1 -- oo

      •   2 -- o

      •   3 -- bar

      •   4 --
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')

    • foobar                       • man
      •                             •
          1 -- oo                       1 --

      •                             •
          2 -- o                        2 --

      •                             •
          3 -- bar                      3 --

      •                             •
          4 --                          4 -- man
Look-ahead/behind
• defines custom zero-width anchors
Look-ahead/behind
• defines custom zero-width anchors
                   positive negative

          ahead     (?=...)   (?!...)

          behind   (?<=...)   (?<!...)
Example

zdenek@gooddata.com
   /.*?@gooddata/


zdenek@gooddata.com
 /.*?(?=@gooddata)/
Recursive RE

• very important!
 •   quote & bracket matching

 •   technically not part of regular grammar

• two styles
 •   g<name> or g<n> - TextMate

 •   (?R) - Perl
Example
(?x:

 ( # match the initial opening parenthesis

 # Now make a named group 'balanced' which
     # matches a balanced substring.

 (?<balanced>

 
 [^()] # A balanced substring is either something
             # that is not a parenthesis:

 
 | # …or a parenthesised string:

 
 ( # A parenthesised string begins with an opening parenthesis

 
 
 g<balanced>* # …followed by a sequence of balanced substrings

 
 ) # …and ends with a closing parenthesis

 )* # Look for a sequence of balanced substrings

 ) # Finally, the outer closing parenthesis
)
Example
(?x:

 ( # match the initial opening parenthesis

 # Now make a named group 'balanced' which
     # matches a balanced substring.

 (?<balanced>

 
 [^()] # A balanced substring is either something
             # that is not a parenthesis:

 
 | # …or a parenthesised string:

 
 ( # A parenthesised string begins with an opening parenthesis

 
 
 g<balanced>* # …followed by a sequence of balanced substrings

 
 ) # …and ends with a closing parenthesis

 )* # Look for a sequence of balanced substrings

 ) # Finally, the outer closing parenthesis
)

or: (([^()]|(?R))*)

Contenu connexe

En vedette

Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Michal Jurosz
 
Budoucnost Web Aplikaci
Budoucnost Web AplikaciBudoucnost Web Aplikaci
Budoucnost Web AplikaciJakub Nesetril
 
Avoiding API Waterfalls
Avoiding API WaterfallsAvoiding API Waterfalls
Avoiding API WaterfallsJakub Nesetril
 
Consuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimConsuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimJakub Nesetril
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebJakub Nesetril
 
Introduction to GoodData BI PaaS
Introduction to GoodData BI PaaSIntroduction to GoodData BI PaaS
Introduction to GoodData BI PaaSJakub Nesetril
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.jsJakub Nesetril
 
GoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsGoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsJakub Nesetril
 
Real-time Web a NodeJS
Real-time Web a NodeJSReal-time Web a NodeJS
Real-time Web a NodeJSJakub Nesetril
 

En vedette (20)

Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41
 
Budoucnost Web Aplikaci
Budoucnost Web AplikaciBudoucnost Web Aplikaci
Budoucnost Web Aplikaci
 
Startup Accelerators
Startup AcceleratorsStartup Accelerators
Startup Accelerators
 
Harmony in API Design
Harmony in API DesignHarmony in API Design
Harmony in API Design
 
Avoiding API Waterfalls
Avoiding API WaterfallsAvoiding API Waterfalls
Avoiding API Waterfalls
 
Consuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimConsuming API description languages - Refract & Minim
Consuming API description languages - Refract & Minim
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time Web
 
Post-REST Manifesto
Post-REST ManifestoPost-REST Manifesto
Post-REST Manifesto
 
Introduction to GoodData BI PaaS
Introduction to GoodData BI PaaSIntroduction to GoodData BI PaaS
Introduction to GoodData BI PaaS
 
Art of Building APIs
Art of Building APIsArt of Building APIs
Art of Building APIs
 
REST API tools
REST API toolsREST API tools
REST API tools
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.js
 
GoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsGoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for Analytics
 
Pushdown autometa
Pushdown autometaPushdown autometa
Pushdown autometa
 
Let's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScriptLet's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScript
 
Node at Apiary.io
Node at Apiary.ioNode at Apiary.io
Node at Apiary.io
 
API Design Workflows
API Design WorkflowsAPI Design Workflows
API Design Workflows
 
Pda
PdaPda
Pda
 
Apiary
ApiaryApiary
Apiary
 
Real-time Web a NodeJS
Real-time Web a NodeJSReal-time Web a NodeJS
Real-time Web a NodeJS
 

Similaire à Advanced Regular Expressions Redux

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to PerlSway Wang
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondMax Shirshin
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secretsHiro Asari
 
Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009scweng
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressionsBen Brumfield
 
Out with Regex, In with Tokens
Out with Regex, In with TokensOut with Regex, In with Tokens
Out with Regex, In with Tokensscoates
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018Emma Burrows
 
[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And PortKeiichi Daiba
 
Erlang with Regexp Perl And Port
Erlang with Regexp Perl And PortErlang with Regexp Perl And Port
Erlang with Regexp Perl And PortKeiichi Daiba
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...Codemotion
 

Similaire à Advanced Regular Expressions Redux (20)

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And Beyond
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
Lecture2 B
Lecture2 BLecture2 B
Lecture2 B
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
 
Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Out with Regex, In with Tokens
Out with Regex, In with TokensOut with Regex, In with Tokens
Out with Regex, In with Tokens
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018
 
[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port
 
Erlang with Regexp Perl And Port
Erlang with Regexp Perl And PortErlang with Regexp Perl And Port
Erlang with Regexp Perl And Port
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Advanced Regular Expressions Redux

  • 2. Scope • medium to advanced • 30 minutes • performance / backtracking irrelevant • no compatibility charts (yet)
  • 3. TOC • basic matching, quantifiers • character classes, types, properties, anchors • groups, options, replace string • look-ahead/behind • subexpressions
  • 5. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 6. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 7. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 10. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5}
  • 11. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1}
  • 12. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,}
  • 13. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,}
  • 14. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,} • non-greedy: ??, *?, +?, {5,7}?
  • 15. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 16. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 17. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 18. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 19. Character Classes / Properties
  • 20. Character Classes / Properties • [0-9a-z] (classes)
  • 21. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr.
  • 22. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-]
  • 23. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z]
  • 24. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties)
  • 25. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana)
  • 26. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana) • [:alnum:], [:^space:] (POSIX bracket)
  • 28. Character Types • . == anything (apart from newline)
  • 29. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode
  • 30. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode
  • 31. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F]
  • 32. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F] • SWD == [^s][^w][^d]
  • 33. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
  • 34. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
  • 36. Anchors • ^ - begining (line, string)
  • 37. Anchors • ^ - begining (line, string) • $ - end (line, string)
  • 38. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W
  • 39. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W • zero width!
  • 41. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global
  • 42. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global • can be written inline • (?imsx-imsx) • (?imsx-imsx:...)
  • 43. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global (?x-i) #this is cool • can be written inline ( foo #my important value • | #don't forget the alternative (?imsx-imsx) bar • ) # result equals to (foo|bar) (?imsx-imsx:...)
  • 46. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended)
  • 47. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket
  • 48. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket • (?:...) - non-captured group • useful for (?:foo)+ or (?:foo|bar)
  • 50. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • 1 -- oo • 2 -- o • 3 -- bar • 4 --
  • 51. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • man • • 1 -- oo 1 -- • • 2 -- o 2 -- • • 3 -- bar 3 -- • • 4 -- 4 -- man
  • 53. Look-ahead/behind • defines custom zero-width anchors positive negative ahead (?=...) (?!...) behind (?<=...) (?<!...)
  • 54. Example zdenek@gooddata.com /.*?@gooddata/ zdenek@gooddata.com /.*?(?=@gooddata)/
  • 55. Recursive RE • very important! • quote & bracket matching • technically not part of regular grammar • two styles • g<name> or g<n> - TextMate • (?R) - Perl
  • 56. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis )
  • 57. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis ) or: (([^()]|(?R))*)

Notes de l'éditeur

  1. escaping???
  2. escaping???
  3. escaping???
  4. examples! possessive (?+, *+, ++)
  5. examples! possessive (?+, *+, ++)
  6. examples! possessive (?+, *+, ++)
  7. examples! possessive (?+, *+, ++)
  8. examples! possessive (?+, *+, ++)
  9. examples! possessive (?+, *+, ++)
  10. unicode compat table!
  11. unicode compat table!
  12. unicode compat table!
  13. unicode compat table!
  14. unicode compat table!
  15. unicode compat table!
  16. unicode compat table!
  17. notice the space at the end, capital reverses
  18. notice the space at the end, capital reverses
  19. notice the space at the end, capital reverses
  20. notice the space at the end, capital reverses
  21. notice the space at the end, capital reverses
  22. how about /g??
  23. how about /g??
  24. how about /g??