Regular expressions are under-valued and most developers tend to only know the basics. Having a thorough understanding of how regular expressions work, will be incredibly helpful when you need to parse structured data.
This presentation will assume you already know what regular expressions are, but will sum up (with an example) some fancy things you probably didn’t know were possible with regular expressions.
If you're interested in a more detailed write-up, I suggest you check out http://www.mullie.eu/regular-expressions-basics/ & http://www.mullie.eu/regular-expressions-advanced/
This presentation is based on the PHP-implementation of PCRE, but nearly all programming languages support the same functionality, albeit sometimes with their own twists.
4. Regular expressions are special
characters that match or capture
portions of a field, as well as the rules
that govern all characters.
Google
Regular expressions 101 » Introduction
5. A regular expression provides a
concise and flexible means for
"matching" strings of text, such as
particular characters, words, or
patterns of characters.
Wikipedia
Regular expressions 101 » Introduction
8. Neque porro quisquam est qui
dolorem ipsum quia dolor sit amet,
consectetur, adipisci velit...
!
!
‣
/ipsum/
‣
/[a-z]/i
‣
/(est|qui)/
‣
/[^w]/i
Regular expressions 101 » Introduction
23. Back references
Solution: /href=(['"])(.*?)1/i
1 references first subpattern!
!
Don’t forget to also string-escape in PHP:
preg_match('/href=(['"])(.*?)1/i', ...);
Regular expressions 101 » Back references
24. Named subpatterns
Scenario: parsing large CSV
1,a title,5.00,92,green
2,another title,3.50,4,blue
3,one more,33699.99,15,white
...
Regular expressions 101 » Named subpatterns
36. Conditional subpatterns
Solution:
if then else
/<(?P<tag>[a-z]+).*?(?P<self>/)?>(?(self)|.*?</(?P=tag)>)/i
Named patterns
If self-closing, then do nothing,
else, find matching end tag
Regular expressions 101 » Conditional subpatterns
37. Conditional subpatterns
With subpattern (named or by id):
‣
‣
(?(pattern)then)
‣
(?(pattern)then|else)
With lookahead/-behind:
‣
‣
(?(?=assertion)then)
‣
(?(?=assertion)then|else)
Regular expressions 101 » Conditional subpatterns
38. Comments
/
# match currency symbols for USD, EUR, GBP & YEN
[$€£¥]
# must be followed by a number to indicate a price
(?=[0-9])
# pattern modifiers:
# u for UTF-8 interpretation (currency symbols),
# x to ignore whitespace (for comments)
/ux
Regular expressions 101 » Comments