SlideShare une entreprise Scribd logo
1  sur  44
Regular Expressions in PHP /(?:dave@davidstocktoncom)/ Front Range PHP User Group David Stockton
What is a regular expression? A pattern used to describe a part of some text “Regular” has some implications to how it can be built, but that’s not really part of this presentation Extremely powerful and useful (And often abused)
Regex Joke A programmer says, “I have a problem that I can solve with regular expressions.” Now, he has two problems…
How to use regex in PHP	 The preg_* functions Perl compatible regular expressions. Probably the most common regex syntax The ereg_* functions POSIX style regular expressions I am not covering these functions. Don’t use the ereg ones.  They are deprecated in PHP 5.3.
How can we use regex in PHP? preg_match( ) – Searches a subject for a match preg_match_all( ) – Searches a subject for all matches preg_replace( ) – Searches a subject for a pattern and replaces it with something else preg_split( ) – Split a string into an array based on a regex delimiter preg_filter( ) – Identical to preg_replace except it returns only the matches preg_replace_callback( ) – Like preg_replace, but replacement is defined in a callback preg_grep( ) – Returns an array of array elements that match a pattern
How can we use regex in PHP? preg_quote( ) – Quotes regular expression characters preg_last_error( ) – Returns the error code of the last PCRE (Perl Compatible Regular Expression) function execution
How can we use regex in PHP? Those are the function calls, and we’ll play with the later. First, we need to learn how to create regex patterns since we need those for any function call.
Starting Pattern /[A-Z0-9_+=]+@[A-Z0-9-][A-Z]{2,4}/i This matches a series of letters, numbers, plus, dash, dots, underscores and equals, followed by an “AT” (@) sign, followed by a series of letters, numbers, dots and dashes, followed by a dot, followed by 2 to 4 letters. In other words…  It matches an email address…  Or rather some email addresses.
Matching Email Addresses What about james@smithsonian.museum? What about freddie@wherecanI.travel? Both of those are valid email addresses, but they fail because our patter only allows 2-4 character TLD parts for the email address. How can we match all valid email addresses and only valid email addresses?
The “real” email address regex (?:(?:)?[ ])*(?:(?:(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ] )+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?: )?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:( ?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ])*(?:[^()<>@,;:". 00- 31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*](?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+ (?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?: (?:)?[ ])*))*|(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+| |(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:) ?[ ])*)*lt;(?:(?:)?[ ])*(?:@(?:[^()<>@,;:". 00-31]+(?:(?:(?:r)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:) ?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ] )*))*(?:,@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])* )(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ] )+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*))*) *:(?:(?:)?[ ])*)?(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+ ||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?: )?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?: )?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[  ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31 ]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*( ?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(? :(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(? :)?[ ])*))*gt;(?:(?:)?[ ])*)|(?:[^()<>@,;:". 00-31]+(?:(? :(?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)? [ ]))*"(?:(?:)?[ ])*)*:(?:(?:)?[ ])*(?:(?:(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]| .|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<> @,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|" (?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ] )*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;: ".]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(? :[^()<>@,;:". 00-
The “real” email address regex cont. 31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:". ]))|([^]|.)*(?:(?:)?[ ])*))*|(?:[^()<>@,;:". 00- 31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|( ?:(?:)?[ ]))*"(?:(?:)?[ ])*)*lt;(?:(?:)?[ ])*(?:@(?:[^()<>@,; :". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([ ^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:" . 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]]|.)*(?:(?:)?[ ])*))*(?:,@(?:(?:)?[ ])*(?:[^()<>@,;:".[ 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^r]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^] |.)*(?:(?:)?[ ])*))*)*:(?:(?:)?[ ])*)?(?:[^()<>@,;:".  00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]| .|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@, ;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|"(? :[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ])* (?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:". ]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[ ^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:". ]))|([^]|.)*(?:(?:)?[ ])*))*gt;(?:(?:)?[ ])*)(?:,*( ?:(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;: ".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)(?:(?:( ?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=[ "()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[  ])*))*@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[  ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*)(? :(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+| |(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*))*|(?: [^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)*lt;(?:(?:) ?[ ])*(?:@(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=[" ()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:) ?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<> @,;:".]))|([^]|.)*(?:(?:)?[ ])*))*(?:,@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@, ;:".]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ] )*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;: ".]))|([^]|.)*(?:(?:)?[ ])*))*)*:(?:(?:)?[ ])*)? (?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:". ]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)(?:(?:(?: )?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=[ "()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ]) *))*@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ]) +||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*)(?:.(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+| |(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*))*gt;(?:( ?:)?[ ])*))*)?;*)
So…  How do we write this? Don’t.  Other much more simple patterns have been written and will match 99.9% of valid email addresses. Use something like Zend_Validate_EmailAddress
So now the real learnin’… Letters and numbers match…  letters and numbers /a/ - Matches a string that contains an “a” /7/ - Matches a string that contains a 7.
More learnin’ Match a word /regex/  - Matches a string with the word “regex” in it You can use a pipe character to give a choice /pizza|steak|cheeseburger/ - Matches a string with any of these foods
Delimiters The examples so far have started with / and ended with /. These are delimiters and let the regex engine know where the pattern starts and ends. You can choose another delimiter if you’d like or if it’s more convenient Match namespace: #/My/PHP/Namespace# If I used “/” in that example, I’d need to escape each of the forward slashes to differentiate them from the delimiter
Character Matching Continued You can match a selection of characters /[Pp][Hh][Pp]/  - Matches PHP in any mixture of upper and lowercase Ranges can be defined /[abcdefghijklmnopqrstuvwxyz]/ - Matches any lowercase alpha character /[a-z]/ - Matches any lowercase alpha character
Character Selection Ranges Ranges can be combined /[A-Za-z0-9]/ - Matches an alphanumeric character /[A-Fa-f0-9]/ - Matches any hex character Character Selection can be inversed /[^0-9]/ - Matches any non-digit character /[^ ]/  - Matches any non space character /[.!@#$%^*]/ - Matches some punctuation
Special Characters Dot (.) matches any character /./ /../ - Matches any two characters To match an actual dot character, you must escape // - Matches a single dot character Unless it’s a character selection /[.]/ - Matches a single dot character
Character classes  means [0-9]  means non-digits  - [^0-9]  means word characters - [A-Za-z0-9_]  means non word characters – [^A-Za-z0-9_]  means a whitespace character [ ]  means non white space characters
Repeating Character Classes Match two digits in a row // /[0-9][0-9]/ /{2}/ /[0-9]{2}/ Match at least one digit (but as many as it can) /+/ Match 0 to infinite digits /*/
Repeating Character Classes cont. * means match 0 or more + means match 1 or more {x} where x is a number means match exactly x of the preceding selection {x,} means match at least x {x,y} means match between x and y {,y} means match up to y
More special characters ? Means the preceding selection is optional Putting it together Telephone Number /?({3})?[-]?({3})[-]?({4})/ Matches 720-675-7471 or (720)675-7471 or (720) 675-7471 or 7206757471 or 720 675 7471 Find a misspelled word (and get great deals on EBay) /la[bp]topcomputer[s]?/
Regex Anchors Anchors allow you to specify a position, like before, after or in between characters /^ab/ matches abcdefg but not cab Notice that it’s the caret character…  It means start of the string in this context, but means the opposite of a character class inside the square brackets /ab$/ matches cab but not abcdefg /^[a-z]+$/ will match a string that consists only of lowercase characters
Word Boundaries  means word boundaries Before first character if first character is a word character After last character if last character is a word character Between two characters if one is a word character and the other is not /fish/ matches fish, but not fisherman or catfish. /fish/ matches fish and catfish
Alternation /cow|boy/ - Matches cow, or boy or cowboy or coward, etc /(cow|boy)/ - Matches cow or boy but not cowboy or coward The above example also captures the matching word due to the parens.  More on this later.
Greedy vs Lazy By default, regular expressions are greedy…   That is, they will match as much as they can Grab a starting html tag: /<.+>/  Matches in bold:  <h1>Welcome to FRPUG</h1> Not what we want Make it lazy:  /<.+?>/ Now it matches <h1>Welcome to FRPUG</h1>
Another tag matching solution /<[^>]+>/ Literally match a less than character followed by one or more non-greater than characters followed by a greater than character This way eliminates the need for the engine to backtrack (almost certainly faster than the last example).
Capturing part of regex (backreference) /__(construct|destruct)/ Backreference will contain either construct or destruct so you can use it later /([a-z]+)/ Matches groups of repeated characters that repeat an even number of times. Matches aa but not a.  Matches aaaaa /([a-z]{3})/ Matches words like booboo or bambam
Backreference Continued… Very useful when performing regex search and replace preg_replace('/?({3})?[-]?({3})[-]?({4})/', '() -', $phone) The above example will take any phone number from the previous example and return it formatted in (xxx) xxx-xxxx format
More backreferences… Replace duplicated words that that have been inadvertently left in
Non-capturing groups Match an IPv4 address /((?:{1,3}){3}{1,3})/ We’re matching 1 to 3 digits followed by a dot 3 times.  We don’t care (right now) about the octets, we just want to repeat the match, so ?: says to not capture the group.
Pattern Modifiers	 Modifiers go after the last delimiter (now you know why there are delimiters) and affect how the regex engine works i – case insensitive matching (matches are case-sensitive by default) m – multiline matching s -  dot matches all characters, including  x – ignore all whitespace characters except if escaped or in a character class
Pattern Modifiers Continued… D – Anchor for the end of the string only, otherwise $ matches  characters Allow username to be alphabetic only /^[A-Za-z]$/ - This will match daveextra stuff However, /^[A-Za-z]$/D will not match U – Invert the meaning of the greediness.  With this on by default matches are lazy and ? makes it greedy. There are lots of other modifiers and you can see them at http://us2.php.net/manual/en/reference.pcre.pattern.modifiers.php
Named Capture Groups Rather than get back a numbered array of matches, get back an associative array. If you add a new capture group, you don’t have to renumber where you use the capture group
Named Capture Groups cont… Use (?P<named_group>pattern)
Named Capture Groups cont… Combined numbered and associative array Capture group 0 is the wholepattern that is matched. If our string to match against was abcde720-675 7471foobar, $matches[0] will contain720-675 7471
Positive Look Ahead Matches Look for a pattern follow by another pattern /p(?=h)/ - Match  a “p” followed by an “h” but don’t include the “h”
Negative Look Ahead Look for a pattern which is not followed by some other pattern /p(?=!h)/ - pnot followed by h.
Look Aheads Positive and negative look aheads do not capture anything.  They just determine if the pattern match is possible They are zero-width /p[^h]/ is not the same as /p(?!h)/ /ph/ is not the same as /p(?=h)/
Look behinds Positive look behind /(?<=oo)d/ - d which is preceded by oo Matches “food”, “mood”, match only contains the “d” Negative look behind /(?<!oo)d/ - d which is not preceded by oo Matches “dude”, “crude”, and “d”
With great power… Test your regular expressions before they go to production It’s much                 easier to get them wrong than to get themright if you                don’t test
When to not use regex Whenever they aren’t needed. If you can use strstr or strpos or str_replace to do the job, do that. They are much faster, much simpler and easier to do correctly. However, if you cannot use those functions, regex may be your best bet. Don’t use regex when you really need a parser
Resources http://regular-expressions.info http://us2.php.net/manual/en/ref.pcre.php Spider Man from http://www.onlineseats.com/
Questions? dave@frontrangephp.org

Contenu connexe

Tendances (20)

Lesson 5 php operators
Lesson 5   php operatorsLesson 5   php operators
Lesson 5 php operators
 
Introduction to JavaScript
Introduction to JavaScriptIntroduction to JavaScript
Introduction to JavaScript
 
Php mysql ppt
Php mysql pptPhp mysql ppt
Php mysql ppt
 
Introduction to HTML and CSS
Introduction to HTML and CSSIntroduction to HTML and CSS
Introduction to HTML and CSS
 
Javascript
JavascriptJavascript
Javascript
 
Java Script ppt
Java Script pptJava Script ppt
Java Script ppt
 
HTML CSS & Javascript
HTML CSS & JavascriptHTML CSS & Javascript
HTML CSS & Javascript
 
JavaScript Control Statements I
JavaScript Control Statements IJavaScript Control Statements I
JavaScript Control Statements I
 
Introduction of Html/css/js
Introduction of Html/css/jsIntroduction of Html/css/js
Introduction of Html/css/js
 
Introducing CSS Grid
Introducing CSS GridIntroducing CSS Grid
Introducing CSS Grid
 
Stored procedure
Stored procedureStored procedure
Stored procedure
 
Css pseudo-classes
Css pseudo-classesCss pseudo-classes
Css pseudo-classes
 
Responsive web-design through bootstrap
Responsive web-design through bootstrapResponsive web-design through bootstrap
Responsive web-design through bootstrap
 
Css Ppt
Css PptCss Ppt
Css Ppt
 
HTML Forms
HTML FormsHTML Forms
HTML Forms
 
Arrays in PHP
Arrays in PHPArrays in PHP
Arrays in PHP
 
JavaScript - Chapter 11 - Events
 JavaScript - Chapter 11 - Events  JavaScript - Chapter 11 - Events
JavaScript - Chapter 11 - Events
 
Web Development using HTML & CSS
Web Development using HTML & CSSWeb Development using HTML & CSS
Web Development using HTML & CSS
 
Html / CSS Presentation
Html / CSS PresentationHtml / CSS Presentation
Html / CSS Presentation
 
Css selectors
Css selectorsCss selectors
Css selectors
 

Similaire à Regular expressions and php

Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 
Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsDanny Bryant
 
Scala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameScala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameAntony Stubbs
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/ibrettflorio
 
Coffee 'n code: Regexes
Coffee 'n code: RegexesCoffee 'n code: Regexes
Coffee 'n code: RegexesPhil Ewels
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaghu nath
 
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressionsmussawir20
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracleLogan Palanisamy
 
Looking for Patterns
Looking for PatternsLooking for Patterns
Looking for PatternsKeith Wright
 
Regular Expressions 2007
Regular Expressions 2007Regular Expressions 2007
Regular Expressions 2007Geoffrey Dunn
 
Javascript正则表达式
Javascript正则表达式Javascript正则表达式
Javascript正则表达式ji guang
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20Max Kleiner
 
RegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing ExamplesRegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing Exampleszeteo12
 

Similaire à Regular expressions and php (20)

Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
 
PHP Regular Expressions
PHP Regular ExpressionsPHP Regular Expressions
PHP Regular Expressions
 
2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex
 
Grokking regex
Grokking regexGrokking regex
Grokking regex
 
Scala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameScala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love Game
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
 
Coffee 'n code: Regexes
Coffee 'n code: RegexesCoffee 'n code: Regexes
Coffee 'n code: Regexes
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressions
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
 
Looking for Patterns
Looking for PatternsLooking for Patterns
Looking for Patterns
 
Regular Expressions 2007
Regular Expressions 2007Regular Expressions 2007
Regular Expressions 2007
 
PHP Web Programming
PHP Web ProgrammingPHP Web Programming
PHP Web Programming
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Javascript正则表达式
Javascript正则表达式Javascript正则表达式
Javascript正则表达式
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
RegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing ExamplesRegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing Examples
 

Plus de David Stockton

Phone calls and sms from php
Phone calls and sms from phpPhone calls and sms from php
Phone calls and sms from phpDavid Stockton
 
The Art of Transduction
The Art of TransductionThe Art of Transduction
The Art of TransductionDavid Stockton
 
Using queues and offline processing to help speed up your application
Using queues and offline processing to help speed up your applicationUsing queues and offline processing to help speed up your application
Using queues and offline processing to help speed up your applicationDavid Stockton
 
Intermediate OOP in PHP
Intermediate OOP in PHPIntermediate OOP in PHP
Intermediate OOP in PHPDavid Stockton
 
Building APIs with Apigilty and Zend Framework 2
Building APIs with Apigilty and Zend Framework 2Building APIs with Apigilty and Zend Framework 2
Building APIs with Apigilty and Zend Framework 2David Stockton
 
Intermediate OOP in PHP
Intermediate OOP in PHPIntermediate OOP in PHP
Intermediate OOP in PHPDavid Stockton
 
Hacking sites for fun and profit
Hacking sites for fun and profitHacking sites for fun and profit
Hacking sites for fun and profitDavid Stockton
 
Common design patterns in php
Common design patterns in phpCommon design patterns in php
Common design patterns in phpDavid Stockton
 
Intermediate oop in php
Intermediate oop in phpIntermediate oop in php
Intermediate oop in phpDavid Stockton
 
Hacking sites for fun and profit
Hacking sites for fun and profitHacking sites for fun and profit
Hacking sites for fun and profitDavid Stockton
 
Hacking sites for fun and profit
Hacking sites for fun and profitHacking sites for fun and profit
Hacking sites for fun and profitDavid Stockton
 
Increasing code quality with code reviews (poetry version)
Increasing code quality with code reviews (poetry version)Increasing code quality with code reviews (poetry version)
Increasing code quality with code reviews (poetry version)David Stockton
 
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSHTame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSHDavid Stockton
 
Mercurial Distributed Version Control
Mercurial Distributed Version ControlMercurial Distributed Version Control
Mercurial Distributed Version ControlDavid Stockton
 

Plus de David Stockton (18)

Phone calls and sms from php
Phone calls and sms from phpPhone calls and sms from php
Phone calls and sms from php
 
The Art of Transduction
The Art of TransductionThe Art of Transduction
The Art of Transduction
 
Using queues and offline processing to help speed up your application
Using queues and offline processing to help speed up your applicationUsing queues and offline processing to help speed up your application
Using queues and offline processing to help speed up your application
 
Intermediate OOP in PHP
Intermediate OOP in PHPIntermediate OOP in PHP
Intermediate OOP in PHP
 
Building APIs with Apigilty and Zend Framework 2
Building APIs with Apigilty and Zend Framework 2Building APIs with Apigilty and Zend Framework 2
Building APIs with Apigilty and Zend Framework 2
 
API All the Things!
API All the Things!API All the Things!
API All the Things!
 
Intermediate OOP in PHP
Intermediate OOP in PHPIntermediate OOP in PHP
Intermediate OOP in PHP
 
Hacking sites for fun and profit
Hacking sites for fun and profitHacking sites for fun and profit
Hacking sites for fun and profit
 
Beginning OOP in PHP
Beginning OOP in PHPBeginning OOP in PHP
Beginning OOP in PHP
 
Common design patterns in php
Common design patterns in phpCommon design patterns in php
Common design patterns in php
 
Intermediate oop in php
Intermediate oop in phpIntermediate oop in php
Intermediate oop in php
 
Hacking sites for fun and profit
Hacking sites for fun and profitHacking sites for fun and profit
Hacking sites for fun and profit
 
Hacking sites for fun and profit
Hacking sites for fun and profitHacking sites for fun and profit
Hacking sites for fun and profit
 
Increasing code quality with code reviews (poetry version)
Increasing code quality with code reviews (poetry version)Increasing code quality with code reviews (poetry version)
Increasing code quality with code reviews (poetry version)
 
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSHTame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
 
Mercurial Distributed Version Control
Mercurial Distributed Version ControlMercurial Distributed Version Control
Mercurial Distributed Version Control
 
PHP 5 Magic Methods
PHP 5 Magic MethodsPHP 5 Magic Methods
PHP 5 Magic Methods
 
FireBug And FirePHP
FireBug And FirePHPFireBug And FirePHP
FireBug And FirePHP
 

Dernier

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Regular expressions and php

  • 1. Regular Expressions in PHP /(?:dave@davidstocktoncom)/ Front Range PHP User Group David Stockton
  • 2. What is a regular expression? A pattern used to describe a part of some text “Regular” has some implications to how it can be built, but that’s not really part of this presentation Extremely powerful and useful (And often abused)
  • 3. Regex Joke A programmer says, “I have a problem that I can solve with regular expressions.” Now, he has two problems…
  • 4. How to use regex in PHP The preg_* functions Perl compatible regular expressions. Probably the most common regex syntax The ereg_* functions POSIX style regular expressions I am not covering these functions. Don’t use the ereg ones. They are deprecated in PHP 5.3.
  • 5. How can we use regex in PHP? preg_match( ) – Searches a subject for a match preg_match_all( ) – Searches a subject for all matches preg_replace( ) – Searches a subject for a pattern and replaces it with something else preg_split( ) – Split a string into an array based on a regex delimiter preg_filter( ) – Identical to preg_replace except it returns only the matches preg_replace_callback( ) – Like preg_replace, but replacement is defined in a callback preg_grep( ) – Returns an array of array elements that match a pattern
  • 6. How can we use regex in PHP? preg_quote( ) – Quotes regular expression characters preg_last_error( ) – Returns the error code of the last PCRE (Perl Compatible Regular Expression) function execution
  • 7. How can we use regex in PHP? Those are the function calls, and we’ll play with the later. First, we need to learn how to create regex patterns since we need those for any function call.
  • 8. Starting Pattern /[A-Z0-9_+=]+@[A-Z0-9-][A-Z]{2,4}/i This matches a series of letters, numbers, plus, dash, dots, underscores and equals, followed by an “AT” (@) sign, followed by a series of letters, numbers, dots and dashes, followed by a dot, followed by 2 to 4 letters. In other words… It matches an email address… Or rather some email addresses.
  • 9. Matching Email Addresses What about james@smithsonian.museum? What about freddie@wherecanI.travel? Both of those are valid email addresses, but they fail because our patter only allows 2-4 character TLD parts for the email address. How can we match all valid email addresses and only valid email addresses?
  • 10. The “real” email address regex (?:(?:)?[ ])*(?:(?:(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ] )+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?: )?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:( ?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ])*(?:[^()<>@,;:". 00- 31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*](?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+ (?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?: (?:)?[ ])*))*|(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+| |(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:) ?[ ])*)*lt;(?:(?:)?[ ])*(?:@(?:[^()<>@,;:". 00-31]+(?:(?:(?:r)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:) ?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ] )*))*(?:,@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])* )(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ] )+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*))*) *:(?:(?:)?[ ])*)?(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+ ||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?: )?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?: )?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31 ]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*( ?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(? :(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(? :)?[ ])*))*gt;(?:(?:)?[ ])*)|(?:[^()<>@,;:". 00-31]+(?:(? :(?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)? [ ]))*"(?:(?:)?[ ])*)*:(?:(?:)?[ ])*(?:(?:(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]| .|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<> @,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|" (?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ] )*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;: ".]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(? :[^()<>@,;:". 00-
  • 11. The “real” email address regex cont. 31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:". ]))|([^]|.)*(?:(?:)?[ ])*))*|(?:[^()<>@,;:". 00- 31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]|.|( ?:(?:)?[ ]))*"(?:(?:)?[ ])*)*lt;(?:(?:)?[ ])*(?:@(?:[^()<>@,; :". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([ ^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:" . 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]]|.)*(?:(?:)?[ ])*))*(?:,@(?:(?:)?[ ])*(?:[^()<>@,;:".[ 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^r]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^] |.)*(?:(?:)?[ ])*))*)*:(?:(?:)?[ ])*)?(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|"(?:[^quot;]| .|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[^()<>@, ;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|"(? :[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ])* (?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:". ]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ])*(?:[ ^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:". ]))|([^]|.)*(?:(?:)?[ ])*))*gt;(?:(?:)?[ ])*)(?:,*( ?:(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;: ".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)(?:(?:( ?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=[ "()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*))*@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*)(? :(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+| |(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*))*|(?: [^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:".]]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)*lt;(?:(?:) ?[ ])*(?:@(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=[" ()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:) ?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<> @,;:".]))|([^]|.)*(?:(?:)?[ ])*))*(?:,@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@, ;:".]))|([^]|.)*(?:(?:)?[ ])*)(?:(?:(?:)?[ ] )*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;: ".]))|([^]|.)*(?:(?:)?[ ])*))*)*:(?:(?:)?[ ])*)? (?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=["()<>@,;:". ]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ])*)(?:(?:(?: )?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+||(?=[ "()<>@,;:".]))|"(?:[^quot;]|.|(?:(?:)?[ ]))*"(?:(?:)?[ ]) *))*@(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ]) +||(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*)(?:.(?:(?:)?[ ])*(?:[^()<>@,;:". 00-31]+(?:(?:(?:)?[ ])+| |(?=["()<>@,;:".]))|([^]|.)*(?:(?:)?[ ])*))*gt;(?:( ?:)?[ ])*))*)?;*)
  • 12. So… How do we write this? Don’t. Other much more simple patterns have been written and will match 99.9% of valid email addresses. Use something like Zend_Validate_EmailAddress
  • 13. So now the real learnin’… Letters and numbers match… letters and numbers /a/ - Matches a string that contains an “a” /7/ - Matches a string that contains a 7.
  • 14. More learnin’ Match a word /regex/ - Matches a string with the word “regex” in it You can use a pipe character to give a choice /pizza|steak|cheeseburger/ - Matches a string with any of these foods
  • 15. Delimiters The examples so far have started with / and ended with /. These are delimiters and let the regex engine know where the pattern starts and ends. You can choose another delimiter if you’d like or if it’s more convenient Match namespace: #/My/PHP/Namespace# If I used “/” in that example, I’d need to escape each of the forward slashes to differentiate them from the delimiter
  • 16. Character Matching Continued You can match a selection of characters /[Pp][Hh][Pp]/ - Matches PHP in any mixture of upper and lowercase Ranges can be defined /[abcdefghijklmnopqrstuvwxyz]/ - Matches any lowercase alpha character /[a-z]/ - Matches any lowercase alpha character
  • 17. Character Selection Ranges Ranges can be combined /[A-Za-z0-9]/ - Matches an alphanumeric character /[A-Fa-f0-9]/ - Matches any hex character Character Selection can be inversed /[^0-9]/ - Matches any non-digit character /[^ ]/ - Matches any non space character /[.!@#$%^*]/ - Matches some punctuation
  • 18. Special Characters Dot (.) matches any character /./ /../ - Matches any two characters To match an actual dot character, you must escape // - Matches a single dot character Unless it’s a character selection /[.]/ - Matches a single dot character
  • 19. Character classes means [0-9] means non-digits - [^0-9] means word characters - [A-Za-z0-9_] means non word characters – [^A-Za-z0-9_] means a whitespace character [ ] means non white space characters
  • 20. Repeating Character Classes Match two digits in a row // /[0-9][0-9]/ /{2}/ /[0-9]{2}/ Match at least one digit (but as many as it can) /+/ Match 0 to infinite digits /*/
  • 21. Repeating Character Classes cont. * means match 0 or more + means match 1 or more {x} where x is a number means match exactly x of the preceding selection {x,} means match at least x {x,y} means match between x and y {,y} means match up to y
  • 22. More special characters ? Means the preceding selection is optional Putting it together Telephone Number /?({3})?[-]?({3})[-]?({4})/ Matches 720-675-7471 or (720)675-7471 or (720) 675-7471 or 7206757471 or 720 675 7471 Find a misspelled word (and get great deals on EBay) /la[bp]topcomputer[s]?/
  • 23. Regex Anchors Anchors allow you to specify a position, like before, after or in between characters /^ab/ matches abcdefg but not cab Notice that it’s the caret character… It means start of the string in this context, but means the opposite of a character class inside the square brackets /ab$/ matches cab but not abcdefg /^[a-z]+$/ will match a string that consists only of lowercase characters
  • 24. Word Boundaries means word boundaries Before first character if first character is a word character After last character if last character is a word character Between two characters if one is a word character and the other is not /fish/ matches fish, but not fisherman or catfish. /fish/ matches fish and catfish
  • 25. Alternation /cow|boy/ - Matches cow, or boy or cowboy or coward, etc /(cow|boy)/ - Matches cow or boy but not cowboy or coward The above example also captures the matching word due to the parens. More on this later.
  • 26. Greedy vs Lazy By default, regular expressions are greedy… That is, they will match as much as they can Grab a starting html tag: /<.+>/ Matches in bold: <h1>Welcome to FRPUG</h1> Not what we want Make it lazy: /<.+?>/ Now it matches <h1>Welcome to FRPUG</h1>
  • 27. Another tag matching solution /<[^>]+>/ Literally match a less than character followed by one or more non-greater than characters followed by a greater than character This way eliminates the need for the engine to backtrack (almost certainly faster than the last example).
  • 28. Capturing part of regex (backreference) /__(construct|destruct)/ Backreference will contain either construct or destruct so you can use it later /([a-z]+)/ Matches groups of repeated characters that repeat an even number of times. Matches aa but not a. Matches aaaaa /([a-z]{3})/ Matches words like booboo or bambam
  • 29. Backreference Continued… Very useful when performing regex search and replace preg_replace('/?({3})?[-]?({3})[-]?({4})/', '() -', $phone) The above example will take any phone number from the previous example and return it formatted in (xxx) xxx-xxxx format
  • 30. More backreferences… Replace duplicated words that that have been inadvertently left in
  • 31. Non-capturing groups Match an IPv4 address /((?:{1,3}){3}{1,3})/ We’re matching 1 to 3 digits followed by a dot 3 times. We don’t care (right now) about the octets, we just want to repeat the match, so ?: says to not capture the group.
  • 32. Pattern Modifiers Modifiers go after the last delimiter (now you know why there are delimiters) and affect how the regex engine works i – case insensitive matching (matches are case-sensitive by default) m – multiline matching s - dot matches all characters, including x – ignore all whitespace characters except if escaped or in a character class
  • 33. Pattern Modifiers Continued… D – Anchor for the end of the string only, otherwise $ matches characters Allow username to be alphabetic only /^[A-Za-z]$/ - This will match daveextra stuff However, /^[A-Za-z]$/D will not match U – Invert the meaning of the greediness. With this on by default matches are lazy and ? makes it greedy. There are lots of other modifiers and you can see them at http://us2.php.net/manual/en/reference.pcre.pattern.modifiers.php
  • 34. Named Capture Groups Rather than get back a numbered array of matches, get back an associative array. If you add a new capture group, you don’t have to renumber where you use the capture group
  • 35. Named Capture Groups cont… Use (?P<named_group>pattern)
  • 36. Named Capture Groups cont… Combined numbered and associative array Capture group 0 is the wholepattern that is matched. If our string to match against was abcde720-675 7471foobar, $matches[0] will contain720-675 7471
  • 37. Positive Look Ahead Matches Look for a pattern follow by another pattern /p(?=h)/ - Match a “p” followed by an “h” but don’t include the “h”
  • 38. Negative Look Ahead Look for a pattern which is not followed by some other pattern /p(?=!h)/ - pnot followed by h.
  • 39. Look Aheads Positive and negative look aheads do not capture anything. They just determine if the pattern match is possible They are zero-width /p[^h]/ is not the same as /p(?!h)/ /ph/ is not the same as /p(?=h)/
  • 40. Look behinds Positive look behind /(?<=oo)d/ - d which is preceded by oo Matches “food”, “mood”, match only contains the “d” Negative look behind /(?<!oo)d/ - d which is not preceded by oo Matches “dude”, “crude”, and “d”
  • 41. With great power… Test your regular expressions before they go to production It’s much easier to get them wrong than to get themright if you don’t test
  • 42. When to not use regex Whenever they aren’t needed. If you can use strstr or strpos or str_replace to do the job, do that. They are much faster, much simpler and easier to do correctly. However, if you cannot use those functions, regex may be your best bet. Don’t use regex when you really need a parser