Regular Expression Crash Course

REGULAR EXPRESSION IN
ACTION
Brief overview of Regular Expression building blocks and
tools with a practical example

REGULAR EXPRESSIONS PROVIDE A CONCISE AND FLEXIBLE MEANS FOR MATCHING STRINGS OF
TEXT, SUCH AS PARTICULAR CHARACTERS, WORDS, OR PATTERNS OF CHARACTERS.
WHAT ARE REGULAR EXPRESSIONS

THE REGEX COACH IS A GRAPHICAL APPLICATION
FOR WINDOWS WHICH CAN BE USED TO
EXPERIMENT WITH REGULAR EXPRESSIONS
INTERACTIVELY.
◦ http://weitz.de/regex-coach/
Sublime Text is a text editor that has support of find and
replace using Regular Expressions.
Web based Regular Expressions tester.
◦ http://www.regular-expressions.info/

THE MOST BASIC REGULAR EXPRESSION CONSISTS OF A LITERAL
which behaves just like string matching. For e.g.
◦ cat will match cat in About cats and dogs.
Special characters known as meta characters needs to be escaped with a in
regular expressions if they are used as part of a literal:
◦ dogs.will match dogs. in About cats and dogs.
Meta characters are:
◦ [ ^ $ . | ? * + ( ) {

WITH A "CHARACTER CLASS", ALSO CALLED "CHARACTER SET", YOU CAN TELL
THE REGEX ENGINE TO MATCH ONLY ONE OUT OF SEVERAL CHARACTERS.
FOR E.G.
◦ gr[ae]ywill match grey and gray both.
Ranges can be specified using dash. For e.g.
◦ [0-9]will match any digit from 0 to 9.
◦ [0-9a-fA-F]will match any single hexadecimal digit.
Caret after the opening square bracket will negate the character class.
• The result is that the character class will match any character that is not
• in the character class. For e.g.
◦ [^0-9] will match any thing except number.
◦ q[^u] will not match Iraq but it will match Iraq is a country

Meta characters works fine without escaping in Character classes. For e.g.
◦ [+*]is a valid expression and match either * or +.
There are some pre-defined character classes known as short hand
character classes:
◦ w stands for[A-Za-z0-9_]
◦ s stands for[ trn]
◦ d stands for[0-9]
If a character class is repeated by using the ?, * or + operators, the
entire character class will be repeated, and not just the character that it
matched. For e.g.
◦ [0-9]+ can match 837 as well as 222
◦ ([0-9])1+ will match 222 but not 837.

The famous dot “.” operator matches anything. For e.g.
◦ a.b will match abb, aab, a+b etc.
^ and $ are used to match start and end of regular expressions. For e.g.
◦ ^My.*.$ will match anything starting with My and ending with a dot.
Pipe operator is used to match a string against either its left or the right
part. For e.g.
◦ (cat|dog) can match both cat or dog.
Question:
◦ If the expression is Get|GetValue|Set|SetValue and string is SetValue.
What will this match and why?
◦ What if the expression becomes Get(Value)?|Set(Value)?
* or {0,} and+ or {1,} are used to control repititions.

Round brackets besides grouping part of a regular expression
together, also create a "backreference". A backreference stores the
matching part of the string matched by the part of the regular
expression inside the parentheses. For e.g.
◦ ([0-9])1+ will match 222 but not 837.
If backreference are not required, you can optimize this regular
expression Set(?:Value)?
Backreferences can be used in expressions itself or in replacement
text. For e.g.
◦ <([A-Za-z][A-Za-z0-9]*)>.*</1>will match matching opening and closing tags.

/i makes the regex match case insensitive.
◦ [A-Z] will match A and a with this modifier.
/s enables "single-line mode". In this mode, the dot matches
newlines as well.
◦ .* will match sherazrnattari with this modifier.
/m enables "multi-line mode". In this mode, the caret and dollar
match before and after newlines in the subject string.
◦ .* will match only sherazin sherazrnattari with this modifier.
/x enables "free-spacing mode". In this mode, whitespace between
regex tokens is ignored, and an unescaped # starts a comment.
◦ #sherazrnrn.* will match only sheraz in with this modifier.

A conditional is a special construct that will first evaluate a lookaround, and then execute one
sub-regex if the lookaround succeeds, and another sub-regex if the lookaround fails.
Example of Positive lookahead is:
◦ q(?=uv*)will match q in quvvvv and qu.
Example of Negative lookahead is:
◦ q(?!uv*)will match q not followed by u and uv.
Example of Positive lookbehind is:
◦ (?<=b)awill match a prefixed by b like ba.
Example of Negative lookbehind is:
◦ (?<!b)awill match a not prefixed by b like ca and da etc.

abc… Letters
123… Digits
d Any Digit
D Any Non-digit character
. Any Character
. Period
[abc] Only a, b, or c
[^abc]Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
w Any Alphanumeric character
W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
s Any Whitespace
S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(abc|def) Matches abc or def

Most of the content is taken from
http://www.regular-expressions.info/
THANK YOU!

Regular Expression Crash Course

Recommandé

Recommandé

Contenu connexe

Similaire à Regular Expression Crash Course

Similaire à Regular Expression Crash Course (20)

Dernier

Dernier (20)

Regular Expression Crash Course