SlideShare une entreprise Scribd logo
1  sur  34
Introduction to Regular
      Expressions

      Ben Brumfield
     RootsTech 2013
Our Texts
• My Texts
  – Manuscript Transcripts
  – OCR
• Our Texts
  – Name Variants
  – Abbreviations
  – Spelling Changes
  – “Mistakes”
What are Regular Expressions?
• Very small language for describing text.
• Not a programming language.
• Incredibly powerful tool for search/replace
  operations.
• Old (1950s-60s)
• Arcane art.
• Ubiquitous.
Why Use Regular Expressions?
• Finding every instance of a string in a file
  – i.e. every mention of “chickens” in a farm
  diary
• How many times does “sing” appear in a
  text in all tenses and conjugations?
• Reformatting dirty data
• Validating input.
• Command line work – listing files,
  grepping log files
The Basics
• A regex is a pattern enclosed within
  delimiters.
• Most characters match themselves.
• /rootstech/ is a regular expression that
  matches “rootstech”.
  – Slash is the delimiter enclosing the
    expression.
  – “rootstech” is the pattern.
/at/
• Matches strings with     at     hat
  “a” followed by “t”.


                           that   atlas



                           aft    Athens
/at/
• Matches strings with     at     hat
  “a” followed by “t”.


                           that   atlas



                           aft    Athens
Some Theory
• Finite State Machine for the regex /at/
Characters
• Matching is case sensitive.
• Special characters: ( ) ^ $ { } [ ]  | . + ? *
• To match a special character in your text,
  precede it with  in your pattern:
  – /snarky [sic]/ does not match “snarky [sic]”
  – /snarky [sic]/ matches “snarky [sic]”
• Regular expressions can support Unicode.
Character Classes
• Characters within [ ] are choices for a
  single-character match.
• Think of a set operation, or a type of or.
• Order within the set is unimportant.
• /x[01]/ matches “x0” and “x1”.
• /[10][23]/ matches “02”, “03”, “12” and
  “13”.
• Initial^ negates the class:
  – /[^45]/ matches all characters except 4 or 5.
/[ch]at/
• Matches strings with      that   at
  “c” or “h”, followed by
  “a”, followed by “t”.
                            chat   cat



                            fat    phat
/[ch]at/
• Matches strings with      that   at
  “c” or “h”, followed by
  “a”, followed by “t”.
                            chat   cat



                            fat    phat
Ranges
• Ranges define sets of characters within a
  class.
  – /[1-9]/ matches any non-zero digit.
  – /[a-zA-Z]/ matches any letter.
  – /[12][0-9]/ matches numbers between 10 and
    29.
Shortcuts
Shortcut Name           Equivalent Class
  d    digit           [0-9]
  D    not digit       [^0-9]
  w    word            [a-zA-Z0-9_]
  W    not word        [^a-zA-Z0-9_]
  s    space           [tnrfv ]
  S    not space       [^tnrfv ]
   .    everything      [^n] (depends on mode)
/ddd[- ]dddd/
• Matches strings with:   501-1234   234 1252
  – Three digits
  – Space or dash
  – Four digits           652.2648   713-342-7452



                          PE6-5000   653-6464x256
/ddd[- ]dddd/
• Matches strings with:   501-1234   234 1252
  – Three digits
  – Space or dash
  – Four digits           652.2648   713-342-7452



                          PE6-5000   653-6464x256
Repeaters
• Symbols indicating       Repeater   Count
  that the preceding           ?      zero or one
  element of the pattern       +      one or more
  can repeat.
                               *      zero or more
• /runs?/ matches runs
  or run                      {n}     exactly n
• /1d*/ matches any         {n,m}    between n and
                                      m times
  number beginning
  with “1”.                   {,m}    no more than m
                                      times
                              {n,}    at least n times
Repeaters
Strings:                     Repeater   Count
1: “at”       2: “art”           ?      zero or one
3: “arrrrt”   4: “aft”           +      one or more
                                 *      zero or more
Patterns:                       {n}     exactly n
A: /ar?t/     B: /a[fr]?t/     {n,m}    between n and
C: /ar*t/     D: /ar+t/                 m times

E: /a.*t/     F: /a.+t/         {,m}    no more than m
                                        times
                                {n,}    at least n times
Repeaters
•   /ar?t/ matches “at” and “art” but not “arrrt”.
•   /a[fr]?t/ matches “at”, “art”, and “aft”.
•   /ar*t/ matches “at”, “art”, and “arrrrt”
•   /ar+t/ matches “art” and “arrrt” but not “at”.
•   /a.*t/ matches anything with an ‘a’
    eventually followed by a ‘t’.
Lab Session I
Try this URL:


tinyurl.com/rootstechlab
Lab Session I
Match “Brumfield” and “Bromfield” in

1702 John Bromfield's estate had been
  proved in Isle of Wight prior to 1702,
Anne Brumfield rec'd. more than her share
  from her father's estate.
Lab Reference
Repeater   Count              Shortcut   Name
    ?      zero or one            d     digit
    +      one or more
                                  D     not digit
    *      zero or more
                                  w     word
   {n}     exactly n times
  {n,m}    between n and          W     not word
           m times                s     space
   {,m}    no more than m         S     not space
           times
   {n,}    at least n times        .     everything
Anchors
• Anchors match            Anchor Matches
  between characters.        ^    start of line
• Used to assert that        $    end of line
  the characters you’re
                             b    word boundary
  matching must
  appear in a certain        B    not boundary
  place.                     A    start of string
• /batb/ matches “at       Z    end of string
  work” but not “batch”.     z    raw end of
                                   string (rare)
Alternation
• In Regex, | means “or”.
• You can put a full expression on the left
  and another full expression on the right.
• Either can match.
• /seeks?|sought/ matches “seek”, “seeks”,
  or “sought”.
Grouping
• Everything within ( … ) is grouped into a
  single element for the purposes of
  repetition and alternation.
• The expression /(la)+/ matches “la”, “lala”,
  “lalalala” but not “all”.
• /schema(ta)?/ matches “schema” and
  “schemata” but not “schematic”.
Grouping Example
• What regular expression matches “eat”,
  “eats”, “ate” and “eaten”?
Grouping Example
• What regular expression matches “eat”,
  “eats”, “ate” and “eaten”?
• /eat(s|en)?|ate/

• Add word boundary anchors to exclude
  “sate” and “eating”: /b(eat(s|en)?|ate)b/
Lab Session II
Match “William” and “Wm.” in

1736 Robert Mosby and John Brumfield
  processioned the lands of Wm. Brittain
1739 … Witnesses: Richard Echols, William
  Brumfield, John Hendrick
Replacement
• Regex most often used for search/replace
• Syntax varies; most scripting languages
  and CLI tools use s/pattern/replacement/ .
• s/dog/hound/ converts “slobbery dogs” to
  “slobbery hounds”.
• s/bsheepsb/sheep/ converts
  – “sheepskin is made from sheeps” to
  – “sheepskin is made from sheep”
Capture
• During searches, ( … ) groups capture
  patterns for use in replacement.
• Special variables $1, $2, $3 etc. contain
  the capture.
• /(ddd)-(dddd)/    “123-4567”
  – $1 contains “123”
  – $2 contains “4567”
Capture
• How do you convert
  – “Smith, James” and “Jones, Sally” to
  – “James Smith” and “Sally Jones”?
Capture
• How do you convert
  – “Smith, James” and “Jones, Sally” to
  – “James Smith” and “Sally Jones”?
• s/(w+), (w+)/$2 $1/
Caveats
• Check the language/application-specific
  documentation: some common shortcuts
  are not universal.
Questions

Ben Brumfield
benwbrum@gmail.com
FromThePage.com
ManuscriptTranscription.blogspot.com

Contenu connexe

Tendances

Regular Expressions grep and egrep
Regular Expressions grep and egrepRegular Expressions grep and egrep
Regular Expressions grep and egrepTri Truong
 
Regular Expressions 101
Regular Expressions 101Regular Expressions 101
Regular Expressions 101Raj Rajandran
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regexJalpesh Vasa
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsBrij Kishore
 
Regular expression
Regular expressionRegular expression
Regular expressionLarry Nung
 
Regular Expression
Regular ExpressionRegular Expression
Regular ExpressionLambert Lum
 
The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++Anjesh Tuladhar
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaj Gupta
 
Regular Expression
Regular ExpressionRegular Expression
Regular ExpressionBharat17485
 
Regular Expression (Regex) Fundamentals
Regular Expression (Regex) FundamentalsRegular Expression (Regex) Fundamentals
Regular Expression (Regex) FundamentalsMesut Günes
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expressionGagan019
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeProf. Wim Van Criekinge
 
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionEloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionKuyseng Chhoeun
 

Tendances (20)

Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regular Expressions grep and egrep
Regular Expressions grep and egrepRegular Expressions grep and egrep
Regular Expressions grep and egrep
 
Regular Expressions 101
Regular Expressions 101Regular Expressions 101
Regular Expressions 101
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014
 
Regular Expression (Regex) Fundamentals
Regular Expression (Regex) FundamentalsRegular Expression (Regex) Fundamentals
Regular Expression (Regex) Fundamentals
 
Unix
UnixUnix
Unix
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expression
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
 
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionEloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
 

En vedette

Mobile app design 2010
Mobile app design 2010Mobile app design 2010
Mobile app design 2010Baidu
 
MARISSA_thesis1_12.11.2011
MARISSA_thesis1_12.11.2011MARISSA_thesis1_12.11.2011
MARISSA_thesis1_12.11.2011mwendolo
 
Marsss!!!!
Marsss!!!!Marsss!!!!
Marsss!!!!Radevski
 
Mobile interface design for color blind user
Mobile interface design for color blind userMobile interface design for color blind user
Mobile interface design for color blind userBaidu
 
MCN2011 Crowdsourcing Transcription
MCN2011 Crowdsourcing TranscriptionMCN2011 Crowdsourcing Transcription
MCN2011 Crowdsourcing TranscriptionBen Brumfield
 
171326626 gambaran-pengetahuan-sikap-dan-tindakan-penderita-hipertensi-dalam-...
171326626 gambaran-pengetahuan-sikap-dan-tindakan-penderita-hipertensi-dalam-...171326626 gambaran-pengetahuan-sikap-dan-tindakan-penderita-hipertensi-dalam-...
171326626 gambaran-pengetahuan-sikap-dan-tindakan-penderita-hipertensi-dalam-...Jeffry Shin
 

En vedette (8)

Atsu-hime
Atsu-himeAtsu-hime
Atsu-hime
 
Mobile app design 2010
Mobile app design 2010Mobile app design 2010
Mobile app design 2010
 
MARISSA_thesis1_12.11.2011
MARISSA_thesis1_12.11.2011MARISSA_thesis1_12.11.2011
MARISSA_thesis1_12.11.2011
 
Marsss!!!!
Marsss!!!!Marsss!!!!
Marsss!!!!
 
Mobile interface design for color blind user
Mobile interface design for color blind userMobile interface design for color blind user
Mobile interface design for color blind user
 
MCN2011 Crowdsourcing Transcription
MCN2011 Crowdsourcing TranscriptionMCN2011 Crowdsourcing Transcription
MCN2011 Crowdsourcing Transcription
 
Ne water powerpoint
Ne water powerpointNe water powerpoint
Ne water powerpoint
 
171326626 gambaran-pengetahuan-sikap-dan-tindakan-penderita-hipertensi-dalam-...
171326626 gambaran-pengetahuan-sikap-dan-tindakan-penderita-hipertensi-dalam-...171326626 gambaran-pengetahuan-sikap-dan-tindakan-penderita-hipertensi-dalam-...
171326626 gambaran-pengetahuan-sikap-dan-tindakan-penderita-hipertensi-dalam-...
 

Similaire à Introduction to Regular Expressions (Regex) Basics

Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex powerMax Kleiner
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsJames Gray
 
Looking for Patterns
Looking for PatternsLooking for Patterns
Looking for PatternsKeith Wright
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secretsHiro Asari
 
Regular Expressions Boot Camp
Regular Expressions Boot CampRegular Expressions Boot Camp
Regular Expressions Boot CampChris Schiffhauer
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and YouJames Armes
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeBertram Ludäscher
 
Php Chapter 4 Training
Php Chapter 4 TrainingPhp Chapter 4 Training
Php Chapter 4 TrainingChris Chubb
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Ahmed El-Arabawy
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionProf. Wim Van Criekinge
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfBryan Alejos
 
Lecture 23
Lecture 23Lecture 23
Lecture 23rhshriva
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressionsYamagata Europe
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP StringsAhmed Swilam
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...Codemotion
 

Similaire à Introduction to Regular Expressions (Regex) Basics (20)

Working with text, Regular expressions
Working with text, Regular expressionsWorking with text, Regular expressions
Working with text, Regular expressions
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex power
 
regex.ppt
regex.pptregex.ppt
regex.ppt
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Looking for Patterns
Looking for PatternsLooking for Patterns
Looking for Patterns
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
 
Regular Expressions Boot Camp
Regular Expressions Boot CampRegular Expressions Boot Camp
Regular Expressions Boot Camp
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and You
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
 
Php Chapter 4 Training
Php Chapter 4 TrainingPhp Chapter 4 Training
Php Chapter 4 Training
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
 
Lecture 23
Lecture 23Lecture 23
Lecture 23
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressions
 
Perl_Tutorial_v1
Perl_Tutorial_v1Perl_Tutorial_v1
Perl_Tutorial_v1
 
Perl_Tutorial_v1
Perl_Tutorial_v1Perl_Tutorial_v1
Perl_Tutorial_v1
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 

Dernier

VIP 7001035870 Find & Meet Hyderabad Call Girls Banjara Hills high-profile Ca...
VIP 7001035870 Find & Meet Hyderabad Call Girls Banjara Hills high-profile Ca...VIP 7001035870 Find & Meet Hyderabad Call Girls Banjara Hills high-profile Ca...
VIP 7001035870 Find & Meet Hyderabad Call Girls Banjara Hills high-profile Ca...aditipandeya
 
High Profile Call Girls Kolkata Gayatri 🤌 8250192130 🚀 Vip Call Girls Kolkata
High Profile Call Girls Kolkata Gayatri 🤌  8250192130 🚀 Vip Call Girls KolkataHigh Profile Call Girls Kolkata Gayatri 🤌  8250192130 🚀 Vip Call Girls Kolkata
High Profile Call Girls Kolkata Gayatri 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Shamshabad high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Shamshabad high-profile Call ...VIP 7001035870 Find & Meet Hyderabad Call Girls Shamshabad high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Shamshabad high-profile Call ...aditipandeya
 
VIP Kolkata Call Girl Rishra 👉 8250192130 Available With Room
VIP Kolkata Call Girl Rishra 👉 8250192130  Available With RoomVIP Kolkata Call Girl Rishra 👉 8250192130  Available With Room
VIP Kolkata Call Girl Rishra 👉 8250192130 Available With Roomdivyansh0kumar0
 
Russian Call Girls Kolkata Amaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Amaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Amaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Amaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
slideshare Call girls Noida Escorts 9999965857 henakhan
slideshare Call girls Noida Escorts 9999965857 henakhanslideshare Call girls Noida Escorts 9999965857 henakhan
slideshare Call girls Noida Escorts 9999965857 henakhanhanshkumar9870
 
BDSM⚡Call Girls in Hari Nagar Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Hari Nagar Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Hari Nagar Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Hari Nagar Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Sustainability Leadership, April 26 2024
Sustainability Leadership, April 26 2024Sustainability Leadership, April 26 2024
Sustainability Leadership, April 26 2024TeckResourcesLtd
 
Top Rated Call Girls In Podanur 📱 {7001035870} VIP Escorts Podanur
Top Rated Call Girls In Podanur 📱 {7001035870} VIP Escorts PodanurTop Rated Call Girls In Podanur 📱 {7001035870} VIP Escorts Podanur
Top Rated Call Girls In Podanur 📱 {7001035870} VIP Escorts Podanurdharasingh5698
 

Dernier (20)

Rohini Sector 17 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 17 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 17 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 17 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls Service Green Park @9999965857 Delhi 🫦 No Advance VVIP 🍎 SERVICE
Call Girls Service Green Park @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SERVICECall Girls Service Green Park @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SERVICE
Call Girls Service Green Park @9999965857 Delhi 🫦 No Advance VVIP 🍎 SERVICE
 
Rohini Sector 15 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 15 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 15 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 15 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Banjara Hills high-profile Ca...
VIP 7001035870 Find & Meet Hyderabad Call Girls Banjara Hills high-profile Ca...VIP 7001035870 Find & Meet Hyderabad Call Girls Banjara Hills high-profile Ca...
VIP 7001035870 Find & Meet Hyderabad Call Girls Banjara Hills high-profile Ca...
 
High Profile Call Girls Kolkata Gayatri 🤌 8250192130 🚀 Vip Call Girls Kolkata
High Profile Call Girls Kolkata Gayatri 🤌  8250192130 🚀 Vip Call Girls KolkataHigh Profile Call Girls Kolkata Gayatri 🤌  8250192130 🚀 Vip Call Girls Kolkata
High Profile Call Girls Kolkata Gayatri 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Preet Vihar (Delhi) 9953330565 Escorts, Call Girls Services
Preet Vihar (Delhi) 9953330565 Escorts, Call Girls ServicesPreet Vihar (Delhi) 9953330565 Escorts, Call Girls Services
Preet Vihar (Delhi) 9953330565 Escorts, Call Girls Services
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Shamshabad high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Shamshabad high-profile Call ...VIP 7001035870 Find & Meet Hyderabad Call Girls Shamshabad high-profile Call ...
VIP 7001035870 Find & Meet Hyderabad Call Girls Shamshabad high-profile Call ...
 
(👉゚9999965857 ゚)👉 VIP Call Girls Greater Noida 👉 Delhi 👈 : 9999 Cash Payment...
(👉゚9999965857 ゚)👉 VIP Call Girls Greater Noida  👉 Delhi 👈 : 9999 Cash Payment...(👉゚9999965857 ゚)👉 VIP Call Girls Greater Noida  👉 Delhi 👈 : 9999 Cash Payment...
(👉゚9999965857 ゚)👉 VIP Call Girls Greater Noida 👉 Delhi 👈 : 9999 Cash Payment...
 
VIP Kolkata Call Girl Rishra 👉 8250192130 Available With Room
VIP Kolkata Call Girl Rishra 👉 8250192130  Available With RoomVIP Kolkata Call Girl Rishra 👉 8250192130  Available With Room
VIP Kolkata Call Girl Rishra 👉 8250192130 Available With Room
 
Sensual Moments: +91 9999965857 Independent Call Girls Noida Delhi {{ Monika}...
Sensual Moments: +91 9999965857 Independent Call Girls Noida Delhi {{ Monika}...Sensual Moments: +91 9999965857 Independent Call Girls Noida Delhi {{ Monika}...
Sensual Moments: +91 9999965857 Independent Call Girls Noida Delhi {{ Monika}...
 
Russian Call Girls Kolkata Amaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Amaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Amaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Amaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
slideshare Call girls Noida Escorts 9999965857 henakhan
slideshare Call girls Noida Escorts 9999965857 henakhanslideshare Call girls Noida Escorts 9999965857 henakhan
slideshare Call girls Noida Escorts 9999965857 henakhan
 
Russian Call Girls Rohini Sector 22 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 22 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Rohini Sector 22 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 22 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Call Girls 🫤 Nehru Place ➡️ 9999965857 ➡️ Delhi 🫦 Russian Escorts FULL ENJOY
Call Girls 🫤 Nehru Place ➡️ 9999965857  ➡️ Delhi 🫦  Russian Escorts FULL ENJOYCall Girls 🫤 Nehru Place ➡️ 9999965857  ➡️ Delhi 🫦  Russian Escorts FULL ENJOY
Call Girls 🫤 Nehru Place ➡️ 9999965857 ➡️ Delhi 🫦 Russian Escorts FULL ENJOY
 
BDSM⚡Call Girls in Hari Nagar Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Hari Nagar Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Hari Nagar Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Hari Nagar Delhi >༒8448380779 Escort Service
 
Call Girls In Kalkaji 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Kalkaji 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In Kalkaji 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Kalkaji 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Call Girls In South Delhi 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Delhi 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Delhi 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Delhi 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Sustainability Leadership, April 26 2024
Sustainability Leadership, April 26 2024Sustainability Leadership, April 26 2024
Sustainability Leadership, April 26 2024
 
Call Girls In Vasant Kunj 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Vasant Kunj 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In Vasant Kunj 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Vasant Kunj 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Top Rated Call Girls In Podanur 📱 {7001035870} VIP Escorts Podanur
Top Rated Call Girls In Podanur 📱 {7001035870} VIP Escorts PodanurTop Rated Call Girls In Podanur 📱 {7001035870} VIP Escorts Podanur
Top Rated Call Girls In Podanur 📱 {7001035870} VIP Escorts Podanur
 

Introduction to Regular Expressions (Regex) Basics

  • 1. Introduction to Regular Expressions Ben Brumfield RootsTech 2013
  • 2. Our Texts • My Texts – Manuscript Transcripts – OCR • Our Texts – Name Variants – Abbreviations – Spelling Changes – “Mistakes”
  • 3. What are Regular Expressions? • Very small language for describing text. • Not a programming language. • Incredibly powerful tool for search/replace operations. • Old (1950s-60s) • Arcane art. • Ubiquitous.
  • 4. Why Use Regular Expressions? • Finding every instance of a string in a file – i.e. every mention of “chickens” in a farm diary • How many times does “sing” appear in a text in all tenses and conjugations? • Reformatting dirty data • Validating input. • Command line work – listing files, grepping log files
  • 5. The Basics • A regex is a pattern enclosed within delimiters. • Most characters match themselves. • /rootstech/ is a regular expression that matches “rootstech”. – Slash is the delimiter enclosing the expression. – “rootstech” is the pattern.
  • 6. /at/ • Matches strings with at hat “a” followed by “t”. that atlas aft Athens
  • 7. /at/ • Matches strings with at hat “a” followed by “t”. that atlas aft Athens
  • 8. Some Theory • Finite State Machine for the regex /at/
  • 9. Characters • Matching is case sensitive. • Special characters: ( ) ^ $ { } [ ] | . + ? * • To match a special character in your text, precede it with in your pattern: – /snarky [sic]/ does not match “snarky [sic]” – /snarky [sic]/ matches “snarky [sic]” • Regular expressions can support Unicode.
  • 10. Character Classes • Characters within [ ] are choices for a single-character match. • Think of a set operation, or a type of or. • Order within the set is unimportant. • /x[01]/ matches “x0” and “x1”. • /[10][23]/ matches “02”, “03”, “12” and “13”. • Initial^ negates the class: – /[^45]/ matches all characters except 4 or 5.
  • 11. /[ch]at/ • Matches strings with that at “c” or “h”, followed by “a”, followed by “t”. chat cat fat phat
  • 12. /[ch]at/ • Matches strings with that at “c” or “h”, followed by “a”, followed by “t”. chat cat fat phat
  • 13. Ranges • Ranges define sets of characters within a class. – /[1-9]/ matches any non-zero digit. – /[a-zA-Z]/ matches any letter. – /[12][0-9]/ matches numbers between 10 and 29.
  • 14. Shortcuts Shortcut Name Equivalent Class d digit [0-9] D not digit [^0-9] w word [a-zA-Z0-9_] W not word [^a-zA-Z0-9_] s space [tnrfv ] S not space [^tnrfv ] . everything [^n] (depends on mode)
  • 15. /ddd[- ]dddd/ • Matches strings with: 501-1234 234 1252 – Three digits – Space or dash – Four digits 652.2648 713-342-7452 PE6-5000 653-6464x256
  • 16. /ddd[- ]dddd/ • Matches strings with: 501-1234 234 1252 – Three digits – Space or dash – Four digits 652.2648 713-342-7452 PE6-5000 653-6464x256
  • 17. Repeaters • Symbols indicating Repeater Count that the preceding ? zero or one element of the pattern + one or more can repeat. * zero or more • /runs?/ matches runs or run {n} exactly n • /1d*/ matches any {n,m} between n and m times number beginning with “1”. {,m} no more than m times {n,} at least n times
  • 18. Repeaters Strings: Repeater Count 1: “at” 2: “art” ? zero or one 3: “arrrrt” 4: “aft” + one or more * zero or more Patterns: {n} exactly n A: /ar?t/ B: /a[fr]?t/ {n,m} between n and C: /ar*t/ D: /ar+t/ m times E: /a.*t/ F: /a.+t/ {,m} no more than m times {n,} at least n times
  • 19. Repeaters • /ar?t/ matches “at” and “art” but not “arrrt”. • /a[fr]?t/ matches “at”, “art”, and “aft”. • /ar*t/ matches “at”, “art”, and “arrrrt” • /ar+t/ matches “art” and “arrrt” but not “at”. • /a.*t/ matches anything with an ‘a’ eventually followed by a ‘t’.
  • 20. Lab Session I Try this URL: tinyurl.com/rootstechlab
  • 21. Lab Session I Match “Brumfield” and “Bromfield” in 1702 John Bromfield's estate had been proved in Isle of Wight prior to 1702, Anne Brumfield rec'd. more than her share from her father's estate.
  • 22. Lab Reference Repeater Count Shortcut Name ? zero or one d digit + one or more D not digit * zero or more w word {n} exactly n times {n,m} between n and W not word m times s space {,m} no more than m S not space times {n,} at least n times . everything
  • 23. Anchors • Anchors match Anchor Matches between characters. ^ start of line • Used to assert that $ end of line the characters you’re b word boundary matching must appear in a certain B not boundary place. A start of string • /batb/ matches “at Z end of string work” but not “batch”. z raw end of string (rare)
  • 24. Alternation • In Regex, | means “or”. • You can put a full expression on the left and another full expression on the right. • Either can match. • /seeks?|sought/ matches “seek”, “seeks”, or “sought”.
  • 25. Grouping • Everything within ( … ) is grouped into a single element for the purposes of repetition and alternation. • The expression /(la)+/ matches “la”, “lala”, “lalalala” but not “all”. • /schema(ta)?/ matches “schema” and “schemata” but not “schematic”.
  • 26. Grouping Example • What regular expression matches “eat”, “eats”, “ate” and “eaten”?
  • 27. Grouping Example • What regular expression matches “eat”, “eats”, “ate” and “eaten”? • /eat(s|en)?|ate/ • Add word boundary anchors to exclude “sate” and “eating”: /b(eat(s|en)?|ate)b/
  • 28. Lab Session II Match “William” and “Wm.” in 1736 Robert Mosby and John Brumfield processioned the lands of Wm. Brittain 1739 … Witnesses: Richard Echols, William Brumfield, John Hendrick
  • 29. Replacement • Regex most often used for search/replace • Syntax varies; most scripting languages and CLI tools use s/pattern/replacement/ . • s/dog/hound/ converts “slobbery dogs” to “slobbery hounds”. • s/bsheepsb/sheep/ converts – “sheepskin is made from sheeps” to – “sheepskin is made from sheep”
  • 30. Capture • During searches, ( … ) groups capture patterns for use in replacement. • Special variables $1, $2, $3 etc. contain the capture. • /(ddd)-(dddd)/ “123-4567” – $1 contains “123” – $2 contains “4567”
  • 31. Capture • How do you convert – “Smith, James” and “Jones, Sally” to – “James Smith” and “Sally Jones”?
  • 32. Capture • How do you convert – “Smith, James” and “Jones, Sally” to – “James Smith” and “Sally Jones”? • s/(w+), (w+)/$2 $1/
  • 33. Caveats • Check the language/application-specific documentation: some common shortcuts are not universal.