SlideShare une entreprise Scribd logo
1  sur  45
/Regular Expressions/

        In Java
Credits
• The Java Tutorials: Regular Expressions
• docs.oracle.com/javase/tutorial
  /essential/regex/
Regex
• Regular expressions are a way to describe a
  set of strings based on common
  characteristics shared by each string in the set.
• They can be used to search, edit, or
  manipulate text and data.
• They are created with a specific syntax.
Regex in Java
• Regex in Java is similar to Perl
• The java.util.regex package primarily consists
  of three classes: Pattern, Matcher,
  and PatternSyntaxException.
Pattern & PatternSyntaxException
• You can think of this as the regular expression
  wrapper object.
• You get a Pattern by calling:
  – Pattern.compile(“RegularExpressionString”);
• If your “RegularExpressionString” is invalid,
  you will get the PatternSyntaxException.
Matcher
• You can think of this as the search result
  object.
• You can get a matcher object by calling:
  – myPattern.matcher(“StringToBeSearched”);
• You use it by calling:
  – myMatcher.find()
• Then call any number of methods on
  myMatcher to see attributes of the result.
Regex Test Harness
• The tutorials give a test harness that uses the
  Console class. It doesn’t work in any IDE.
• So I rewrote it to use Basic I/O
It’s time for…

CODE DEMO
Regex
• Test harness output example.
• Input is given in Bold.

Enter your regex: foo
Enter input string to search: foofoo
Found ‘foo’ at index 0, ending at index 3.
Found ‘foo’ at index 3, ending at index 6.
Indexing
Metacharacters
• <([{^-=$!|]})?*+.>
• Precede a metacharacter with a ‘’ to treat it
  as a ordinary character.
• Or use Q and E to begin and end a literal
  quote.
Metacharacters
Enter your regex: cat.
Enter input string to search: cats
Found ‘cats’ at index 0, ending at index 4.
Character Classes
Construct               Description
[abc]                   a, b, or c (simple class)
                        Any character except a, b, or c
[^abc]
                        (negation)
                        a through z, or A through Z, inclusive
[a-zA-Z]
                        (range)
                        a through d, OR m through p: [a-dm-p]
[a-d[m-p]]
                        (union)
[a-z&&[def]]            d, e, f (intersection)
                        a through z, except for b and c: [ad-z]
[a-z&&[^bc]]
                        (subtraction)
                        a through z, and not m through p: [a-lq-
[a-z&&[^m-p]]
                        z] (subtraction)
Character Class
Enter your regex: [bcr]at
Enter input string to search: rat
I found the text "rat" starting at index 0 and
ending at index 3.

Enter input string to search: cat
Found "cat" at index 0, ending at index 3.
Character Class: Negation
Enter your regex: [^bcr]at
Enter input string to search: rat
No match found.

Enter input string to search: hat
Found "hat" at index 0, ending at index 3.
Character Class: Range
Enter your regex: foo[1-5]
Enter input string to search: foo5
Found "foo5" at index 0, ending at index 4.

Enter input string to search: foo6
No match found.
Character Class: Union
Enter your regex: [0-4[6-8]]
Enter input string to search: 0
Found "0" at index 0, ending at index 1.

Enter input string to search: 5
No match found.

Enter input string to search: 6
Found "6" starting at index 0, ending at index 1.
Character Class: Intersection
Enter your regex: [0-9&&[345]]
Enter input string to search: 5
Found "5" at index 0, ending at index 1.

Enter input string to search: 2
No match found.
Character Class: Subtraction
Enter your regex: [0-9&&[^345]]
Enter input string to search: 5
No match found.
Predefined Character Classes
Construct           Description
                    Any character (may or may not match line
.
                    terminators)
d                  A digit: [0-9]
D                  A non-digit: [^0-9]
s                  A whitespace character: [ tnx0Bfr]
S                  A non-whitespace character: [^s]
w                  A word character: [a-zA-Z_0-9]
W                  A non-word character: [^w]
Predefined Character Classes (cont.)
• To summarize:
  – d matches all digits
  – s matches spaces
  – w matches word characters
• Whereas a capital letter is the opposite:
  – D matches non-digits
  – S matches non-spaces
  – W matches non-word characters
Quantifiers
Greedy   Reluctant     Possessive   Meaning
X?       X??           X?+          X, once or not at all
                                    X, zero or more
X*       X*?           X*+
                                    times
                                    X, one or more
X+       X+?           X++
                                    times
X{n}     X{n}?         X{n}+        X, exactly n times
X{n,}    X{n,}?        X{n,}+       X, at least n times
                                    X, at least n but not
X{n,m}   X{n,m}?       X{n,m}+
                                    more than m times
Ignore Greedy, Reluctant, and
         Possessive
           For now.
Zero Length Match
• In the regexes ‘a?’ and ‘a*’ each allow for zero
  occurrences of the letter a.

Enter your regex: a*
Enter input string to search: aa
Found “aa" at index 0, ending at index 2.
Found “” at index 2, ending at index 2.
Quatifiers: Exact
Enter your regex: a{3}
Enter input string to search: aa
No match found.

Enter input string to search: aaaa
Found "aaa" at index 0, ending at index 3.
Quantifiers: At Least, No Greater
Enter your regex: a{3,}
Enter input string to search: aaaaaaaaa
Found "aaaaaaaaa" at index 0, ending at index 9.

Enter your regex: a{3,6}
Enter input string to search: aaaaaaaaa
Found "aaaaaa" at index 0, ending at index 6.
Found "aaa" at index 6, ending at index 9.
Quantifiers
• "abc+"
  – Means "a, followed by b, followed by (c one or
    more times)".
  – “abcc” = match!, “abbc” = no match
• “*abc++”
  – Means “(a, b, or c) one or more times)
  – “bba” = match!
Greedy, Reluctant, and Possessive
• Greedy
  – The whole input is validated, end characters are
    consecutively left off as needed
• Reluctant
  – No input is validated, beginning characters are
    consecutively added as needed
• Possessive
  – The whole input is validated, no retries are made
Greedy
Enter your regex: .*foo
Enter input string to search: xfooxxxxxxfoo
Found "xfooxxxxxxfoo" at index 0, ending at
index 13.
Reluctant
Enter your regex: .*?foo
Enter input string to search: xfooxxxxxxfoo
Found "xfoo" at index 0, ending at index 4.
Found "xxxxxxfoo" at index 4, ending at index
13.
Possessive
Enter your regex: .*+foo
Enter input string to search: xfooxxxxxxfoo
No match found.
Capturing Group
• Capturing groups are a way to treat multiple
  characters as a single unit.
• They are created by placing the characters to
  be grouped inside a set of parentheses.
• “(dog)”
  – Means a single group containing the letters "d"
    "o" and "g".
Capturing Group w/ Quantifiers
• (abc)+
  – Means "abc" one or more times
Capturing Groups: Numbering
• ((A)(B(C)))
  1.   ((A)(B(C)))
  2.   (A)
  3.   (B(C))
  4.   (C)
• The index is based on the opening
  parentheses.
Capturing Groups: Numbering Usage
• Some Matcher methods accept a group
  number as a parameter:
• int start(int group)
• int end (int group)
• String group (int group)
Capturing Groups: Backreferences
• The section of input matching the capturing
  group is saved for recall via backreference.
• Specify a backreference with ‘’ followed by
  the group number.
• ’(dd)’
  – Can be recalled with the expression ‘1’.
Capturing Groups: Backreferences
Enter your regex: (dd)1
Enter input string to search: 1212
Found "1212" at index 0, ending at index 4.

Enter input string to search: 1234
No match found.
Boundary Matchers
Boundary Construct           Description
^                            The beginning of a line
$                            The end of a line
b                           A word boundary
B                           A non-word boundary
A                           The beginning of the input
G                           The end of the previous match
                             The end of the input but for the final
Z
                             terminator, if any
z                           The end of the input
Boundary Matchers
Enter your regex: ^dog$
Enter input string to search: dog
Found "dog" at index 0, ending at index 3.

Enter your regex: ^dogw*
Enter input string to search: dogblahblah
Found "dogblahblah" at index 0, ending at index
11.
Boundary Matchers (cont.)
Enter your regex: bdogb
Enter input string to search: The doggie
plays in the yard.
No match found.

Enter your regex: Gdog
Enter input string to search: dog dog
Found "dog" at index 0, ending at index 3.
Pattern Class (cont.)
• There are a number of flags that can be
  passed to the ‘compile’ method.
• Embeddable flag expressions are Java-specific
  regex that duplicates these compile flags.
• Check out ‘matches’, ‘split’, and ‘quote’
  methods as well.
Matcher Class (cont.)
• The Matcher class can slice input a multitude
  of ways:
  – Index methods give the position of matches
  – Study methods give boolean results to queries
  – Replacement methods let you edit input
PatternSyntaxException (cont.)
• You get a little more than just an error
  message from the PatternSyntaxException.
• Check out the following methods:
  – public String getDescription()
  – public int getIndex()
  – public String getPattern()
  – public String getMessage()
The End$

Contenu connexe

Tendances

Python advanced 2. regular expression in python
Python advanced 2. regular expression in pythonPython advanced 2. regular expression in python
Python advanced 2. regular expression in pythonJohn(Qiang) Zhang
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in PythonSujith Kumar
 
Strings in Python
Strings in PythonStrings in Python
Strings in Pythonnitamhaske
 
Processing Regex Python
Processing Regex PythonProcessing Regex Python
Processing Regex Pythonprimeteacher32
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaj Gupta
 
Regular expressions in Ruby and Introduction to Vim
Regular expressions in Ruby and Introduction to VimRegular expressions in Ruby and Introduction to Vim
Regular expressions in Ruby and Introduction to VimStalin Thangaraj
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular ExpressionsOCSI
 
11. using regular expressions with oracle database
11. using regular expressions with oracle database11. using regular expressions with oracle database
11. using regular expressions with oracle databaseAmrit Kaur
 
Regular expression
Regular expressionRegular expression
Regular expressionLarry Nung
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracleLogan Palanisamy
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
Python- Regular expression
Python- Regular expressionPython- Regular expression
Python- Regular expressionMegha V
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expressionGagan019
 

Tendances (20)

Python advanced 2. regular expression in python
Python advanced 2. regular expression in pythonPython advanced 2. regular expression in python
Python advanced 2. regular expression in python
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Strings in Python
Strings in PythonStrings in Python
Strings in Python
 
Python strings
Python stringsPython strings
Python strings
 
Processing Regex Python
Processing Regex PythonProcessing Regex Python
Processing Regex Python
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regular expressions in Ruby and Introduction to Vim
Regular expressions in Ruby and Introduction to VimRegular expressions in Ruby and Introduction to Vim
Regular expressions in Ruby and Introduction to Vim
 
Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular Expressions
 
Python : Regular expressions
Python : Regular expressionsPython : Regular expressions
Python : Regular expressions
 
Python data handling
Python data handlingPython data handling
Python data handling
 
11. using regular expressions with oracle database
11. using regular expressions with oracle database11. using regular expressions with oracle database
11. using regular expressions with oracle database
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
Regex posix
Regex posixRegex posix
Regex posix
 
Python- Regular expression
Python- Regular expressionPython- Regular expression
Python- Regular expression
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expression
 

Similaire à Regular expressions

FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfBryan Alejos
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeBertram Ludäscher
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex powerMax Kleiner
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressionsYamagata Europe
 
unit-4 regular expression.pptx
unit-4 regular expression.pptxunit-4 regular expression.pptx
unit-4 regular expression.pptxPadreBhoj
 
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressionsmussawir20
 
Mikhail Khristophorov "Introduction to Regular Expressions"
Mikhail Khristophorov "Introduction to Regular Expressions"Mikhail Khristophorov "Introduction to Regular Expressions"
Mikhail Khristophorov "Introduction to Regular Expressions"LogeekNightUkraine
 
0-Slot21-22-Strings.pdf
0-Slot21-22-Strings.pdf0-Slot21-22-Strings.pdf
0-Slot21-22-Strings.pdfssusere19c741
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptxDurgaNayak4
 
Presentation more c_programmingcharacter_and_string_handling_
Presentation more c_programmingcharacter_and_string_handling_Presentation more c_programmingcharacter_and_string_handling_
Presentation more c_programmingcharacter_and_string_handling_KarthicaMarasamy
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP StringsAhmed Swilam
 

Similaire à Regular expressions (20)

FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
 
Python - Lecture 7
Python - Lecture 7Python - Lecture 7
Python - Lecture 7
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Intoduction to php strings
Intoduction to php  stringsIntoduction to php  strings
Intoduction to php strings
 
Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex power
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressions
 
unit-4 regular expression.pptx
unit-4 regular expression.pptxunit-4 regular expression.pptx
unit-4 regular expression.pptx
 
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressions
 
Reg EX
Reg EXReg EX
Reg EX
 
Mikhail Khristophorov "Introduction to Regular Expressions"
Mikhail Khristophorov "Introduction to Regular Expressions"Mikhail Khristophorov "Introduction to Regular Expressions"
Mikhail Khristophorov "Introduction to Regular Expressions"
 
0-Slot21-22-Strings.pdf
0-Slot21-22-Strings.pdf0-Slot21-22-Strings.pdf
0-Slot21-22-Strings.pdf
 
Lecture 10.pdf
Lecture 10.pdfLecture 10.pdf
Lecture 10.pdf
 
Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptx
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
P3 2017 python_regexes
P3 2017 python_regexesP3 2017 python_regexes
P3 2017 python_regexes
 
Presentation more c_programmingcharacter_and_string_handling_
Presentation more c_programmingcharacter_and_string_handling_Presentation more c_programmingcharacter_and_string_handling_
Presentation more c_programmingcharacter_and_string_handling_
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Regular expressions

  • 2. Credits • The Java Tutorials: Regular Expressions • docs.oracle.com/javase/tutorial /essential/regex/
  • 3. Regex • Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in the set. • They can be used to search, edit, or manipulate text and data. • They are created with a specific syntax.
  • 4. Regex in Java • Regex in Java is similar to Perl • The java.util.regex package primarily consists of three classes: Pattern, Matcher, and PatternSyntaxException.
  • 5. Pattern & PatternSyntaxException • You can think of this as the regular expression wrapper object. • You get a Pattern by calling: – Pattern.compile(“RegularExpressionString”); • If your “RegularExpressionString” is invalid, you will get the PatternSyntaxException.
  • 6. Matcher • You can think of this as the search result object. • You can get a matcher object by calling: – myPattern.matcher(“StringToBeSearched”); • You use it by calling: – myMatcher.find() • Then call any number of methods on myMatcher to see attributes of the result.
  • 7. Regex Test Harness • The tutorials give a test harness that uses the Console class. It doesn’t work in any IDE. • So I rewrote it to use Basic I/O
  • 9. Regex • Test harness output example. • Input is given in Bold. Enter your regex: foo Enter input string to search: foofoo Found ‘foo’ at index 0, ending at index 3. Found ‘foo’ at index 3, ending at index 6.
  • 11. Metacharacters • <([{^-=$!|]})?*+.> • Precede a metacharacter with a ‘’ to treat it as a ordinary character. • Or use Q and E to begin and end a literal quote.
  • 12. Metacharacters Enter your regex: cat. Enter input string to search: cats Found ‘cats’ at index 0, ending at index 4.
  • 13. Character Classes Construct Description [abc] a, b, or c (simple class) Any character except a, b, or c [^abc] (negation) a through z, or A through Z, inclusive [a-zA-Z] (range) a through d, OR m through p: [a-dm-p] [a-d[m-p]] (union) [a-z&&[def]] d, e, f (intersection) a through z, except for b and c: [ad-z] [a-z&&[^bc]] (subtraction) a through z, and not m through p: [a-lq- [a-z&&[^m-p]] z] (subtraction)
  • 14. Character Class Enter your regex: [bcr]at Enter input string to search: rat I found the text "rat" starting at index 0 and ending at index 3. Enter input string to search: cat Found "cat" at index 0, ending at index 3.
  • 15. Character Class: Negation Enter your regex: [^bcr]at Enter input string to search: rat No match found. Enter input string to search: hat Found "hat" at index 0, ending at index 3.
  • 16. Character Class: Range Enter your regex: foo[1-5] Enter input string to search: foo5 Found "foo5" at index 0, ending at index 4. Enter input string to search: foo6 No match found.
  • 17. Character Class: Union Enter your regex: [0-4[6-8]] Enter input string to search: 0 Found "0" at index 0, ending at index 1. Enter input string to search: 5 No match found. Enter input string to search: 6 Found "6" starting at index 0, ending at index 1.
  • 18. Character Class: Intersection Enter your regex: [0-9&&[345]] Enter input string to search: 5 Found "5" at index 0, ending at index 1. Enter input string to search: 2 No match found.
  • 19. Character Class: Subtraction Enter your regex: [0-9&&[^345]] Enter input string to search: 5 No match found.
  • 20. Predefined Character Classes Construct Description Any character (may or may not match line . terminators) d A digit: [0-9] D A non-digit: [^0-9] s A whitespace character: [ tnx0Bfr] S A non-whitespace character: [^s] w A word character: [a-zA-Z_0-9] W A non-word character: [^w]
  • 21. Predefined Character Classes (cont.) • To summarize: – d matches all digits – s matches spaces – w matches word characters • Whereas a capital letter is the opposite: – D matches non-digits – S matches non-spaces – W matches non-word characters
  • 22. Quantifiers Greedy Reluctant Possessive Meaning X? X?? X?+ X, once or not at all X, zero or more X* X*? X*+ times X, one or more X+ X+? X++ times X{n} X{n}? X{n}+ X, exactly n times X{n,} X{n,}? X{n,}+ X, at least n times X, at least n but not X{n,m} X{n,m}? X{n,m}+ more than m times
  • 23. Ignore Greedy, Reluctant, and Possessive For now.
  • 24. Zero Length Match • In the regexes ‘a?’ and ‘a*’ each allow for zero occurrences of the letter a. Enter your regex: a* Enter input string to search: aa Found “aa" at index 0, ending at index 2. Found “” at index 2, ending at index 2.
  • 25. Quatifiers: Exact Enter your regex: a{3} Enter input string to search: aa No match found. Enter input string to search: aaaa Found "aaa" at index 0, ending at index 3.
  • 26. Quantifiers: At Least, No Greater Enter your regex: a{3,} Enter input string to search: aaaaaaaaa Found "aaaaaaaaa" at index 0, ending at index 9. Enter your regex: a{3,6} Enter input string to search: aaaaaaaaa Found "aaaaaa" at index 0, ending at index 6. Found "aaa" at index 6, ending at index 9.
  • 27. Quantifiers • "abc+" – Means "a, followed by b, followed by (c one or more times)". – “abcc” = match!, “abbc” = no match • “*abc++” – Means “(a, b, or c) one or more times) – “bba” = match!
  • 28. Greedy, Reluctant, and Possessive • Greedy – The whole input is validated, end characters are consecutively left off as needed • Reluctant – No input is validated, beginning characters are consecutively added as needed • Possessive – The whole input is validated, no retries are made
  • 29. Greedy Enter your regex: .*foo Enter input string to search: xfooxxxxxxfoo Found "xfooxxxxxxfoo" at index 0, ending at index 13.
  • 30. Reluctant Enter your regex: .*?foo Enter input string to search: xfooxxxxxxfoo Found "xfoo" at index 0, ending at index 4. Found "xxxxxxfoo" at index 4, ending at index 13.
  • 31. Possessive Enter your regex: .*+foo Enter input string to search: xfooxxxxxxfoo No match found.
  • 32. Capturing Group • Capturing groups are a way to treat multiple characters as a single unit. • They are created by placing the characters to be grouped inside a set of parentheses. • “(dog)” – Means a single group containing the letters "d" "o" and "g".
  • 33. Capturing Group w/ Quantifiers • (abc)+ – Means "abc" one or more times
  • 34. Capturing Groups: Numbering • ((A)(B(C))) 1. ((A)(B(C))) 2. (A) 3. (B(C)) 4. (C) • The index is based on the opening parentheses.
  • 35. Capturing Groups: Numbering Usage • Some Matcher methods accept a group number as a parameter: • int start(int group) • int end (int group) • String group (int group)
  • 36. Capturing Groups: Backreferences • The section of input matching the capturing group is saved for recall via backreference. • Specify a backreference with ‘’ followed by the group number. • ’(dd)’ – Can be recalled with the expression ‘1’.
  • 37. Capturing Groups: Backreferences Enter your regex: (dd)1 Enter input string to search: 1212 Found "1212" at index 0, ending at index 4. Enter input string to search: 1234 No match found.
  • 38. Boundary Matchers Boundary Construct Description ^ The beginning of a line $ The end of a line b A word boundary B A non-word boundary A The beginning of the input G The end of the previous match The end of the input but for the final Z terminator, if any z The end of the input
  • 39. Boundary Matchers Enter your regex: ^dog$ Enter input string to search: dog Found "dog" at index 0, ending at index 3. Enter your regex: ^dogw* Enter input string to search: dogblahblah Found "dogblahblah" at index 0, ending at index 11.
  • 40. Boundary Matchers (cont.) Enter your regex: bdogb Enter input string to search: The doggie plays in the yard. No match found. Enter your regex: Gdog Enter input string to search: dog dog Found "dog" at index 0, ending at index 3.
  • 41. Pattern Class (cont.) • There are a number of flags that can be passed to the ‘compile’ method. • Embeddable flag expressions are Java-specific regex that duplicates these compile flags. • Check out ‘matches’, ‘split’, and ‘quote’ methods as well.
  • 42. Matcher Class (cont.) • The Matcher class can slice input a multitude of ways: – Index methods give the position of matches – Study methods give boolean results to queries – Replacement methods let you edit input
  • 43. PatternSyntaxException (cont.) • You get a little more than just an error message from the PatternSyntaxException. • Check out the following methods: – public String getDescription() – public int getIndex() – public String getPattern() – public String getMessage()
  • 44.