SlideShare une entreprise Scribd logo
1  sur  26
7/22/2014VYBHAVA TECHNOLOGIES 1
Regular expressions are a powerful language for
matching text patterns and standardized way of searching,
replacing, and parsing text with complex patterns of
characters
All modern languages have similar library packages
for regular expressions i.e., re built in module
7/22/2014VYBHAVA TECHNOLOGIES 2
 Hundreds of code could be reduced to a one line elegant
regular expression
 Used to construct compilers, interpreters and text editors
 Used to search and match text patterns
 Used to validate text data formats especially input data
 Popular programming languages have Regex capabilities
Python, Perl, JavaScript, Ruby ,Tcl, C++,C#
7/22/2014VYBHAVA TECHNOLOGIES 3
General uses of regular expressions to:
 Search a string (search and match)
 Replace parts of a string (sub)
 Break string into small pieces (split)
 Finding a string (findall)
Before using the regular expressions in your program, you
must import the library using "import re"
7/22/2014VYBHAVA TECHNOLOGIES 4
 Alternative : |
 Grouping : ()
 Quantification : ?*+{m,n}
 Anchors : ^ $
 Meta- characters : . [][-][^]
 Character classes : dDwW…..
7/22/2014VYBHAVA TECHNOLOGIES 5
Alternative:
Eg: “cat|mat” ==“cat” or “mat”
“python|jython” ==“python” or “jython”
Grouping:
Eg: gr(r|a)y==“grey” or “gray”
“ra(mil|n(ny|el))”==“ramil” or “ranny” or “ranel”
7/22/2014VYBHAVA TECHNOLOGIES 6
Quantification
? == zero or one of the preceding element
Eg: “rani?el”==“raniel” or “ranel”
“colou?r”==“colour” or “color”
* == zero or more of the preceding element
Eg: “fo*ot”==“foot” or “fooot” or “foooooot”
“94*9” ==“99” or “9449” or “9444449”
+ == one or more of the preceding element
Eg: “too+fan”==“toofan” or “tooooofan”
“36+40”==“3640” or “3666640”
{m,n} == m to n times of the preceding element
Eg: “go{2,3}gle”== “google” or “gooogle”
“6{3} “==“666”
“s{2,}”==“ss” or “sss” or “ssss” ……
7/22/2014VYBHAVA TECHNOLOGIES 7
Anchors
^ == matches the starting position with in the string
Eg: “^obje”==“object” or “object – oriented”
“^2014”==“2014” or “2014/20/07”
$ == matches the ending position with in the string
Eg: “gram$”==“program” or “kilogram”
“2014$” == “20/07/2014”, “2013-2014”
7/22/2014VYBHAVA TECHNOLOGIES 8
Meta-characters
.(dot)== matches any single character
Eg: “bat.”== “bat” or “bats” or “bata”
“87.1”==“8741” or “8751” or “8761”
[]== matches a single character that is contained with in the brackets
Eg: “[xyz]” == “x” or “y” or “z”
“[aeiou]”==any vowel
“[0123456789]”==any digit
[ - ] == matches a single character that is contained within the brackets and
the specified range.
Eg: “[a-c]” == “a” or “b” or “c”
“[a-zA-Z]” == all letters (lowercase & uppercase)
“[0-9]” == all digits
[^ ] == matches a single character that is not contained within the brackets.
Eg: “[^aeiou]” == any non-vowel
“[^0-9]” == any non-digit
“[^xyz]” == any character, but not “x”, “y”, or “z”
7/22/2014VYBHAVA TECHNOLOGIES 9
Character Classes
Character classes specifies a group of characters to match in a string
d Matches a decimal digit [0-9]
D Matches non digits
s Matches a single white space character [t-tab,n-newline,
r-return,v-space, f-form]
S Matches any non-white space character
w Matches alphanumeric character class ([a-zA-Z0-9_])
W Matches non-alphanumeric character class ([^a-zA-Z0-9_])
w+ Matches one or more words / characters
b Matches word boundaries when outside brackets.
Matches backspace when inside brackets
B Matches nonword boundaries
A Matches beginning of string
z Matches end of string
7/22/2014VYBHAVA TECHNOLOGIES 10
Search scans through the input string and tries to match at
any location
The search function searches for first occurrence of
RE pattern within string with optional flags.
syntax :
re.search(pattern, string, flags=0)
Pattern --This is the regular expression to be matched
String-- This is the string, which would be searched to match the pattern
anywhere in the string
Flags-- You can specify different flags using bitwise OR (|). These are
modifiers.
7/22/2014VYBHAVA TECHNOLOGIES 11
7/22/2014VYBHAVA TECHNOLOGIES 12
Modifier Description
re.I Performs case-insensitive matching
re.L Interprets words according to the current locale. This
interpretation affects the alphabetic group (w and W),
as well as word boundary behavior (b and B).
re.M Makes $ match the end of a line (not just the end of the
string) and makes ^ match the start of any line (not just
the start of the string).
re.S Makes a period (dot) match any character, including a
newline.
re.u Interprets letters according to the Unicode character set.
This flag affects the behavior of w, W, b, B.
re.x It ignores whitespace (except inside a set [] or when
escaped by a backslash) and treats unescaped # as a
comment marker.
OPTION FLAGS
The re.search function returns a match object on
success, None on failure. We would
use group(num)or groups() function of match object to get
matched expression.
7/22/2014VYBHAVA TECHNOLOGIES 13
Match Object Methods Description
group(num=0) This method returns entire match (or
specific subgroup num)
groups() This method returns all matching
subgroups in a tuple
 match object for information about the matching
string. match object instances also have several
methods and attributes; the most important ones are
7/22/2014VYBHAVA TECHNOLOGIES 14
Method Purpose
group( ) Return the string matched by the RE
start( ) Return the starting position of the match
end ( ) Return the ending position of the match
span ( ) Return a tuple containing the (start, end)
positions of the match
""" This program illustrates is
one of the regular expression method i.e., search()"""
import re
# importing Regular Expression built-in module
text = 'This is my First Regulr Expression Program'
patterns = [ 'first', 'that‘, 'program' ]
# In the text input which patterns you have to search
for pattern in patterns:
print 'Looking for "%s" in "%s" ->' % (pattern, text),
if re.search(pattern, text,re.I):
print 'found a match!'
# if given pattern found in the text then execute the 'if' condition
else:
print 'no match'
# if pattern not found in the text then execute the 'else' condition
7/22/2014VYBHAVA TECHNOLOGIES 15
 Till now we have done simply performed searches against a
static string. Regular expressions are also commonly used to
modify strings in various ways using the following pattern
methods.
7/22/2014VYBHAVA TECHNOLOGIES 16
Method Purpose
Split ( ) Split the string into a list, splitting it wherever the RE
matches
Sub ( ) Find all substrings where the RE matches, and replace
them with a different string
Subn ( ) Does the same thing as sub(), but returns the new string
and the number of replacements
Regular expressions are compiled into pattern objects,
which have methods for various operations such as searching
for pattern matches or performing string substitutions
Split Method:
Syntax:
re.split(string,[maxsplit=0])
Eg:
>>>p = re.compile(r'W+')
>>> p.split(‘This is my first split example string')
[‘This', 'is', 'my', 'first', 'split', 'example']
>>> p.split(‘This is my first split example string', 3)
[‘This', 'is', 'my', 'first split example']
7/22/2014VYBHAVA TECHNOLOGIES 17
Sub Method:
The sub method replaces all occurrences of the
RE pattern in string with repl, substituting all occurrences
unless max provided. This method would return modified
string
Syntax:
re.sub(pattern, repl, string, max=0)
7/22/2014VYBHAVA TECHNOLOGIES 18
import re
DOB = "25-01-1991 # This is Date of Birth“
# Delete Python-style comments
Birth = re.sub (r'#.*$', "", DOB)
print "Date of Birth : ", Birth
# Remove anything other than digits
Birth1 = re.sub (r'D', "", Birth)
print "Before substituting DOB : ", Birth1
# Substituting the '-' with '.(dot)'
New=re.sub (r'W',".",Birth)
print "After substituting DOB: ", New
7/22/2014VYBHAVA TECHNOLOGIES 19
 There are two types of Match Methods in RE
1) Greedy
2) Lazy
 Most flavors of regular expressions are greedy by default
 To make a Quantifier Lazy, append a question mark to it
 The basic difference is ,Greedy mode tries to find the last
possible match, lazy mode the first possible match.
7/22/2014VYBHAVA TECHNOLOGIES 20
Example for Greedy Match
>>>s = '<html><head><title>Title</title>'
>>> len(s)
32
>>> print re.match('<.*>', s).span( )
(0, 32) # it gives the start and end position of the match
>>> print re.match('<.*>', s).group( )
<html><head><title>Title</title>
Example for Lazy Match
>>> print re.match('<.*?>', s).group( )
<html>
7/22/2014VYBHAVA TECHNOLOGIES 21
 The RE matches the '<' in <html>, and the .* consumes the rest of
the string. There’s still more left in the RE, though, and the > can’t
match at the end of the string, so the regular expression engine has
to backtrack character by character until it finds a match for the >.
The final match extends from the '<' in <html>to the '>' in </title>,
which isn’t what you want.
 In the Lazy (non-greedy) qualifiers *?, +?, ??, or {m,n}?, which
match as little text as possible. In the example, the '>' is tried
immediately after the first '<' matches, and when it fails, the engine
advances a character at a time, retrying the '>' at every step.
7/22/2014VYBHAVA TECHNOLOGIES 22
Findall:
 Return all non-overlapping matches of pattern in string,
as a list of strings.
 The string is scanned left-to-right, and matches are
returned in the order found.
 If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
 Empty matches are included in the result unless they
touch the beginning of another match.
Syntax:
re.findall (pattern, string, flags=0)
7/22/2014VYBHAVA TECHNOLOGIES 23
import re
line='Python Orientation course helps professionals fish best opportunities‘
Print “Find all words starts with p”
print re.findall(r"bp[w]*",line)
print "Find all five characthers long words"
print re.findall(r"bw{5}b", line)
# Find all four, six characthers long words
print “find all 4, 6 char long words"
print re.findall(r"bw{4,6}b", line)
# Find all words which are at least 13 characters long
print “Find all words with 13 char"
print re.findall(r"bw{13,}b", line)
7/22/2014VYBHAVA TECHNOLOGIES 24
Finditer
 Return an iterator yielding MatchObject instances over
all non-overlapping matches for the RE pattern in string.
 The string is scanned left-to-right, and matches are
returned in the order found.
 Empty matches are included in the result unless they
touch the beginning of another match.
Syntax:
re.finditer(pattern, string, flags=0)
7/22/2014VYBHAVA TECHNOLOGIES 25
import re
string="Python java c++ perl shell ruby tcl c c#"
print re.findall(r"bc[W+]*",string,re.M|re.I)
print re.findall(r"bp[w]*",string,re.M|re.I)
print re.findall(r"bs[w]*",string,re.M|re.I)
print re.sub(r'W+',"",string)
it = re.finditer(r"bc[(Ws)]*", string)
for match in it:
print "'{g}' was found between the indices
{s}".format(g=match.group(),s=match.span())
7/22/2014VYBHAVA TECHNOLOGIES 26

Contenu connexe

Tendances (20)

Python Exception Handling
Python Exception HandlingPython Exception Handling
Python Exception Handling
 
Python Flow Control
Python Flow ControlPython Flow Control
Python Flow Control
 
File handling in Python
File handling in PythonFile handling in Python
File handling in Python
 
Python Regular Expressions
Python Regular ExpressionsPython Regular Expressions
Python Regular Expressions
 
Arrays in python
Arrays in pythonArrays in python
Arrays in python
 
Data Structure (Queue)
Data Structure (Queue)Data Structure (Queue)
Data Structure (Queue)
 
Adv. python regular expression by Rj
Adv. python regular expression by RjAdv. python regular expression by Rj
Adv. python regular expression by Rj
 
Queue ppt
Queue pptQueue ppt
Queue ppt
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
Sets in python
Sets in pythonSets in python
Sets in python
 
NUMPY
NUMPY NUMPY
NUMPY
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Vectors in Java
Vectors in JavaVectors in Java
Vectors in Java
 
Queues
QueuesQueues
Queues
 
Data Structures in Python
Data Structures in PythonData Structures in Python
Data Structures in Python
 
Threads in python
Threads in pythonThreads in python
Threads in python
 
File handling in c
File handling in cFile handling in c
File handling in c
 
Functions in python
Functions in pythonFunctions in python
Functions in python
 
Python basic
Python basicPython basic
Python basic
 
Modules and packages in python
Modules and packages in pythonModules and packages in python
Modules and packages in python
 

En vedette

Object Oriented Programming in Python
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in PythonSujith Kumar
 
Python Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutAudrey Roy
 
Python Datatypes by SujithKumar
Python Datatypes by SujithKumarPython Datatypes by SujithKumar
Python Datatypes by SujithKumarSujith Kumar
 
Advance OOP concepts in Python
Advance OOP concepts in PythonAdvance OOP concepts in Python
Advance OOP concepts in PythonSujith Kumar
 
Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonSujith Kumar
 
Introduction to python for Beginners
Introduction to python for Beginners Introduction to python for Beginners
Introduction to python for Beginners Sujith Kumar
 

En vedette (6)

Object Oriented Programming in Python
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in Python
 
Python Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live Without
 
Python Datatypes by SujithKumar
Python Datatypes by SujithKumarPython Datatypes by SujithKumar
Python Datatypes by SujithKumar
 
Advance OOP concepts in Python
Advance OOP concepts in PythonAdvance OOP concepts in Python
Advance OOP concepts in Python
 
Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in Python
 
Introduction to python for Beginners
Introduction to python for Beginners Introduction to python for Beginners
Introduction to python for Beginners
 

Similaire à Regular expressions in Python

Python regular expressions
Python regular expressionsPython regular expressions
Python regular expressionsKrishna Nanda
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptxDurgaNayak4
 
Unit 2 - Regular Expression.pptx
Unit 2 - Regular Expression.pptxUnit 2 - Regular Expression.pptx
Unit 2 - Regular Expression.pptxmythili213835
 
Unit 2 - Regular Expression .pptx
Unit 2 - Regular        Expression .pptxUnit 2 - Regular        Expression .pptx
Unit 2 - Regular Expression .pptxmythili213835
 
Module 3 - Regular Expressions, Dictionaries.pdf
Module 3 - Regular  Expressions,  Dictionaries.pdfModule 3 - Regular  Expressions,  Dictionaries.pdf
Module 3 - Regular Expressions, Dictionaries.pdfGaneshRaghu4
 
Regular expressionfunction
Regular expressionfunctionRegular expressionfunction
Regular expressionfunctionADARSH BHATT
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20Max Kleiner
 
unit-4 regular expression.pptx
unit-4 regular expression.pptxunit-4 regular expression.pptx
unit-4 regular expression.pptxPadreBhoj
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionProf. Wim Van Criekinge
 
regular-expression.pdf
regular-expression.pdfregular-expression.pdf
regular-expression.pdfDarellMuchoko
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaghu nath
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP StringsAhmed Swilam
 
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular ExpressionsMukesh Tekwani
 
Strings,patterns and regular expressions in perl
Strings,patterns and regular expressions in perlStrings,patterns and regular expressions in perl
Strings,patterns and regular expressions in perlsana mateen
 
Unit 1-strings,patterns and regular expressions
Unit 1-strings,patterns and regular expressionsUnit 1-strings,patterns and regular expressions
Unit 1-strings,patterns and regular expressionssana mateen
 
Ruby data types and objects
Ruby   data types and objectsRuby   data types and objects
Ruby data types and objectsHarkamal Singh
 
Manipulating strings
Manipulating stringsManipulating strings
Manipulating stringsNicole Ryan
 

Similaire à Regular expressions in Python (20)

Python regular expressions
Python regular expressionsPython regular expressions
Python regular expressions
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptx
 
Unit 2 - Regular Expression.pptx
Unit 2 - Regular Expression.pptxUnit 2 - Regular Expression.pptx
Unit 2 - Regular Expression.pptx
 
Unit 2 - Regular Expression .pptx
Unit 2 - Regular        Expression .pptxUnit 2 - Regular        Expression .pptx
Unit 2 - Regular Expression .pptx
 
Module 3 - Regular Expressions, Dictionaries.pdf
Module 3 - Regular  Expressions,  Dictionaries.pdfModule 3 - Regular  Expressions,  Dictionaries.pdf
Module 3 - Regular Expressions, Dictionaries.pdf
 
Regular expressionfunction
Regular expressionfunctionRegular expressionfunction
Regular expressionfunction
 
Perl_Part4
Perl_Part4Perl_Part4
Perl_Part4
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
 
unit-4 regular expression.pptx
unit-4 regular expression.pptxunit-4 regular expression.pptx
unit-4 regular expression.pptx
 
PHP Web Programming
PHP Web ProgrammingPHP Web Programming
PHP Web Programming
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
regular-expression.pdf
regular-expression.pdfregular-expression.pdf
regular-expression.pdf
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
JavaScript.pptx
JavaScript.pptxJavaScript.pptx
JavaScript.pptx
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
 
Strings,patterns and regular expressions in perl
Strings,patterns and regular expressions in perlStrings,patterns and regular expressions in perl
Strings,patterns and regular expressions in perl
 
Unit 1-strings,patterns and regular expressions
Unit 1-strings,patterns and regular expressionsUnit 1-strings,patterns and regular expressions
Unit 1-strings,patterns and regular expressions
 
Ruby data types and objects
Ruby   data types and objectsRuby   data types and objects
Ruby data types and objects
 
Manipulating strings
Manipulating stringsManipulating strings
Manipulating strings
 

Dernier

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Dernier (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Regular expressions in Python

  • 2. Regular expressions are a powerful language for matching text patterns and standardized way of searching, replacing, and parsing text with complex patterns of characters All modern languages have similar library packages for regular expressions i.e., re built in module 7/22/2014VYBHAVA TECHNOLOGIES 2
  • 3.  Hundreds of code could be reduced to a one line elegant regular expression  Used to construct compilers, interpreters and text editors  Used to search and match text patterns  Used to validate text data formats especially input data  Popular programming languages have Regex capabilities Python, Perl, JavaScript, Ruby ,Tcl, C++,C# 7/22/2014VYBHAVA TECHNOLOGIES 3
  • 4. General uses of regular expressions to:  Search a string (search and match)  Replace parts of a string (sub)  Break string into small pieces (split)  Finding a string (findall) Before using the regular expressions in your program, you must import the library using "import re" 7/22/2014VYBHAVA TECHNOLOGIES 4
  • 5.  Alternative : |  Grouping : ()  Quantification : ?*+{m,n}  Anchors : ^ $  Meta- characters : . [][-][^]  Character classes : dDwW….. 7/22/2014VYBHAVA TECHNOLOGIES 5
  • 6. Alternative: Eg: “cat|mat” ==“cat” or “mat” “python|jython” ==“python” or “jython” Grouping: Eg: gr(r|a)y==“grey” or “gray” “ra(mil|n(ny|el))”==“ramil” or “ranny” or “ranel” 7/22/2014VYBHAVA TECHNOLOGIES 6
  • 7. Quantification ? == zero or one of the preceding element Eg: “rani?el”==“raniel” or “ranel” “colou?r”==“colour” or “color” * == zero or more of the preceding element Eg: “fo*ot”==“foot” or “fooot” or “foooooot” “94*9” ==“99” or “9449” or “9444449” + == one or more of the preceding element Eg: “too+fan”==“toofan” or “tooooofan” “36+40”==“3640” or “3666640” {m,n} == m to n times of the preceding element Eg: “go{2,3}gle”== “google” or “gooogle” “6{3} “==“666” “s{2,}”==“ss” or “sss” or “ssss” …… 7/22/2014VYBHAVA TECHNOLOGIES 7
  • 8. Anchors ^ == matches the starting position with in the string Eg: “^obje”==“object” or “object – oriented” “^2014”==“2014” or “2014/20/07” $ == matches the ending position with in the string Eg: “gram$”==“program” or “kilogram” “2014$” == “20/07/2014”, “2013-2014” 7/22/2014VYBHAVA TECHNOLOGIES 8
  • 9. Meta-characters .(dot)== matches any single character Eg: “bat.”== “bat” or “bats” or “bata” “87.1”==“8741” or “8751” or “8761” []== matches a single character that is contained with in the brackets Eg: “[xyz]” == “x” or “y” or “z” “[aeiou]”==any vowel “[0123456789]”==any digit [ - ] == matches a single character that is contained within the brackets and the specified range. Eg: “[a-c]” == “a” or “b” or “c” “[a-zA-Z]” == all letters (lowercase & uppercase) “[0-9]” == all digits [^ ] == matches a single character that is not contained within the brackets. Eg: “[^aeiou]” == any non-vowel “[^0-9]” == any non-digit “[^xyz]” == any character, but not “x”, “y”, or “z” 7/22/2014VYBHAVA TECHNOLOGIES 9
  • 10. Character Classes Character classes specifies a group of characters to match in a string d Matches a decimal digit [0-9] D Matches non digits s Matches a single white space character [t-tab,n-newline, r-return,v-space, f-form] S Matches any non-white space character w Matches alphanumeric character class ([a-zA-Z0-9_]) W Matches non-alphanumeric character class ([^a-zA-Z0-9_]) w+ Matches one or more words / characters b Matches word boundaries when outside brackets. Matches backspace when inside brackets B Matches nonword boundaries A Matches beginning of string z Matches end of string 7/22/2014VYBHAVA TECHNOLOGIES 10
  • 11. Search scans through the input string and tries to match at any location The search function searches for first occurrence of RE pattern within string with optional flags. syntax : re.search(pattern, string, flags=0) Pattern --This is the regular expression to be matched String-- This is the string, which would be searched to match the pattern anywhere in the string Flags-- You can specify different flags using bitwise OR (|). These are modifiers. 7/22/2014VYBHAVA TECHNOLOGIES 11
  • 12. 7/22/2014VYBHAVA TECHNOLOGIES 12 Modifier Description re.I Performs case-insensitive matching re.L Interprets words according to the current locale. This interpretation affects the alphabetic group (w and W), as well as word boundary behavior (b and B). re.M Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string). re.S Makes a period (dot) match any character, including a newline. re.u Interprets letters according to the Unicode character set. This flag affects the behavior of w, W, b, B. re.x It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker. OPTION FLAGS
  • 13. The re.search function returns a match object on success, None on failure. We would use group(num)or groups() function of match object to get matched expression. 7/22/2014VYBHAVA TECHNOLOGIES 13 Match Object Methods Description group(num=0) This method returns entire match (or specific subgroup num) groups() This method returns all matching subgroups in a tuple
  • 14.  match object for information about the matching string. match object instances also have several methods and attributes; the most important ones are 7/22/2014VYBHAVA TECHNOLOGIES 14 Method Purpose group( ) Return the string matched by the RE start( ) Return the starting position of the match end ( ) Return the ending position of the match span ( ) Return a tuple containing the (start, end) positions of the match
  • 15. """ This program illustrates is one of the regular expression method i.e., search()""" import re # importing Regular Expression built-in module text = 'This is my First Regulr Expression Program' patterns = [ 'first', 'that‘, 'program' ] # In the text input which patterns you have to search for pattern in patterns: print 'Looking for "%s" in "%s" ->' % (pattern, text), if re.search(pattern, text,re.I): print 'found a match!' # if given pattern found in the text then execute the 'if' condition else: print 'no match' # if pattern not found in the text then execute the 'else' condition 7/22/2014VYBHAVA TECHNOLOGIES 15
  • 16.  Till now we have done simply performed searches against a static string. Regular expressions are also commonly used to modify strings in various ways using the following pattern methods. 7/22/2014VYBHAVA TECHNOLOGIES 16 Method Purpose Split ( ) Split the string into a list, splitting it wherever the RE matches Sub ( ) Find all substrings where the RE matches, and replace them with a different string Subn ( ) Does the same thing as sub(), but returns the new string and the number of replacements
  • 17. Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions Split Method: Syntax: re.split(string,[maxsplit=0]) Eg: >>>p = re.compile(r'W+') >>> p.split(‘This is my first split example string') [‘This', 'is', 'my', 'first', 'split', 'example'] >>> p.split(‘This is my first split example string', 3) [‘This', 'is', 'my', 'first split example'] 7/22/2014VYBHAVA TECHNOLOGIES 17
  • 18. Sub Method: The sub method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method would return modified string Syntax: re.sub(pattern, repl, string, max=0) 7/22/2014VYBHAVA TECHNOLOGIES 18
  • 19. import re DOB = "25-01-1991 # This is Date of Birth“ # Delete Python-style comments Birth = re.sub (r'#.*$', "", DOB) print "Date of Birth : ", Birth # Remove anything other than digits Birth1 = re.sub (r'D', "", Birth) print "Before substituting DOB : ", Birth1 # Substituting the '-' with '.(dot)' New=re.sub (r'W',".",Birth) print "After substituting DOB: ", New 7/22/2014VYBHAVA TECHNOLOGIES 19
  • 20.  There are two types of Match Methods in RE 1) Greedy 2) Lazy  Most flavors of regular expressions are greedy by default  To make a Quantifier Lazy, append a question mark to it  The basic difference is ,Greedy mode tries to find the last possible match, lazy mode the first possible match. 7/22/2014VYBHAVA TECHNOLOGIES 20
  • 21. Example for Greedy Match >>>s = '<html><head><title>Title</title>' >>> len(s) 32 >>> print re.match('<.*>', s).span( ) (0, 32) # it gives the start and end position of the match >>> print re.match('<.*>', s).group( ) <html><head><title>Title</title> Example for Lazy Match >>> print re.match('<.*?>', s).group( ) <html> 7/22/2014VYBHAVA TECHNOLOGIES 21
  • 22.  The RE matches the '<' in <html>, and the .* consumes the rest of the string. There’s still more left in the RE, though, and the > can’t match at the end of the string, so the regular expression engine has to backtrack character by character until it finds a match for the >. The final match extends from the '<' in <html>to the '>' in </title>, which isn’t what you want.  In the Lazy (non-greedy) qualifiers *?, +?, ??, or {m,n}?, which match as little text as possible. In the example, the '>' is tried immediately after the first '<' matches, and when it fails, the engine advances a character at a time, retrying the '>' at every step. 7/22/2014VYBHAVA TECHNOLOGIES 22
  • 23. Findall:  Return all non-overlapping matches of pattern in string, as a list of strings.  The string is scanned left-to-right, and matches are returned in the order found.  If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.  Empty matches are included in the result unless they touch the beginning of another match. Syntax: re.findall (pattern, string, flags=0) 7/22/2014VYBHAVA TECHNOLOGIES 23
  • 24. import re line='Python Orientation course helps professionals fish best opportunities‘ Print “Find all words starts with p” print re.findall(r"bp[w]*",line) print "Find all five characthers long words" print re.findall(r"bw{5}b", line) # Find all four, six characthers long words print “find all 4, 6 char long words" print re.findall(r"bw{4,6}b", line) # Find all words which are at least 13 characters long print “Find all words with 13 char" print re.findall(r"bw{13,}b", line) 7/22/2014VYBHAVA TECHNOLOGIES 24
  • 25. Finditer  Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string.  The string is scanned left-to-right, and matches are returned in the order found.  Empty matches are included in the result unless they touch the beginning of another match. Syntax: re.finditer(pattern, string, flags=0) 7/22/2014VYBHAVA TECHNOLOGIES 25
  • 26. import re string="Python java c++ perl shell ruby tcl c c#" print re.findall(r"bc[W+]*",string,re.M|re.I) print re.findall(r"bp[w]*",string,re.M|re.I) print re.findall(r"bs[w]*",string,re.M|re.I) print re.sub(r'W+',"",string) it = re.finditer(r"bc[(Ws)]*", string) for match in it: print "'{g}' was found between the indices {s}".format(g=match.group(),s=match.span()) 7/22/2014VYBHAVA TECHNOLOGIES 26