SlideShare une entreprise Scribd logo
1  sur  14
Regular Expression in Action Brief overview of Regular Expression building blocks and tools with a practical example Muhammad Sheraz Siddiqi http://www.sherazsiddiqi.com/
What are Regular Expressions Tools to learn Literal characters and Special characters Build blocks of Regular Expressions    Grouping and Backreferences Unicode characters in regular expressions Regex Matching Modes Lookarounds Parse a log file… This Presentation… http://www.sherazsiddiqi.com/
Regular expressions provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. What are Regular Expressions? http://www.sherazsiddiqi.com/
The Regex Coach is a graphical application for Windows which can be used to experiment with regular expressions interactively. http://weitz.de/regex-coach/ Notepad++ is a text editor that has support of find and replace using Regular Expressions. http://notepad-plus-plus.org/ Web based Regular Expressions tester. http://www.regular-expressions.info/javascriptexample.html Tools to learn? http://www.sherazsiddiqi.com/
The most basic regular expression consists of a literal which behaves just like string matching. For e.g. catwill match cat in About cats and dogs. Special characters known as meta characters needs to be escaped with a in regular expressions if they are used as part of a literal:  dogswill match dogs. in About cats and dogs. Meta characters are: [       ^    $    .    |    ?    *    +    (    )    { Literal and Special characters http://www.sherazsiddiqi.com/
With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters. For e.g. gr[ae]ywill match grey and gray both. Ranges can be specified using dash. For e.g.  [0-9]will match any digit from 0 to 9. [0-9a-fA-F]will match any single hexadecimal digit. Caret after the opening square bracket will negate the character class. The result is that the character class will match any character that is not in the character class. For e.g. [^0-9]will match any thing except number. q[^u]will not match Iraq  but it will match Iraq is a country Character Classes and Shorthands http://www.sherazsiddiqi.com/
Meta characters works fine without escaping in Character classes. For e.g. [+*]is a valid expression and match either * or +. There are some pre-defined character classes known as short hand character classes: stands for[A-Za-z0-9_] stands for[ ] stands for[0-9] If a character class is repeated by using the ?, * or + operators, the entire character class will be repeated, and not just the character that it matched. For e.g. [0-9]+ can match 837 as well as 222 ([0-9])+ will match 222 but not 837. Character Classes and Shorthands http://www.sherazsiddiqi.com/
The famous dot “.” operator matches anything. For e.g. a.bwill match abb, aab, a+betc. ^ and $ are used to match start and end of regular expressions. For e.g. ^My.*$will match anything starting with My and ending with a dot. Pipe operator is used to match a string against either its left or the right part. For e.g. (cat|dog) can match both cat or dog.  Question: If the expression is Get|GetValue|Set|SetValueand string isSetValue. What will this match and why? What if the expression becomes Get(Value)?|Set(Value)? * or {0,} and+ or {1,} are used to control repititions. Building blocks of Regular Exp. http://www.sherazsiddiqi.com/
Round brackets besides grouping part of a regular expression together, also create a "backreference". A backreference stores the matching part of the string matched by the part of the regular expression inside the parentheses. For e.g. ([0-9])+ will match 222 but not 837. If backreference are not required, you can optimize this regular expression Set(?:Value)? Backreferences can be used in expressions itself or in replacement text. For e.g. <([A-Za-z][A-Za-z0-9]*)>.*</> will match matching opening and closing tags. Grouping and Backreferences http://www.sherazsiddiqi.com/
Unicode characters can be used as xxxx in regular expressions. For e.g. عطاری cat be matched in an expression as: 063906370627063106cc Unicode characters in Regular Exp. http://www.sherazsiddiqi.com/
/i makes the regex match case insensitive.  [A-Z] will match A and a with this modifier. /s enables "single-line mode". In this mode, the dot matches newlines as well.  .* will match sherazattari with this modifier. /m enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string. .* will match only sherazin sherazattari with this modifier. /x enables "free-spacing mode". In this mode, whitespace between regex tokens is ignored, and an unescaped # starts a comment.  #sheraz.* will match only sherazin with this modifier. Regular Exp. Matching Modes http://www.sherazsiddiqi.com/
A conditional is a special construct that will first evaluate a lookaround, and then execute one sub-regex if the lookaround succeeds, and another sub-regex if the lookaround fails. Example of Positive lookahead is: q(?=uv*)will match q in quvvvv and qu. Example of Negative lookahead is: q(?!uv*)will match q not followed by u and uv. Example of Positive lookbehind is: (?<=b)awill match a prefixed by b like ba. Example of Negative lookbehind is: (?<!b)awill match a not prefixed by b like ca and da etc. Lookarounds with Conditions… http://www.sherazsiddiqi.com/
Example1:: I have an access log (access.log) file of Helix DNA server. I want to calculate how many times each content is access and update download and listen count of each content in the database.  Exp: ^(.*)asxgen/Data/Naat/Download(.*)/(+)(mp3|rm)(.*)$ Replace: UPDATE DB.TBL set col=col + COUNT where id=; Example2:: I have application generated log (applog.txt) file of a web application. I want to fetch required information from relevant rows. In order to remove irrelevant rows: Exp: ^(?!((.*)ID:(.*)Status:(.*))).*$ Replace: Empty string Parse a log file… http://www.sherazsiddiqi.com/
Questions Please….. http://www.sherazsiddiqi.com/ Thank you for being here… Most of the content is taken from: http://www.regular-expressions.info/ me@sherazsiddiqi.com

Contenu connexe

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Regular Expressions In Action

  • 1. Regular Expression in Action Brief overview of Regular Expression building blocks and tools with a practical example Muhammad Sheraz Siddiqi http://www.sherazsiddiqi.com/
  • 2. What are Regular Expressions Tools to learn Literal characters and Special characters Build blocks of Regular Expressions   Grouping and Backreferences Unicode characters in regular expressions Regex Matching Modes Lookarounds Parse a log file… This Presentation… http://www.sherazsiddiqi.com/
  • 3. Regular expressions provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. What are Regular Expressions? http://www.sherazsiddiqi.com/
  • 4. The Regex Coach is a graphical application for Windows which can be used to experiment with regular expressions interactively. http://weitz.de/regex-coach/ Notepad++ is a text editor that has support of find and replace using Regular Expressions. http://notepad-plus-plus.org/ Web based Regular Expressions tester. http://www.regular-expressions.info/javascriptexample.html Tools to learn? http://www.sherazsiddiqi.com/
  • 5. The most basic regular expression consists of a literal which behaves just like string matching. For e.g. catwill match cat in About cats and dogs. Special characters known as meta characters needs to be escaped with a in regular expressions if they are used as part of a literal: dogswill match dogs. in About cats and dogs. Meta characters are: [ ^ $ . | ? * + ( ) { Literal and Special characters http://www.sherazsiddiqi.com/
  • 6. With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters. For e.g. gr[ae]ywill match grey and gray both. Ranges can be specified using dash. For e.g. [0-9]will match any digit from 0 to 9. [0-9a-fA-F]will match any single hexadecimal digit. Caret after the opening square bracket will negate the character class. The result is that the character class will match any character that is not in the character class. For e.g. [^0-9]will match any thing except number. q[^u]will not match Iraq but it will match Iraq is a country Character Classes and Shorthands http://www.sherazsiddiqi.com/
  • 7. Meta characters works fine without escaping in Character classes. For e.g. [+*]is a valid expression and match either * or +. There are some pre-defined character classes known as short hand character classes: stands for[A-Za-z0-9_] stands for[ ] stands for[0-9] If a character class is repeated by using the ?, * or + operators, the entire character class will be repeated, and not just the character that it matched. For e.g. [0-9]+ can match 837 as well as 222 ([0-9])+ will match 222 but not 837. Character Classes and Shorthands http://www.sherazsiddiqi.com/
  • 8. The famous dot “.” operator matches anything. For e.g. a.bwill match abb, aab, a+betc. ^ and $ are used to match start and end of regular expressions. For e.g. ^My.*$will match anything starting with My and ending with a dot. Pipe operator is used to match a string against either its left or the right part. For e.g. (cat|dog) can match both cat or dog. Question: If the expression is Get|GetValue|Set|SetValueand string isSetValue. What will this match and why? What if the expression becomes Get(Value)?|Set(Value)? * or {0,} and+ or {1,} are used to control repititions. Building blocks of Regular Exp. http://www.sherazsiddiqi.com/
  • 9. Round brackets besides grouping part of a regular expression together, also create a "backreference". A backreference stores the matching part of the string matched by the part of the regular expression inside the parentheses. For e.g. ([0-9])+ will match 222 but not 837. If backreference are not required, you can optimize this regular expression Set(?:Value)? Backreferences can be used in expressions itself or in replacement text. For e.g. <([A-Za-z][A-Za-z0-9]*)>.*</> will match matching opening and closing tags. Grouping and Backreferences http://www.sherazsiddiqi.com/
  • 10. Unicode characters can be used as xxxx in regular expressions. For e.g. عطاری cat be matched in an expression as: 063906370627063106cc Unicode characters in Regular Exp. http://www.sherazsiddiqi.com/
  • 11. /i makes the regex match case insensitive. [A-Z] will match A and a with this modifier. /s enables "single-line mode". In this mode, the dot matches newlines as well. .* will match sherazattari with this modifier. /m enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string. .* will match only sherazin sherazattari with this modifier. /x enables "free-spacing mode". In this mode, whitespace between regex tokens is ignored, and an unescaped # starts a comment. #sheraz.* will match only sherazin with this modifier. Regular Exp. Matching Modes http://www.sherazsiddiqi.com/
  • 12. A conditional is a special construct that will first evaluate a lookaround, and then execute one sub-regex if the lookaround succeeds, and another sub-regex if the lookaround fails. Example of Positive lookahead is: q(?=uv*)will match q in quvvvv and qu. Example of Negative lookahead is: q(?!uv*)will match q not followed by u and uv. Example of Positive lookbehind is: (?<=b)awill match a prefixed by b like ba. Example of Negative lookbehind is: (?<!b)awill match a not prefixed by b like ca and da etc. Lookarounds with Conditions… http://www.sherazsiddiqi.com/
  • 13. Example1:: I have an access log (access.log) file of Helix DNA server. I want to calculate how many times each content is access and update download and listen count of each content in the database. Exp: ^(.*)asxgen/Data/Naat/Download(.*)/(+)(mp3|rm)(.*)$ Replace: UPDATE DB.TBL set col=col + COUNT where id=; Example2:: I have application generated log (applog.txt) file of a web application. I want to fetch required information from relevant rows. In order to remove irrelevant rows: Exp: ^(?!((.*)ID:(.*)Status:(.*))).*$ Replace: Empty string Parse a log file… http://www.sherazsiddiqi.com/
  • 14. Questions Please….. http://www.sherazsiddiqi.com/ Thank you for being here… Most of the content is taken from: http://www.regular-expressions.info/ me@sherazsiddiqi.com