SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Regular Expressions
Ben Simpson - <3 HUB
Introductions
●
●
●
●

Working with web technologies for 10 years
Former HUB supervisor
Tour de jobs: http://tinyurl.com/kmsns38
Graduated from CSU with a BAS in
Technology Management 2013
● Husband and proud father
● Presenter on regular expressions!
What Is a Regular
Expression?
Pattern matching
What Could I Do With a RegExp?
●
●
●
●
●
●

Searching
Syntax highlighting
Data validation
Sanitation
Data queries / extraction
Many tasks that require matching a pattern
RegExps Won’t Let You Time Travel
Brain Teaser
Which of the following is a valid telephone
number?
1. 678 466 4000
2. (678) 466-4000
3. 1234
4. domainuser
5. 1 (800) 1234 567
How did you know?
Depends on who you ask...
We Pattern Match Every Day
● Telephone numbers follow a pattern that we
recognize
● This pattern has rules (3 digit zip, 7 digit
number, numeric only)
● There are often many variations to a pattern
(optional intl code)
Literal Characters
String: The cat in the hat
RegExp: /at/
The cat in the hat
Regular Expressions in Javascript
var haystack = "The cat in the hat";
var needle = new RegExp(/cat/);
haystack.match(needle); // truthy
needle = new RegExp(/dog/);
haystack.match(needle); // falsey
Well that wasn’t so bad
The best is yet to come!
Special Characters (Metacharacters)
●  - escape character
● ^ - beginning of line (not
inside brackets)

● $ - ending of line
● . - wildcard
● | - or junction

●
●
●
●
●
●

? - zero or one
* - zero or more
+ - one or more
() - grouping
[] - character set
{} - repetition
Demonstration of Special Characters
String: ...To login to your email use the
username: “ben.simpson@mail.com” with a
password “password123”...
RegExp: /username "(.*)" .* password "(.*)"/
Results: 1. ben.simpson@mail.com
2. password123
Shorthand Character Classes
● d - digit [0-9]
● w - word
● s - whitespace

● D - digit [^d]
● W - word [^w]
● S - whitespace [^s]
Wait a Second!
You said this was easy
Thinking about a Telephone Pattern
●
●
●
●
●
●
●
●
●

Optional international code
3 digit area code
7 digit number
Optional extension
What about alpha phrases? (e.g. 678 466-HELP)
What is the length of intl codes? (e.g. 358 for Finland)
Are parenthesis optional?
Is spacing optional?
Country specific formats (e.g. France 06 87 71 23 45)
Regular Expression - Telephone #
String: 678 466 4357
RegExp: d{3} d{3} d{4}
String: (678) 466-4357
RegExp: (d{3}) d{3}-d{4}
Telephone # - Two Variations
String: 678 466 4357
(678) 466-4357
RegExp: (?d{3})? d{3}[s-]d{4}
Telephone # - Three Variations
String: 678 466 4357
(678) 466-4357
1 (678) 466-4357
RegExp: d*s?(?d{3})? d{3}[s-]?d{4}
That Escalated Quickly
Surprisingly Difficult
● Seemingly simple patterns can become very
complex.
● Its best to work against data that is
consistent, or regular in its implementation of
patterns
● If the data is too dirty, a regular expression
won’t be much help
When RegExps Go Bad
● Websites that don’t accept special
characters in email addresses, URLs,
telephone numbers, etc
● May be RegExps that are too restrictive
● Doesn’t take into account all variations of a
pattern
● Longer expressions are difficult to grok
In a Nutshell
“Some people, when confronted with a
problem, think ‘I know, I'll use regular
expressions.’ Now they have two problems.”
-Jamie Zawinski
Brain Teaser
Which of the following a valid email address?
1. thehoagie@gmail.com
2. ben.simpson+work@analoganalytics.com
3. ben+email
4. http://www.clayton.edu
5. abc."defghi".xyz@example.com
Thinking about Email Address
● Has a local part (e.g. thehub@clayton.edu)
● Has a domain part (e.g. thehub@clayton.
edu)
● Has an @ symbol in the middle
● Do we need to support special characters?
● Can we verify based on minimum /
maximum length?
Best to Keep It Simple!
String: thehoagie@gmail.com
RegExp: .*@.*
Yeah, but isn’t here an official email Regex that
takes all the patterns into account? Yes...
RFC 5322 - The Email RegExp
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
| "(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]
| [x01-x09x0bx0cx0e-x7f])*")
@ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
| [(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:
(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]
| [x01-x09x0bx0cx0e-x7f])+)
])
Maybe this instead?

(╯°□°)╯︵ ┻━┻)
┬─┬ ノ( ゜-゜ノ)

(Let me put that back for you)
Brain Teaser
Which is a valid zipcode?
1. 30022
2. 30022-7155
3. 300131
4. -7155
5. AB123XY
Thinking About a Zipcode
●
●
●
●
●

Digits only
5 digits mandatory plus optional 4 digit code
4 digit code suffixed with hyphen
Do other countries use zip codes?
Pattern is easier because there is less
variation (Thank USPS!)
Brain Teaser
Which is a valid URL?
1. http://www.clayton.edu
2. www.clayton.edu
3. clayton.edu
4. thehub.clayton.edu
5. ben:pass@clayton.edu:80/foo?bar=baz#qux
Thinking about a URL
Ben Simpson
thehoagie@gmail.com
@mrfrosti
Extra Credit
●
●
●
●
●

IP address
HTML Tag contents
Validating a password against requirements
Dates
Times

Contenu connexe

En vedette (6)

Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Docker presentation
Docker presentationDocker presentation
Docker presentation
 
Expression Presentation
Expression PresentationExpression Presentation
Expression Presentation
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 

Similaire à Regular expression presentation for the HUB

Contest Tips and Tricks
Contest Tips and TricksContest Tips and Tricks
Contest Tips and Tricks
mbuzdalov
 
Email Data Cleaning
Email Data CleaningEmail Data Cleaning
Email Data Cleaning
feiwin
 
SKRIBBL_HANGMAN_PasaSAsRESENTATION 2.ppt
SKRIBBL_HANGMAN_PasaSAsRESENTATION 2.pptSKRIBBL_HANGMAN_PasaSAsRESENTATION 2.ppt
SKRIBBL_HANGMAN_PasaSAsRESENTATION 2.ppt
ssuser0894051
 
Cracking the coding interview u penn - sept 30 2010
Cracking the coding interview   u penn - sept 30 2010Cracking the coding interview   u penn - sept 30 2010
Cracking the coding interview u penn - sept 30 2010
careercup
 

Similaire à Regular expression presentation for the HUB (20)

Approaching (almost) Any NLP Problem
Approaching (almost) Any NLP ProblemApproaching (almost) Any NLP Problem
Approaching (almost) Any NLP Problem
 
ACM init() Spring 2015 Day 1
ACM init() Spring 2015 Day 1ACM init() Spring 2015 Day 1
ACM init() Spring 2015 Day 1
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
 
Contest Tips and Tricks
Contest Tips and TricksContest Tips and Tricks
Contest Tips and Tricks
 
Email Data Cleaning
Email Data CleaningEmail Data Cleaning
Email Data Cleaning
 
Source Code Quality
Source Code QualitySource Code Quality
Source Code Quality
 
SKRIBBL_HANGMAN_PasaSAsRESENTATION 2.ppt
SKRIBBL_HANGMAN_PasaSAsRESENTATION 2.pptSKRIBBL_HANGMAN_PasaSAsRESENTATION 2.ppt
SKRIBBL_HANGMAN_PasaSAsRESENTATION 2.ppt
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Xml
XmlXml
Xml
 
The Apex Ten Commandments
The Apex Ten CommandmentsThe Apex Ten Commandments
The Apex Ten Commandments
 
Cracking the coding interview u penn - sept 30 2010
Cracking the coding interview   u penn - sept 30 2010Cracking the coding interview   u penn - sept 30 2010
Cracking the coding interview u penn - sept 30 2010
 
F# for BLOBA, by brandon d'imperio
F# for BLOBA, by brandon d'imperioF# for BLOBA, by brandon d'imperio
F# for BLOBA, by brandon d'imperio
 
Salesforce Apex Ten Commandments
Salesforce Apex Ten CommandmentsSalesforce Apex Ten Commandments
Salesforce Apex Ten Commandments
 
python.pdf
python.pdfpython.pdf
python.pdf
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
Level Up Your Automated Tests
Level Up Your Automated TestsLevel Up Your Automated Tests
Level Up Your Automated Tests
 
The strategies of password
The strategies of passwordThe strategies of password
The strategies of password
 

Plus de thehoagie

Converting your JS library to a jQuery plugin
Converting your JS library to a jQuery pluginConverting your JS library to a jQuery plugin
Converting your JS library to a jQuery plugin
thehoagie
 

Plus de thehoagie (10)

Pair programming
Pair programmingPair programming
Pair programming
 
Database 101
Database 101Database 101
Database 101
 
Testing
TestingTesting
Testing
 
Hubot
HubotHubot
Hubot
 
Git Pro Tips
Git Pro TipsGit Pro Tips
Git Pro Tips
 
Null object pattern
Null object patternNull object pattern
Null object pattern
 
Big tables and you - Keeping DDL operatations fast
Big tables and you - Keeping DDL operatations fastBig tables and you - Keeping DDL operatations fast
Big tables and you - Keeping DDL operatations fast
 
Angular.js - An introduction for the unitiated
Angular.js - An introduction for the unitiatedAngular.js - An introduction for the unitiated
Angular.js - An introduction for the unitiated
 
Converting your JS library to a jQuery plugin
Converting your JS library to a jQuery pluginConverting your JS library to a jQuery plugin
Converting your JS library to a jQuery plugin
 
Active records before_type_cast
Active records before_type_castActive records before_type_cast
Active records before_type_cast
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Regular expression presentation for the HUB

  • 2. Introductions ● ● ● ● Working with web technologies for 10 years Former HUB supervisor Tour de jobs: http://tinyurl.com/kmsns38 Graduated from CSU with a BAS in Technology Management 2013 ● Husband and proud father ● Presenter on regular expressions!
  • 3. What Is a Regular Expression? Pattern matching
  • 4. What Could I Do With a RegExp? ● ● ● ● ● ● Searching Syntax highlighting Data validation Sanitation Data queries / extraction Many tasks that require matching a pattern
  • 5. RegExps Won’t Let You Time Travel
  • 6. Brain Teaser Which of the following is a valid telephone number? 1. 678 466 4000 2. (678) 466-4000 3. 1234 4. domainuser 5. 1 (800) 1234 567
  • 7. How did you know? Depends on who you ask...
  • 8. We Pattern Match Every Day ● Telephone numbers follow a pattern that we recognize ● This pattern has rules (3 digit zip, 7 digit number, numeric only) ● There are often many variations to a pattern (optional intl code)
  • 9. Literal Characters String: The cat in the hat RegExp: /at/ The cat in the hat
  • 10. Regular Expressions in Javascript var haystack = "The cat in the hat"; var needle = new RegExp(/cat/); haystack.match(needle); // truthy needle = new RegExp(/dog/); haystack.match(needle); // falsey
  • 11. Well that wasn’t so bad The best is yet to come!
  • 12. Special Characters (Metacharacters) ● - escape character ● ^ - beginning of line (not inside brackets) ● $ - ending of line ● . - wildcard ● | - or junction ● ● ● ● ● ● ? - zero or one * - zero or more + - one or more () - grouping [] - character set {} - repetition
  • 13.
  • 14. Demonstration of Special Characters String: ...To login to your email use the username: “ben.simpson@mail.com” with a password “password123”... RegExp: /username "(.*)" .* password "(.*)"/ Results: 1. ben.simpson@mail.com 2. password123
  • 15. Shorthand Character Classes ● d - digit [0-9] ● w - word ● s - whitespace ● D - digit [^d] ● W - word [^w] ● S - whitespace [^s]
  • 16. Wait a Second! You said this was easy
  • 17. Thinking about a Telephone Pattern ● ● ● ● ● ● ● ● ● Optional international code 3 digit area code 7 digit number Optional extension What about alpha phrases? (e.g. 678 466-HELP) What is the length of intl codes? (e.g. 358 for Finland) Are parenthesis optional? Is spacing optional? Country specific formats (e.g. France 06 87 71 23 45)
  • 18. Regular Expression - Telephone # String: 678 466 4357 RegExp: d{3} d{3} d{4} String: (678) 466-4357 RegExp: (d{3}) d{3}-d{4}
  • 19. Telephone # - Two Variations String: 678 466 4357 (678) 466-4357 RegExp: (?d{3})? d{3}[s-]d{4}
  • 20. Telephone # - Three Variations String: 678 466 4357 (678) 466-4357 1 (678) 466-4357 RegExp: d*s?(?d{3})? d{3}[s-]?d{4}
  • 22. Surprisingly Difficult ● Seemingly simple patterns can become very complex. ● Its best to work against data that is consistent, or regular in its implementation of patterns ● If the data is too dirty, a regular expression won’t be much help
  • 23. When RegExps Go Bad ● Websites that don’t accept special characters in email addresses, URLs, telephone numbers, etc ● May be RegExps that are too restrictive ● Doesn’t take into account all variations of a pattern ● Longer expressions are difficult to grok
  • 24.
  • 25. In a Nutshell “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” -Jamie Zawinski
  • 26. Brain Teaser Which of the following a valid email address? 1. thehoagie@gmail.com 2. ben.simpson+work@analoganalytics.com 3. ben+email 4. http://www.clayton.edu 5. abc."defghi".xyz@example.com
  • 27. Thinking about Email Address ● Has a local part (e.g. thehub@clayton.edu) ● Has a domain part (e.g. thehub@clayton. edu) ● Has an @ symbol in the middle ● Do we need to support special characters? ● Can we verify based on minimum / maximum length?
  • 28. Best to Keep It Simple! String: thehoagie@gmail.com RegExp: .*@.* Yeah, but isn’t here an official email Regex that takes all the patterns into account? Yes...
  • 29. RFC 5322 - The Email RegExp (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)* | "(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f] | [x01-x09x0bx0cx0e-x7f])*") @ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])? | [(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]: (?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f] | [x01-x09x0bx0cx0e-x7f])+) ])
  • 31. ┬─┬ ノ( ゜-゜ノ) (Let me put that back for you)
  • 32. Brain Teaser Which is a valid zipcode? 1. 30022 2. 30022-7155 3. 300131 4. -7155 5. AB123XY
  • 33. Thinking About a Zipcode ● ● ● ● ● Digits only 5 digits mandatory plus optional 4 digit code 4 digit code suffixed with hyphen Do other countries use zip codes? Pattern is easier because there is less variation (Thank USPS!)
  • 34. Brain Teaser Which is a valid URL? 1. http://www.clayton.edu 2. www.clayton.edu 3. clayton.edu 4. thehub.clayton.edu 5. ben:pass@clayton.edu:80/foo?bar=baz#qux
  • 37. Extra Credit ● ● ● ● ● IP address HTML Tag contents Validating a password against requirements Dates Times