SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Machine Learning Methods
 for CAPTCHA Recognition
       Rachel Shadoan
       Zachery Tidwell, II
CAPTCHA
Completely Automated Public Turing Test to tell Computers and Humans Apart


Why are they interesting?
  o Harder than normal text recognition
         On par with handwriting recognition,
         reading damaged text
  o Techniques translate well to other problems
         Facial recognition (Gonzaga, 2002)
         Weed identification (Yang, 2000)
  o Near infinite data sets
         Easier to avoid over-fitting
Hypothesis

CAPTCHA recognition can be
 accomplished to a high degree
 of accuracy using machine
 learning methods with minimal
 preprocessing of inputs.
Methods
           Tools
              o JCaptcha
              o Image Processing

Learning Methods        Segmentation Methods
  o Feed-forward Neural   o Overlapping
    Nets                     o Whitespace
  o Self-Organizing Maps     o K-Means
  o K-Means
  o Cluster Classification
JCaptcha

o Open-source CAPTCHA
  generation software
o Highly configurable
   Can produce CAPTCHAs of
   many levels of difficulty

o Check it out at:
  http://jcaptcha.sourceforge.net
Image Processing
Sparse Image
  Represents Images as unbounded set of pixels
  Each pixel is a value between 0 and 1 and a
    coordinate pair
  Center each image before turning into a matrix of
    0s and 1s




         Original          After Transformation
Feed-Forward Neural Nets




      As covered in class
Self-Organizing Maps
Training                          Collection
    Initialize N buckets to         For many inputs
       random values
                                          Sort each input into 
    For each input                        the bucket it most 
       Find the bucket that is            closely matches
       “closest” to the input       For each bucket and each 
       Adjust the “closest”         character
       bucket to more closely             Calculate the 
       match the input using              probability of that 
       exponential average                character going into 
                                          that bucket.
K-Means
• Very similar to Self‐
  Organizing Maps 
  (SOMs)
• Can use the same 
  classifying mechanism 
  as used for SOM
Overlapping Segmentation
• Divide image into
  fixed number of
  overlapping tiles of
  the same size
• In our case, 20 x 20
  pixels with a 50%
  overlap
• Discard chunks
  under a certain size   Note: This is a B with
                         part of it cut off, not
  and chunks that are    an E. Therein lies the
  all white              rub.
Whitespace Segmentation
• Iterate through the
  image from left to
  right—segment
  when a full column
  of whitespace is
  encountered
• Works perfectly for
  well-spaced text
K-Means Segmentation
• Performs better
  than heuristic
  segmentation on
  closely-packed
  inputs
Segmentation Comparison
     Even‐width


     Whitespace


     K‐Means



     Even‐width


     Whitespace


     K‐Means
Experiment 1
Machine Learning Method:
  Self-Organizing Map
Topology
  200 buckets, initialized randomly
Inputs:
  3 letter CATPCHAs
  Random fonts
  Letters A-G
  “Chunked” using overlapping segmentation
Experiment 1 Results
Buckets fell into three primary categories:

  Distinguishable
  letters


  Chunks with halves
  of two letters

  Indistinguishable
  noise
Experiment 1 Results
Experiment 2
ML Method:                                        Contains … ?
  Neural Net
                                                             A: 0 or 1
Topology:                                                    B : 0 or 1
                                                             C: 0 or 1




                           400 Nodes
  Fully connected




                                       50 Nodes




                                                   7 Nodes
                                                             D: 0 or 1
                                                             E: 0 or 1
  400 inputs                                                 F: 0 or 1
  50 node hidden layer                                       G: 0 or 1

  7 outputs
Inputs:
  Single letter CATPCHAs
  Random fonts
  Letters A-G
Experiment 2 Results




     Neural Net Learning Curve
Experiment 2 Results

                                               Past a certain
                                               number of nodes
                                               in the hidden
                                               layer, the
                                               topology ceases
                                               to have a huge
                                               impact on
                                               accuracy.



Neural Net Accuracy vs. Size of Hidden Layer
Experiment 3
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      4 letter CATPCHAs
      Fandom fonts
      Letters A-G
Experiment 3




Neural Net vs. SOM on CAPTCHAs Length 4, Letters A‐G
Experiment 4
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      4 letter CATPCHAs
      Fandom fonts
      Letters A-Z
Experiment 4




Neural Net vs. SOM on CAPTCHAs Length 4, Letters A‐Z
Experiment 5
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      5 letter CATPCHAs
      Fandom fonts
      Letters A-Z
Experiment 5




Neural Net vs. SOM on CAPTCHAs Length 5, Letters A-Z
What it all means
• Increasing number of characters
  dramatically decreases total accuracy
  because segmentation quality decreases
• True positive rate goes down when
  segmentation quality decreases
• Hence, better segmentation is the key
Future Work
Improved Segmentation
   o Wirescreen segmentation
   o Ensemble techniques
Improved True Positive Rates with Current
  System
   o Ensemble techniques
New problems
   o Handwriting recognition
   o Bot net of doom
Questions?

Contenu connexe

Tendances

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Modelsbutest
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksNguyen Quang
 
Network embedding
Network embeddingNetwork embedding
Network embeddingSOYEON KIM
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoLidia Pivovarova
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsNesreen K. Ahmed
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detectionNASIM ALAM
 
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen LarroqueTagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen LarroqueStephen Larroque
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep LearningAsim Jalis
 

Tendances (20)

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
 
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen LarroqueTagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 

En vedette

CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network Bushra Jbawi
 
Captcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learningCaptcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learningcrew1274
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking SystemAyan Omer
 
breaking PHP web Captcha
breaking PHP web Captchabreaking PHP web Captcha
breaking PHP web Captchacrew1274
 
Generic Solving Of Text Based Captcha
Generic Solving Of Text Based CaptchaGeneric Solving Of Text Based Captcha
Generic Solving Of Text Based Captchakaranwayne
 
Human or Intelligent Machine?
Human or Intelligent Machine?Human or Intelligent Machine?
Human or Intelligent Machine?ameyakulk
 
CAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthCAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthWebCrazyLabs
 
Seminar report on captcha
Seminar report on captchaSeminar report on captcha
Seminar report on captchakunalkiit
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 

En vedette (20)

CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network
 
Captcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learningCaptcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learning
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
 
Captcha
CaptchaCaptcha
Captcha
 
breaking PHP web Captcha
breaking PHP web Captchabreaking PHP web Captcha
breaking PHP web Captcha
 
CAPTCHA
CAPTCHACAPTCHA
CAPTCHA
 
Captchas
CaptchasCaptchas
Captchas
 
Captcha seminar
Captcha seminar Captcha seminar
Captcha seminar
 
captcha.ppt
 captcha.ppt captcha.ppt
captcha.ppt
 
Generic Solving Of Text Based Captcha
Generic Solving Of Text Based CaptchaGeneric Solving Of Text Based Captcha
Generic Solving Of Text Based Captcha
 
Human or Intelligent Machine?
Human or Intelligent Machine?Human or Intelligent Machine?
Human or Intelligent Machine?
 
CAPTCHA
CAPTCHACAPTCHA
CAPTCHA
 
Captcha ppt
Captcha pptCaptcha ppt
Captcha ppt
 
CAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthCAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for Youth
 
Seminar report on captcha
Seminar report on captchaSeminar report on captcha
Seminar report on captcha
 
Captcha
CaptchaCaptcha
Captcha
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
deCaptcha
deCaptchadeCaptcha
deCaptcha
 
Captcha
CaptchaCaptcha
Captcha
 

Similaire à Machine Learning Methods For Captcha Recognition

Original SOINN
Original SOINNOriginal SOINN
Original SOINNSOINN Inc.
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmESCOM
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksZak Jost
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalSuhas Pillai
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 

Similaire à Machine Learning Methods For Captcha Recognition (7)

Komdat-Kompresi Data
Komdat-Kompresi DataKomdat-Kompresi Data
Komdat-Kompresi Data
 
Original SOINN
Original SOINNOriginal SOINN
Original SOINN
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 

Dernier

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Dernier (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Machine Learning Methods For Captcha Recognition

  • 1. Machine Learning Methods for CAPTCHA Recognition Rachel Shadoan Zachery Tidwell, II
  • 2. CAPTCHA Completely Automated Public Turing Test to tell Computers and Humans Apart Why are they interesting? o Harder than normal text recognition On par with handwriting recognition, reading damaged text o Techniques translate well to other problems Facial recognition (Gonzaga, 2002) Weed identification (Yang, 2000) o Near infinite data sets Easier to avoid over-fitting
  • 3. Hypothesis CAPTCHA recognition can be accomplished to a high degree of accuracy using machine learning methods with minimal preprocessing of inputs.
  • 4. Methods Tools o JCaptcha o Image Processing Learning Methods Segmentation Methods o Feed-forward Neural o Overlapping Nets o Whitespace o Self-Organizing Maps o K-Means o K-Means o Cluster Classification
  • 5. JCaptcha o Open-source CAPTCHA generation software o Highly configurable Can produce CAPTCHAs of many levels of difficulty o Check it out at: http://jcaptcha.sourceforge.net
  • 6. Image Processing Sparse Image Represents Images as unbounded set of pixels Each pixel is a value between 0 and 1 and a coordinate pair Center each image before turning into a matrix of 0s and 1s Original After Transformation
  • 7. Feed-Forward Neural Nets As covered in class
  • 8. Self-Organizing Maps Training Collection Initialize N buckets to  For many inputs random values Sort each input into  For each input the bucket it most  Find the bucket that is  closely matches “closest” to the input For each bucket and each  Adjust the “closest”  character bucket to more closely  Calculate the  match the input using  probability of that  exponential average character going into  that bucket.
  • 9. K-Means • Very similar to Self‐ Organizing Maps  (SOMs) • Can use the same  classifying mechanism  as used for SOM
  • 10. Overlapping Segmentation • Divide image into fixed number of overlapping tiles of the same size • In our case, 20 x 20 pixels with a 50% overlap • Discard chunks under a certain size Note: This is a B with part of it cut off, not and chunks that are an E. Therein lies the all white rub.
  • 11. Whitespace Segmentation • Iterate through the image from left to right—segment when a full column of whitespace is encountered • Works perfectly for well-spaced text
  • 12. K-Means Segmentation • Performs better than heuristic segmentation on closely-packed inputs
  • 13. Segmentation Comparison Even‐width Whitespace K‐Means Even‐width Whitespace K‐Means
  • 14. Experiment 1 Machine Learning Method: Self-Organizing Map Topology 200 buckets, initialized randomly Inputs: 3 letter CATPCHAs Random fonts Letters A-G “Chunked” using overlapping segmentation
  • 15. Experiment 1 Results Buckets fell into three primary categories: Distinguishable letters Chunks with halves of two letters Indistinguishable noise
  • 17. Experiment 2 ML Method: Contains … ? Neural Net A: 0 or 1 Topology: B : 0 or 1 C: 0 or 1 400 Nodes Fully connected 50 Nodes 7 Nodes D: 0 or 1 E: 0 or 1 400 inputs F: 0 or 1 50 node hidden layer G: 0 or 1 7 outputs Inputs: Single letter CATPCHAs Random fonts Letters A-G
  • 18. Experiment 2 Results Neural Net Learning Curve
  • 19. Experiment 2 Results Past a certain number of nodes in the hidden layer, the topology ceases to have a huge impact on accuracy. Neural Net Accuracy vs. Size of Hidden Layer
  • 20. Experiment 3 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 4 letter CATPCHAs Fandom fonts Letters A-G
  • 22. Experiment 4 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 4 letter CATPCHAs Fandom fonts Letters A-Z
  • 24. Experiment 5 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 5 letter CATPCHAs Fandom fonts Letters A-Z
  • 25. Experiment 5 Neural Net vs. SOM on CAPTCHAs Length 5, Letters A-Z
  • 26. What it all means • Increasing number of characters dramatically decreases total accuracy because segmentation quality decreases • True positive rate goes down when segmentation quality decreases • Hence, better segmentation is the key
  • 27. Future Work Improved Segmentation o Wirescreen segmentation o Ensemble techniques Improved True Positive Rates with Current System o Ensemble techniques New problems o Handwriting recognition o Bot net of doom