SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Introduction
   HTML parser choice
HTML5::Sanitizer interna
 HTML5::Sanitizer usage
             Conclusion




         HTML5::Sanitizer
  Sanitizing HTML 5 with Perl 5


                 Uwe Voelker

                     XING AG


            August 16th 2011




            Uwe Voelker    HTML5::Sanitizer
Introduction
                      HTML parser choice
                   HTML5::Sanitizer interna
                    HTML5::Sanitizer usage
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice      Task: WYSIWYG editor
                 HTML5::Sanitizer interna   Team
                  HTML5::Sanitizer usage    Live example
                              Conclusion




1   Introduction
       Task: WYSIWYG editor
       Team
       Live example

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion


                             Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions
     none was suited, mostly for security reasons
     decision was made, to build it inhouse




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions
     none was suited, mostly for security reasons
     decision was made, to build it inhouse
     goals: secure, share profiles (allowed tags) between frontend
     and backend




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      Task: WYSIWYG editor
               HTML5::Sanitizer interna   Team
                HTML5::Sanitizer usage    Live example
                            Conclusion


Team




 Christopher Blum        Ingo Chao                           Uwe Voelker
 Javascript              QA (HTML5/CSS)                      Perl


                           Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      Task: WYSIWYG editor
               HTML5::Sanitizer interna   Team
                HTML5::Sanitizer usage    Live example
                            Conclusion


Live example




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                      HTML parser choice      CPAN modules
                   HTML5::Sanitizer interna   Evaluation
                    HTML5::Sanitizer usage    Final decision
                                Conclusion




1   Introduction

2   HTML parser choice
     CPAN modules
     Evaluation
     Final decision

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion


                               Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      CPAN modules
               HTML5::Sanitizer interna   Evaluation
                HTML5::Sanitizer usage    Final decision
                            Conclusion


HTML parser on CPAN



     HTML::Parser
     HTML::TreeBuilder
     HTML::TreeBuilder::LibXML
     XML::LibXML
     HTML::HTML5::Parser
     Marpa::HTML
     ...




                           Uwe Voelker    HTML5::Sanitizer
Introduction
   HTML parser choice      CPAN modules
HTML5::Sanitizer interna   Evaluation
 HTML5::Sanitizer usage    Final decision
             Conclusion




            Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en
    http://example.com/?section=2©=3&lang=en




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en
    http://example.com/?section=2©=3&lang=en
final choice: XML::LibXML




                      Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna
     Processing Phases
     Parsing
     Converting
     Writing

4   HTML5::Sanitizer usage

5   Conclusion

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)
      converting (rebuild tree according to profile)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)
      converting (rebuild tree according to profile)
      writing (DOM tree → HTML)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion


Parsing HTML with XML::LibXML

  use XML : : LibXML ;

  my $ p a r s e r = XML : : LibXML−>new (
       encoding                        => ’UTF−8 ’ ,
       recover                         => 2 ,
       keep blanks                     => 1 ,
       no cdata                        => 1 ,
       expand entities                 => 1 ,
      no network                       => 1 ,
       suppress errors                 => 1 ,
       s u p p r e s s w a r n i n g s => 1 ,
  );

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion


Parsing HTML with XML::LibXML



  my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g (
      $html ,
      {
          no cdata                        => 1 ,
          suppress errors                 => 1 ,
          s u p p r e s s w a r n i n g s => 1 ,
      },
  );




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                         Processing Phases
                 HTML parser choice
                                         Parsing
              HTML5::Sanitizer interna
                                         Converting
               HTML5::Sanitizer usage
                                         Writing
                           Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)




                          Uwe Voelker    HTML5::Sanitizer
Introduction
                                          Processing Phases
                  HTML parser choice
                                          Parsing
               HTML5::Sanitizer interna
                                          Converting
                HTML5::Sanitizer usage
                                          Writing
                            Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>
     eventually change tag name (profile)
     transform (or copy) attributes




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>
     eventually change tag name (profile)
     transform (or copy) attributes
     proceed recursively with child nodes




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Writing HTML

     mainly for additional escapes
     could not find a nice way to integrate this in XML::LibXML




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                             Processing Phases
                     HTML parser choice
                                             Parsing
                  HTML5::Sanitizer interna
                                             Converting
                   HTML5::Sanitizer usage
                                             Writing
                               Conclusion


Writing HTML

     mainly for additional escapes
     could not find a nice way to integrate this in XML::LibXML

  $text   =˜   s/&/&amp ; / g ;
  $text   =˜   s / ’ /&#39;/g;# ’
  $text   =˜   s /”/&q u o t ; / g;#”
  $text   =˜   s/</& l t ; / g ;
  $text   =˜   s/>/&g t ; / g ;
  $text   =˜   s / ‘/&#9 6 ; / g ;
  $text   =˜   s /{/&#1 2 3 ; / g ;
  $text   =˜   s /}/&#1 2 5 ; / g ;


                              Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage
     Usage
     Profile
     Examples
     Debugging

5   Conclusion

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Usage



 # construct object
 my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new (
      p r o f i l e => ’My : : P r o f i l e ’ ,
 );

 # c a l l process ()
 my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ;




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                          Usage
                  HTML parser choice
                                          Profile
               HTML5::Sanitizer interna
                                          Examples
                HTML5::Sanitizer usage
                                          Debugging
                            Conclusion


Profile


     you have to build your own




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Profile


     you have to build your own
     class with just one method: element($tag)
     return undef or a hashref with:




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Profile


     you have to build your own
     class with just one method: element($tag)
     return undef or a hashref with:
           remove remove complete sub tree (boolean)
      rename tag rename tag (string)
     set attributes set these attributes (hashref)
     check attributes check/transform these attributes (hashref)
          set class set class (string)
         add class add class from other attributes (hashref)



                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)

  {
       remove => 1 ,
  }




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)

  {
       remove => 1 ,
  }

      otherwise it would be converted to <span>
      and all children processed recursively




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - big



     <big> → <span class=”big”>




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                             Usage
                     HTML parser choice
                                             Profile
                  HTML5::Sanitizer interna
                                             Examples
                   HTML5::Sanitizer usage
                                             Debugging
                               Conclusion


Examples - big



      <big> → <span class=”big”>

  {
       r e n a m e t a g => ’ s p a n ’ ,
       s e t c l a s s => ’ b i g ’ ,
  }




                              Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Examples - a



     add rel=”nofollow” and target=” blank” to every link




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Examples - a



      add rel=”nofollow” and target=” blank” to every link

  {
       s e t a t t r i b u t e s => {
             rel          => ’ n o f o l l o w ’ ,
             t a r g e t => ’ b l a n k ’ ,
       },
  }




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Examples - font
  r e n a m e t a g => ’ s p a n ’ ,
  a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } ,




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                                 Usage
                         HTML parser choice
                                                 Profile
                      HTML5::Sanitizer interna
                                                 Examples
                       HTML5::Sanitizer usage
                                                 Debugging
                                   Conclusion


Examples - font
  r e n a m e t a g => ’ s p a n ’ ,
  a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } ,

  sub c l a s s s i z e f o n t {
    my ( $ s e l f , $ v a l ) = @ ;
    return unless $val ;
    r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ;
    # ...
    r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ;

      r e t u r n ’ s i z e −l a r g e r ’        i f $ v a l =˜ /ˆ+/;
      r e t u r n ’ s i z e −s m a l l e r ’      i f $ v a l =˜ /ˆ −/;
      return ;
  }
                                  Uwe Voelker    HTML5::Sanitizer
Introduction
                                                 Usage
                         HTML parser choice
                                                 Profile
                      HTML5::Sanitizer interna
                                                 Examples
                       HTML5::Sanitizer usage
                                                 Debugging
                                   Conclusion


Debugging

        if the result is not as expected, you can access intermediate
        results:

  my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t

  # s e e HTML5 : : S a n i t i z e r : : R e s u l t
  s a y $ r e s −>i n p u t ;
  s a y $ r e s −>p r e p r o c e s s e d ;
  s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ;
  s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ;
  s a y $ r e s −>o u t p u t ;

  p r i n t $ r e s −>d e b u g o u t p u t ;

                                  Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice
                HTML5::Sanitizer interna
                 HTML5::Sanitizer usage
                             Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice
                 HTML5::Sanitizer interna
                  HTML5::Sanitizer usage
                              Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer
      wysihtml5 (javascript frontend)
      http://github.com/xing/wysihtml5




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice
                 HTML5::Sanitizer interna
                  HTML5::Sanitizer usage
                              Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer
      wysihtml5 (javascript frontend)
      http://github.com/xing/wysihtml5
      Feedback? uwe@uwevoelker.de




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                HTML parser choice
             HTML5::Sanitizer interna
              HTML5::Sanitizer usage
                          Conclusion


Questions?




                         Uwe Voelker    HTML5::Sanitizer

Contenu connexe

Tendances

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...Bruno Tanoue
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous IntegrationKelli Mohr
 
Joomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingJoomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingShyam Sunder Verma
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy CodeAdam Culp
 
Building for perfection
Building for perfectionBuilding for perfection
Building for perfectionJorge Ortiz
 

Tendances (6)

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous Integration
 
Ensuring Software Quality in the cloud
Ensuring Software Quality in the cloudEnsuring Software Quality in the cloud
Ensuring Software Quality in the cloud
 
Joomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingJoomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation Testing
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy Code
 
Building for perfection
Building for perfectionBuilding for perfection
Building for perfection
 

Similaire à Sanitizing HTML 5 with Perl 5

Why Embrace "Html5"?
Why Embrace "Html5"?Why Embrace "Html5"?
Why Embrace "Html5"?FossilDesigns
 
Delhi student's day
Delhi student's dayDelhi student's day
Delhi student's dayAnkur Mishra
 
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03Rajiv Pant
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsAntonio Carpentieri
 
Everything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEverything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEdgar Parada
 
How CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerHow CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerMichael Wales
 
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETerminalfour
 
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubIncrease the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubDevOps.com
 
CPAN Module Maintenance
CPAN Module MaintenanceCPAN Module Maintenance
CPAN Module MaintenanceDave Cross
 
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith
 

Similaire à Sanitizing HTML 5 with Perl 5 (20)

Why Embrace "Html5"?
Why Embrace "Html5"?Why Embrace "Html5"?
Why Embrace "Html5"?
 
Delhi student's day
Delhi student's dayDelhi student's day
Delhi student's day
 
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
 
Xhtml
XhtmlXhtml
Xhtml
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni Rails
 
Everything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEverything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 min
 
Daniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVMDaniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVM
 
Html5
Html5Html5
Html5
 
How CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerHow CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A Freelancer
 
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
 
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubIncrease the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
 
CPAN Module Maintenance
CPAN Module MaintenanceCPAN Module Maintenance
CPAN Module Maintenance
 
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008
 

Dernier

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Dernier (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Sanitizing HTML 5 with Perl 5

  • 1. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion HTML5::Sanitizer Sanitizing HTML 5 with Perl 5 Uwe Voelker XING AG August 16th 2011 Uwe Voelker HTML5::Sanitizer
  • 2. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 3. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion 1 Introduction Task: WYSIWYG editor Team Live example 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 4. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions Uwe Voelker HTML5::Sanitizer
  • 5. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse Uwe Voelker HTML5::Sanitizer
  • 6. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse goals: secure, share profiles (allowed tags) between frontend and backend Uwe Voelker HTML5::Sanitizer
  • 7. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Team Christopher Blum Ingo Chao Uwe Voelker Javascript QA (HTML5/CSS) Perl Uwe Voelker HTML5::Sanitizer
  • 8. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Live example Uwe Voelker HTML5::Sanitizer
  • 9. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion 1 Introduction 2 HTML parser choice CPAN modules Evaluation Final decision 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 10. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion HTML parser on CPAN HTML::Parser HTML::TreeBuilder HTML::TreeBuilder::LibXML XML::LibXML HTML::HTML5::Parser Marpa::HTML ... Uwe Voelker HTML5::Sanitizer
  • 11. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion Uwe Voelker HTML5::Sanitizer
  • 12. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags Uwe Voelker HTML5::Sanitizer
  • 13. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en Uwe Voelker HTML5::Sanitizer
  • 14. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2&copy;=3&lang=en Uwe Voelker HTML5::Sanitizer
  • 15. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2&copy;=3&lang=en final choice: XML::LibXML Uwe Voelker HTML5::Sanitizer
  • 16. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna Processing Phases Parsing Converting Writing 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 17. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) Uwe Voelker HTML5::Sanitizer
  • 18. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) Uwe Voelker HTML5::Sanitizer
  • 19. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) Uwe Voelker HTML5::Sanitizer
  • 20. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) writing (DOM tree → HTML) Uwe Voelker HTML5::Sanitizer
  • 21. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Parsing HTML with XML::LibXML use XML : : LibXML ; my $ p a r s e r = XML : : LibXML−>new ( encoding => ’UTF−8 ’ , recover => 2 , keep blanks => 1 , no cdata => 1 , expand entities => 1 , no network => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , ); Uwe Voelker HTML5::Sanitizer
  • 22. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Parsing HTML with XML::LibXML my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g ( $html , { no cdata => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , }, ); Uwe Voelker HTML5::Sanitizer
  • 23. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) Uwe Voelker HTML5::Sanitizer
  • 24. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> Uwe Voelker HTML5::Sanitizer
  • 25. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes Uwe Voelker HTML5::Sanitizer
  • 26. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes proceed recursively with child nodes Uwe Voelker HTML5::Sanitizer
  • 27. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Writing HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML Uwe Voelker HTML5::Sanitizer
  • 28. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Writing HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML $text =˜ s/&/&amp ; / g ; $text =˜ s / ’ /&#39;/g;# ’ $text =˜ s /”/&q u o t ; / g;#” $text =˜ s/</& l t ; / g ; $text =˜ s/>/&g t ; / g ; $text =˜ s / ‘/&#9 6 ; / g ; $text =˜ s /{/&#1 2 3 ; / g ; $text =˜ s /}/&#1 2 5 ; / g ; Uwe Voelker HTML5::Sanitizer
  • 29. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage Usage Profile Examples Debugging 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 30. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Usage # construct object my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new ( p r o f i l e => ’My : : P r o f i l e ’ , ); # c a l l process () my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ; Uwe Voelker HTML5::Sanitizer
  • 31. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own Uwe Voelker HTML5::Sanitizer
  • 32. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own class with just one method: element($tag) return undef or a hashref with: Uwe Voelker HTML5::Sanitizer
  • 33. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own class with just one method: element($tag) return undef or a hashref with: remove remove complete sub tree (boolean) rename tag rename tag (string) set attributes set these attributes (hashref) check attributes check/transform these attributes (hashref) set class set class (string) add class add class from other attributes (hashref) Uwe Voelker HTML5::Sanitizer
  • 34. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) Uwe Voelker HTML5::Sanitizer
  • 35. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) { remove => 1 , } Uwe Voelker HTML5::Sanitizer
  • 36. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) { remove => 1 , } otherwise it would be converted to <span> and all children processed recursively Uwe Voelker HTML5::Sanitizer
  • 37. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - big <big> → <span class=”big”> Uwe Voelker HTML5::Sanitizer
  • 38. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - big <big> → <span class=”big”> { r e n a m e t a g => ’ s p a n ’ , s e t c l a s s => ’ b i g ’ , } Uwe Voelker HTML5::Sanitizer
  • 39. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - a add rel=”nofollow” and target=” blank” to every link Uwe Voelker HTML5::Sanitizer
  • 40. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - a add rel=”nofollow” and target=” blank” to every link { s e t a t t r i b u t e s => { rel => ’ n o f o l l o w ’ , t a r g e t => ’ b l a n k ’ , }, } Uwe Voelker HTML5::Sanitizer
  • 41. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , Uwe Voelker HTML5::Sanitizer
  • 42. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , sub c l a s s s i z e f o n t { my ( $ s e l f , $ v a l ) = @ ; return unless $val ; r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ; # ... r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ; r e t u r n ’ s i z e −l a r g e r ’ i f $ v a l =˜ /ˆ+/; r e t u r n ’ s i z e −s m a l l e r ’ i f $ v a l =˜ /ˆ −/; return ; } Uwe Voelker HTML5::Sanitizer
  • 43. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Debugging if the result is not as expected, you can access intermediate results: my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t # s e e HTML5 : : S a n i t i z e r : : R e s u l t s a y $ r e s −>i n p u t ; s a y $ r e s −>p r e p r o c e s s e d ; s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ; s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ; s a y $ r e s −>o u t p u t ; p r i n t $ r e s −>d e b u g o u t p u t ; Uwe Voelker HTML5::Sanitizer
  • 44. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer Uwe Voelker HTML5::Sanitizer
  • 45. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Uwe Voelker HTML5::Sanitizer
  • 46. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Feedback? uwe@uwevoelker.de Uwe Voelker HTML5::Sanitizer
  • 47. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Questions? Uwe Voelker HTML5::Sanitizer