SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Full­text searching with Marjory




               Markus Wolff




                     
What's Marjory?

        A webservice for full­text indexing and 
    


        searching of documents
        Written in PHP
    



        Based on Zend Framework
    



        (Very) Roughly comparable to Solr
    



        BSD­licensed, available on Google Code
    




                                
How does Marjory work?
                                Your application



                                 Sends search 
    Sends Document data                                Returns result in desired
                                 terms via GET
     or location via POST                             output format (default: XML)




                                   Marjory
                            (ReST­based webservice)



    Stores document data                                    Returns query
                                 Queries search
       in search engine                                        results
                                    engine




                                Search engine
                                            
                               (Default: Lucene)
Features

        Search engine abstraction
    


            use the engine that suits your needs, just write a 
        


            small adaptor class
            Zend_Search_Lucene built­in by default
        



        Multiple search catalogs
    


            Index many sites with one dedicated search server
        



            Put all documents matching any criteria into 
        


            separate search indexes to speed up search

                                     
More features

        Two ways to index documents:
    


            submit an XML snippet containing any content you 
        


            want to index
            or, just submit an URI (valid PHP stream resource) 
        


            and let Marjory extract the content from the 
            document
                 HTML supported by default (for now)
             



                 add your own document parser class to extract plain text 
             


                 from any other document format (or special markup 
                 structures)
                                         
Even more features

        Index documents asynchronously using Dropr 
    


        as a messaging service
            Dropr: PHP­based durable messaging service
        



            Example webservice and Dropr client included with 
        


            Marjory
            Application does not need to wait for document 
        


            retrieval, parsing and adding to the index
            More info: www.dropr.org
        




                                   
Latest additions

        Search results as a Dojo.Data compatible 
    


        JSON data source
        API exposure via JSON­RPC as alternative to 
    


        XML over ReST (experimental!)




                               
How to add a catalog

        Send a POST request to:
    


        http://marjory.example.com/rest/catalog/
        Containing this XML snippet:
    


        <add catalog=quot;MyGloriousCatalogquot; />
        Et voilá, you got yourself a new search index
    




                               
Adding a document

        Make a POST request to:
    


        http://marjory.example.com/rest/add/
        Send the document content as XML like this:
    


    <add catalog=quot;defaultquot;>
      <doc uri=quot;MyUniqueDocumentIdquot;>
          <field name=quot;titlequot;>Marjory: Search as a service</field>
          <field name=quot;abstractquot;>
            An epic novel about full­text indexing in an SOA environment
          </field>
          <field name=quot;contentquot;>Lorem ipsum dolor sit amet... (to be continued)</field> 
      </doc>

    </add>


                                                
Adding a document, the easy way

        Or, if Marjory should retrieve and parse the 
    


        document:
    <add catalog=quot;defaultquot;>
      <doc src=quot;http://my.website.tld/my/document.htmlquot; />
    </add>
        If you have many and/or complex documents, 
    


        better use Dropr to send messages to Marjory


                                
Searching for documents

        Make a GET request including the query terms:
    

        http://marjory.example.com/rest/select?q=Marjory
        Additional parameters to...
    


            Limit number of results
        



            Include only specific fields in response
        



            Specify a search catalog
        


                 Default catalog name: „default“ ­ who would have 
             


                 guessed?


                                         
Search response format

<?xml version=quot;1.0quot; encoding=quot;UTF­8quot;?>
<response>
  <responseHeader>
    <status>0</status><QTime>1</QTime>
  </responseHeader>

  <result numFound=quot;2quot; start=quot;0quot;>
   <doc>
    <str name=quot;idquot;>MA147LL/A</str>
    <str name=quot;namequot;>Apple 60 GB iPod Black</str>
   </doc>
   <doc>
    <str name=quot;idquot;>EN7800GTX/2DHTV/256M</str>
    <str name=quot;namequot;>ASUS Extreme N7800GTX</str>
   </doc>
  </result>
</response>


                                             
Looks familiar?

        Blatantly stolen from Solr :­)
    



        Why reinvent the wheel?
    



        Makes switching between the two projects easy 
    


        if need be
        Don't like it? Try JSON­RPC instead.
    




                                 
Access control

        No access control provided by Marjory
    



        Use your webserver's authentication and ACL 
    


        capabilities
        There are currently no plans to add anything 
    


        built­in, unless someone convinces me 
        otherwise :­)



                               
Things to do

        Fully unit­test the beast
    



        Add a nice admin GUI (currently in progress)
    



        Add other engines
    



        Support more document formats out of the box
    


        (PDF likely to be next addition)
        Fine­tuning (how about renaming or removing 
    


        catalogs, for example?)

                                 
Is it production­ready?

        Yes, and it's already being used on production 
    


        websites




                               
That's all, folks!

        More information:
    


            http://code.google.com/p/marjory/
        



            http://www.dropr.org/
        



            http://blog.wolff­hamburg.de/
        




                                     

Contenu connexe

Tendances

ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
Clément Wehrung
 

Tendances (20)

ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
 
Sightly - Part 2
Sightly - Part 2Sightly - Part 2
Sightly - Part 2
 
Html and Xhtml
Html and XhtmlHtml and Xhtml
Html and Xhtml
 
HTML5 - Introduction
HTML5 - IntroductionHTML5 - Introduction
HTML5 - Introduction
 
Introduction to WEB HTML, CSS
Introduction to WEB HTML, CSSIntroduction to WEB HTML, CSS
Introduction to WEB HTML, CSS
 
Ferret
FerretFerret
Ferret
 
Elements of html powerpoint
Elements of html powerpointElements of html powerpoint
Elements of html powerpoint
 
Web Development Using CSS3
Web Development Using CSS3Web Development Using CSS3
Web Development Using CSS3
 
Web Development Using CSS3
Web Development Using CSS3Web Development Using CSS3
Web Development Using CSS3
 
How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)
 
Things I wish web graduates knew
Things I wish web graduates knewThings I wish web graduates knew
Things I wish web graduates knew
 
WWW and HTTP
WWW and HTTPWWW and HTTP
WWW and HTTP
 
Html and html5 cheat sheets
Html and html5 cheat sheetsHtml and html5 cheat sheets
Html and html5 cheat sheets
 
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified DiffChoose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
 
HTML Web design english & sinhala mix note
HTML Web design english & sinhala mix noteHTML Web design english & sinhala mix note
HTML Web design english & sinhala mix note
 
DIWE - Coding HTML for Basic Web Designing
DIWE - Coding HTML for Basic Web DesigningDIWE - Coding HTML for Basic Web Designing
DIWE - Coding HTML for Basic Web Designing
 
HTML5
HTML5 HTML5
HTML5
 
Web page concept final ppt
Web page concept  final pptWeb page concept  final ppt
Web page concept final ppt
 
html
htmlhtml
html
 
Xhtml 2010
Xhtml 2010Xhtml 2010
Xhtml 2010
 

En vedette

Omni-Proofing Your Organization
Omni-Proofing Your OrganizationOmni-Proofing Your Organization
Omni-Proofing Your Organization
Monica Gout
 

En vedette (18)

Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
 
Corp Web Risks and Concerns
Corp Web Risks and ConcernsCorp Web Risks and Concerns
Corp Web Risks and Concerns
 
California water footprint
California water footprintCalifornia water footprint
California water footprint
 
The Shift Home
The Shift HomeThe Shift Home
The Shift Home
 
Mis 14 FAVs de #Empleo2020
Mis 14 FAVs de #Empleo2020Mis 14 FAVs de #Empleo2020
Mis 14 FAVs de #Empleo2020
 
PHP, AJAX und XUL im Intranet
PHP, AJAX und XUL im IntranetPHP, AJAX und XUL im Intranet
PHP, AJAX und XUL im Intranet
 
why LG IPS technology?
why LG IPS technology?why LG IPS technology?
why LG IPS technology?
 
Rapid Evolution of Web Dev? aka Talking About The Web
Rapid Evolution of Web Dev? aka Talking About The WebRapid Evolution of Web Dev? aka Talking About The Web
Rapid Evolution of Web Dev? aka Talking About The Web
 
Irrigation of agricultural crops in California
Irrigation of agricultural crops in CaliforniaIrrigation of agricultural crops in California
Irrigation of agricultural crops in California
 
Thoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for SitecoreThoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for Sitecore
 
Omni-Proofing Your Organization
Omni-Proofing Your OrganizationOmni-Proofing Your Organization
Omni-Proofing Your Organization
 
Magento Performance Improvements with Client Side Optimizations
Magento Performance Improvements with Client Side OptimizationsMagento Performance Improvements with Client Side Optimizations
Magento Performance Improvements with Client Side Optimizations
 
The Romanticism of Things
The Romanticism of ThingsThe Romanticism of Things
The Romanticism of Things
 
Who Gamifies the Gamificators?
Who Gamifies the Gamificators?Who Gamifies the Gamificators?
Who Gamifies the Gamificators?
 
Desarrollemos el negocio de la #mHealth
Desarrollemos el negocio de la #mHealthDesarrollemos el negocio de la #mHealth
Desarrollemos el negocio de la #mHealth
 
Transmisión de Conocimiento en Apps Sanitarias
Transmisión de Conocimiento en Apps SanitariasTransmisión de Conocimiento en Apps Sanitarias
Transmisión de Conocimiento en Apps Sanitarias
 
Dr. House Design Thinking
Dr. House Design ThinkingDr. House Design Thinking
Dr. House Design Thinking
 
Review of Malcolm Gladwell's Outliers
Review of Malcolm Gladwell's OutliersReview of Malcolm Gladwell's Outliers
Review of Malcolm Gladwell's Outliers
 

Similaire à Search As A Service

Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web Developers
Nathan Buggia
 
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey IntroductionseyCwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
elliando dias
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 Platform
WSO2
 
Fast and Easy Website Tuneups
Fast and Easy Website TuneupsFast and Easy Website Tuneups
Fast and Easy Website Tuneups
Jeff Wisniewski
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practices
Stoyan Stefanov
 

Similaire à Search As A Service (20)

Switching search to SOLR
Switching search to SOLRSwitching search to SOLR
Switching search to SOLR
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
 
Boost and SEO
Boost and SEOBoost and SEO
Boost and SEO
 
Turbogears Presentation
Turbogears PresentationTurbogears Presentation
Turbogears Presentation
 
Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web Developers
 
Web Scraping In Ruby Utosc 2009.Key
Web Scraping In Ruby Utosc 2009.KeyWeb Scraping In Ruby Utosc 2009.Key
Web Scraping In Ruby Utosc 2009.Key
 
Guide 6 - Tapping Into Your Website Configuration File.pdf
Guide 6 - Tapping Into Your Website Configuration File.pdfGuide 6 - Tapping Into Your Website Configuration File.pdf
Guide 6 - Tapping Into Your Website Configuration File.pdf
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey IntroductionseyCwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers Presentation
 
Fast by Default
Fast by DefaultFast by Default
Fast by Default
 
Open Source Web Technologies
Open Source Web TechnologiesOpen Source Web Technologies
Open Source Web Technologies
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 Platform
 
Php frameworks
Php frameworksPhp frameworks
Php frameworks
 
Fast and Easy Website Tuneups
Fast and Easy Website TuneupsFast and Easy Website Tuneups
Fast and Easy Website Tuneups
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practices
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Search As A Service

  • 2. What's Marjory? A webservice for full­text indexing and   searching of documents Written in PHP  Based on Zend Framework  (Very) Roughly comparable to Solr  BSD­licensed, available on Google Code     
  • 3. How does Marjory work? Your application Sends search  Sends Document data Returns result in desired terms via GET or location via POST output format (default: XML) Marjory (ReST­based webservice) Stores document data Returns query Queries search in search engine results engine Search engine     (Default: Lucene)
  • 4. Features Search engine abstraction  use the engine that suits your needs, just write a   small adaptor class Zend_Search_Lucene built­in by default  Multiple search catalogs  Index many sites with one dedicated search server  Put all documents matching any criteria into   separate search indexes to speed up search    
  • 5. More features Two ways to index documents:  submit an XML snippet containing any content you   want to index or, just submit an URI (valid PHP stream resource)   and let Marjory extract the content from the  document HTML supported by default (for now)  add your own document parser class to extract plain text   from any other document format (or special markup  structures)    
  • 6. Even more features Index documents asynchronously using Dropr   as a messaging service Dropr: PHP­based durable messaging service  Example webservice and Dropr client included with   Marjory Application does not need to wait for document   retrieval, parsing and adding to the index More info: www.dropr.org     
  • 7. Latest additions Search results as a Dojo.Data compatible   JSON data source API exposure via JSON­RPC as alternative to   XML over ReST (experimental!)    
  • 8. How to add a catalog Send a POST request to:  http://marjory.example.com/rest/catalog/ Containing this XML snippet:  <add catalog=quot;MyGloriousCatalogquot; /> Et voilá, you got yourself a new search index     
  • 9. Adding a document Make a POST request to:  http://marjory.example.com/rest/add/ Send the document content as XML like this:  <add catalog=quot;defaultquot;> <doc uri=quot;MyUniqueDocumentIdquot;>     <field name=quot;titlequot;>Marjory: Search as a service</field>     <field name=quot;abstractquot;> An epic novel about full­text indexing in an SOA environment     </field>     <field name=quot;contentquot;>Lorem ipsum dolor sit amet... (to be continued)</field>  </doc> </add>    
  • 10. Adding a document, the easy way Or, if Marjory should retrieve and parse the   document: <add catalog=quot;defaultquot;>   <doc src=quot;http://my.website.tld/my/document.htmlquot; /> </add> If you have many and/or complex documents,   better use Dropr to send messages to Marjory    
  • 11. Searching for documents Make a GET request including the query terms:  http://marjory.example.com/rest/select?q=Marjory Additional parameters to...  Limit number of results  Include only specific fields in response  Specify a search catalog  Default catalog name: „default“ ­ who would have   guessed?    
  • 13. Looks familiar? Blatantly stolen from Solr :­)  Why reinvent the wheel?  Makes switching between the two projects easy   if need be Don't like it? Try JSON­RPC instead.     
  • 14. Access control No access control provided by Marjory  Use your webserver's authentication and ACL   capabilities There are currently no plans to add anything   built­in, unless someone convinces me  otherwise :­)    
  • 15. Things to do Fully unit­test the beast  Add a nice admin GUI (currently in progress)  Add other engines  Support more document formats out of the box  (PDF likely to be next addition) Fine­tuning (how about renaming or removing   catalogs, for example?)    
  • 16. Is it production­ready? Yes, and it's already being used on production   websites    
  • 17. That's all, folks! More information:  http://code.google.com/p/marjory/  http://www.dropr.org/  http://blog.wolff­hamburg.de/ 