Enterprise search in Plone using Solr

•Télécharger en tant que KEY, PDF•

8 j'aime•1,003 vues

Out of the box, Plone includes an integrated and powerful search engine with features such as live search and full text indexing. Sometimes this isn't enough or you need more robust search features to provide your site visitors with a more custom search experience. In this talk, Six Feet Up CTO Calvin Hendryx-Parker, will go into the details of implementing Solr with Plone for a large project. Solr is an enterprise search engine that can be deployed alongside of Plone. Some of the topics to be discussed include: weighted search thesaurus spell check flexible query parsing faster search performance and more...

Technologie Formation

Enterprise Search in
Plone using Solr
Calvin Hendryx-Parker
Plone Symposium East 2010

nowhere to go but
open source
s i xf e e tu p . c om

What is Solr?
• Java Based
• Full-Text Search
• Web Services API
• Standards Based Interfaces
• Scalable
• XML Conﬁguration
• Extensible

sixfeetup.com/deploy

Playing with Solr

• Indexing
• Query

sixfeetup.com/deploy

nowhere to go but
open source
s i xf e e tu p . c om

Solr Features
• Data Schema
• Faceted Search
• Administrative Interface
• Incremental Updates
• Supports Sharding
• Index Databases, Local Files and Web Pages
• Supports Multiple Indexes

sixfeetup.com/deploy

Solr Features
• Stopwords
• Synonyms
• Highlighted Context Snippets
• Spelling Suggestions
• More Like This Suggestions
• Supports Rich Documents

sixfeetup.com/deploy

Solr Performance
• Wiktionary Dataset
• 49.5 Millions lines of XML
• 1.3 GB of data
• 1.7 Million Pages Index 5.5 hours
• ZODB Size after import 1.1GB

sixfeetup.com/deploy

Integration Options with Plone

• collective.solr

sixfeetup.com/deploy

collective.solr Issues
• Monkey Patching
• Relies on collective.indexing
• Duplicates all indexes
• Sub-Optimal Integration with Zope Transactions
• Relies on Thread Locals

sixfeetup.com/deploy

What to do?

nowhere to go but
open source
s i xf e e tu p . c om

Reevaluate

nowhere to go but
open source
s i xf e e tu p . c om

Solr Integration as a Catalog Index

• No Monkey Patching
• Simpler Code

sixfeetup.com/deploy

Enter alm.solrindex
• ZCatalog Index
• Doesn't depend on Plone
• Utilizes new foreign_connections Connection
Method
• Pass through Solr Queries
• Direct access to the Solr Response

sixfeetup.com/deploy

Sorting

• Still handled by the ZCatalog
• Could change in the future

sixfeetup.com/deploy

alm.solrindex Field Handlers

• Handle Parsing Attributes for Indexing
• Translate ﬁeld-speciﬁc queries to Solr
• Zope Utilities

sixfeetup.com/deploy

$Example Handler class TextFieldHandler(DefaultFieldHandler): <html> def parse_query(self, field, field_query): <body> name = field.name <h3>Code Sample</h3> request = {name: field_query} <p>Replace this text!</p> record = parseIndexRequest(request, name, ('query',)) if not record.keys: </body> return None </html> query_str = ' '.join(record.keys) if not query_str: return None return {'q': u'+%s:%s' % (name, quote_query(query_str))} sixfeetup.com/deploy$

Other alm.solrindex Features

• GenericSetup Proﬁle
• Tests
• Uses solrpy instead of the unsupported solr.py

sixfeetup.com/deploy

Tips
• Can replace several ZCatalog indexes
• Remove any indexes you have replaced
• Use it for all Text Indexes
• Still Utilize the ZCatalog Indexes for Everything
Else

sixfeetup.com/deploy

Demo
<html>
<body>
<h3>Code Sample</h3>
<p>Replace this text!</p>
</body> Project Gutenberg Data
</html>

sixfeetup.com/deploy

Questions?

nowhere to go but
open source
s i xf e e tu p . c om

sixfeetup.com/deploy
sixfeetup.com/deploy

Contenu connexe

Tendances

Introduction to libre « fulltext » technologyRobert Viseur

QueryPath, Mash-ups, and Web ServicesMatt Butcher

Text Manipulation with/without Parsecujihisa

Sphinx && Perl Houston Perl Mongers - May 8th, 2014Brett Estrade

Apache Solr - Enterprise search platformTommaso Teofili

JSON in Solr: from top to bottomAlexandre Rafalovitch

Tendances (6)

Introduction to libre « fulltext » technology

QueryPath, Mash-ups, and Web Services

Text Manipulation with/without Parsec

Sphinx && Perl Houston Perl Mongers - May 8th, 2014

Apache Solr - Enterprise search platform

JSON in Solr: from top to bottom

Similaire à Enterprise search in Plone using Solr

Scala Bay Meetup - The state of Scala code style and qualityJaime Jorge

Solr RecipesErik Hatcher

Rapid API Development ArangoDB FoxxMichael Hackstein

Performance and AbstractionsMetosin Oy

Solr 8 interview Alihossein shahabi

Polyglot GrailsMarcin Gryszko

SolrCloud on HadoopAlex Moundalexis

Solr Recipes WorkshopErik Hatcher

Solr Masterclass Bangkok, June 2014Alexandre Rafalovitch

Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey

From Lucene to Solr 4 Trunktdthomassld

OWASP 2013 APPSEC USA ZAP HackathonSimon Bennetts

Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar

Best practices for highly available and large scale SolrCloudAnshum Gupta

SolrPeter Svehla

Solr: 4 big featuresDavid Smiley

SolrClaudio Devecchi

Building Intelligent Search Applications with Apache Solr and PHP5israelekpo

Introduction to SolrErik Hatcher

Middleware in Golang: InVision's RyeCale Hoopes

Similaire à Enterprise search in Plone using Solr (20)

Scala Bay Meetup - The state of Scala code style and quality

Solr Recipes

Rapid API Development ArangoDB Foxx

Performance and Abstractions

Solr 8 interview

Polyglot Grails

SolrCloud on Hadoop

Solr Recipes Workshop

Solr Masterclass Bangkok, June 2014

Solr/Elasticsearch for CF Developers (and others)

From Lucene to Solr 4 Trunk

OWASP 2013 APPSEC USA ZAP Hackathon

Parallel SQL and Streaming Expressions in Apache Solr 6

Best practices for highly available and large scale SolrCloud

Solr

Solr: 4 big features

Solr

Building Intelligent Search Applications with Apache Solr and PHP5

Introduction to Solr

Middleware in Golang: InVision's Rye

Plus de Calvin Hendryx-Parker

Plone and Drupal -- CMS Coexistance in Higher EducationCalvin Hendryx-Parker

Plone roadmapCalvin Hendryx-Parker

How to seal the dealCalvin Hendryx-Parker

2010 py ohio supervisor talkCalvin Hendryx-Parker

Social Networking Tools Session ThreeCalvin Hendryx-Parker

Social Networking Tools Session OneCalvin Hendryx-Parker

Social Networking Tools Session TwoCalvin Hendryx-Parker

Plone's AnatomyCalvin Hendryx-Parker

Plus de Calvin Hendryx-Parker (8)

Plone and Drupal -- CMS Coexistance in Higher Education

Plone roadmap

How to seal the deal

2010 py ohio supervisor talk

Social Networking Tools Session Three

Social Networking Tools Session One

Social Networking Tools Session Two

Plone's Anatomy

Dernier

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Training state-of-the-art general text embeddingZilliz

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Story boards and shot lists for my a level piececharlottematthew16

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

AI as an Interface for Commercial BuildingsMemoori

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Dernier (20)

Ensuring Technical Readiness For Copilot in Microsoft 365

My INSURER PTE LTD - Insurtech Innovation Award 2024

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Artificial intelligence in cctv survelliance.pptx

Nell’iperspazio con Rocket: il Framework Web di Rust!

My Hashitalk Indonesia April 2024 Presentation

What's New in Teams Calling, Meetings and Devices March 2024

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

DMCC Future of Trade Web3 - Special Edition

Unraveling Multimodality with Large Language Models.pdf

Training state-of-the-art general text embedding

Advanced Test Driven-Development @ php[tek] 2024

Powerpoint exploring the locations used in television show Time Clash

Story boards and shot lists for my a level piece

DevEX - reference for building teams, processes, and platforms

AI as an Interface for Commercial Buildings

SAP Build Work Zone - Overview L2-L3.pptx

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

SIP trunking in Janus @ Kamailio World 2024

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Enterprise search in Plone using Solr

1. Enterprise Search in Plone using Solr Calvin Hendryx-Parker Plone Symposium East 2010 nowhere to go but open source s i xf e e tu p . c om

2. What is Solr? • Java Based • Full-Text Search • Web Services API • Standards Based Interfaces • Scalable • XML Conﬁguration • Extensible sixfeetup.com/deploy

3. Playing with Solr • Indexing • Query sixfeetup.com/deploy

4. nowhere to go but open source s i xf e e tu p . c om

5. nowhere to go but open source s i xf e e tu p . c om

6. Solr Features • Data Schema • Faceted Search • Administrative Interface • Incremental Updates • Supports Sharding • Index Databases, Local Files and Web Pages • Supports Multiple Indexes sixfeetup.com/deploy

7. Solr Features • Stopwords • Synonyms • Highlighted Context Snippets • Spelling Suggestions • More Like This Suggestions • Supports Rich Documents sixfeetup.com/deploy

8. nowhere to go but open source s i xf e e tu p . c om

9. nowhere to go but open source s i xf e e tu p . c om

10. nowhere to go but open source s i xf e e tu p . c om

11. Solr Performance • Wiktionary Dataset • 49.5 Millions lines of XML • 1.3 GB of data • 1.7 Million Pages Index 5.5 hours • ZODB Size after import 1.1GB sixfeetup.com/deploy

12. Integration Options with Plone • collective.solr sixfeetup.com/deploy

13. collective.solr Issues • Monkey Patching • Relies on collective.indexing • Duplicates all indexes • Sub-Optimal Integration with Zope Transactions • Relies on Thread Locals sixfeetup.com/deploy

14. What to do? nowhere to go but open source s i xf e e tu p . c om

15. Reevaluate nowhere to go but open source s i xf e e tu p . c om

16. Solr Integration as a Catalog Index • No Monkey Patching • Simpler Code sixfeetup.com/deploy

17. Enter alm.solrindex • ZCatalog Index • Doesn't depend on Plone • Utilizes new foreign_connections Connection Method • Pass through Solr Queries • Direct access to the Solr Response sixfeetup.com/deploy

18. nowhere to go but open source s i xf e e tu p . c om

19. nowhere to go but open source s i xf e e tu p . c om

20. Sorting • Still handled by the ZCatalog • Could change in the future sixfeetup.com/deploy

21. alm.solrindex Field Handlers • Handle Parsing Attributes for Indexing • Translate ﬁeld-speciﬁc queries to Solr • Zope Utilities sixfeetup.com/deploy

22. Example Handler class TextFieldHandler(DefaultFieldHandler): <html> def parse_query(self, field, field_query): <body> name = field.name <h3>Code Sample</h3> request = {name: field_query} <p>Replace this text!</p> record = parseIndexRequest(request, name, ('query',)) if not record.keys: </body> return None </html> query_str = ' '.join(record.keys) if not query_str: return None return {'q': u'+%s:%s' % (name, quote_query(query_str))} sixfeetup.com/deploy

23. Other alm.solrindex Features • GenericSetup Proﬁle • Tests • Uses solrpy instead of the unsupported solr.py sixfeetup.com/deploy

24. Tips • Can replace several ZCatalog indexes • Remove any indexes you have replaced • Use it for all Text Indexes • Still Utilize the ZCatalog Indexes for Everything Else sixfeetup.com/deploy

25. Demo <html> <body> <h3>Code Sample</h3> <p>Replace this text!</p> </body> Project Gutenberg Data </html> sixfeetup.com/deploy

26. Questions? nowhere to go but open source s i xf e e tu p . c om

27. sixfeetup.com/deploy sixfeetup.com/deploy

Notes de l'éditeur

enterprise search server XML, JSON, HTTP (REST) Efficient Replication to other Solr Search Servers Full Plugin Architecture
XML over HTTP HTTP GET returns XML results
very similar to the ZCatalog Numeric Types, Dynamic Fields, Unique Keys rich set of debugging tools
very similar to the ZCatalog Numeric Types, Dynamic Fields, Unique Keys rich set of debugging tools
17GB before packing still using zope in the middle for this did the same import with ZCTextIndex and ZCatalog and it crashed and burned at 900k items
designed to work with this package, which is a collection of more monkey patches except SearchableText which it removes from the catalog anytime you add an index to the catalog, have to add it to solr also global or thread local variables usually make it much harder to understand horrible code readability could lead to unexpected results, maybe your solr connection isn't there anymore
This is open source software so we can learn from past experiences of others. Six Feet Up embarked on a large project where flexible search features were a great match for Solr. We shouldn't throw away everything in collective.solr.
less indirection allow us to re-factor our connection to Solr
only one needed per catalog uses the solr schema to determine what columns to index maintains a persistent connection to external db from a ZODB object won't be dropped by object deactivations like _v_ methods won't break when you pass control between treads like thread locals Pass solr query paramaters directly via the solr_param dictionary. Requires that some value be passed for "q". Can be used to access features like weighting of terms in a query. Pass in a solr_callback function in your query and SolrIndex will call it passing the parsed Solr response object.
Not handled by SolrIndex at this time. Just use the sort_on parameter like you normally would.
write your own and register them via ZCML
Avoids indexing the same attribute multiple times. ZCatalog falls short on full text indexing when it comes to performance and features Native ZCatalog indexes are faster than any network bound service. ZCatalog Indexes share transaction aware ZODB caches.
32K Books richer metadata than the wiktionary data

Enterprise search in Plone using Solr

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (6)

Similaire à Enterprise search in Plone using Solr

Similaire à Enterprise search in Plone using Solr (20)

Plus de Calvin Hendryx-Parker

Plus de Calvin Hendryx-Parker (8)

Dernier

Dernier (20)

Enterprise search in Plone using Solr

Notes de l'éditeur