SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Apache Solr 101
Killing the Vampires of Search
Cluj, 2013
Olivier Dobberkau
Some Vampirology first
●
●
●
●
●

Nosferatu
Dracula
van Helsing
Selene
Edward & Bella

http://en.wikipedia.org/wiki/Vampire_film
Agenda
●
●
●
●
●
●

About me
History of EXT:solr
Current status
Solr Basics
Caveats
Books & Documents
About me
Olivier Dobberkau
CEO of dkd Internet Service GmbH
Research and Development
over 10 years of TYPO3 CMS
Member of the T3A EAB
olivier.dobberkau@dkd.de
Twitter: T3RevNeverEnd
Scratching ...
.. the TYPO3 CMS search itch
History of EXT:solr
We all know when a solution fails ...
History of EXT:solr
●
●
●
●
●
●
●

Indexed Search gave us some pain
First prototype 2009
What you get in one or two days of work
Started Funding of Development
over 70 Sponsors
Its possible to offer services around it
Support and Consulting available
Current Status
Version 2.8.2 was released November 2012
Introduced the Add-ons for additional features
Supported TYPO3 CMS Versions
4.5, 4.6 & 4.7
Supported Solr Server
3.6.2 (Time flies when you are having fun!)
The last TER Release
TER: 2.8.3
Introduce support for TYPO3 CMS Versions 4.5
- 6.1
Loads of bug-fixes
Maintenance Release
Next Major Version
EXT:solr 3.x will be the next version
Release will be hopefully soon(tm)
Will have no new features on the TYPO3 side
Support for TYPO3 CMS 4.5 - 6.1
Add Apache Solr 4.4 as a Server
Roadmap for EXT:solr 4.x
●
●
●
●
●

Backend parts of the EXT all in Extbase
Templates go FLUID
Frontend goes Extbase
4.x will be 6.2 only!
Effort estimated 2 to 4 man months
The EXT:solr ecosystem
The base is EXT:solr
Features are added thru Add-ons
● EXT:solrfile (File-Indexing for CMS 4.5 - 4.7)
● EXT:solrdam (File-Indexing with DAM)
● EXT:solrfal (File-Indexing for CMS 6.1 & 6.2)
● EXT:solrmlt (More like this)
● EXT:solrgrouping
● EXT:tika (Extracting Service)
EXT:solr
So what does it do?
● Indexing
● Querying
● Results Listing
● Logging / Analysis
Indexing
●
●
●
●

Indexing of pages
Indexing of TCA records
Indexing of Files (Add-On)
Index Queue
○ List of all to be indexed items
○ Every time an items is touched/changed an update
is sent to the solr server
○ No need for a crawler / instant results
Indexing
● Indexing is very easy and can be achieved
thru simple typoscript configuration
● Additionally you can use Apache Nutch to
index non TYPO3 websites
● Support for more than 30 Languages
Querying
● Easy to set up
● Apply Lucene query language if you want to
search for specific items (only news i.e)
● You can tell solr to boost results if query
terms are in the fields you are searching
● Use elevation to rank terms
● Correct Stemming available
● Range queries (Intelligent dates)
Results Listing
● Results can be fully individualized
○ Templates for different results types

● Sorting of the Results List
○
○
○
○

Relevance
Date
Title
any other field

● Can be toggled
Result Listings
● Facettes
○ Filter the results based of attributes
○ Hierarchical Facettes

●
●
●
●

Suggestions / Autocomplete
Stopwords
Protected words
Did you mean?
Logging / Analysis
● Built in query logging
● Can be used with your favorite Analytics
suite
● Feature rich analysis & debugging options
Caveats
● Junk in / Junk out
● Get your data right
● A String is not Text
○ Be aware of the difference between Strings and Text
○ Protect proper names from stemming
○ Example
Caveats
● Synonyms are nice, but don't abuse them
● Don't confuse Solr with a Database
○ %WORD% does not work

● Search with “WORD” if you want your query
to remain untouched
● * work only at the end of a word
○ cat* will find catapult, cats, catastrophe etc
○ *cat will yield with no results
Caveats
● Beware of indexing time
○ Pages index slower than TCA records
○ Files might be too big for initial settings
Some web resources
● You will find a lot of infos around the Apache
Solr Extension: www.typo3-solr.com
● http://forge.typo3.
org/projects/show/extension-solr
● Mailing List / Newsgroup / Forums
● Afraid of Solr? try www.hosted-solr.com
Books & Documentation
● Taming Text
● Apache Solr Cookbook
● Administering Solr
● Apache Solr 4.x
● WIKI of Apache Solr
https://cwiki.apache.
org/confluence/display/solr/Apache+Solr+Refer
ence+Guide
Merci!
Thank you!

Contenu connexe

Similaire à Killing the Vampires of Search with Apache Solr 101

2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for NewbiesTYPO3 CertiFUNcation
 
Status & Outlook on EXT:solr for TYPO3 CMS
Status & Outlook on EXT:solr for TYPO3 CMSStatus & Outlook on EXT:solr for TYPO3 CMS
Status & Outlook on EXT:solr for TYPO3 CMSOlivier Dobberkau
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
 
Destination Documentation: How Not to Get Lost in Your Org
Destination Documentation: How Not to Get Lost in Your OrgDestination Documentation: How Not to Get Lost in Your Org
Destination Documentation: How Not to Get Lost in Your Orgcsupilowski
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Websolutions Agency
 
OpenSearch.pdf
OpenSearch.pdfOpenSearch.pdf
OpenSearch.pdfAbhi Jain
 
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationWhat Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationCTruncer
 
Turbo charge your logs
Turbo charge your logsTurbo charge your logs
Turbo charge your logsJeremy Cook
 
Solr in drupal 7 index and search more entities
Solr in drupal 7  index and search more entitiesSolr in drupal 7  index and search more entities
Solr in drupal 7 index and search more entitiesBiglazy
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandraVinay Kumar Chella
 
Ukoug webinar - testing PLSQL APIs with utPLSQL v3
Ukoug webinar - testing PLSQL APIs with utPLSQL v3Ukoug webinar - testing PLSQL APIs with utPLSQL v3
Ukoug webinar - testing PLSQL APIs with utPLSQL v3Jacek Gebal
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional ProgrammerDave Cross
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience ReportNetcetera
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLPRobert Viseur
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observabilityOVHcloud
 

Similaire à Killing the Vampires of Search with Apache Solr 101 (20)

2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
 
Status & Outlook on EXT:solr for TYPO3 CMS
Status & Outlook on EXT:solr for TYPO3 CMSStatus & Outlook on EXT:solr for TYPO3 CMS
Status & Outlook on EXT:solr for TYPO3 CMS
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
Destination Documentation: How Not to Get Lost in Your Org
Destination Documentation: How Not to Get Lost in Your OrgDestination Documentation: How Not to Get Lost in Your Org
Destination Documentation: How Not to Get Lost in Your Org
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8
 
OpenSearch.pdf
OpenSearch.pdfOpenSearch.pdf
OpenSearch.pdf
 
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationWhat Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
 
Turbo charge your logs
Turbo charge your logsTurbo charge your logs
Turbo charge your logs
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Solr in drupal 7 index and search more entities
Solr in drupal 7  index and search more entitiesSolr in drupal 7  index and search more entities
Solr in drupal 7 index and search more entities
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
 
Ukoug webinar - testing PLSQL APIs with utPLSQL v3
Ukoug webinar - testing PLSQL APIs with utPLSQL v3Ukoug webinar - testing PLSQL APIs with utPLSQL v3
Ukoug webinar - testing PLSQL APIs with utPLSQL v3
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional Programmer
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observability
 

Plus de Olivier Dobberkau

Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3Olivier Dobberkau
 
Apache Solr for TYPO3: More than a search engine
Apache Solr for TYPO3: More than a search engineApache Solr for TYPO3: More than a search engine
Apache Solr for TYPO3: More than a search engineOlivier Dobberkau
 
With a little help from my friends (english)
With a little help  from my friends (english)With a little help  from my friends (english)
With a little help from my friends (english)Olivier Dobberkau
 
With a little help from my friends
With a little help from my friendsWith a little help from my friends
With a little help from my friendsOlivier Dobberkau
 
Sonnenschein für ihre Website
Sonnenschein für ihre WebsiteSonnenschein für ihre Website
Sonnenschein für ihre WebsiteOlivier Dobberkau
 
TYPO3 Camp Poznan - Solr Usecases with Hosted Solr
TYPO3 Camp Poznan - Solr Usecases with Hosted SolrTYPO3 Camp Poznan - Solr Usecases with Hosted Solr
TYPO3 Camp Poznan - Solr Usecases with Hosted SolrOlivier Dobberkau
 
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...Olivier Dobberkau
 
ForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT: Beyond the page: Giving content a meaning and valueForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT: Beyond the page: Giving content a meaning and valueOlivier Dobberkau
 
ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014Olivier Dobberkau
 
Explain TYPO3 Association March 2014
Explain TYPO3 Association March 2014Explain TYPO3 Association March 2014
Explain TYPO3 Association March 2014Olivier Dobberkau
 
Outside the Box - Panel on CMS at TYPO3 Camp Mallorca
Outside the Box - Panel on CMS at TYPO3 Camp MallorcaOutside the Box - Panel on CMS at TYPO3 Camp Mallorca
Outside the Box - Panel on CMS at TYPO3 Camp MallorcaOlivier Dobberkau
 
The future of CMS @T3UNI 2013 Annecy France
The future of CMS @T3UNI 2013 Annecy FranceThe future of CMS @T3UNI 2013 Annecy France
The future of CMS @T3UNI 2013 Annecy FranceOlivier Dobberkau
 
Digital dark age - Are we doing enough to preserve our website heritage?
Digital dark age - Are we doing enough to preserve our website heritage?Digital dark age - Are we doing enough to preserve our website heritage?
Digital dark age - Are we doing enough to preserve our website heritage?Olivier Dobberkau
 
Everything you always wanted to know about search in typo3
Everything you always wanted to know about search in typo3Everything you always wanted to know about search in typo3
Everything you always wanted to know about search in typo3Olivier Dobberkau
 
Alles was-sie-ueber-suche-wissen-wollten
Alles was-sie-ueber-suche-wissen-wolltenAlles was-sie-ueber-suche-wissen-wollten
Alles was-sie-ueber-suche-wissen-wolltenOlivier Dobberkau
 

Plus de Olivier Dobberkau (20)

Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
 
Apache Solr for TYPO3: More than a search engine
Apache Solr for TYPO3: More than a search engineApache Solr for TYPO3: More than a search engine
Apache Solr for TYPO3: More than a search engine
 
TYPO3 v8 LTS in the cloud
TYPO3 v8 LTS in the cloudTYPO3 v8 LTS in the cloud
TYPO3 v8 LTS in the cloud
 
With a little help from my friends (english)
With a little help  from my friends (english)With a little help  from my friends (english)
With a little help from my friends (english)
 
With a little help from my friends
With a little help from my friendsWith a little help from my friends
With a little help from my friends
 
TYPO3 & You
TYPO3 & YouTYPO3 & You
TYPO3 & You
 
Sonnenschein für ihre Website
Sonnenschein für ihre WebsiteSonnenschein für ihre Website
Sonnenschein für ihre Website
 
Apache Solr Revisited 2015
Apache Solr Revisited 2015Apache Solr Revisited 2015
Apache Solr Revisited 2015
 
TYPO3 Camp Poznan - Solr Usecases with Hosted Solr
TYPO3 Camp Poznan - Solr Usecases with Hosted SolrTYPO3 Camp Poznan - Solr Usecases with Hosted Solr
TYPO3 Camp Poznan - Solr Usecases with Hosted Solr
 
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
 
TYPO3 and CMIS
TYPO3 and CMISTYPO3 and CMIS
TYPO3 and CMIS
 
ForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT: Beyond the page: Giving content a meaning and valueForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT: Beyond the page: Giving content a meaning and value
 
ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014
 
Explain TYPO3 Association March 2014
Explain TYPO3 Association March 2014Explain TYPO3 Association March 2014
Explain TYPO3 Association March 2014
 
EXPLAIN #t3a
EXPLAIN #t3aEXPLAIN #t3a
EXPLAIN #t3a
 
Outside the Box - Panel on CMS at TYPO3 Camp Mallorca
Outside the Box - Panel on CMS at TYPO3 Camp MallorcaOutside the Box - Panel on CMS at TYPO3 Camp Mallorca
Outside the Box - Panel on CMS at TYPO3 Camp Mallorca
 
The future of CMS @T3UNI 2013 Annecy France
The future of CMS @T3UNI 2013 Annecy FranceThe future of CMS @T3UNI 2013 Annecy France
The future of CMS @T3UNI 2013 Annecy France
 
Digital dark age - Are we doing enough to preserve our website heritage?
Digital dark age - Are we doing enough to preserve our website heritage?Digital dark age - Are we doing enough to preserve our website heritage?
Digital dark age - Are we doing enough to preserve our website heritage?
 
Everything you always wanted to know about search in typo3
Everything you always wanted to know about search in typo3Everything you always wanted to know about search in typo3
Everything you always wanted to know about search in typo3
 
Alles was-sie-ueber-suche-wissen-wollten
Alles was-sie-ueber-suche-wissen-wolltenAlles was-sie-ueber-suche-wissen-wollten
Alles was-sie-ueber-suche-wissen-wollten
 

Dernier

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Dernier (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Killing the Vampires of Search with Apache Solr 101

  • 1. Apache Solr 101 Killing the Vampires of Search Cluj, 2013 Olivier Dobberkau
  • 2. Some Vampirology first ● ● ● ● ● Nosferatu Dracula van Helsing Selene Edward & Bella http://en.wikipedia.org/wiki/Vampire_film
  • 3. Agenda ● ● ● ● ● ● About me History of EXT:solr Current status Solr Basics Caveats Books & Documents
  • 4. About me Olivier Dobberkau CEO of dkd Internet Service GmbH Research and Development over 10 years of TYPO3 CMS Member of the T3A EAB olivier.dobberkau@dkd.de Twitter: T3RevNeverEnd
  • 5. Scratching ... .. the TYPO3 CMS search itch
  • 6. History of EXT:solr We all know when a solution fails ...
  • 7. History of EXT:solr ● ● ● ● ● ● ● Indexed Search gave us some pain First prototype 2009 What you get in one or two days of work Started Funding of Development over 70 Sponsors Its possible to offer services around it Support and Consulting available
  • 8. Current Status Version 2.8.2 was released November 2012 Introduced the Add-ons for additional features Supported TYPO3 CMS Versions 4.5, 4.6 & 4.7 Supported Solr Server 3.6.2 (Time flies when you are having fun!)
  • 9. The last TER Release TER: 2.8.3 Introduce support for TYPO3 CMS Versions 4.5 - 6.1 Loads of bug-fixes Maintenance Release
  • 10. Next Major Version EXT:solr 3.x will be the next version Release will be hopefully soon(tm) Will have no new features on the TYPO3 side Support for TYPO3 CMS 4.5 - 6.1 Add Apache Solr 4.4 as a Server
  • 11. Roadmap for EXT:solr 4.x ● ● ● ● ● Backend parts of the EXT all in Extbase Templates go FLUID Frontend goes Extbase 4.x will be 6.2 only! Effort estimated 2 to 4 man months
  • 12. The EXT:solr ecosystem The base is EXT:solr Features are added thru Add-ons ● EXT:solrfile (File-Indexing for CMS 4.5 - 4.7) ● EXT:solrdam (File-Indexing with DAM) ● EXT:solrfal (File-Indexing for CMS 6.1 & 6.2) ● EXT:solrmlt (More like this) ● EXT:solrgrouping ● EXT:tika (Extracting Service)
  • 13. EXT:solr So what does it do? ● Indexing ● Querying ● Results Listing ● Logging / Analysis
  • 14. Indexing ● ● ● ● Indexing of pages Indexing of TCA records Indexing of Files (Add-On) Index Queue ○ List of all to be indexed items ○ Every time an items is touched/changed an update is sent to the solr server ○ No need for a crawler / instant results
  • 15. Indexing ● Indexing is very easy and can be achieved thru simple typoscript configuration ● Additionally you can use Apache Nutch to index non TYPO3 websites ● Support for more than 30 Languages
  • 16. Querying ● Easy to set up ● Apply Lucene query language if you want to search for specific items (only news i.e) ● You can tell solr to boost results if query terms are in the fields you are searching ● Use elevation to rank terms ● Correct Stemming available ● Range queries (Intelligent dates)
  • 17. Results Listing ● Results can be fully individualized ○ Templates for different results types ● Sorting of the Results List ○ ○ ○ ○ Relevance Date Title any other field ● Can be toggled
  • 18. Result Listings ● Facettes ○ Filter the results based of attributes ○ Hierarchical Facettes ● ● ● ● Suggestions / Autocomplete Stopwords Protected words Did you mean?
  • 19. Logging / Analysis ● Built in query logging ● Can be used with your favorite Analytics suite ● Feature rich analysis & debugging options
  • 20. Caveats ● Junk in / Junk out ● Get your data right ● A String is not Text ○ Be aware of the difference between Strings and Text ○ Protect proper names from stemming ○ Example
  • 21. Caveats ● Synonyms are nice, but don't abuse them ● Don't confuse Solr with a Database ○ %WORD% does not work ● Search with “WORD” if you want your query to remain untouched ● * work only at the end of a word ○ cat* will find catapult, cats, catastrophe etc ○ *cat will yield with no results
  • 22. Caveats ● Beware of indexing time ○ Pages index slower than TCA records ○ Files might be too big for initial settings
  • 23. Some web resources ● You will find a lot of infos around the Apache Solr Extension: www.typo3-solr.com ● http://forge.typo3. org/projects/show/extension-solr ● Mailing List / Newsgroup / Forums ● Afraid of Solr? try www.hosted-solr.com
  • 24. Books & Documentation ● Taming Text ● Apache Solr Cookbook ● Administering Solr ● Apache Solr 4.x ● WIKI of Apache Solr https://cwiki.apache. org/confluence/display/solr/Apache+Solr+Refer ence+Guide