SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
http://www.dkd.de
Freitag, 10. Juni 2011
d dkdevelopment
kommunikation
design
Freitag, 10. Juni 2011
Welcome
Olivier Dobberkau
CEO
dkd Internet Service GmbH
Frankfurt am Main, Germany
Freitag, 10. Juni 2011
Agenda
What is search?
Search in TYPO3
Search expectations today
Apache Solr
Why and how?
Watch out!
Freitag, 10. Juni 2011
Aboutme
Freitag, 10. Juni 2011
OlivierDobberkau
Founder of dkd Internet Service GmbH
aka „the reverend never-end“
Met TYPO3 with Version 3.2 beta 3
Member of T3A BCC
43 years old
olivier.dobberkau@dkd.de
Twitter: @T3RevNeverEnd
Freitag, 10. Juni 2011
WhatisSearch?
Freitag, 10. Juni 2011
DefinitionofInformationRetrieval
Information retrieval (IR) is the area of study
concerned with searching for documents, for
information within documents, and for metadata
about documents, as well as that of searching
relational databases and the World Wide Web.
Wikipedia:
http://en.wikipedia.org/wiki/Information_retrieval
Freitag, 10. Juni 2011
FactorsinInformationRetrieval
Recall
Precision
Fall-out
Scalability
Performance
Freitag, 10. Juni 2011
FactorsinInformationRetrieval
Recall
Precision
Fall-out
Scalability
Performance
Simplicity
Flexibility
Freitag, 10. Juni 2011
Recall
Percent of documents that are returned
400 documents
100 containing information
25% recall
Freitag, 10. Juni 2011
Precision
Percentage of documents that are relevant
500 returned, 100 relevant
20% precision
Freitag, 10. Juni 2011
Best would be:
100% Recall with 100% Precision
Freitag, 10. Juni 2011
Index
The purpose of storing an index is to optimize
speed and performance in finding relevant
documents for a search query.
Freitag, 10. Juni 2011
Index
Index
Document 5
Document 4
Document 3
Document 2
Document 1
Extbase
TYPO3
San
Baseball
My
is
Francisco
is
cat
T3CON
my
is
a
rocks
Fort
cool
Ghetto
Mason
Sport
Freitag, 10. Juni 2011
PostingFile
Word Document
My 1,2
cat 1
is 1,2,5
cool 1
Baseball 2
Sport 2
San 3
Freitag, 10. Juni 2011
SearchinTYPO3
Freitag, 10. Juni 2011
IndexedSearch
Indexed Search since TYPO3 Version 3.5
Frontend Indexing through the Frontend
Searches in Pages and in some Filetypes
Works with Languages and Accessrights
Freitag, 10. Juni 2011
IndexedSearch
Index in Database
Problems with large websites
Slow
no sorting
no Templating
OK for small websites
Freitag, 10. Juni 2011
Search
Expectations
Freitag, 10. Juni 2011
Expectationvs.Experience
Users expect „Google-Like“ interface and
behaviour in search
No one navigates through an online shop
up to 30% of users use the search instead of
going through text or navigation
Search is mediocre on a lot of websites
Slow and incomplete
Lots of improvement possible
Freitag, 10. Juni 2011
ApacheSolr
Enterprise Search Server
Freitag, 10. Juni 2011
ApacheSolr
Apache Software Foundation
Enterprise Search Server
uses the Lucene Index
Lots of great Features
CNet, Netflix, Zappos.com and many more...
Freitag, 10. Juni 2011
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Freitag, 10. Juni 2011
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
Freitag, 10. Juni 2011
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
Speed
Freitag, 10. Juni 2011
Howdoesitwork?
REST like Interface
Indexing with POST
Search with GET
Results in XML, JSON, PHP and many more
Libraries for many programming languages
SolrPhpClient
Freitag, 10. Juni 2011
Whyandhow?
Freitag, 10. Juni 2011
ScratchingourItch
Why?
Indexed Search was too slow
misses a lot of now a days requirements
Freitag, 10. Juni 2011
History
Prototype im Summer 2008
Kick-off February 2009
„Acts like Indexed Search“
Early Access Program
T3CON September 2009 Version 1.0
Freitag, 10. Juni 2011
Components
Indexing
Search
Flexible Templating
Analysis and Statistics
Administration
Freitag, 10. Juni 2011
Challenges
Page Rendering in TYPO3
Access Rights
File Indexing
Easy Setup for Non Java People
Integrating Solr in general
Freitag, 10. Juni 2011
Solutions
Record Monitor und Indexing Queue
Solr Query Parser Plugin
Integration of Apache Tika
Fully Automated bash Install Script
SolrPhpClient
Freitag, 10. Juni 2011
Features
Facetted Search
File Indexing
Multi-language Support
Did you mean
Freitag, 10. Juni 2011
Features
Search Word Highlighting
Autocomplete / Suggestions
Access Rights Support
More to come
Freitag, 10. Juni 2011
Watchout!
Freitag, 10. Juni 2011
„I do not have any solution. I admire the problem.“
Ashleight Brillant, Cartonist and Author.
Freitag, 10. Juni 2011
CommonProblems
Relanvancy Perception Trap
Assumption: Search should display a certain
result like an Employee Name
Query: Mike Miller
Results: Mill 100% Relanvancy
Miller 75% Relanvancy
Possible Issue: Stemming on proper Names
Solution: Don‘t stemm Fields with Names
Freitag, 10. Juni 2011
CommonProblems
Finding Corpses in your Corpus
While Searching you find „interesting“ Results
You have forgotten to hide content
You have not set the „no search“ Flag
You have made copies of records and
forgotten them
Freitag, 10. Juni 2011
CommonProblems
Data updates without using the TCE Main
You wonder: Why do my new records of table
XY not show up
You have updated the tables with i.e
phpMyAdmin
You might have forgotten to add the Language
id in the records
Freitag, 10. Juni 2011
CommonProblems
Can‘t access the Solr Server
You can not access the Solr Server on another
Machine
Possible Solution
Freitag, 10. Juni 2011
CommonProblems
Help my Index gets deleted
Syntom: Your Index is empty
Possible Cause: Your Solr Server is not secured
Freitag, 10. Juni 2011
CommonProblems
My news are not being indexed
News that you have in a Sysfolder are not
showing up in your Results
The Folder in not in the rootline of the Website
Configure the PID of the Sysfolder correctly
Freitag, 10. Juni 2011
Questions?
Freitag, 10. Juni 2011
d dk
development
kommunikation
design
Thankyou.
Freitag, 10. Juni 2011

Contenu connexe

Similaire à Searching does not mean finding Stuff - Apache Solr for TYPO3

Mwrc2011 cookbook design patterns
Mwrc2011 cookbook design patternsMwrc2011 cookbook design patterns
Mwrc2011 cookbook design patterns
jtimberman
 
Opera Mobile HTML5 CSS3 Standards
Opera Mobile HTML5 CSS3 StandardsOpera Mobile HTML5 CSS3 Standards
Opera Mobile HTML5 CSS3 Standards
Zi Bin Cheah
 
Generating Print Sales Leads with LinkedIn session 1
Generating Print Sales Leads with LinkedIn session 1Generating Print Sales Leads with LinkedIn session 1
Generating Print Sales Leads with LinkedIn session 1
Joe Kern
 
RIA Unleashed - Developing for the TV with litl os
RIA Unleashed - Developing for the TV with litl osRIA Unleashed - Developing for the TV with litl os
RIA Unleashed - Developing for the TV with litl os
ryancanulla
 
WebShell - confoo 2011 - sean coates
WebShell - confoo 2011 - sean coatesWebShell - confoo 2011 - sean coates
WebShell - confoo 2011 - sean coates
Bachkoutou Toutou
 

Similaire à Searching does not mean finding Stuff - Apache Solr for TYPO3 (20)

10 Tips For improving Traffic and Conversions on your Drupal Site
10 Tips For improving Traffic and Conversions on your Drupal Site10 Tips For improving Traffic and Conversions on your Drupal Site
10 Tips For improving Traffic and Conversions on your Drupal Site
 
Online journalism: thinking about platforms
Online journalism: thinking about platformsOnline journalism: thinking about platforms
Online journalism: thinking about platforms
 
Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011
 
Introduction to Confluence Blueprints
Introduction to Confluence BlueprintsIntroduction to Confluence Blueprints
Introduction to Confluence Blueprints
 
Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020
 
Mobile apps using drupal as base system SumitK DrupalCon Chicago
Mobile apps using drupal as base system   SumitK DrupalCon ChicagoMobile apps using drupal as base system   SumitK DrupalCon Chicago
Mobile apps using drupal as base system SumitK DrupalCon Chicago
 
Mwrc2011 cookbook design patterns
Mwrc2011 cookbook design patternsMwrc2011 cookbook design patterns
Mwrc2011 cookbook design patterns
 
Data Journalism 2: Interrogating, Visualising and Mashing
Data Journalism 2: Interrogating, Visualising and MashingData Journalism 2: Interrogating, Visualising and Mashing
Data Journalism 2: Interrogating, Visualising and Mashing
 
Opera Mobile HTML5 CSS3 Standards
Opera Mobile HTML5 CSS3 StandardsOpera Mobile HTML5 CSS3 Standards
Opera Mobile HTML5 CSS3 Standards
 
Generating Print Sales Leads with LinkedIn session 1
Generating Print Sales Leads with LinkedIn session 1Generating Print Sales Leads with LinkedIn session 1
Generating Print Sales Leads with LinkedIn session 1
 
RIA Unleashed - Developing for the TV with litl os
RIA Unleashed - Developing for the TV with litl osRIA Unleashed - Developing for the TV with litl os
RIA Unleashed - Developing for the TV with litl os
 
Best. Plone. Ever! Presenting Plone 3.
Best. Plone. Ever!  Presenting Plone 3.Best. Plone. Ever!  Presenting Plone 3.
Best. Plone. Ever! Presenting Plone 3.
 
20100608sigmod
20100608sigmod20100608sigmod
20100608sigmod
 
WebShell - confoo 2011 - sean coates
WebShell - confoo 2011 - sean coatesWebShell - confoo 2011 - sean coates
WebShell - confoo 2011 - sean coates
 
Drupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the LibraryDrupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the Library
 
Apachecon 2011 stanbol_ogrisel
Apachecon 2011 stanbol_ogriselApachecon 2011 stanbol_ogrisel
Apachecon 2011 stanbol_ogrisel
 
IUG 2011 Intelligent Webpac
IUG 2011 Intelligent WebpacIUG 2011 Intelligent Webpac
IUG 2011 Intelligent Webpac
 
Searching for X: Search Interface Usability
Searching for X: Search Interface UsabilitySearching for X: Search Interface Usability
Searching for X: Search Interface Usability
 
Most important features when choosing an electronic lab notebook
Most important features when choosing an electronic lab notebookMost important features when choosing an electronic lab notebook
Most important features when choosing an electronic lab notebook
 
Apache Stanbol 
and the Web of Data - ApacheCon 2011
Apache Stanbol 
and the Web of Data - ApacheCon 2011Apache Stanbol 
and the Web of Data - ApacheCon 2011
Apache Stanbol 
and the Web of Data - ApacheCon 2011
 

Plus de Olivier Dobberkau

ForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT: Beyond the page: Giving content a meaning and valueForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT: Beyond the page: Giving content a meaning and value
Olivier Dobberkau
 
ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014
Olivier Dobberkau
 
Everything you always wanted to know about search in typo3
Everything you always wanted to know about search in typo3Everything you always wanted to know about search in typo3
Everything you always wanted to know about search in typo3
Olivier Dobberkau
 

Plus de Olivier Dobberkau (20)

Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
 
Apache Solr for TYPO3: More than a search engine
Apache Solr for TYPO3: More than a search engineApache Solr for TYPO3: More than a search engine
Apache Solr for TYPO3: More than a search engine
 
TYPO3 v8 LTS in the cloud
TYPO3 v8 LTS in the cloudTYPO3 v8 LTS in the cloud
TYPO3 v8 LTS in the cloud
 
With a little help from my friends (english)
With a little help  from my friends (english)With a little help  from my friends (english)
With a little help from my friends (english)
 
With a little help from my friends
With a little help from my friendsWith a little help from my friends
With a little help from my friends
 
TYPO3 & You
TYPO3 & YouTYPO3 & You
TYPO3 & You
 
Sonnenschein für ihre Website
Sonnenschein für ihre WebsiteSonnenschein für ihre Website
Sonnenschein für ihre Website
 
Apache Solr Revisited 2015
Apache Solr Revisited 2015Apache Solr Revisited 2015
Apache Solr Revisited 2015
 
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
 
TYPO3 and CMIS
TYPO3 and CMISTYPO3 and CMIS
TYPO3 and CMIS
 
ForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT: Beyond the page: Giving content a meaning and valueForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT: Beyond the page: Giving content a meaning and value
 
ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014
 
Explain TYPO3 Association March 2014
Explain TYPO3 Association March 2014Explain TYPO3 Association March 2014
Explain TYPO3 Association March 2014
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101
 
EXPLAIN #t3a
EXPLAIN #t3aEXPLAIN #t3a
EXPLAIN #t3a
 
Outside the Box - Panel on CMS at TYPO3 Camp Mallorca
Outside the Box - Panel on CMS at TYPO3 Camp MallorcaOutside the Box - Panel on CMS at TYPO3 Camp Mallorca
Outside the Box - Panel on CMS at TYPO3 Camp Mallorca
 
Status & Outlook on EXT:solr for TYPO3 CMS
Status & Outlook on EXT:solr for TYPO3 CMSStatus & Outlook on EXT:solr for TYPO3 CMS
Status & Outlook on EXT:solr for TYPO3 CMS
 
The future of CMS @T3UNI 2013 Annecy France
The future of CMS @T3UNI 2013 Annecy FranceThe future of CMS @T3UNI 2013 Annecy France
The future of CMS @T3UNI 2013 Annecy France
 
Digital dark age - Are we doing enough to preserve our website heritage?
Digital dark age - Are we doing enough to preserve our website heritage?Digital dark age - Are we doing enough to preserve our website heritage?
Digital dark age - Are we doing enough to preserve our website heritage?
 
Everything you always wanted to know about search in typo3
Everything you always wanted to know about search in typo3Everything you always wanted to know about search in typo3
Everything you always wanted to know about search in typo3
 

Dernier

Dernier (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Searching does not mean finding Stuff - Apache Solr for TYPO3