This document summarizes a presentation about search capabilities in TYPO3 using Apache Solr. It defines information retrieval and discusses factors like recall, precision and indexing. It introduces Apache Solr's features for enterprise search and how it can be integrated with TYPO3 using extensions. Finally, it outlines some common problems users may face and potential solutions.
6. OlivierDobberkau
Founder of dkd Internet Service GmbH
aka „the reverend never-end“
Met TYPO3 with Version 3.2 beta 3
Member of T3A BCC
43 years old
olivier.dobberkau@dkd.de
Twitter: @T3RevNeverEnd
Freitag, 10. Juni 2011
8. DefinitionofInformationRetrieval
Information retrieval (IR) is the area of study
concerned with searching for documents, for
information within documents, and for metadata
about documents, as well as that of searching
relational databases and the World Wide Web.
Wikipedia:
http://en.wikipedia.org/wiki/Information_retrieval
Freitag, 10. Juni 2011
14. Index
The purpose of storing an index is to optimize
speed and performance in finding relevant
documents for a search query.
Freitag, 10. Juni 2011
15. Index
Index
Document 5
Document 4
Document 3
Document 2
Document 1
Extbase
TYPO3
San
Baseball
My
is
Francisco
is
cat
T3CON
my
is
a
rocks
Fort
cool
Ghetto
Mason
Sport
Freitag, 10. Juni 2011
18. IndexedSearch
Indexed Search since TYPO3 Version 3.5
Frontend Indexing through the Frontend
Searches in Pages and in some Filetypes
Works with Languages and Accessrights
Freitag, 10. Juni 2011
21. Expectationvs.Experience
Users expect „Google-Like“ interface and
behaviour in search
No one navigates through an online shop
up to 30% of users use the search instead of
going through text or navigation
Search is mediocre on a lot of websites
Slow and incomplete
Lots of improvement possible
Freitag, 10. Juni 2011
27. Howdoesitwork?
REST like Interface
Indexing with POST
Search with GET
Results in XML, JSON, PHP and many more
Libraries for many programming languages
SolrPhpClient
Freitag, 10. Juni 2011
30. History
Prototype im Summer 2008
Kick-off February 2009
„Acts like Indexed Search“
Early Access Program
T3CON September 2009 Version 1.0
Freitag, 10. Juni 2011
32. Challenges
Page Rendering in TYPO3
Access Rights
File Indexing
Easy Setup for Non Java People
Integrating Solr in general
Freitag, 10. Juni 2011
33. Solutions
Record Monitor und Indexing Queue
Solr Query Parser Plugin
Integration of Apache Tika
Fully Automated bash Install Script
SolrPhpClient
Freitag, 10. Juni 2011
37. „I do not have any solution. I admire the problem.“
Ashleight Brillant, Cartonist and Author.
Freitag, 10. Juni 2011
38. CommonProblems
Relanvancy Perception Trap
Assumption: Search should display a certain
result like an Employee Name
Query: Mike Miller
Results: Mill 100% Relanvancy
Miller 75% Relanvancy
Possible Issue: Stemming on proper Names
Solution: Don‘t stemm Fields with Names
Freitag, 10. Juni 2011
39. CommonProblems
Finding Corpses in your Corpus
While Searching you find „interesting“ Results
You have forgotten to hide content
You have not set the „no search“ Flag
You have made copies of records and
forgotten them
Freitag, 10. Juni 2011
40. CommonProblems
Data updates without using the TCE Main
You wonder: Why do my new records of table
XY not show up
You have updated the tables with i.e
phpMyAdmin
You might have forgotten to add the Language
id in the records
Freitag, 10. Juni 2011
41. CommonProblems
Can‘t access the Solr Server
You can not access the Solr Server on another
Machine
Possible Solution
Freitag, 10. Juni 2011
42. CommonProblems
Help my Index gets deleted
Syntom: Your Index is empty
Possible Cause: Your Solr Server is not secured
Freitag, 10. Juni 2011
43. CommonProblems
My news are not being indexed
News that you have in a Sysfolder are not
showing up in your Results
The Folder in not in the rootline of the Website
Configure the PID of the Sysfolder correctly
Freitag, 10. Juni 2011