SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
APACHE SOLR CMS INTEGRATION
Ingo Renner
Software Engineer
we build smart.
ID INFIELD DESIGN
MAY.01.2013
LUCENE/SOLR REVOLUTION
TYPO3 CMS and Solr. How we did it.
APACHE SOLR CMS INTEGRATION
ABOUT ID
What we do and who we do it for
• Strategy Planning
• Design
• UX
• Development & Integration
WHO IS THIS GUY?
• Committer TYPO3 CMS
• Committer and PMC member Apache Tika
• Release Manager TYPO3 CMS 4.2
• New San Franciscan
• Snowboarding, mountain biking
• Software Engineer, Architect at Infield Design
- Caution -
TYPO3-Evangelist
TYPO3 CMS
TYPO3 CMS
• Free and Open Source Enterprise CMS
• Estimated 500,000+ installations worldwide
• Over 6,000+ public extensions
• 6,000,000+ downloads
• Content Management Framework
• Multi-Site, Multi-Language, Versioning, Workflows, ...
• Stable, Secure, Scaleable
TYPO3 COMMUNITY
• Community driven development
• Conferences in North America, Europe, Asia
• Barcamps, Developer Days, Snowboard Tour
• 4 times Google Summer of Code participant
• Backed by TYPO3 Association
• Several other projects under the TYPO3 brand
SOLR & CMS
INTEGRATION
Integration Challenges & Solutions
PAGE RENDERING
• Different template engines
• (too) flexible page rendering engine
• Identify relevant content on websites
• Exclude navigation and common page elements
• Content generated by plugins
Integration Challenges & Solutions
INDEX QUEUE
• Index Queue to track and index content
• Record Monitor to update Index Queue
• Crawl pages, index unstructured content marked relevant
• Exclude pages with plugin-generated content
• Index structured plugin data directly from DB
Integration Challenges & Solutions
ACCESS RIGHTS
• Intranet, Extranet, ...
• Not everybody may see everything
• Flexible user groups and permissions
• Permissions extended to sub-pages
Integration Challenges & Solutions
SOLR ACCESS FILTER PLUGIN
• Custom Solr access filter plugin
• Query Parser and Filter
• User group IDs stored in documents
• Current user’s groups submitted with query
• Plugin matches document groups with user’s groups
Integration Challenges & Solutions
FILE INDEXING
• Finding file links in page content
• Core file links vs. plugin file links
• Track files for indexing
• Reading file content
• Separate tools for different file formats
Integration Challenges & Solutions
FILE INDEXING
• File Detectors & File Index Queue
• File system abstraction layer
• Apache Tika
• Knows 1,200+ file formats, reads about half of them
• Content & meta data extraction
• Language detection
Integration Challenges & Solutions
THE REST
• PHP people vs. Java technology
• Talking to Solr
• Learning from mistakes
Integration Challenges & Solutions
THE REST
• Fully automated bash install script
• SolrPhpClient
• Separate your languages
EXT:solr - Apache Solr for TYPO3
FEATURES
• Facetted Search
• File Indexing
• Multi-Language & Multi-Site Support
• Did you mean, More Like This
• Search Word Highlighting
• Auto Complete
• Access Rights Support
• Many More ...
we build smart.
ID INFIELD DESIGN
QUESTIONS?
ID INFIELD DESIGN
we build smart.
THANKS.
ID INFIELD DESIGN
we build smart.
T3CON North America
San Francisco, May 30-31
20% off regular ticket price, use:
LUCENETYPO3
INFIELD DESIGN is hiring!
CONFERENCE PARTY
The Tipsy Crow: 770 5th Ave
Starts after Stump The Chump
Your conference badge gets
you in the door
TOMORROW
Breakfast starts at 7:30
Keynotes start at 8:30
CONTACT
@irnnr
ingo@typo3.org, ingo@apache.org

Contenu connexe

Tendances

Tendances (8)

Lois Patterson: Markup Languages and Warp-Speed Documentation
Lois Patterson:  Markup Languages and Warp-Speed DocumentationLois Patterson:  Markup Languages and Warp-Speed Documentation
Lois Patterson: Markup Languages and Warp-Speed Documentation
 
Introduction to sitecore identity
Introduction to sitecore identityIntroduction to sitecore identity
Introduction to sitecore identity
 
Creating your own private Download Center with Bintray
Creating your own private Download Center with Bintray Creating your own private Download Center with Bintray
Creating your own private Download Center with Bintray
 
Overview of SuiteHelp 3.1 for DITA
Overview of SuiteHelp 3.1 for DITAOverview of SuiteHelp 3.1 for DITA
Overview of SuiteHelp 3.1 for DITA
 
AWS Elastic Container Registry
AWS Elastic Container RegistryAWS Elastic Container Registry
AWS Elastic Container Registry
 
Pimp your web browser at the library
Pimp your web browser at the libraryPimp your web browser at the library
Pimp your web browser at the library
 
Joe Gelb: Taxonomy and Delivery
Joe Gelb: Taxonomy and DeliveryJoe Gelb: Taxonomy and Delivery
Joe Gelb: Taxonomy and Delivery
 
TYPO3 Camp Poznan - Solr Usecases with Hosted Solr
TYPO3 Camp Poznan - Solr Usecases with Hosted SolrTYPO3 Camp Poznan - Solr Usecases with Hosted Solr
TYPO3 Camp Poznan - Solr Usecases with Hosted Solr
 

En vedette

Hippo get together presentation solr integration
Hippo get together presentation   solr integrationHippo get together presentation   solr integration
Hippo get together presentation solr integration
Hippo
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repository
nobby
 
Hippo gettogether april 2012 faceted navigation a tale of daemons
Hippo gettogether april 2012 faceted navigation   a tale of daemonsHippo gettogether april 2012 faceted navigation   a tale of daemons
Hippo gettogether april 2012 faceted navigation a tale of daemons
Hippo
 
Hippo get together workshop automatic export
Hippo get together   workshop automatic exportHippo get together   workshop automatic export
Hippo get together workshop automatic export
Hippo
 

En vedette (20)

Hippo get together presentation solr integration
Hippo get together presentation   solr integrationHippo get together presentation   solr integration
Hippo get together presentation solr integration
 
2008-12 OJUG JCR Demo
2008-12 OJUG JCR Demo2008-12 OJUG JCR Demo
2008-12 OJUG JCR Demo
 
Introducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management PlatformIntroducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management Platform
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repository
 
Hippo gettogether april 2012 faceted navigation a tale of daemons
Hippo gettogether april 2012 faceted navigation   a tale of daemonsHippo gettogether april 2012 faceted navigation   a tale of daemons
Hippo gettogether april 2012 faceted navigation a tale of daemons
 
Sharing content between hippo and solr
Sharing content between hippo and solrSharing content between hippo and solr
Sharing content between hippo and solr
 
Integration eines responsive CSS-Framework in TYPO3
Integration eines responsive CSS-Framework in TYPO3Integration eines responsive CSS-Framework in TYPO3
Integration eines responsive CSS-Framework in TYPO3
 
Brahe mass scale flexible indexing
Brahe   mass scale flexible indexingBrahe   mass scale flexible indexing
Brahe mass scale flexible indexing
 
Hippo meetup: enterprise search with Solr and elasticsearch
Hippo meetup: enterprise search with Solr and elasticsearchHippo meetup: enterprise search with Solr and elasticsearch
Hippo meetup: enterprise search with Solr and elasticsearch
 
40 extensions for TYPO3 CMS 6.2 you should try
40 extensions for TYPO3 CMS 6.2 you should try40 extensions for TYPO3 CMS 6.2 you should try
40 extensions for TYPO3 CMS 6.2 you should try
 
AngularJS und TYP-D'oh!3
AngularJS und TYP-D'oh!3AngularJS und TYP-D'oh!3
AngularJS und TYP-D'oh!3
 
Hippo CMS at OpenCo Amsterdam 2014
Hippo CMS at OpenCo Amsterdam 2014Hippo CMS at OpenCo Amsterdam 2014
Hippo CMS at OpenCo Amsterdam 2014
 
Hippo Presentation Jboye Study tour
Hippo Presentation Jboye Study tourHippo Presentation Jboye Study tour
Hippo Presentation Jboye Study tour
 
JCR In Action (ApacheCon US 2009)
JCR In Action (ApacheCon US 2009)JCR In Action (ApacheCon US 2009)
JCR In Action (ApacheCon US 2009)
 
Web Applications Development
Web Applications DevelopmentWeb Applications Development
Web Applications Development
 
Hippo get together workshop automatic export
Hippo get together   workshop automatic exportHippo get together   workshop automatic export
Hippo get together workshop automatic export
 
What's new in JSR-283?
What's new in JSR-283?What's new in JSR-283?
What's new in JSR-283?
 
Rapid JCR Applications Development with Sling
Rapid JCR Applications Development with SlingRapid JCR Applications Development with Sling
Rapid JCR Applications Development with Sling
 
App and web with Hippo CMS and AngularJS
App and web with Hippo CMS and AngularJS App and web with Hippo CMS and AngularJS
App and web with Hippo CMS and AngularJS
 
JCR and ModeShape
JCR and ModeShapeJCR and ModeShape
JCR and ModeShape
 

Similaire à Cms integration of apache solr how we did it.

Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 

Similaire à Cms integration of apache solr how we did it. (20)

Guide to open source
Guide to open source Guide to open source
Guide to open source
 
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
 
But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Alfresco Day Milano 2016 - Alfresco Product Update
Alfresco Day Milano 2016 - Alfresco Product UpdateAlfresco Day Milano 2016 - Alfresco Product Update
Alfresco Day Milano 2016 - Alfresco Product Update
 
WordPress & Other Content Management Systems
WordPress & Other Content Management SystemsWordPress & Other Content Management Systems
WordPress & Other Content Management Systems
 
Olympya web-tools 2011
Olympya web-tools 2011Olympya web-tools 2011
Olympya web-tools 2011
 
Business 2.0 with WordPress
Business 2.0 with WordPressBusiness 2.0 with WordPress
Business 2.0 with WordPress
 
WordPress - Open Source Overview Presentation
WordPress - Open Source Overview PresentationWordPress - Open Source Overview Presentation
WordPress - Open Source Overview Presentation
 
The Atlassian Tool Suite for Collaborative Science
The Atlassian Tool Suite for Collaborative ScienceThe Atlassian Tool Suite for Collaborative Science
The Atlassian Tool Suite for Collaborative Science
 
Thinking big with SharePoint the Howard Hughes Way!
Thinking big with SharePoint the Howard Hughes Way!Thinking big with SharePoint the Howard Hughes Way!
Thinking big with SharePoint the Howard Hughes Way!
 
Contributing to Open Source
Contributing to Open SourceContributing to Open Source
Contributing to Open Source
 
Cross Site Collection Navigation
Cross Site Collection NavigationCross Site Collection Navigation
Cross Site Collection Navigation
 
Code the docs-yu liu
Code the docs-yu liuCode the docs-yu liu
Code the docs-yu liu
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 
BlackHat USA 2013 Arsenal - Sparty : A FrontPage and SharePoint Security Audi...
BlackHat USA 2013 Arsenal - Sparty : A FrontPage and SharePoint Security Audi...BlackHat USA 2013 Arsenal - Sparty : A FrontPage and SharePoint Security Audi...
BlackHat USA 2013 Arsenal - Sparty : A FrontPage and SharePoint Security Audi...
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search Training
 
Cross Site Collection Navigation using SPFx, Powershell PnP & PnP-JS
Cross Site Collection Navigation using SPFx, Powershell PnP & PnP-JSCross Site Collection Navigation using SPFx, Powershell PnP & PnP-JS
Cross Site Collection Navigation using SPFx, Powershell PnP & PnP-JS
 
Intro to open source - 101 presentation
Intro to open source - 101 presentationIntro to open source - 101 presentation
Intro to open source - 101 presentation
 

Plus de lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

Cms integration of apache solr how we did it.

  • 1. APACHE SOLR CMS INTEGRATION Ingo Renner Software Engineer
  • 2. we build smart. ID INFIELD DESIGN MAY.01.2013 LUCENE/SOLR REVOLUTION TYPO3 CMS and Solr. How we did it. APACHE SOLR CMS INTEGRATION
  • 3. ABOUT ID What we do and who we do it for • Strategy Planning • Design • UX • Development & Integration
  • 4. WHO IS THIS GUY? • Committer TYPO3 CMS • Committer and PMC member Apache Tika • Release Manager TYPO3 CMS 4.2 • New San Franciscan • Snowboarding, mountain biking • Software Engineer, Architect at Infield Design - Caution - TYPO3-Evangelist
  • 6. TYPO3 CMS • Free and Open Source Enterprise CMS • Estimated 500,000+ installations worldwide • Over 6,000+ public extensions • 6,000,000+ downloads • Content Management Framework • Multi-Site, Multi-Language, Versioning, Workflows, ... • Stable, Secure, Scaleable
  • 7. TYPO3 COMMUNITY • Community driven development • Conferences in North America, Europe, Asia • Barcamps, Developer Days, Snowboard Tour • 4 times Google Summer of Code participant • Backed by TYPO3 Association • Several other projects under the TYPO3 brand
  • 9. Integration Challenges & Solutions PAGE RENDERING • Different template engines • (too) flexible page rendering engine • Identify relevant content on websites • Exclude navigation and common page elements • Content generated by plugins
  • 10. Integration Challenges & Solutions INDEX QUEUE • Index Queue to track and index content • Record Monitor to update Index Queue • Crawl pages, index unstructured content marked relevant • Exclude pages with plugin-generated content • Index structured plugin data directly from DB
  • 11. Integration Challenges & Solutions ACCESS RIGHTS • Intranet, Extranet, ... • Not everybody may see everything • Flexible user groups and permissions • Permissions extended to sub-pages
  • 12. Integration Challenges & Solutions SOLR ACCESS FILTER PLUGIN • Custom Solr access filter plugin • Query Parser and Filter • User group IDs stored in documents • Current user’s groups submitted with query • Plugin matches document groups with user’s groups
  • 13. Integration Challenges & Solutions FILE INDEXING • Finding file links in page content • Core file links vs. plugin file links • Track files for indexing • Reading file content • Separate tools for different file formats
  • 14. Integration Challenges & Solutions FILE INDEXING • File Detectors & File Index Queue • File system abstraction layer • Apache Tika • Knows 1,200+ file formats, reads about half of them • Content & meta data extraction • Language detection
  • 15. Integration Challenges & Solutions THE REST • PHP people vs. Java technology • Talking to Solr • Learning from mistakes
  • 16. Integration Challenges & Solutions THE REST • Fully automated bash install script • SolrPhpClient • Separate your languages
  • 17. EXT:solr - Apache Solr for TYPO3 FEATURES • Facetted Search • File Indexing • Multi-Language & Multi-Site Support • Did you mean, More Like This • Search Word Highlighting • Auto Complete • Access Rights Support • Many More ...
  • 18.
  • 19. we build smart. ID INFIELD DESIGN QUESTIONS?
  • 20. ID INFIELD DESIGN we build smart. THANKS.
  • 21. ID INFIELD DESIGN we build smart. T3CON North America San Francisco, May 30-31 20% off regular ticket price, use: LUCENETYPO3 INFIELD DESIGN is hiring!
  • 22. CONFERENCE PARTY The Tipsy Crow: 770 5th Ave Starts after Stump The Chump Your conference badge gets you in the door TOMORROW Breakfast starts at 7:30 Keynotes start at 8:30 CONTACT @irnnr ingo@typo3.org, ingo@apache.org