SlideShare une entreprise Scribd logo
1  sur  25
From Lucene to Solr 4 Trunk

                               We Made It!
                       SF Bay Lucene / Solr Meetup

                                Troy Thomas
                                 17 Jan 2013



© Synopsys 2013    1
Lucene to Solr 4 Trunk
 Agenda
 •    Company - Background
 •    Project Inspiration
 •    Why Solr 4 – Why Trunk?
 •    Architecture (Front to Back)
 •    Trunk to Beta
 •    Future
 •    Demo
 •    Q and A




© Synopsys 2013   2
Company - Background
 Synopsys – What?
 • Synopsys – 25 year old company / 1.8B 2012 revenue
      – Electronic Design Automation (EDA)
      – Electrical engineers design computer chips using Synopsys
           – Verilog, VHDL - High level design
           – Simulation
           – Test
           – Power
           – Place and route
           – IP blocks


      – Nearly every semiconductor built uses Synopsys…
        microprocessors, RAM, etc.



© Synopsys 2013   3
Company Background
 Synopsys – SolvNet ®
 • SolvNet ® - online knowledge base system used by
   customers and employees
      – Dedicated engineering team
 • 20 year history
      –    1993 Email
      –    1995 A “Patchy” NCSA Web server + PERL CGI
      –    1997 Verity Netscape Search
      –    2001 Java – Netscape Iplanet Portal + Verity
      –    2005 Apache Lucene
      –    2007 Pure Apache
      –    2012 Solr 4



© Synopsys 2013   4
Lucene
 It’s complicated…
 • Moved to Lucene in 2005
      – Custom tokenization helped results
           – Ex: +delay_mode_zero
 • Auto-complete function 2008
      – Yahoo UI Widget
 •    Tomcat w/ RMI callback
 •    PDF Text extraction using PDFBox
 •    HTML parser
 •    Generate Lucene documents
      – Add to index
 • Separate collections – Articles, Docs

© Synopsys 2013   5
Project Inspiration
 Apachecon - Solr
 •    Advanced Full-Text Search Capabilities
 •    Optimized for High Volume Web Traffic
 •    Standards Based Open Interfaces - XML,JSON and HTTP
 •    Comprehensive HTML Administration Interfaces
 •    Server statistics exposed over JMX for monitoring
 •    Scalability - Efficient Replication to other Solr Search Servers
 •    Flexible and Adaptable with XML configuration
 •    Extensible Plugin Architecture
 •    Solr Uses the Lucene Search Library and Extends it!
 •    A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
 •    Powerful Extensions to the Lucene Query Language
 •    Faceted Search and Filtering
 •    Advanced, Configurable Text Analysis
 •    Highly Configurable and User Extensible Caching
 •    Performance Optimizations
 •    External Configuration via XML
 •    An Administration Interface
 •    Monitorable Logging
 •    Fast Incremental Updates and Index Replication
 •    Highly Scalable Distributed search with sharded index across multiple hosts
 •    XML, CSV/delimited-text, and binary update formats
 •    Easy ways to pull in data from databases and XML files from local disk and HTTP sources
 •    Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
 •    Multiple search indices
© Synopsys 2013        6
Solr 4
 Why?
 • Solr
      – Faceting
      – Modernize GUI
      – Deprecate custom code
           – Auto-complete using Yahoo UI
           – Did you mean?
      – Use Tika for more mime types
           – ExtractingRequestHandler (Solr Cell)
 • Solr 4 (trunk)
      –    DirectSolrSpellChecker
      –    More like this
      –    Synonym list
      –    Save migration
© Synopsys 2013   7
Front-End
 Screenshot




© Synopsys 2013   8
Front-End
 Research
 • How should we build new front-end?
      – Classic
           – JSF
           – JSP / Servlet (MVC)
      – Leverage framework
           – Apache Velocity
           – SolrJ
           – SolrJS
           – Myfaces
           – Ajax Solr




© Synopsys 2013   9
Front-End
 Research
 • Ajax Solr versus SolrJS
      – SolrJS (deprecated)
           – not fully IE 6, 7, 8 compatible
           – No highlight / sorting support
      – Ajax Solr
           – AbstractFacetWidget methods for faceting
           – AbstractTextWidget
           – PagerWidget for pagination
           – AutoComplete
           – Community weak




© Synopsys 2013   10
Front-End
 Ajax Solr
 • Ajax Solr
      – Advantage: Widgets
           – Save settings
           – Auto Complete
           – Query submit
           – Sort /display results
           – Pagination
           – Facet by product
           – Facet by doc type
           – JQUERY / JSON friendly
      – Challenges:
           – Session management
           – Proxy solution

© Synopsys 2013   11
Front-End
 Screenshot




© Synopsys 2013   12
Front-End
 Ajax Solr – JSON Object data - Firebug




© Synopsys 2013   13
Front-End
 DirectSolrSpellChecker – Auto Suggest




© Synopsys 2013   14
Front-End
 Extend Solr Highlighter




© Synopsys 2013   15
Back-end
 Tokenization
 • Carry custom tokenization work forward from Lucene
      – Change functionality – operator (ex: +delay_mode_zero)
 • Used text_rev xml configuration to reverse tokens for
   reverse index feature
      – Enables wildcard searching in front of string
      – *lock* *lock clock*
      – Apache Solr Mailing list community was very helpful




© Synopsys 2013   16
Back-end
 Tokenizer – text_rev configuration
 <!-- Similar to fieldtype text except text_rev reverses the characters of
     each token, to enable more efficient leading wildcard queries. -->
  <fieldType name="text_rev" class="solr.TextField" sortMissingLast="true" omitNorms="true">
    <analyzer type="index">
          <tokenizer class="com.synopsys.ies.solr.backend.analysis.standard.SolvNetTokenizerFactory"/>
          <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
          <filter class="com.synopsys.ies.solr.backend.analysis.standard.SolvNetFilterFactory"/>
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
          <filter class="com.synopsys.ies.solr.backend.analysis.standard.SpecialCharSynonymFilterFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
 <!-- Disable reverse indexing to save disk space and improve speed! -->
        <!-- filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
          maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/-->
    </analyzer>
    <analyzer type="query">
          <tokenizer class="com.synopsys.ies.solr.backend.analysis.standard.SolvNetTokenizerFactory"/>
          <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
          <filter class="com.synopsys.ies.solr.backend.analysis.standard.SolvNetFilterFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    </analyzer>
© Synopsys 2013    17
Back-end
 Strip out the noise
 Custom Input Stream Filter – strip out the noise




© Synopsys 2013   18
Back-end
 Sharding
 • A different way to shard
      – Many shards mapped to one collection
      – Shards used for easy maintenance (not performance)
           – One shard per documentation version (12 total)
           – One shard for articles
           – One for release notes
           – One shard for internal only articles
      – Full re-index Articles, Release Notes every few hours
           – Simpler implementation
      – Index Documentation – as needed




© Synopsys 2013   19
Trunk to Beta
 Minor annoyance
 • After go live – Solr 4 beta shipped
      – Minor changes
      – Tika and Zookeeper upgraded
      – ContentStreamUpdateRequest.addFile()
           –      addFile(File file) became addFile(File file, String contentType)
      – New setLuceneMatchVersion
           – LUCENE_4
           – Added to make unit tests work properly
 • Production remains on Solr 4 beta
      – Will migrate to Solr 4.1 production mid year




© Synopsys 2013    20
Future
 What remains
 • More tuning
      – Human and machine learning approaches
 • NRT indexing
      – Use article hits to boost results (Most popular sort)
      – Leverage article rating data
 • No SQL like features
      – Customer profile data




© Synopsys 2013   21
Demo




© Synopsys 2013   22
Special Thanks…
 Thank you Chris and Erik - Apachecon 2010




© Synopsys 2013   23
Final Thoughts
 Thank you Lucid Works
 Thank you for hosting this Meetup and your commitment to
 the Apache Community…




© Synopsys 2013   24
Q and A / Contact Me

  Questions?




© Synopsys 2013   25

Contenu connexe

Tendances

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisOfer Zelig
 
Oracle Traffic Director - a vital part of your Oracle infrastructure
Oracle Traffic Director - a vital part of your Oracle infrastructureOracle Traffic Director - a vital part of your Oracle infrastructure
Oracle Traffic Director - a vital part of your Oracle infrastructureSimon Haslam
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxPythian
 
Maria db 10 and the mariadb foundation(colin)
Maria db 10 and the mariadb foundation(colin)Maria db 10 and the mariadb foundation(colin)
Maria db 10 and the mariadb foundation(colin)kayokogoto
 
One daytalk hbraun_oct2011
One daytalk hbraun_oct2011One daytalk hbraun_oct2011
One daytalk hbraun_oct2011hbraun
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridVinay Kumar
 
Taking eZ Find beyond full-text search
Taking eZ Find beyond  full-text searchTaking eZ Find beyond  full-text search
Taking eZ Find beyond full-text searchPaul Borgermans
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Impala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDImpala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDMatthew Jacobs
 
Fontys Lecture - The Evolution of the Oracle Database 2016
Fontys Lecture -  The Evolution of the Oracle Database 2016Fontys Lecture -  The Evolution of the Oracle Database 2016
Fontys Lecture - The Evolution of the Oracle Database 2016Lucas Jellema
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksLucidworks
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!Paul Borgermans
 
What's New and Noteworthy on Oracle CAF 12.1.3
What's New and Noteworthy on Oracle CAF 12.1.3What's New and Noteworthy on Oracle CAF 12.1.3
What's New and Noteworthy on Oracle CAF 12.1.3Bruno Borges
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Josh Elser
 

Tendances (20)

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Oracle Traffic Director - a vital part of your Oracle infrastructure
Oracle Traffic Director - a vital part of your Oracle infrastructureOracle Traffic Director - a vital part of your Oracle infrastructure
Oracle Traffic Director - a vital part of your Oracle infrastructure
 
MySQL 5.7 what's new
MySQL 5.7 what's newMySQL 5.7 what's new
MySQL 5.7 what's new
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
 
Maria db 10 and the mariadb foundation(colin)
Maria db 10 and the mariadb foundation(colin)Maria db 10 and the mariadb foundation(colin)
Maria db 10 and the mariadb foundation(colin)
 
One daytalk hbraun_oct2011
One daytalk hbraun_oct2011One daytalk hbraun_oct2011
One daytalk hbraun_oct2011
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug Madrid
 
Sharepoint Deployments
Sharepoint DeploymentsSharepoint Deployments
Sharepoint Deployments
 
Taking eZ Find beyond full-text search
Taking eZ Find beyond  full-text searchTaking eZ Find beyond  full-text search
Taking eZ Find beyond full-text search
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Impala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDImpala Resource Management - OUTDATED
Impala Resource Management - OUTDATED
 
Fontys Lecture - The Evolution of the Oracle Database 2016
Fontys Lecture -  The Evolution of the Oracle Database 2016Fontys Lecture -  The Evolution of the Oracle Database 2016
Fontys Lecture - The Evolution of the Oracle Database 2016
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!
 
October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
 
What's New and Noteworthy on Oracle CAF 12.1.3
What's New and Noteworthy on Oracle CAF 12.1.3What's New and Noteworthy on Oracle CAF 12.1.3
What's New and Noteworthy on Oracle CAF 12.1.3
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
 

Similaire à From Lucene to Solr 4 Trunk

What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25Jon Petter Hjulstad
 
OUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteOUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteJon Petter Hjulstad
 
MySQL Day Paris 2016 - MySQL as a Document Store
MySQL Day Paris 2016 - MySQL as a Document StoreMySQL Day Paris 2016 - MySQL as a Document Store
MySQL Day Paris 2016 - MySQL as a Document StoreOlivier DASINI
 
symfony_from_scratch
symfony_from_scratchsymfony_from_scratch
symfony_from_scratchtutorialsruby
 
symfony_from_scratch
symfony_from_scratchsymfony_from_scratch
symfony_from_scratchtutorialsruby
 
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchNetConstructor, Inc.
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
MySQL & Oracle Linux Keynote at Open Source India 2014
MySQL & Oracle Linux Keynote at Open Source India 2014MySQL & Oracle Linux Keynote at Open Source India 2014
MySQL & Oracle Linux Keynote at Open Source India 2014Sanjay Manwani
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1Stefan Schmidt
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 
Drupal performance
Drupal performanceDrupal performance
Drupal performanceGabi Lee
 
MySQL :What's New #GIDS16
MySQL :What's New #GIDS16MySQL :What's New #GIDS16
MySQL :What's New #GIDS16Sanjay Manwani
 

Similaire à From Lucene to Solr 4 Trunk (20)

Solr 4
Solr 4Solr 4
Solr 4
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25
 
OUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteOUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA Suite
 
MySQL Day Paris 2016 - MySQL as a Document Store
MySQL Day Paris 2016 - MySQL as a Document StoreMySQL Day Paris 2016 - MySQL as a Document Store
MySQL Day Paris 2016 - MySQL as a Document Store
 
Solr 101
Solr 101Solr 101
Solr 101
 
symfony_from_scratch
symfony_from_scratchsymfony_from_scratch
symfony_from_scratch
 
symfony_from_scratch
symfony_from_scratchsymfony_from_scratch
symfony_from_scratch
 
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
MySQL & Oracle Linux Keynote at Open Source India 2014
MySQL & Oracle Linux Keynote at Open Source India 2014MySQL & Oracle Linux Keynote at Open Source India 2014
MySQL & Oracle Linux Keynote at Open Source India 2014
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Upgrading to Alfresco 6
Upgrading to Alfresco 6Upgrading to Alfresco 6
Upgrading to Alfresco 6
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
MySQL :What's New #GIDS16
MySQL :What's New #GIDS16MySQL :What's New #GIDS16
MySQL :What's New #GIDS16
 

Dernier

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

From Lucene to Solr 4 Trunk

  • 1. From Lucene to Solr 4 Trunk We Made It! SF Bay Lucene / Solr Meetup Troy Thomas 17 Jan 2013 © Synopsys 2013 1
  • 2. Lucene to Solr 4 Trunk Agenda • Company - Background • Project Inspiration • Why Solr 4 – Why Trunk? • Architecture (Front to Back) • Trunk to Beta • Future • Demo • Q and A © Synopsys 2013 2
  • 3. Company - Background Synopsys – What? • Synopsys – 25 year old company / 1.8B 2012 revenue – Electronic Design Automation (EDA) – Electrical engineers design computer chips using Synopsys – Verilog, VHDL - High level design – Simulation – Test – Power – Place and route – IP blocks – Nearly every semiconductor built uses Synopsys… microprocessors, RAM, etc. © Synopsys 2013 3
  • 4. Company Background Synopsys – SolvNet ® • SolvNet ® - online knowledge base system used by customers and employees – Dedicated engineering team • 20 year history – 1993 Email – 1995 A “Patchy” NCSA Web server + PERL CGI – 1997 Verity Netscape Search – 2001 Java – Netscape Iplanet Portal + Verity – 2005 Apache Lucene – 2007 Pure Apache – 2012 Solr 4 © Synopsys 2013 4
  • 5. Lucene It’s complicated… • Moved to Lucene in 2005 – Custom tokenization helped results – Ex: +delay_mode_zero • Auto-complete function 2008 – Yahoo UI Widget • Tomcat w/ RMI callback • PDF Text extraction using PDFBox • HTML parser • Generate Lucene documents – Add to index • Separate collections – Articles, Docs © Synopsys 2013 5
  • 6. Project Inspiration Apachecon - Solr • Advanced Full-Text Search Capabilities • Optimized for High Volume Web Traffic • Standards Based Open Interfaces - XML,JSON and HTTP • Comprehensive HTML Administration Interfaces • Server statistics exposed over JMX for monitoring • Scalability - Efficient Replication to other Solr Search Servers • Flexible and Adaptable with XML configuration • Extensible Plugin Architecture • Solr Uses the Lucene Search Library and Extends it! • A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys • Powerful Extensions to the Lucene Query Language • Faceted Search and Filtering • Advanced, Configurable Text Analysis • Highly Configurable and User Extensible Caching • Performance Optimizations • External Configuration via XML • An Administration Interface • Monitorable Logging • Fast Incremental Updates and Index Replication • Highly Scalable Distributed search with sharded index across multiple hosts • XML, CSV/delimited-text, and binary update formats • Easy ways to pull in data from databases and XML files from local disk and HTTP sources • Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika • Multiple search indices © Synopsys 2013 6
  • 7. Solr 4 Why? • Solr – Faceting – Modernize GUI – Deprecate custom code – Auto-complete using Yahoo UI – Did you mean? – Use Tika for more mime types – ExtractingRequestHandler (Solr Cell) • Solr 4 (trunk) – DirectSolrSpellChecker – More like this – Synonym list – Save migration © Synopsys 2013 7
  • 9. Front-End Research • How should we build new front-end? – Classic – JSF – JSP / Servlet (MVC) – Leverage framework – Apache Velocity – SolrJ – SolrJS – Myfaces – Ajax Solr © Synopsys 2013 9
  • 10. Front-End Research • Ajax Solr versus SolrJS – SolrJS (deprecated) – not fully IE 6, 7, 8 compatible – No highlight / sorting support – Ajax Solr – AbstractFacetWidget methods for faceting – AbstractTextWidget – PagerWidget for pagination – AutoComplete – Community weak © Synopsys 2013 10
  • 11. Front-End Ajax Solr • Ajax Solr – Advantage: Widgets – Save settings – Auto Complete – Query submit – Sort /display results – Pagination – Facet by product – Facet by doc type – JQUERY / JSON friendly – Challenges: – Session management – Proxy solution © Synopsys 2013 11
  • 13. Front-End Ajax Solr – JSON Object data - Firebug © Synopsys 2013 13
  • 14. Front-End DirectSolrSpellChecker – Auto Suggest © Synopsys 2013 14
  • 15. Front-End Extend Solr Highlighter © Synopsys 2013 15
  • 16. Back-end Tokenization • Carry custom tokenization work forward from Lucene – Change functionality – operator (ex: +delay_mode_zero) • Used text_rev xml configuration to reverse tokens for reverse index feature – Enables wildcard searching in front of string – *lock* *lock clock* – Apache Solr Mailing list community was very helpful © Synopsys 2013 16
  • 17. Back-end Tokenizer – text_rev configuration <!-- Similar to fieldtype text except text_rev reverses the characters of each token, to enable more efficient leading wildcard queries. --> <fieldType name="text_rev" class="solr.TextField" sortMissingLast="true" omitNorms="true"> <analyzer type="index"> <tokenizer class="com.synopsys.ies.solr.backend.analysis.standard.SolvNetTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="com.synopsys.ies.solr.backend.analysis.standard.SolvNetFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="com.synopsys.ies.solr.backend.analysis.standard.SpecialCharSynonymFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- Disable reverse indexing to save disk space and improve speed! --> <!-- filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/--> </analyzer> <analyzer type="query"> <tokenizer class="com.synopsys.ies.solr.backend.analysis.standard.SolvNetTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="com.synopsys.ies.solr.backend.analysis.standard.SolvNetFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> </analyzer> © Synopsys 2013 17
  • 18. Back-end Strip out the noise Custom Input Stream Filter – strip out the noise © Synopsys 2013 18
  • 19. Back-end Sharding • A different way to shard – Many shards mapped to one collection – Shards used for easy maintenance (not performance) – One shard per documentation version (12 total) – One shard for articles – One for release notes – One shard for internal only articles – Full re-index Articles, Release Notes every few hours – Simpler implementation – Index Documentation – as needed © Synopsys 2013 19
  • 20. Trunk to Beta Minor annoyance • After go live – Solr 4 beta shipped – Minor changes – Tika and Zookeeper upgraded – ContentStreamUpdateRequest.addFile() – addFile(File file) became addFile(File file, String contentType) – New setLuceneMatchVersion – LUCENE_4 – Added to make unit tests work properly • Production remains on Solr 4 beta – Will migrate to Solr 4.1 production mid year © Synopsys 2013 20
  • 21. Future What remains • More tuning – Human and machine learning approaches • NRT indexing – Use article hits to boost results (Most popular sort) – Leverage article rating data • No SQL like features – Customer profile data © Synopsys 2013 21
  • 23. Special Thanks… Thank you Chris and Erik - Apachecon 2010 © Synopsys 2013 23
  • 24. Final Thoughts Thank you Lucid Works Thank you for hosting this Meetup and your commitment to the Apache Community… © Synopsys 2013 24
  • 25. Q and A / Contact Me Questions? © Synopsys 2013 25