SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
cominvent as
                     Enterprise Search Specialists




               Migrating FAST to Solr
                   by Jan Høydahl
cominvent as
cominvent as



cominvent as
Consulting

    – Cominvent delivers independent search consulting
    – Focus on Apache Lucene/Solr & Microsoft FAST ESP
    – We know both the proprietary and Open Source worlds,
      their benefits and disadvantages. We help you choose.
      We help you maximize your chosen engine, and we
      help you migrate as your requirements change.




cominvent as
Training

    – Cominvent AS delivers training public and on-site
    – Certified Solr Training Partner for Lucid Imagination
    – Certified FAST ESP Training Partner

    – Read more: http://www.cominvent.com/training/




cominvent as
                                                       Photo: fluidpowerzone.com
Commercial Support

    – When community & mailing list support is not enough..
    – Paid support agreement for Apache Solr/Lucene
    – In cooperation with Lucid Imagination

    – Read more: http://www.cominvent.com/support/




cominvent as
Jan Høydahl – experience

                          ●    IT architect, 15 years with
                               search, telecom, mobile
                          ●    Helped build FAST's Global
                               Services as first engineer
                          ●    Founder of Cominvent AS
                          ●    Search consultant 10 years
                          ●    Certified Solr instructor




cominvent as
Recommendations




    «His skills on Fast ESP is in-depth, thorough, and
    probably amongst the best you can get. Jan is
    working independently, but also well in teams.
    Whether it is technical or business work, Jan does
    not fall behind. His excellent skills to see things from
    the holistic perspective is great.»

    -Knut Stenmark, DPM AS



cominvent as
Sample consulting projects

    World wide news agency
    Chief architect of FAST ESP search solution, migrating from Autonomy
    IDOL. Real-time news, alerting etc.
    Major Swedish newspaper
    Architect for new Topic Page solution, letting editors define topics based on
    keywords and regex rules.
    Norwegian Yellow Pages actor
    Architect for migrating traditional DB backed catalog search to modern one-
    search box solution.
    Classifieds and real estate online broker
    Advise on migrating from DB to search. Architect for FAST ESP solution
    with Norwegian linguistics, search middleware and relevance tuning.
    Leading news surveillance company
    Helped implement and tune real-time search using FAST ESP and real-time
    alerting using FAST RTA.
cominvent as
Sample Solr Training references


                                Library organization
    – Danish national library   – Global library org,
      organization serving all    serving hundreds of
      Danish libraries            libraries world wide
    – Migrating from in-house – Helping them migrate
      search to Apache Solr for   from FAST to Solr
      all their search          – First step is Classroom
    – Delivered Solr training     Training in March 2010
      course in 2010




cominvent as
cominvent as
About Apache Solr

    –   Open Source enterprise search server
    –   Built on the popular Apache Lucene library
    –   100% Java, runs on all platforms and env.
    –   Supports billions of documents, high scalability and
        advanced features like faceting, highlighting,
        document format conversions, GEO search etc
    –   Indexes most languages including CJK
    –   Platform not language aware, but each field can be
        configured to language specific tokenization,
        stemming, stop word processing etc
    –   Very active developer and user communities
    –   Apache 2.0 license – commercially friendly
    –   Rapid growth in companies providing support etc

cominvent as
Solr-user community growth

                                                                                 Solr-user growth
           1600




           1400




           1200




           1000
Messages




            800
                                                                                                                                                                        Column B


            600




            400




            200




              0
                   2006 Mar    2006 Jul    2006 Nov    2007 Mar    2007 Jul    2007 Nov    2008 Mar    2008 Jul    2008 Nov    2009 Apr    2009 Aug    2009 Dec
             2006 Jan    2006 May    2006 Sep    2007 Jan    2007 May    2007 Sep    2008 Jan    2008 May    2008 Sep    2009 Feb    2009 Jun    2009 Oct    2010 Feb
     cominvent as                                                                   Month
Lucene/Solr deployments




    – More: http://wiki.apache.org/solr/PublicServers
cominvent as
                                              Thanks to Lucid Imagination for logo collection
Solr in media & newspapers

                   – News search. Also exposes API

                   – Danish news search


                   – Swedish news search

                   – Swedish news search

                   – Faceted search through classifieds

                   – Eastern european classifieds


cominvent as
Sample FAST-Solr switchers

                   – Human Rights search
                      • hurisearch.org (blog)

                   – FINN katalog (former Sesam)
                      • katalog.finn.no (announce)

                   – Mocality – African business search
                      • mocality.co.ke (linkedin)

                   – International library search
                      • Large multi-lingual index

                   – Norwegian media house
cominvent as
                      • Multiple newspapers
Solr Architecture




cominvent as
The migration...




cominvent as
Migration objectives

    – Possible objectives include:
        •   Lower maintenance cost
        •   Deeper in-house competency
        •   Less dependent on external consultants
        •   Ownership and visibility of source code
        •   Shorter time to market for new features
        •   Bugs fixed faster – or even fix ourselves
        •   Larger community, mailing lists that work!
        •   More choice in external consultants
        •   Contribute back to Open Source
        •   Lower HW footprint



cominvent as
Migration steps

    – Knowledge gathering & Training
    – Review current features & arch
        • Want to keep all features? Add new?
    – Migration areas:
        •   Index profile
        •   Content
        •   Feeding
        •   Document Processing
        •   Querying
        •   Search middleware?
        •   Admin & Operational
    – What to do in Application space vs Search space?

cominvent as
Feature comparison ESP – Solr (similarities)

               Feature                         ESP                  Solr
 Full-text, boolean, range search,       Yes                 Yes
 sorting, sub-second, facets, did-you-
 mean, synonyms, faceting
 Scaling for QPS                         Add rows            Add rows

 Scaling for document volume             Add columns         Add shards

 Synonyms                                Index/query side    Index/query side

 GEO search                              Yes                 Yes (1.5)

 Boolean query language                  Yes (FQL)           Yes (Lucene or
                                                             (e)DisMax)
 APIs                                    HTTP, Java, .NET,   HTTP, Java, .NET,
                                         C++, PHP            Ruby, Python, PHP,
                                                             Perl, JS

cominvent as
Feature comparison ESP – Solr (differences)

                Feature           ESP                Solr
 Admin server              Yes                No (coming 1.5)

 Processes                 Many (C++, Java,   One WAR in Java
                           Python)            app-server, 100%
                                              Java
 Navigators / Facets       Index-time         Query-time

 Did-you-mean              Dictionary based   Dictionary or
                                              index based
 Feeding                   API only           HTTP POST or API

 Document processing       Pipeline (py)      Simple pipeline
                                              (Java, JS, Groovy,
                                              Jython, JRuby..)
 Multi field querying      Composite fields   DisMax handler


cominvent as
Feature comparison ESP – Solr (differences)

                Feature                   ESP                  Solr
 Relevancy tuning                   Rank profiles, term Dynamic function
                                    boosting            queries and boost
                                                        functions
 XRANK                              XRANK operator      Function Queries

 Freshness boost                    Freshness in rank   Function Queries
                                    profile
 Boost GEO distance                 Rank profile and    Function Queries
                                    special
 Major schema or software updates   Cold update, use    Stage new content
                                    stage environment   into new Solr core
 Pluggability                       Docprocs, clients   Everything :)
                                                        Request Handlers,
                                                        Query Parsers,
                                                        Docprocs, Rank,
                                                        Spell, tokenizer++
cominvent as
Feature comparison ESP – Solr (differences)

                Feature           ESP                  Solr
 Lemmatization             Can be licensed     Can be licensed
                           for many            from 3rd party
                           languages
 Query syntax              and(a:foo, b:bar)   a:foo OR b:bar
                           i:range(0, 100)     I:[0 TO 100]

                           d:range(2000-01-    d:[2000-01-
                           01T00:00:00,        01T00:00:00Z TO
                           2010-03-            NOW]
                           03T12:00:00)
 Query params              query=              q=
                           offset=             start=
                           hits=               rows=
                           spell=1             spellcheck=true
 What fields to return     view=viewname       fl=title,price,body...

cominvent as
Your FAST system - overview

                       Your web-app


                                      Search middleware?




cominvent as
                                              Graphics diagram: www.microsoft.com
Migrating index profile

    – ESP index profile -> Solr schema.xml
    – Setup field types, use defaults or create your own
    – Setup the static fields. ESP:



    – Solr equivalent:



    – No need for generic*, use dynamic fields:



cominvent as
Migrating index profile

    – Composite fields?
        • Solr can use <copyField> to copy multiple fields into
          one, e.g. as we did to map many attributes into one
          field
        • However, to achieve ranking with different boost of
          each field, Solr does not need composite field. Use
          DisMax query handler instead. Very powerful!
    – No need to edit schema to add new fields. Using
      dynamic fields, it is easy to e.g. Introduce a color facet
      for cars or a Mpixels facet for digital cameras




cominvent as
DisMax query example

    – This Solr query can replace use of composite-field
        • qt=dismax
        • q=oslo
        • qf=title^0.7 highpriorityfields^1.5
          mediumpriorityfields^0.6 lowpriorityfields^0.2
          recallfields^0.0 body^0.0
        • bf=recip(rord(creationDate),1,1000,1000)




cominvent as
Migrating content

    – If using FAST ContentAPI to push programatically
        • Use Solr's clients (Java, .NET, Ruby, Python, PHP...)
    – If feeding FastXML using FileTraverser
        • Feed as Solr XML using HTTP POST or a POST client




    – If you feed custom XML with XMLMapper
        • Have a look at DIH's import and mapping features


cominvent as
Push Feeding example

    – Feed XML using HTTP POST:
        • curl http://localhost:8080/solr/update?commit=true
          -H "Content-Type: text/xml"
          --data-binary @mydoc.xml
    – Ruby example:
        • >gem sources -a http://gemcutter.org
          >sudo gem install rsolr
          require 'rsolr'
          solr = RSolr.connect :url=>'http://localhost:8080'
          documents = [{:id=>1, :price=>1.00},
                    {:id=>2, :price=>10.50}]
          solr.add documents
          solr.commit


cominvent as
Pull: DataImportHandler (DIH)




cominvent as
Querying examples

    – http://localhost:8080/solr/select?q=car&fl=id,title




    – Ruby
        • res=solr.select :q=>'roses', :fq=>['red','white']
          res['response']['docs'].each do |doc|
            puts doc['title']
          end

cominvent as
Migrating document processing

    – Solr lacks a sophisticated pipeline with entity
      extraction etc. Alternatives:
        • Do extraction in Application space (Ruby)
        • Write own stage in Solr pipeline for simple cases
        • Integrate                 to do more advanced stuff
    – Matchers/extractors
        • LingPipe NamedEntityExtractor inside of OpenPipeline
    – Synonyms:
        • Use Solr's synonym handling index/query side
    – Custom stages:
        • Write a Solr UpdateProcessor (in Java, Jython etc)
    – Got a LOT of custom FAST docproc stages?
        • Have a look at SESAT's PY ProcServer for Solr (GPL)
cominvent as
Migrating linguistics (lemmatization)

    – Solr ships with Stemming instead of Lemmatization
    – Stemming has limitations
        • Biler, bilen, bilene -> bil
          BUT
        • Bøker, bøkene -> bøk; boka, bok -> bok
    – Kstem better. Free with LucidWorks for Solr
    – If you need singular/plural handling only
        • Free dictionaries? Check lucene-hunspell
    – Lemmatization can be licensed from 3rd party
      such as Basistech, who also has language
      identification & entity extraction
    – Language identification also from Sematext

cominvent as
Basistech Rosette for Lucene

    – High-end linguistics capabilities for
      19 languages
    – Language Identification
    – Segmentation and tokenization
    – Lemmatization
    – Noun decompounding
    – Part-of-speech tagging
    – Entity extraction

    – Easily integrated with Lucene/Solr

    – More: http://www.basistech.com/lucene/

cominvent as
Migrating search middleware

    – Using FAST Unity?
        • Consider migrating middleware logic such as external
          source querying and federation to SESAT (AGPL)
    – Using Comperio Front?
        • Must migrate custom query and resp formats
        • Consider SESAT as well for migrating flow logic
    – Or is plain Solr enough?
        • Solr has built-in support for shards
        • A shard query will query multiple shards
          and merge the results into one
        • Add custom processing as Query
          Components in Solr
        • Check contrib & patches!

cominvent as
Migrating Web Crawler

    – Solr has no built-in web crawler
        • Instead you can choose from several integrations
    – The Apache Nutch crawler
        • Proven with hundreds of millions of pages
        • http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
    – Apache Droids
        • Still an incubator, but aims at becoming a full crawler
        • http://incubator.apache.org/droids/
    – Heritix + Solr (example in Solr1.4 book)
    – OpenPipeline has a (very) simple crawler
    – Lucene Connectors Framework
        • Preparing crawler support

cominvent as
Migrating Connectors

    – Solr handles these sources internally through DIH:
        • Database, RSS, Web-services, Local filesystem
    – Additionally throgh Lucene Connectors Framework:
        •

        • EMC Documentum, FileNet, JDBC, LiveLink, Patriarch
          (Memex), Meridio, SharePoint, RSS
        • New connectors should be written for LCF
    – Another option: Open Pipeline, supporting:
        •
        • Sharepoint, IMAP, Documentum, Vignette, Filesystem



cominvent as
Operations

    –   Solr has no admin-server (coming in 1.5)
    –   Possible to run multiple Tomcat on same server
    –   Multiple cores in same Tomcat – easier migration
    –   No built-in query reports, use 3rd party tools
    –   No built-in monitoring, have a look at Nagios?




cominvent as
More info

    –   Solr WIKI: http://wiki.apache.org/solr/
    –   Deployments: http://wiki.apache.org/solr/PublicServers
    –   Reference Guide: http://tinyurl.com/ygj3q9j
    –   Solr Book: http://tinyurl.com/solrbook
    –   Solr training: http://www.solrtraining.com/




cominvent as
Thank You



               www.cominvent.com




               jh@cominvent.com




               www.twitter.com/cominvent

                  This presentation licensed under CC-by-sa license
cominvent as      You must attribute Cominvent with name and link

Contenu connexe

Tendances

Search Options in SharePoint 2010
Search Options in SharePoint 2010Search Options in SharePoint 2010
Search Options in SharePoint 2010milanchauhan
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance InitiativeDataWorks Summit
 
OData, External objects & Lightning Connect
OData, External objects & Lightning ConnectOData, External objects & Lightning Connect
OData, External objects & Lightning ConnectPrasanna Deshpande ☁
 
Electronic patients records system based on oracle apex
Electronic patients records system based on oracle apexElectronic patients records system based on oracle apex
Electronic patients records system based on oracle apexJan Karremans
 
Oracle APEX Introduction (release 18.1)
Oracle APEX Introduction (release 18.1)Oracle APEX Introduction (release 18.1)
Oracle APEX Introduction (release 18.1)Michael Hichwa
 
ATG - Web Commerce @ Your Figertips
ATG - Web Commerce @ Your FigertipsATG - Web Commerce @ Your Figertips
ATG - Web Commerce @ Your FigertipsKeyur Shah
 
APEX Alpe Adria Mike Hichwa Keynote April 11th 2019- Zagreb
APEX Alpe Adria Mike Hichwa Keynote April 11th 2019- ZagrebAPEX Alpe Adria Mike Hichwa Keynote April 11th 2019- Zagreb
APEX Alpe Adria Mike Hichwa Keynote April 11th 2019- ZagrebMichael Hichwa
 
ATG - Common Terminologies
ATG - Common TerminologiesATG - Common Terminologies
ATG - Common TerminologiesKeyur Shah
 
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud ServicesOracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud ServicesMichael Hichwa
 
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationDBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationTammy Bednar
 
Building Search Driven Applications in SharePoint 2010 - SharePoint Fest 2012
Building Search Driven Applications in SharePoint 2010 - SharePoint Fest 2012Building Search Driven Applications in SharePoint 2010 - SharePoint Fest 2012
Building Search Driven Applications in SharePoint 2010 - SharePoint Fest 2012Nik Patel
 
Overview of atg framework
Overview of atg frameworkOverview of atg framework
Overview of atg frameworkYousuf Roushan
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3OTN Systems Hub
 
Oracle Application Extensions for Oracle Endeca - for Application DBA's
Oracle Application Extensions for Oracle Endeca - for Application DBA'sOracle Application Extensions for Oracle Endeca - for Application DBA's
Oracle Application Extensions for Oracle Endeca - for Application DBA'sRavi Madabhushanam
 
UNYOUG - APEX 19.2 New Features
UNYOUG - APEX 19.2 New FeaturesUNYOUG - APEX 19.2 New Features
UNYOUG - APEX 19.2 New Featuresmsewtz
 
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
#dbhouseparty - Graph Technologies - More than just Social (Distancing) NetworksTammy Bednar
 

Tendances (20)

Search Options in SharePoint 2010
Search Options in SharePoint 2010Search Options in SharePoint 2010
Search Options in SharePoint 2010
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
 
OBIEE Architecture
OBIEE ArchitectureOBIEE Architecture
OBIEE Architecture
 
OData, External objects & Lightning Connect
OData, External objects & Lightning ConnectOData, External objects & Lightning Connect
OData, External objects & Lightning Connect
 
Electronic patients records system based on oracle apex
Electronic patients records system based on oracle apexElectronic patients records system based on oracle apex
Electronic patients records system based on oracle apex
 
Oracle APEX Introduction (release 18.1)
Oracle APEX Introduction (release 18.1)Oracle APEX Introduction (release 18.1)
Oracle APEX Introduction (release 18.1)
 
ATG - Web Commerce @ Your Figertips
ATG - Web Commerce @ Your FigertipsATG - Web Commerce @ Your Figertips
ATG - Web Commerce @ Your Figertips
 
Oracle APEX
Oracle APEXOracle APEX
Oracle APEX
 
APEX Alpe Adria Mike Hichwa Keynote April 11th 2019- Zagreb
APEX Alpe Adria Mike Hichwa Keynote April 11th 2019- ZagrebAPEX Alpe Adria Mike Hichwa Keynote April 11th 2019- Zagreb
APEX Alpe Adria Mike Hichwa Keynote April 11th 2019- Zagreb
 
ATG - Common Terminologies
ATG - Common TerminologiesATG - Common Terminologies
ATG - Common Terminologies
 
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud ServicesOracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
 
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationDBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
 
Building Search Driven Applications in SharePoint 2010 - SharePoint Fest 2012
Building Search Driven Applications in SharePoint 2010 - SharePoint Fest 2012Building Search Driven Applications in SharePoint 2010 - SharePoint Fest 2012
Building Search Driven Applications in SharePoint 2010 - SharePoint Fest 2012
 
Overview of atg framework
Overview of atg frameworkOverview of atg framework
Overview of atg framework
 
Oracle data integrator (odi) online training
Oracle data integrator (odi) online trainingOracle data integrator (odi) online training
Oracle data integrator (odi) online training
 
Bots & Teams: el poder de Grayskull
Bots & Teams: el poder de GrayskullBots & Teams: el poder de Grayskull
Bots & Teams: el poder de Grayskull
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3
 
Oracle Application Extensions for Oracle Endeca - for Application DBA's
Oracle Application Extensions for Oracle Endeca - for Application DBA'sOracle Application Extensions for Oracle Endeca - for Application DBA's
Oracle Application Extensions for Oracle Endeca - for Application DBA's
 
UNYOUG - APEX 19.2 New Features
UNYOUG - APEX 19.2 New FeaturesUNYOUG - APEX 19.2 New Features
UNYOUG - APEX 19.2 New Features
 
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
 

En vedette

Presentation sql server to oracle a database migration roadmap
Presentation    sql server to oracle a database migration roadmapPresentation    sql server to oracle a database migration roadmap
Presentation sql server to oracle a database migration roadmapxKinAnx
 
Database migration
Database migrationDatabase migration
Database migrationOpris Monica
 
Got Personally-Owned Devices? Manage Them with System Center
Got Personally-Owned Devices? Manage Them with System CenterGot Personally-Owned Devices? Manage Them with System Center
Got Personally-Owned Devices? Manage Them with System CenterC/D/H Technology Consultants
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for DrupalChris Caple
 
Database migration
Database migrationDatabase migration
Database migrationOpris Monica
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative SpiritBenjamin Good
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwiseCominvent AS
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Benjamin Good
 
Buyer Remorse
Buyer RemorseBuyer Remorse
Buyer Remorsesmfox
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfBenjamin Good
 
Eishi Company Profile 修改好的
Eishi Company Profile 修改好的Eishi Company Profile 修改好的
Eishi Company Profile 修改好的eishimachinery
 
Fedora Iptables
Fedora IptablesFedora Iptables
Fedora Iptableszubin71
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
 
Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 schelby
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyCominvent AS
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidataBenjamin Good
 

En vedette (20)

Presentation sql server to oracle a database migration roadmap
Presentation    sql server to oracle a database migration roadmapPresentation    sql server to oracle a database migration roadmap
Presentation sql server to oracle a database migration roadmap
 
Database migration
Database migrationDatabase migration
Database migration
 
Got Personally-Owned Devices? Manage Them with System Center
Got Personally-Owned Devices? Manage Them with System CenterGot Personally-Owned Devices? Manage Them with System Center
Got Personally-Owned Devices? Manage Them with System Center
 
FAST Search for SharePoint
FAST Search for SharePointFAST Search for SharePoint
FAST Search for SharePoint
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
Database migration
Database migrationDatabase migration
Database migration
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative Spirit
 
2to3
2to32to3
2to3
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwise
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Buyer Remorse
Buyer RemorseBuyer Remorse
Buyer Remorse
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
 
Eishi Company Profile 修改好的
Eishi Company Profile 修改好的Eishi Company Profile 修改好的
Eishi Company Profile 修改好的
 
Fedora Iptables
Fedora IptablesFedora Iptables
Fedora Iptables
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1 Resume 2009 Compatible V2 1
Resume 2009 Compatible V2 1
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 

Similaire à Migrating Fast to Solr

Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan HøydahlOslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan HøydahlCominvent AS
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software FoundationShalin Shekhar Mangar
 
Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010Cominvent AS
 
Drupal & Apache Solr
Drupal & Apache SolrDrupal & Apache Solr
Drupal & Apache SolrAndrei Savu
 
How the Lucene More Like This Works
How the Lucene More Like This WorksHow the Lucene More Like This Works
How the Lucene More Like This WorksSease
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWSJonathan Holloway
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Java overview 20131022
Java overview 20131022Java overview 20131022
Java overview 20131022hamidsamadi
 
Building multi billion ( dollars, users, documents ) search engines on open ...
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...Andrei Lopatenko
 
Software Development with Open Source
Software Development with Open SourceSoftware Development with Open Source
Software Development with Open SourceOpusVL
 
David_Thomas_Resume_Software_08_29_16
David_Thomas_Resume_Software_08_29_16David_Thomas_Resume_Software_08_29_16
David_Thomas_Resume_Software_08_29_16Dave Thomas
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
OpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemOpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemGrant Ingersoll
 
The power of faceted search in alfresco
The power of faceted search in alfrescoThe power of faceted search in alfresco
The power of faceted search in alfrescoXeniT Solutions nv
 

Similaire à Migrating Fast to Solr (20)

Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan HøydahlOslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
 
Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010
 
Drupal & Apache Solr
Drupal & Apache SolrDrupal & Apache Solr
Drupal & Apache Solr
 
How the Lucene More Like This Works
How the Lucene More Like This WorksHow the Lucene More Like This Works
How the Lucene More Like This Works
 
Solr
SolrSolr
Solr
 
Varun-CV-J
Varun-CV-JVarun-CV-J
Varun-CV-J
 
Solr 101
Solr 101Solr 101
Solr 101
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
Java overview 20131022
Java overview 20131022Java overview 20131022
Java overview 20131022
 
Smalltalk and Business
Smalltalk and BusinessSmalltalk and Business
Smalltalk and Business
 
Building multi billion ( dollars, users, documents ) search engines on open ...
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...
 
Software Development with Open Source
Software Development with Open SourceSoftware Development with Open Source
Software Development with Open Source
 
David_Thomas_Resume_Software_08_29_16
David_Thomas_Resume_Software_08_29_16David_Thomas_Resume_Software_08_29_16
David_Thomas_Resume_Software_08_29_16
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
OpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemOpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene Ecosystem
 
The power of faceted search in alfresco
The power of faceted search in alfrescoThe power of faceted search in alfresco
The power of faceted search in alfresco
 
Pharo Status
Pharo StatusPharo Status
Pharo Status
 

Plus de Cominvent AS

Solr's missing plugin ecosystem
Solr's missing plugin ecosystemSolr's missing plugin ecosystem
Solr's missing plugin ecosystemCominvent AS
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr CloudCominvent AS
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update ChainCominvent AS
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkCominvent AS
 
Frokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asFrokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asCominvent AS
 
Cominvent AS company Presentation
Cominvent AS company PresentationCominvent AS company Presentation
Cominvent AS company PresentationCominvent AS
 

Plus de Cominvent AS (7)

Solr's missing plugin ecosystem
Solr's missing plugin ecosystemSolr's missing plugin ecosystem
Solr's missing plugin ecosystem
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update Chain
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søk
 
Frokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asFrokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent as
 
Cominvent AS company Presentation
Cominvent AS company PresentationCominvent AS company Presentation
Cominvent AS company Presentation
 

Dernier

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Dernier (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Migrating Fast to Solr

  • 1. cominvent as Enterprise Search Specialists Migrating FAST to Solr by Jan Høydahl cominvent as
  • 3. Consulting – Cominvent delivers independent search consulting – Focus on Apache Lucene/Solr & Microsoft FAST ESP – We know both the proprietary and Open Source worlds, their benefits and disadvantages. We help you choose. We help you maximize your chosen engine, and we help you migrate as your requirements change. cominvent as
  • 4. Training – Cominvent AS delivers training public and on-site – Certified Solr Training Partner for Lucid Imagination – Certified FAST ESP Training Partner – Read more: http://www.cominvent.com/training/ cominvent as Photo: fluidpowerzone.com
  • 5. Commercial Support – When community & mailing list support is not enough.. – Paid support agreement for Apache Solr/Lucene – In cooperation with Lucid Imagination – Read more: http://www.cominvent.com/support/ cominvent as
  • 6. Jan Høydahl – experience ● IT architect, 15 years with search, telecom, mobile ● Helped build FAST's Global Services as first engineer ● Founder of Cominvent AS ● Search consultant 10 years ● Certified Solr instructor cominvent as
  • 7. Recommendations «His skills on Fast ESP is in-depth, thorough, and probably amongst the best you can get. Jan is working independently, but also well in teams. Whether it is technical or business work, Jan does not fall behind. His excellent skills to see things from the holistic perspective is great.» -Knut Stenmark, DPM AS cominvent as
  • 8. Sample consulting projects World wide news agency Chief architect of FAST ESP search solution, migrating from Autonomy IDOL. Real-time news, alerting etc. Major Swedish newspaper Architect for new Topic Page solution, letting editors define topics based on keywords and regex rules. Norwegian Yellow Pages actor Architect for migrating traditional DB backed catalog search to modern one- search box solution. Classifieds and real estate online broker Advise on migrating from DB to search. Architect for FAST ESP solution with Norwegian linguistics, search middleware and relevance tuning. Leading news surveillance company Helped implement and tune real-time search using FAST ESP and real-time alerting using FAST RTA. cominvent as
  • 9. Sample Solr Training references Library organization – Danish national library – Global library org, organization serving all serving hundreds of Danish libraries libraries world wide – Migrating from in-house – Helping them migrate search to Apache Solr for from FAST to Solr all their search – First step is Classroom – Delivered Solr training Training in March 2010 course in 2010 cominvent as
  • 11. About Apache Solr – Open Source enterprise search server – Built on the popular Apache Lucene library – 100% Java, runs on all platforms and env. – Supports billions of documents, high scalability and advanced features like faceting, highlighting, document format conversions, GEO search etc – Indexes most languages including CJK – Platform not language aware, but each field can be configured to language specific tokenization, stemming, stop word processing etc – Very active developer and user communities – Apache 2.0 license – commercially friendly – Rapid growth in companies providing support etc cominvent as
  • 12. Solr-user community growth Solr-user growth 1600 1400 1200 1000 Messages 800 Column B 600 400 200 0 2006 Mar 2006 Jul 2006 Nov 2007 Mar 2007 Jul 2007 Nov 2008 Mar 2008 Jul 2008 Nov 2009 Apr 2009 Aug 2009 Dec 2006 Jan 2006 May 2006 Sep 2007 Jan 2007 May 2007 Sep 2008 Jan 2008 May 2008 Sep 2009 Feb 2009 Jun 2009 Oct 2010 Feb cominvent as Month
  • 13. Lucene/Solr deployments – More: http://wiki.apache.org/solr/PublicServers cominvent as Thanks to Lucid Imagination for logo collection
  • 14. Solr in media & newspapers – News search. Also exposes API – Danish news search – Swedish news search – Swedish news search – Faceted search through classifieds – Eastern european classifieds cominvent as
  • 15. Sample FAST-Solr switchers – Human Rights search • hurisearch.org (blog) – FINN katalog (former Sesam) • katalog.finn.no (announce) – Mocality – African business search • mocality.co.ke (linkedin) – International library search • Large multi-lingual index – Norwegian media house cominvent as • Multiple newspapers
  • 18. Migration objectives – Possible objectives include: • Lower maintenance cost • Deeper in-house competency • Less dependent on external consultants • Ownership and visibility of source code • Shorter time to market for new features • Bugs fixed faster – or even fix ourselves • Larger community, mailing lists that work! • More choice in external consultants • Contribute back to Open Source • Lower HW footprint cominvent as
  • 19. Migration steps – Knowledge gathering & Training – Review current features & arch • Want to keep all features? Add new? – Migration areas: • Index profile • Content • Feeding • Document Processing • Querying • Search middleware? • Admin & Operational – What to do in Application space vs Search space? cominvent as
  • 20. Feature comparison ESP – Solr (similarities) Feature ESP Solr Full-text, boolean, range search, Yes Yes sorting, sub-second, facets, did-you- mean, synonyms, faceting Scaling for QPS Add rows Add rows Scaling for document volume Add columns Add shards Synonyms Index/query side Index/query side GEO search Yes Yes (1.5) Boolean query language Yes (FQL) Yes (Lucene or (e)DisMax) APIs HTTP, Java, .NET, HTTP, Java, .NET, C++, PHP Ruby, Python, PHP, Perl, JS cominvent as
  • 21. Feature comparison ESP – Solr (differences) Feature ESP Solr Admin server Yes No (coming 1.5) Processes Many (C++, Java, One WAR in Java Python) app-server, 100% Java Navigators / Facets Index-time Query-time Did-you-mean Dictionary based Dictionary or index based Feeding API only HTTP POST or API Document processing Pipeline (py) Simple pipeline (Java, JS, Groovy, Jython, JRuby..) Multi field querying Composite fields DisMax handler cominvent as
  • 22. Feature comparison ESP – Solr (differences) Feature ESP Solr Relevancy tuning Rank profiles, term Dynamic function boosting queries and boost functions XRANK XRANK operator Function Queries Freshness boost Freshness in rank Function Queries profile Boost GEO distance Rank profile and Function Queries special Major schema or software updates Cold update, use Stage new content stage environment into new Solr core Pluggability Docprocs, clients Everything :) Request Handlers, Query Parsers, Docprocs, Rank, Spell, tokenizer++ cominvent as
  • 23. Feature comparison ESP – Solr (differences) Feature ESP Solr Lemmatization Can be licensed Can be licensed for many from 3rd party languages Query syntax and(a:foo, b:bar) a:foo OR b:bar i:range(0, 100) I:[0 TO 100] d:range(2000-01- d:[2000-01- 01T00:00:00, 01T00:00:00Z TO 2010-03- NOW] 03T12:00:00) Query params query= q= offset= start= hits= rows= spell=1 spellcheck=true What fields to return view=viewname fl=title,price,body... cominvent as
  • 24. Your FAST system - overview Your web-app Search middleware? cominvent as Graphics diagram: www.microsoft.com
  • 25. Migrating index profile – ESP index profile -> Solr schema.xml – Setup field types, use defaults or create your own – Setup the static fields. ESP: – Solr equivalent: – No need for generic*, use dynamic fields: cominvent as
  • 26. Migrating index profile – Composite fields? • Solr can use <copyField> to copy multiple fields into one, e.g. as we did to map many attributes into one field • However, to achieve ranking with different boost of each field, Solr does not need composite field. Use DisMax query handler instead. Very powerful! – No need to edit schema to add new fields. Using dynamic fields, it is easy to e.g. Introduce a color facet for cars or a Mpixels facet for digital cameras cominvent as
  • 27. DisMax query example – This Solr query can replace use of composite-field • qt=dismax • q=oslo • qf=title^0.7 highpriorityfields^1.5 mediumpriorityfields^0.6 lowpriorityfields^0.2 recallfields^0.0 body^0.0 • bf=recip(rord(creationDate),1,1000,1000) cominvent as
  • 28. Migrating content – If using FAST ContentAPI to push programatically • Use Solr's clients (Java, .NET, Ruby, Python, PHP...) – If feeding FastXML using FileTraverser • Feed as Solr XML using HTTP POST or a POST client – If you feed custom XML with XMLMapper • Have a look at DIH's import and mapping features cominvent as
  • 29. Push Feeding example – Feed XML using HTTP POST: • curl http://localhost:8080/solr/update?commit=true -H "Content-Type: text/xml" --data-binary @mydoc.xml – Ruby example: • >gem sources -a http://gemcutter.org >sudo gem install rsolr require 'rsolr' solr = RSolr.connect :url=>'http://localhost:8080' documents = [{:id=>1, :price=>1.00}, {:id=>2, :price=>10.50}] solr.add documents solr.commit cominvent as
  • 31. Querying examples – http://localhost:8080/solr/select?q=car&fl=id,title – Ruby • res=solr.select :q=>'roses', :fq=>['red','white'] res['response']['docs'].each do |doc| puts doc['title'] end cominvent as
  • 32. Migrating document processing – Solr lacks a sophisticated pipeline with entity extraction etc. Alternatives: • Do extraction in Application space (Ruby) • Write own stage in Solr pipeline for simple cases • Integrate to do more advanced stuff – Matchers/extractors • LingPipe NamedEntityExtractor inside of OpenPipeline – Synonyms: • Use Solr's synonym handling index/query side – Custom stages: • Write a Solr UpdateProcessor (in Java, Jython etc) – Got a LOT of custom FAST docproc stages? • Have a look at SESAT's PY ProcServer for Solr (GPL) cominvent as
  • 33. Migrating linguistics (lemmatization) – Solr ships with Stemming instead of Lemmatization – Stemming has limitations • Biler, bilen, bilene -> bil BUT • Bøker, bøkene -> bøk; boka, bok -> bok – Kstem better. Free with LucidWorks for Solr – If you need singular/plural handling only • Free dictionaries? Check lucene-hunspell – Lemmatization can be licensed from 3rd party such as Basistech, who also has language identification & entity extraction – Language identification also from Sematext cominvent as
  • 34. Basistech Rosette for Lucene – High-end linguistics capabilities for 19 languages – Language Identification – Segmentation and tokenization – Lemmatization – Noun decompounding – Part-of-speech tagging – Entity extraction – Easily integrated with Lucene/Solr – More: http://www.basistech.com/lucene/ cominvent as
  • 35. Migrating search middleware – Using FAST Unity? • Consider migrating middleware logic such as external source querying and federation to SESAT (AGPL) – Using Comperio Front? • Must migrate custom query and resp formats • Consider SESAT as well for migrating flow logic – Or is plain Solr enough? • Solr has built-in support for shards • A shard query will query multiple shards and merge the results into one • Add custom processing as Query Components in Solr • Check contrib & patches! cominvent as
  • 36. Migrating Web Crawler – Solr has no built-in web crawler • Instead you can choose from several integrations – The Apache Nutch crawler • Proven with hundreds of millions of pages • http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ – Apache Droids • Still an incubator, but aims at becoming a full crawler • http://incubator.apache.org/droids/ – Heritix + Solr (example in Solr1.4 book) – OpenPipeline has a (very) simple crawler – Lucene Connectors Framework • Preparing crawler support cominvent as
  • 37. Migrating Connectors – Solr handles these sources internally through DIH: • Database, RSS, Web-services, Local filesystem – Additionally throgh Lucene Connectors Framework: • • EMC Documentum, FileNet, JDBC, LiveLink, Patriarch (Memex), Meridio, SharePoint, RSS • New connectors should be written for LCF – Another option: Open Pipeline, supporting: • • Sharepoint, IMAP, Documentum, Vignette, Filesystem cominvent as
  • 38. Operations – Solr has no admin-server (coming in 1.5) – Possible to run multiple Tomcat on same server – Multiple cores in same Tomcat – easier migration – No built-in query reports, use 3rd party tools – No built-in monitoring, have a look at Nagios? cominvent as
  • 39. More info – Solr WIKI: http://wiki.apache.org/solr/ – Deployments: http://wiki.apache.org/solr/PublicServers – Reference Guide: http://tinyurl.com/ygj3q9j – Solr Book: http://tinyurl.com/solrbook – Solr training: http://www.solrtraining.com/ cominvent as
  • 40. Thank You www.cominvent.com jh@cominvent.com www.twitter.com/cominvent This presentation licensed under CC-by-sa license cominvent as You must attribute Cominvent with name and link