SlideShare a Scribd company logo
1 of 43
Download to read offline
Key topics when
             Migratng from FAST to Solr


                                     By Jan Høydahl

                            cominvent as
Apache Lucene EuroCon   05/21/10
Agenda
     About Cominvent & Jan Høydahl
     Quick overview of FAST ESP
     The migraton step by step
     Pain points
     Q&A




Apache Lucene EuroCon   05/21/10
Jan Høydahl: BIO
                                                                  ●   Enterprise search
                                                                      consultant since 2000
                                                                  ●   Background in Telecom,
                                                                      Mobile services &
                                                                      sofware development
                                                                  ●   Second FAST Global
                                                                      Services engineer
                                                                  ●   Founder of Cominvent AS
                                                                  ●   Lucid Imaginaton certfed
                                                                      instructor & partner
                                                                  ●   FAST Certfed instructor
Apache Lucene EuroCon   05/21/10   Logos represent projects I've been involved in, and ™ are © of respectve companies
Cominvent AS: Consultng
           Vendor independent search consultng




Apache Lucene EuroCon   05/21/10
Cominvent AS: Training
         Certfed Solr Training Partner with Lucid Imaginaton
         Certfed FAST ESP Training Partner




Apache Lucene EuroCon   05/21/10
                                                                Photo: fuidpowerzone.com
Solr training Oslo June 1-3




Apache Lucene EuroCon   05/21/10
Assumptons
     Decision to migrate to Solr is already done
         This is not a "sales talk" for any partcular technology

     Basic knowledge of Solr
     None or limited knowledge of FAST ESP
     Migraton to plain Solr or LucidWorks
      (LucidWorks Enterprise editon not considered)




Apache Lucene EuroCon   05/21/10
Introducton to...



                                   ...for Solr people




Apache Lucene EuroCon   05/21/10
Security




                                   Connectors
Apache Lucene EuroCon   05/21/10
Apache Lucene EuroCon   05/21/10
FAST ESP architecture




Apache Lucene EuroCon   05/21/10   Source: www.microsof.com
   Very strong & scalable document processing framework
                        Format        Language     Linguistic
                        Conversion    Detection    Normalization    Entities




                                                      Custom
                        Taxonomy       Sentiment                   Ontology
                                                      Plug-in




                         Search       Alert           PARIS (Reuters) - Venus Williams
                                                          raced into the second round of the
                                                          $11.25 million French Open
                                                          Monday, brushing aside Bianka
                                                          Lamade, 6-3, 6-3, in 65 minutes.
Apache Lucene EuroCon      05/21/10
FAST Document Processors (DP)
   DPs transform documents prior to indexing
   This is diferent from Solr feld centric analysis
   Examples of stages:
         Encoding normalizaton, language identfcaton
         Text extracton (HTML, PDF, MS Ofce, etc.)
         Tokenizaton, lemmatzaton, entty extracton
   DPs are chained in pipelines
   ESP ships with lots useful DPs and pipelines
   Writen in Python, very easy to script new ones



                                                           Custom
                                   Taxonomy    Sentiment             Ontology
                                                           Plug-in
Apache Lucene EuroCon   05/21/10
Terminology
     Lucene/Solr                   FAST
    Replica                        Search row
    Shard                          Column
    Facet                          Navigator
    Spellcheck                     Did you mean
    Update processor               Document processor
    Request Handler                Query Transformer (QT)
    Response Writer                Result Processor(RP)/TWM

Apache Lucene EuroCon   05/21/10
Terminology
     Lucene/Solr                   FAST
    Schema                         Index profile
    Index segment                  Index partition
    Lucene IndexWriter/Rdr         indexer/fsearch (RTS)
    ~Multi core                    ~Multi cluster
    (Documents receiving same Collection
    processing)



Apache Lucene EuroCon   05/21/10
Important diferences
     Lucene/Solr                   FAST
    Most features query-time       Most features index-time
    Field centric analysis         Document centric analysis
    One language per field         Multi lingual fields
    One Update handler per         Format conversion in
    input type (XML, CSV)          document pipeline
    Slim disk & memory             Quite fat disk & memory
    footprint                      footprint
    One Java Web app               15-20 processes

Apache Lucene EuroCon   05/21/10
Solr Architecture




                                   Thanks to Christan Moen/ATILIKA for graphics
Apache Lucene EuroCon   05/21/10
The migraton...




Apache Lucene EuroCon   05/21/10
Steps of the migraton
           Review current features & architecture
               Keep all features? Add new?

           Install Solr and do a quick iteraton (1-2 days):
               Draf schema.xml & solrconfg.xml
               Dump & index some real data
               Play around with queries – Solritas is nice here

           Design spec covering all migraton areas:
               Schema, Content, Feeding & Analysis
               Frontends, Querying & API
               Admin & Operatonal
           Implement :)

Apache Lucene EuroCon     05/21/10
Spreadsheet for planning the schema




Apache Lucene EuroCon   05/21/10
Migratng index-profle -> Solr schema
     ESP index profle -> Solr schema.xml
     FAST felds example:



     Solr equivalent:



     Example: A feld with "tokenize=auto" in FAST → type="text"
     Create new <feldType>'s as needed
Apache Lucene EuroCon   05/21/10
Product facets & generic felds
           With FAST you ofen use «generic1», «generic2» etc to
            model product facets which may vary between product
            groups. Front ends need logic to convert.




Apache Lucene EuroCon   05/21/10
Product facets & generic felds
           With Solr, using dynamic felds, each document can have
            as many facets you like.



           Makes it easy to e.g. Introduce a new «color» facet for
            cars or a «MegaPixels» facet for digital cameras




Apache Lucene EuroCon   05/21/10
Composite felds -> DisMax ReqHandler
       FAST uses composite felds to search across multple
        felds, with weightng defned in Rank Profles
       FAST's composite felds & rank profles can be modelled as
        Solr «DisMax» queries
       Set suitable defaults in solrconfg.xml using named
        requesthandler instances.
       In case of many felds & performance issues, use
        <copyField> to group similarly ranked felds!
       Freshness boost, GEO boost etc handled through
        Functon Queries

Apache Lucene EuroCon   05/21/10
Composite felds -> DisMax ReqHandler
       Given a FAST composite feld / Rank Profle




Apache Lucene EuroCon   05/21/10
Composite felds -> DisMax ReqHandler
       This Solr query will do the same, confgureable per query:
           qt=dismax
           q=oslo
           qf=ttle^5.0 teaser^1.5 body^0.1
           bf=recip(rord(last_modifed),1,1000,1000)




 ...
   ...
 DisjunctonMaxQuery((teaser:foo^1.5 ||ttle:foo^5.0 ||body:foo^0.1)~0.01)
   DisjunctonMaxQuery((teaser:foo^1.5 ttle:foo^5.0 body:foo^0.1)~0.01)
 DisjunctonMaxQuery((teaser:bar^1.5 ||ttle:bar^5.0 ||body:bar^0.1)~0.01)
   DisjunctonMaxQuery((teaser:bar^1.5 ttle:bar^5.0 body:bar^0.1)~0.01)
 FunctonQuery(1000.0/(1.0*foat(top(rord(last_modifed)))
   FunctonQuery(1000.0/(1.0*foat(top(rord(last_modifed)))
 ...
   ...

Apache Lucene EuroCon   05/21/10
Statc document boosts
     FAST uses the «hwboost» feld to add a statc Quality boost to
      each document.
     In Solr, you have more fexibility:
         Add a boost to each document
          <doc boost="10.0">
         Add a boost to each feld
          <feld name="ttle" boost="10.0">
         Include any numeric document feld in a BoostFuncton

          bf=sum(sqrt(popularity)^100.0, statcboost^20.0)
           bf=sum(sqrt(popularity)^100.0, statcboost^20.0)



Apache Lucene EuroCon   05/21/10
Navigator statstcs
     FAST navigators provide statstcs metadata (min/max/avg/sum)
     Soluton: Use the StatsComponent




Apache Lucene EuroCon   05/21/10
Navigator auto-buckets
     FAST numeric navigators give auto-bucketng based on
         equal-frequency, equal-width, manual




     Soluton:
         Create a new feld which is pre-computed
         Example: Document A has price=200.000, add pricerange="150.000 – 1.299.999"
         Or use facet queries (expensive)
         Or implement auto-bucketng and contribute the patch :-)


Apache Lucene EuroCon   05/21/10
XRANK
     FAST has a feature to boost documents satsfying an "XRANK"
      sub-query with a certain statc boost
     In Solr, you can solve most XRANK use cases using
      FunctonQueries




Apache Lucene EuroCon   05/21/10
Scope search
     FAST ofers a feld type which holds arbitrary XML
     Search in XPath-style:
      xml:companies:company:and(revenue:>1000, employees:>=100)
     Have not found similar feld type in Lucene.
     Anyone?




Apache Lucene EuroCon   05/21/10
Migratng Connectors
       FAST's connectors are many and mature
       For simple use cases, consider Solr's DIH:
           Supports DB, RSS, Web-services, Local flesystem

       Additonally throgh Lucene Connectors Framework:



           EMC Documentum, FileNet, JDBC, LiveLink, Patriarch (Memex), Meridio,
            SharePoint, RSS
           New connectors should be writen for LCF
            -and be submited back to the community :)


Apache Lucene EuroCon   05/21/10
Migratng Web Crawler
         FAST's crawler is mature, performing & scalable
         Solr has no built-in web crawler
         Prepare for a lot of extra work migratng crawler
         Alternatves:
             The Apache Nutch crawler (steep learning curve)
             Apache Droids
             Heritx + Solr (example in Solr1.4 book)
             OpenPipeline has a (very) simple crawler




Apache Lucene EuroCon    05/21/10
Migratng document processing
       Solr lacks a sophistcated processing pipeline.
       Alternatves:
       Solr's UpdateProcessorChain for simple pipelines:
           Write a Solr UpdateProcessor (in Java, Jython etc, see SOLR-1725)

       OpenPipeline for more advanced requirements:
           Check out FindWise's talk
           Integrated with Solr
           LingPipe NamedEnttyExtractor plugin




Apache Lucene EuroCon    05/21/10
Document processing examples
     Binary documents with metadata
         Actual customer request: Enrich library records with PDF content
         Use Open Pipeline with Apache Tika processor
         Implmenent Tika as an UpdateRequestProcessor (SOLR-1763)



     Custom XML using FAST's XMLMapper
         DIH's built-in XPath support
         XSLT to Solr input XML
         Write an new XMLMapper Update Request Handler?




Apache Lucene EuroCon   05/21/10
Mult lingual
        FAST is state of the art on linguistcs
        FAST is language aware, e.g. the ttle feld is "analyzed"
         depending on detected language

        Solr is not language aware
        Each feld type has one and only one language
        Most common soluton:
            One feld type per language: text_no, text_en, text_de
            Dynamic felds: <dynamicField name="*_en" type="text_en"..../>
            Implement language awareness in applicaton layer (feeding + querying)

Apache Lucene EuroCon   05/21/10
Mult lingual – advanced
        FAST ships with Lemmatzaton for most languages
        Solr ships with Stemming – has limitatons

        Solutons for mult lingual needs:
            Kstem is tghter. Free with
            License 3rd party linguistcs
            Example:
             BasisTech Rosete Linguistc Platorm
             Lemmatzaton, POS etc..




Apache Lucene EuroCon   05/21/10
Mult lingual – very advanced
        FAST allows lemmatzaton by index expansion
        This can be useful if your frontend does not know what
         languages are being queried, as all the word infectons
         are stored in the index.
        There is no soluton for this in Solr today,
        Workaround: DisMax query spanning all languages:
         q=eurocon&qf=text_en^2.0 text_no text_de text_it
        Downside: This gets ugly and slow with increasing number
         of languages


Apache Lucene EuroCon   05/21/10
Migratng Front ends / Query
        Using a search middleware with Solr support? Lucky you!
        If not, consider introducing one now:




        Using FAST Java/.NET APIs?
            Choose SolrJ or SolrNET/SolrSharp
            Query language diferences. &fq= instead of flter()
            Solr facets do not require session/state as FAST's
Apache Lucene EuroCon    05/21/10
Result views
       FAST uses "result-view" and "search profle" to specify
        what felds to return.

       Migrate FAST's «views» into named RequestHandler
        confgs with all default presets
       No need to defne felds to return up-front!, use f=a,b,c...




Apache Lucene EuroCon   05/21/10
Operatons
     Solr has no central admin-server (untl "SolrCloud")
     For GUI installer, use
     Multple cores – allows smooth schema upgrade etc.
     No built-in query reportng, log analysis or monitoring.
      But have a look at:




Apache Lucene EuroCon   05/21/10
Summary
     Many migratons are (quite) straight-forward!
     Warning fags
         Mult-lingual and advanced linguistcs
         Heavy use of Document Processing, including Entty Extracton
         Scope search
         Other enterprise complexites (security, connectors etc)

     Follow a structured process
         Quick prototyping
         Design spec for each area

     Don't forget to analyze logs and measure user satsfacton!

Apache Lucene EuroCon   05/21/10
Thank You
                         www.cominvent.com



                         jh@cominvent.com


                         www.twiter.com/cominvent


                         linkedin.com/in/janhoy
                                       This presentaton licensed under CC-by-sa license
Apache Lucene EuroCon   05/21/10       You must atribute Cominvent with name and link

More Related Content

What's hot

Railo Presentation Railo 3.1
Railo Presentation Railo 3.1Railo Presentation Railo 3.1
Railo Presentation Railo 3.1Rhinofly
 
Oracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningOracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningTanel Poder
 
OkAPI meet symfony, symfony meet OkAPI
OkAPI meet symfony, symfony meet OkAPIOkAPI meet symfony, symfony meet OkAPI
OkAPI meet symfony, symfony meet OkAPILukas Smith
 
Strategies to improve embedded Linux application performance beyond ordinary ...
Strategies to improve embedded Linux application performance beyond ordinary ...Strategies to improve embedded Linux application performance beyond ordinary ...
Strategies to improve embedded Linux application performance beyond ordinary ...André Oriani
 
LD_PRELOAD Exploitation - DC9723
LD_PRELOAD Exploitation - DC9723LD_PRELOAD Exploitation - DC9723
LD_PRELOAD Exploitation - DC9723Iftach Ian Amit
 

What's hot (7)

Railo Presentation Railo 3.1
Railo Presentation Railo 3.1Railo Presentation Railo 3.1
Railo Presentation Railo 3.1
 
Exploitation Crash Course
Exploitation Crash CourseExploitation Crash Course
Exploitation Crash Course
 
Oracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance TuningOracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance Tuning
 
OkAPI meet symfony, symfony meet OkAPI
OkAPI meet symfony, symfony meet OkAPIOkAPI meet symfony, symfony meet OkAPI
OkAPI meet symfony, symfony meet OkAPI
 
Rf介绍
Rf介绍Rf介绍
Rf介绍
 
Strategies to improve embedded Linux application performance beyond ordinary ...
Strategies to improve embedded Linux application performance beyond ordinary ...Strategies to improve embedded Linux application performance beyond ordinary ...
Strategies to improve embedded Linux application performance beyond ordinary ...
 
LD_PRELOAD Exploitation - DC9723
LD_PRELOAD Exploitation - DC9723LD_PRELOAD Exploitation - DC9723
LD_PRELOAD Exploitation - DC9723
 

Viewers also liked

レガシーコード改善はじめました 横浜道場
レガシーコード改善はじめました 横浜道場レガシーコード改善はじめました 横浜道場
レガシーコード改善はじめました 横浜道場Hiroyuki Ohnaka
 
AgileJapan2010 基調講演:野中郁次郎先生による「実践知のリーダシップ~スクラムと知の場作り」
AgileJapan2010 基調講演:野中郁次郎先生による「実践知のリーダシップ~スクラムと知の場作り」AgileJapan2010 基調講演:野中郁次郎先生による「実践知のリーダシップ~スクラムと知の場作り」
AgileJapan2010 基調講演:野中郁次郎先生による「実践知のリーダシップ~スクラムと知の場作り」Kenji Hiranabe
 
Ley organica del_trabajo_los_trabajadores_y_las_trabajadoras
Ley organica del_trabajo_los_trabajadores_y_las_trabajadorasLey organica del_trabajo_los_trabajadores_y_las_trabajadoras
Ley organica del_trabajo_los_trabajadores_y_las_trabajadorasJosé Morales
 
My Presentation Park Lay
My Presentation Park LayMy Presentation Park Lay
My Presentation Park Layjunowedd
 
Axfood q4 2011 presentation
Axfood q4 2011 presentationAxfood q4 2011 presentation
Axfood q4 2011 presentationAxfood
 
My Arctic Tundra Project!Custer
My Arctic Tundra Project!CusterMy Arctic Tundra Project!Custer
My Arctic Tundra Project!CusterMichelle McGinnis
 
Axfood Annual General Meeting 2012
Axfood Annual General Meeting 2012Axfood Annual General Meeting 2012
Axfood Annual General Meeting 2012Axfood
 
Modern recruiter tips
Modern recruiter tipsModern recruiter tips
Modern recruiter tipsRob Humphrey
 
Presentasjon Bekas
Presentasjon BekasPresentasjon Bekas
Presentasjon Bekasenergien
 
FüüSika üLdistavad Teemad KokkuvõTtena Keskkooli LõPus
FüüSika üLdistavad Teemad KokkuvõTtena Keskkooli LõPusFüüSika üLdistavad Teemad KokkuvõTtena Keskkooli LõPus
FüüSika üLdistavad Teemad KokkuvõTtena Keskkooli LõPussuurmets
 
About Geography of Health: Reflections on Concepts & Relevant Techniques by D...
About Geography of Health: Reflections on Concepts & Relevant Techniques by D...About Geography of Health: Reflections on Concepts & Relevant Techniques by D...
About Geography of Health: Reflections on Concepts & Relevant Techniques by D...Priyanka_vshukla
 
Five Industries Still Doing Work
Five Industries Still Doing WorkFive Industries Still Doing Work
Five Industries Still Doing WorkGina Alexander
 
公司简介 Ppt中文长版
公司简介 Ppt中文长版公司简介 Ppt中文长版
公司简介 Ppt中文长版wolves hu
 
DigimarcDiscover_CaseStudy_HouseBeautiful_061714_FNL
DigimarcDiscover_CaseStudy_HouseBeautiful_061714_FNLDigimarcDiscover_CaseStudy_HouseBeautiful_061714_FNL
DigimarcDiscover_CaseStudy_HouseBeautiful_061714_FNLdkinpdx
 
English Project ( Joan and Carla )
English Project ( Joan and Carla )English Project ( Joan and Carla )
English Project ( Joan and Carla )guestcca71c
 

Viewers also liked (20)

レガシーコード改善はじめました 横浜道場
レガシーコード改善はじめました 横浜道場レガシーコード改善はじめました 横浜道場
レガシーコード改善はじめました 横浜道場
 
AgileJapan2010 基調講演:野中郁次郎先生による「実践知のリーダシップ~スクラムと知の場作り」
AgileJapan2010 基調講演:野中郁次郎先生による「実践知のリーダシップ~スクラムと知の場作り」AgileJapan2010 基調講演:野中郁次郎先生による「実践知のリーダシップ~スクラムと知の場作り」
AgileJapan2010 基調講演:野中郁次郎先生による「実践知のリーダシップ~スクラムと知の場作り」
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Alenty appnexus app
Alenty  appnexus appAlenty  appnexus app
Alenty appnexus app
 
Ley organica del_trabajo_los_trabajadores_y_las_trabajadoras
Ley organica del_trabajo_los_trabajadores_y_las_trabajadorasLey organica del_trabajo_los_trabajadores_y_las_trabajadoras
Ley organica del_trabajo_los_trabajadores_y_las_trabajadoras
 
00555 0 ccet0001020
00555 0 ccet000102000555 0 ccet0001020
00555 0 ccet0001020
 
My Presentation Park Lay
My Presentation Park LayMy Presentation Park Lay
My Presentation Park Lay
 
Axfood q4 2011 presentation
Axfood q4 2011 presentationAxfood q4 2011 presentation
Axfood q4 2011 presentation
 
My Arctic Tundra Project!Custer
My Arctic Tundra Project!CusterMy Arctic Tundra Project!Custer
My Arctic Tundra Project!Custer
 
Axfood Annual General Meeting 2012
Axfood Annual General Meeting 2012Axfood Annual General Meeting 2012
Axfood Annual General Meeting 2012
 
Modern recruiter tips
Modern recruiter tipsModern recruiter tips
Modern recruiter tips
 
Strategia broker assicurativi
Strategia broker assicurativiStrategia broker assicurativi
Strategia broker assicurativi
 
Presentasjon Bekas
Presentasjon BekasPresentasjon Bekas
Presentasjon Bekas
 
FüüSika üLdistavad Teemad KokkuvõTtena Keskkooli LõPus
FüüSika üLdistavad Teemad KokkuvõTtena Keskkooli LõPusFüüSika üLdistavad Teemad KokkuvõTtena Keskkooli LõPus
FüüSika üLdistavad Teemad KokkuvõTtena Keskkooli LõPus
 
About Geography of Health: Reflections on Concepts & Relevant Techniques by D...
About Geography of Health: Reflections on Concepts & Relevant Techniques by D...About Geography of Health: Reflections on Concepts & Relevant Techniques by D...
About Geography of Health: Reflections on Concepts & Relevant Techniques by D...
 
Five Industries Still Doing Work
Five Industries Still Doing WorkFive Industries Still Doing Work
Five Industries Still Doing Work
 
公司简介 Ppt中文长版
公司简介 Ppt中文长版公司简介 Ppt中文长版
公司简介 Ppt中文长版
 
D.D.-C.V.
D.D.-C.V.D.D.-C.V.
D.D.-C.V.
 
DigimarcDiscover_CaseStudy_HouseBeautiful_061714_FNL
DigimarcDiscover_CaseStudy_HouseBeautiful_061714_FNLDigimarcDiscover_CaseStudy_HouseBeautiful_061714_FNL
DigimarcDiscover_CaseStudy_HouseBeautiful_061714_FNL
 
English Project ( Joan and Carla )
English Project ( Joan and Carla )English Project ( Joan and Carla )
English Project ( Joan and Carla )
 

Similar to Key topics when migrating from FAST to Solr, EuroCon 2010

Migrating Fast to Solr
Migrating Fast to SolrMigrating Fast to Solr
Migrating Fast to SolrCominvent AS
 
Manufacturing Webinar AMS
Manufacturing Webinar AMSManufacturing Webinar AMS
Manufacturing Webinar AMSSplunk
 
How the WSO2 ESB outperforms other major open source esb vendors
How the WSO2 ESB outperforms other major open source esb vendorsHow the WSO2 ESB outperforms other major open source esb vendors
How the WSO2 ESB outperforms other major open source esb vendorsWSO2
 
The power of faceted search in alfresco
The power of faceted search in alfrescoThe power of faceted search in alfresco
The power of faceted search in alfrescoXeniT Solutions nv
 
Spirit20090924poly
Spirit20090924polySpirit20090924poly
Spirit20090924polyGary Dare
 
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyJean-Sebastien Delfino
 
FPL'2014 - FlexTiles Workshop - 5 - FlexTiles Simulation Platform
FPL'2014 - FlexTiles Workshop - 5 - FlexTiles Simulation PlatformFPL'2014 - FlexTiles Workshop - 5 - FlexTiles Simulation Platform
FPL'2014 - FlexTiles Workshop - 5 - FlexTiles Simulation PlatformFlexTiles Team
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroYosuke Matsusaka
 
Ai meetup Neural machine translation updated
Ai meetup Neural machine translation updatedAi meetup Neural machine translation updated
Ai meetup Neural machine translation updated2040.io
 
Use of-solr-at-trovit-classified-ads marc-sturlese
Use of-solr-at-trovit-classified-ads marc-sturleseUse of-solr-at-trovit-classified-ads marc-sturlese
Use of-solr-at-trovit-classified-ads marc-sturleseMarc Sturlese
 
AIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translationAIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translation2040.io
 
Keep Calm and Use Parser
Keep Calm and Use ParserKeep Calm and Use Parser
Keep Calm and Use ParserOPNFV
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonApache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonDataWorks Summit
 
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...
TAUS USER CONFERENCE 2010,  Sony, Pangeanic - moving on with mt - building op...TAUS USER CONFERENCE 2010,  Sony, Pangeanic - moving on with mt - building op...
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...TAUS - The Language Data Network
 

Similar to Key topics when migrating from FAST to Solr, EuroCon 2010 (20)

HPC Workbench Presentation
HPC Workbench PresentationHPC Workbench Presentation
HPC Workbench Presentation
 
Migrating Fast to Solr
Migrating Fast to SolrMigrating Fast to Solr
Migrating Fast to Solr
 
Manufacturing Webinar AMS
Manufacturing Webinar AMSManufacturing Webinar AMS
Manufacturing Webinar AMS
 
How the WSO2 ESB outperforms other major open source esb vendors
How the WSO2 ESB outperforms other major open source esb vendorsHow the WSO2 ESB outperforms other major open source esb vendors
How the WSO2 ESB outperforms other major open source esb vendors
 
The power of faceted search in alfresco
The power of faceted search in alfrescoThe power of faceted search in alfresco
The power of faceted search in alfresco
 
Spirit20090924poly
Spirit20090924polySpirit20090924poly
Spirit20090924poly
 
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
 
FPL'2014 - FlexTiles Workshop - 5 - FlexTiles Simulation Platform
FPL'2014 - FlexTiles Workshop - 5 - FlexTiles Simulation PlatformFPL'2014 - FlexTiles Workshop - 5 - FlexTiles Simulation Platform
FPL'2014 - FlexTiles Workshop - 5 - FlexTiles Simulation Platform
 
Vishal_Resume
Vishal_ResumeVishal_Resume
Vishal_Resume
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI Intro
 
LOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack ArchitectureLOD2: State of Play WP6 - LOD2 Stack Architecture
LOD2: State of Play WP6 - LOD2 Stack Architecture
 
Ai meetup Neural machine translation updated
Ai meetup Neural machine translation updatedAi meetup Neural machine translation updated
Ai meetup Neural machine translation updated
 
Use of-solr-at-trovit-classified-ads marc-sturlese
Use of-solr-at-trovit-classified-ads marc-sturleseUse of-solr-at-trovit-classified-ads marc-sturlese
Use of-solr-at-trovit-classified-ads marc-sturlese
 
AIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translationAIMeetup #4: Neural-machine-translation
AIMeetup #4: Neural-machine-translation
 
DhevendranResume
DhevendranResumeDhevendranResume
DhevendranResume
 
Keep Calm and Use Parser
Keep Calm and Use ParserKeep Calm and Use Parser
Keep Calm and Use Parser
 
Rajesh - CV
Rajesh - CVRajesh - CV
Rajesh - CV
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonApache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
 
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...
TAUS USER CONFERENCE 2010,  Sony, Pangeanic - moving on with mt - building op...TAUS USER CONFERENCE 2010,  Sony, Pangeanic - moving on with mt - building op...
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...
 

More from Cominvent AS

Solr's missing plugin ecosystem
Solr's missing plugin ecosystemSolr's missing plugin ecosystem
Solr's missing plugin ecosystemCominvent AS
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr CloudCominvent AS
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update ChainCominvent AS
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyCominvent AS
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkCominvent AS
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwiseCominvent AS
 
Frokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asFrokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asCominvent AS
 
Cominvent AS company Presentation
Cominvent AS company PresentationCominvent AS company Presentation
Cominvent AS company PresentationCominvent AS
 

More from Cominvent AS (9)

Solr's missing plugin ecosystem
Solr's missing plugin ecosystemSolr's missing plugin ecosystem
Solr's missing plugin ecosystem
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update Chain
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søk
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwise
 
Frokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent asFrokostseminar mai 2010 solr open source cominvent as
Frokostseminar mai 2010 solr open source cominvent as
 
Cominvent AS company Presentation
Cominvent AS company PresentationCominvent AS company Presentation
Cominvent AS company Presentation
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Key topics when migrating from FAST to Solr, EuroCon 2010

  • 1. Key topics when Migratng from FAST to Solr By Jan Høydahl cominvent as Apache Lucene EuroCon 05/21/10
  • 2. Agenda  About Cominvent & Jan Høydahl  Quick overview of FAST ESP  The migraton step by step  Pain points  Q&A Apache Lucene EuroCon 05/21/10
  • 3. Jan Høydahl: BIO ● Enterprise search consultant since 2000 ● Background in Telecom, Mobile services & sofware development ● Second FAST Global Services engineer ● Founder of Cominvent AS ● Lucid Imaginaton certfed instructor & partner ● FAST Certfed instructor Apache Lucene EuroCon 05/21/10 Logos represent projects I've been involved in, and ™ are © of respectve companies
  • 4. Cominvent AS: Consultng  Vendor independent search consultng Apache Lucene EuroCon 05/21/10
  • 5. Cominvent AS: Training  Certfed Solr Training Partner with Lucid Imaginaton  Certfed FAST ESP Training Partner Apache Lucene EuroCon 05/21/10 Photo: fuidpowerzone.com
  • 6. Solr training Oslo June 1-3 Apache Lucene EuroCon 05/21/10
  • 7. Assumptons  Decision to migrate to Solr is already done  This is not a "sales talk" for any partcular technology  Basic knowledge of Solr  None or limited knowledge of FAST ESP  Migraton to plain Solr or LucidWorks (LucidWorks Enterprise editon not considered) Apache Lucene EuroCon 05/21/10
  • 8. Introducton to... ...for Solr people Apache Lucene EuroCon 05/21/10
  • 9. Security Connectors Apache Lucene EuroCon 05/21/10
  • 11. FAST ESP architecture Apache Lucene EuroCon 05/21/10 Source: www.microsof.com
  • 12. Very strong & scalable document processing framework Format Language Linguistic Conversion Detection Normalization Entities Custom Taxonomy Sentiment Ontology Plug-in Search Alert PARIS (Reuters) - Venus Williams raced into the second round of the $11.25 million French Open Monday, brushing aside Bianka Lamade, 6-3, 6-3, in 65 minutes. Apache Lucene EuroCon 05/21/10
  • 13. FAST Document Processors (DP)  DPs transform documents prior to indexing  This is diferent from Solr feld centric analysis  Examples of stages:  Encoding normalizaton, language identfcaton  Text extracton (HTML, PDF, MS Ofce, etc.)  Tokenizaton, lemmatzaton, entty extracton  DPs are chained in pipelines  ESP ships with lots useful DPs and pipelines  Writen in Python, very easy to script new ones Custom Taxonomy Sentiment Ontology Plug-in Apache Lucene EuroCon 05/21/10
  • 14. Terminology Lucene/Solr FAST Replica Search row Shard Column Facet Navigator Spellcheck Did you mean Update processor Document processor Request Handler Query Transformer (QT) Response Writer Result Processor(RP)/TWM Apache Lucene EuroCon 05/21/10
  • 15. Terminology Lucene/Solr FAST Schema Index profile Index segment Index partition Lucene IndexWriter/Rdr indexer/fsearch (RTS) ~Multi core ~Multi cluster (Documents receiving same Collection processing) Apache Lucene EuroCon 05/21/10
  • 16. Important diferences Lucene/Solr FAST Most features query-time Most features index-time Field centric analysis Document centric analysis One language per field Multi lingual fields One Update handler per Format conversion in input type (XML, CSV) document pipeline Slim disk & memory Quite fat disk & memory footprint footprint One Java Web app 15-20 processes Apache Lucene EuroCon 05/21/10
  • 17. Solr Architecture Thanks to Christan Moen/ATILIKA for graphics Apache Lucene EuroCon 05/21/10
  • 18. The migraton... Apache Lucene EuroCon 05/21/10
  • 19. Steps of the migraton  Review current features & architecture  Keep all features? Add new?  Install Solr and do a quick iteraton (1-2 days):  Draf schema.xml & solrconfg.xml  Dump & index some real data  Play around with queries – Solritas is nice here  Design spec covering all migraton areas:  Schema, Content, Feeding & Analysis  Frontends, Querying & API  Admin & Operatonal  Implement :) Apache Lucene EuroCon 05/21/10
  • 20. Spreadsheet for planning the schema Apache Lucene EuroCon 05/21/10
  • 21. Migratng index-profle -> Solr schema  ESP index profle -> Solr schema.xml  FAST felds example:  Solr equivalent:  Example: A feld with "tokenize=auto" in FAST → type="text"  Create new <feldType>'s as needed Apache Lucene EuroCon 05/21/10
  • 22. Product facets & generic felds  With FAST you ofen use «generic1», «generic2» etc to model product facets which may vary between product groups. Front ends need logic to convert. Apache Lucene EuroCon 05/21/10
  • 23. Product facets & generic felds  With Solr, using dynamic felds, each document can have as many facets you like.  Makes it easy to e.g. Introduce a new «color» facet for cars or a «MegaPixels» facet for digital cameras Apache Lucene EuroCon 05/21/10
  • 24. Composite felds -> DisMax ReqHandler  FAST uses composite felds to search across multple felds, with weightng defned in Rank Profles  FAST's composite felds & rank profles can be modelled as Solr «DisMax» queries  Set suitable defaults in solrconfg.xml using named requesthandler instances.  In case of many felds & performance issues, use <copyField> to group similarly ranked felds!  Freshness boost, GEO boost etc handled through Functon Queries Apache Lucene EuroCon 05/21/10
  • 25. Composite felds -> DisMax ReqHandler  Given a FAST composite feld / Rank Profle Apache Lucene EuroCon 05/21/10
  • 26. Composite felds -> DisMax ReqHandler  This Solr query will do the same, confgureable per query:  qt=dismax  q=oslo  qf=ttle^5.0 teaser^1.5 body^0.1  bf=recip(rord(last_modifed),1,1000,1000) ... ... DisjunctonMaxQuery((teaser:foo^1.5 ||ttle:foo^5.0 ||body:foo^0.1)~0.01) DisjunctonMaxQuery((teaser:foo^1.5 ttle:foo^5.0 body:foo^0.1)~0.01) DisjunctonMaxQuery((teaser:bar^1.5 ||ttle:bar^5.0 ||body:bar^0.1)~0.01) DisjunctonMaxQuery((teaser:bar^1.5 ttle:bar^5.0 body:bar^0.1)~0.01) FunctonQuery(1000.0/(1.0*foat(top(rord(last_modifed))) FunctonQuery(1000.0/(1.0*foat(top(rord(last_modifed))) ... ... Apache Lucene EuroCon 05/21/10
  • 27. Statc document boosts  FAST uses the «hwboost» feld to add a statc Quality boost to each document.  In Solr, you have more fexibility:  Add a boost to each document <doc boost="10.0">  Add a boost to each feld <feld name="ttle" boost="10.0">  Include any numeric document feld in a BoostFuncton bf=sum(sqrt(popularity)^100.0, statcboost^20.0) bf=sum(sqrt(popularity)^100.0, statcboost^20.0) Apache Lucene EuroCon 05/21/10
  • 28. Navigator statstcs  FAST navigators provide statstcs metadata (min/max/avg/sum)  Soluton: Use the StatsComponent Apache Lucene EuroCon 05/21/10
  • 29. Navigator auto-buckets  FAST numeric navigators give auto-bucketng based on  equal-frequency, equal-width, manual  Soluton:  Create a new feld which is pre-computed  Example: Document A has price=200.000, add pricerange="150.000 – 1.299.999"  Or use facet queries (expensive)  Or implement auto-bucketng and contribute the patch :-) Apache Lucene EuroCon 05/21/10
  • 30. XRANK  FAST has a feature to boost documents satsfying an "XRANK" sub-query with a certain statc boost  In Solr, you can solve most XRANK use cases using FunctonQueries Apache Lucene EuroCon 05/21/10
  • 31. Scope search  FAST ofers a feld type which holds arbitrary XML  Search in XPath-style: xml:companies:company:and(revenue:>1000, employees:>=100)  Have not found similar feld type in Lucene.  Anyone? Apache Lucene EuroCon 05/21/10
  • 32. Migratng Connectors  FAST's connectors are many and mature  For simple use cases, consider Solr's DIH:  Supports DB, RSS, Web-services, Local flesystem  Additonally throgh Lucene Connectors Framework:  EMC Documentum, FileNet, JDBC, LiveLink, Patriarch (Memex), Meridio, SharePoint, RSS  New connectors should be writen for LCF -and be submited back to the community :) Apache Lucene EuroCon 05/21/10
  • 33. Migratng Web Crawler  FAST's crawler is mature, performing & scalable  Solr has no built-in web crawler  Prepare for a lot of extra work migratng crawler  Alternatves:  The Apache Nutch crawler (steep learning curve)  Apache Droids  Heritx + Solr (example in Solr1.4 book)  OpenPipeline has a (very) simple crawler Apache Lucene EuroCon 05/21/10
  • 34. Migratng document processing  Solr lacks a sophistcated processing pipeline.  Alternatves:  Solr's UpdateProcessorChain for simple pipelines:  Write a Solr UpdateProcessor (in Java, Jython etc, see SOLR-1725)  OpenPipeline for more advanced requirements:  Check out FindWise's talk  Integrated with Solr  LingPipe NamedEnttyExtractor plugin Apache Lucene EuroCon 05/21/10
  • 35. Document processing examples  Binary documents with metadata  Actual customer request: Enrich library records with PDF content  Use Open Pipeline with Apache Tika processor  Implmenent Tika as an UpdateRequestProcessor (SOLR-1763)  Custom XML using FAST's XMLMapper  DIH's built-in XPath support  XSLT to Solr input XML  Write an new XMLMapper Update Request Handler? Apache Lucene EuroCon 05/21/10
  • 36. Mult lingual  FAST is state of the art on linguistcs  FAST is language aware, e.g. the ttle feld is "analyzed" depending on detected language  Solr is not language aware  Each feld type has one and only one language  Most common soluton:  One feld type per language: text_no, text_en, text_de  Dynamic felds: <dynamicField name="*_en" type="text_en"..../>  Implement language awareness in applicaton layer (feeding + querying) Apache Lucene EuroCon 05/21/10
  • 37. Mult lingual – advanced  FAST ships with Lemmatzaton for most languages  Solr ships with Stemming – has limitatons  Solutons for mult lingual needs:  Kstem is tghter. Free with  License 3rd party linguistcs  Example: BasisTech Rosete Linguistc Platorm Lemmatzaton, POS etc.. Apache Lucene EuroCon 05/21/10
  • 38. Mult lingual – very advanced  FAST allows lemmatzaton by index expansion  This can be useful if your frontend does not know what languages are being queried, as all the word infectons are stored in the index.  There is no soluton for this in Solr today,  Workaround: DisMax query spanning all languages: q=eurocon&qf=text_en^2.0 text_no text_de text_it  Downside: This gets ugly and slow with increasing number of languages Apache Lucene EuroCon 05/21/10
  • 39. Migratng Front ends / Query  Using a search middleware with Solr support? Lucky you!  If not, consider introducing one now:  Using FAST Java/.NET APIs?  Choose SolrJ or SolrNET/SolrSharp  Query language diferences. &fq= instead of flter()  Solr facets do not require session/state as FAST's Apache Lucene EuroCon 05/21/10
  • 40. Result views  FAST uses "result-view" and "search profle" to specify what felds to return.  Migrate FAST's «views» into named RequestHandler confgs with all default presets  No need to defne felds to return up-front!, use f=a,b,c... Apache Lucene EuroCon 05/21/10
  • 41. Operatons  Solr has no central admin-server (untl "SolrCloud")  For GUI installer, use  Multple cores – allows smooth schema upgrade etc.  No built-in query reportng, log analysis or monitoring. But have a look at: Apache Lucene EuroCon 05/21/10
  • 42. Summary  Many migratons are (quite) straight-forward!  Warning fags  Mult-lingual and advanced linguistcs  Heavy use of Document Processing, including Entty Extracton  Scope search  Other enterprise complexites (security, connectors etc)  Follow a structured process  Quick prototyping  Design spec for each area  Don't forget to analyze logs and measure user satsfacton! Apache Lucene EuroCon 05/21/10
  • 43. Thank You www.cominvent.com jh@cominvent.com www.twiter.com/cominvent linkedin.com/in/janhoy This presentaton licensed under CC-by-sa license Apache Lucene EuroCon 05/21/10 You must atribute Cominvent with name and link