SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Solr Update
                              code4lib conference, 13 Feburary '13, Chicago
                                                   presented by
                                                   Erik Hatcher




© Copyright 2013 LucidWorks
Abstract

      Solr is continually improving. Solr 4 was recently released,
      bringing dramatic changes in the underlying Lucene library and
      Solr-level features. It's tough for us all to keep up with the
      various versions and capabilities.

      This talk will blaze through the highlights of new features and
      improvements in Solr 4 (and up). Topics will include: SolrCloud,
      direct spell checking, surround query parser, and many other
      features. We will focus on the features library coders really need
      to know about.




    © 2013 LucidWorks

2
About: È  ℛ  Ỉ  Ķ  Ḫ  Ằ  Ţ  Ḉ  Ḣ  Ể  Ŕ

    • Co-author “Lucene in Action”
    • Lucene/Solr Committer and PMC, ASF Member
    • Senior Solutions Architect and co-founder, LucidWorks
       - (formerly Lucid Imagination)
    • Library Cred:
       - developer for Rossetti Archive and NINES
       - originator/namer of Blacklight




    © 2013 LucidWorks

3
© 2013 LucidWorks

4
Lucene 4 Highlights

    • Flexible index formats
    • Pluggable scoring
    • String -> BytesRef
    • DWPT (Document Writer Per Thread)
       - faster, more consistent indexing speed
    • NRT (Near Real-Time)
       - per-segment loading of FieldCache, soft commits
    • Spatial overhaul
    • FST/FSA
       - FuzzyQuery over 100x faster
       - also reduces memory footprint for Terms index
    • And much much more!
       - See http://lucene.apache.org/core/4_1_0/changes/Changes.html


    © 2013 LucidWorks

5
Flexible index formats

    •For terms, postings lists, stored fields, term vectors, etc
    •Several new posting list codecs
      - Pulsing (inlines low doc freq)
      - Block (packed int blocks)
      - SimpleText (debugging, transparency)
      - Bloom (experimental, also inlines low doc freq)
      - Appending (for append-only filesystems such as HDFS)
      - Memory (terms as FST)
    •Compressed stored fields




    © 2013 LucidWorks

6
Pluggable scoring

    •Decoupled from traditional vector space (TF/IDF)
    •Additional index statistics
      - number of tokens for a term or field
      - number of postings for a field
      - number of documents with a posting for a field
    •Several built-in alternatives:
      - BM25
      - DFR – divergence from randomness
      - Information-based models




    © 2013 LucidWorks

7
Indexing performance

    • http://people.apache.org/~mikemccand/lucenebench/
      indexing.html




    © 2013 LucidWorks

8
QPS (primary key lookup)

    • http://people.apache.org/~mikemccand/lucenebench/
      PKLookup.html




    © 2013 LucidWorks

9
FuzzyQuery

     • http://people.apache.org/~mikemccand/lucenebench/
       Fuzzy2.html




     © 2013 LucidWorks

10
© 2013 LucidWorks

11
Solr 4 Highlights

     • Requires Java 1.6+
     • Pivot facets
        - http://wiki.apache.org/solr/SimpleFacetParameters#facet.pivot
     • DirectSpellChecker support
        - http://wiki.apache.org/solr/SpellCheckComponent
     • Improved document response
        - DocTransformer: [shard], [explain], [value], [docid]
        - Function query results
        - http://wiki.apache.org/solr/DocTransformers
     • Pseudo-join
        - http://wiki.apache.org/solr/Join
     • Surround query parser



     © 2013 LucidWorks

12
More Solr 4 Highlights

     • Transaction log
     • Several new update processors, including a “script” one
        - http://wiki.apache.org/solr/ScriptUpdateProcessor
     • Spatial overhaul
        - http://wiki.apache.org/solr/SpatialSearch
     • Content-type savvy /update handler
     • SolrCloud
        - http://wiki.apache.org/solr/SolrCloud
     • And more!
        - See http://lucene.apache.org/solr/4_1_0/changes/Changes.html




     © 2013 LucidWorks

13
Solr 4.1

     • Enhanced document routing (custom sharding)
     • Compressed stored fields
     • MoreLikeThis distributed capability
     • AnalyzingSuggester
        - http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing-
          suggester.html
        - via lookupImpl = org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory
        - and FuzzyLookupFactory
     • Many SolrCloud fixes and improvements
     • Stanford! - _query_ no longer needed to specify nested query
       parsers




     © 2013 LucidWorks

14
Looks Good!




     © 2013 LucidWorks

15
Pivot Faceting

     • Finds the top N constraints for field1, then for each of those,
       finds the top N constraints for field2, etc
     • Syntax: facet.pivot=field1,field2,field3,…

     facet.pivot=cat,inStock
                             #docs #docs w/             #docs w/
                                   inStock:true         instock:false
     cat:electronics         14      10                 4
     cat:memory              3       3                  0
     cat:connector           2       0                  2
     cat:graphics card       2       0                  2
     cat:hard drive          2       2                  0
     © 2013 LucidWorks

16
DirectSpellChecker

     • Automaton-based
     • Candidates are presented directly from the term dictionary,
       based on Levenshtein distance.
     • A practical benefit of this spellchecker is that it requires no
       additional datastructures (neither in RAM nor on disk) to do its
       work.
        - http://lucene.apache.org/core/4_1_0/suggest/org/apache/lucene/search/spell/
          DirectSpellChecker.html




     © 2013 LucidWorks

17
Improved document response

     • Returns other info along with document stored fields
     • Function queries
        - fl=name,location,geodist(),add(myfield,10)
     • Fieldname globs
        - fl=id,attr_*
     • Multiple “fl” (field list) values
        - &fl=id,attr_*
        - &fl=geodist()
        - &fl=termfreq(text,’solr’)
     • Aliasing
        - fl=id,location:loc,_dist_:geodist()
     • fl=id,[explain],[shard]



     © 2013 LucidWorks

18
Improved document response example
     $ curl http://localhost:8983/solr/query?
         q=solr
         &fl=id,apache_mentions:termfreq(text,’apache’)
         &fl=my_constant:”this is cool!”
         &fl=inStock, not(inStock)
         &fl=other_query_score:query($qq)
         &qq=text:search

     { "response":{"numFound":1,"start":0,"docs":[
           {
             "id":"SOLR1000",
             "apache_mentions":1,
             "my_constant":"this is cool!",
             "inStock":true,
             "not(inStock)":false,
             "other_query_score":0.84178084
           }]}




      © 2013 LucidWorks

19
Query Parsing

     • _query_ no longer needed for nested queries
        - https://issues.apache.org/jira/browse/SOLR-4093
     • "surround" query parser
        - enables the use of Lucene's SpanQuery family, sophisticated proximity
          matching
        - Examples:
            »5n(dog cat)
            »dog 5w cat
        - http://wiki.apache.org/solr/SurroundQueryParse




     © 2013 LucidWorks

20
New Spatial Support

     • wiki.apache.org/solr/SpatialSearch
     • Multiple values per field
     • Index shapes other than points (circles, polygons, etc)
     • Indexing:
        - "geo”:”43.17614,-90.57341”
        - “geo”:”Circle(4.56,1.23 d=0.0710)”
        - “geo”:”POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))”


     • Searching:
        - fq=geo:"Intersects(-74.093 41.042 -69.347 44.558)"
        - fq=geo:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30)))"




     © 2013 LucidWorks

21
Add and Retrieve document

     $ curl http://localhost:8983/solr/update -H 'Content-type:application/
     json' -d '
     [
       { "id" : "book1",
         "title" : "Infinite Jest",
         "author" : "David Foster Wallace"
       }
     ]'




     $ curl http://localhost:8983/solr/get?id=book1
     {
       "doc": {
         "id" : "book1",
         "author": "David Foster Wallace",
         "title" : "Infinite Jest",
         "_version_": 1410390803582287872
       }
     }


       © 2013 LucidWorks

22
Atomic Updates
         $ curl http://localhost:8983/solr/update
                  -H 'Content-type:application/json' -d '
         [
           {"id"         : "book1",
             "pubyear_i" : { "add" : 2006 },
             "ISBN_s"    : { "add" : "0-380-97365-1"}
           }
         ]'




        $ curl http://localhost:8983/solr/update
                 -H 'Content-type:application/json' -d '
        [
          {"id"         : "book1",
           "copies_i" : { "inc" : 1},
           "cat"        : { "add" : "fiction"},
           "ISBN_s"     : { "set" : "0-316-92004-5"}
           "remove_s" : { "set" : null } }
        ]'


     © 2013 LucidWorks

23
Pseudo-Join
                                               id: post1
id: blog1
                                               blog_id: blog1
name: Blog 1
                                               author: John Doe
owner: c4l
                                               title: Pseudo-join can be handy!
Started: 2007-10-26
                                               body: Here's how to use {!join....}


id: blog2                                      id: post2
name: Blog 2                                   blog_id: blog1
owner: zoia                                    author: John Doe
started: 2005-1-31                             title: Solr Update
                                               body: Live streaming today!


                                               id: post3
                                               blog_id: blog2
                                               author: Jane Doe
                                               title: What's New at code4lib

  Restrict to blogs mentioning netflix:
    - How it works:
                               fq={!join from=blog_id to=id}body:code4lib
             - Finds all documents matching “code4lib”
             - Maps to different docs by following blog_id to id
     © 2013 LucidWorks
Pseudo-Join Examples

• Only show posts from blogs started after 2010
         &fq={!join from=id to=blog_id}started:[2010 TO *]
• If any post in a blog mentions “Chicago”, then search all posts in
  that blog for “conference” (self-join)
         q=conference
         &fq={!join from=blog_id to=blog_id}Chicago
• If any blog post mentions “Chicago”, then search all emails with
  the same blog owner for “conference”
         q=email_body:conference
         &fq={!join from=owner_email_user to=email_user}{!join from=blog_id to=id}
         Chicago




© 2013 LucidWorks
Cross-Core Join

http://localhost:8983/solr/collection1/select?q=*:*
&fq={!join fromIndex=sec1 from=security_groups to=security}user:john



id: doc1
security: managers
                                              id: mary
title: doc for managers only
                                              security_groups: managers, employees
body: …

                                              id: john
id: doc1                                      security_groups: employees
security: managers, employees
title: doc for everyone
body: …


                    collection1                               sec1

                                  Single Solr Server

© 2013 LucidWorks
New UpdateProcessor's

     • FieldMutatingUpdateProcessor family:
        - ConcatField, CountField, FieldLength, HTMLStripField, IgnoreField,
          RegexReplace, RemoveBlankField, TrimField, TruncateField
     • ScriptUpdateProcessor
        - enables update processing code to be written in a scripting language. The
          script can be written in any scripting language supported by your JVM (such as
          JavaScript), and executed dynamically so no pre-compilation is necessary.
        - http://wiki.apache.org/solr/ScriptUpdateProcessor




     © 2013 LucidWorks

27
SolrCloud: Solr 4’s scalability

     • Sharded leaders and replicas
     • ZooKeeper used for cluster management
     • Distributed indexing
        - Automatically distributes updates to appropriate shard
        - Facilitates Near Real-Time (NRT) searching
     • Distributed search
        - Automatically distributes to nodes of each shard
     • Robust, automatic update recovery
     • Real-time /get
        - Leverages transaction log
     • No single point of failure
     • Large scale NRT using soft commits
     • Transaction log uses:
       - Durability for updates that have not yet been committed
       - Peer syncing in SolrCloud
       - Real-time get


     © 2013 LucidWorks

28
SolrCloud Visualization




                         Image from http://bit.ly/X4E5H9
     © 2013 LucidWorks

29
Near Real Time (NRT) softCommit

      • softCommit opens a new view of the index without flushing +
        fsyncing files to disk
         - Decouples update visibility from update durability
      • commitWithin now implies a soft commit
      • Current autoCommit defaults from solrconfig.xml:

     <autoCommit>
       <maxTime>15000</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>

     <!--
       <autoSoftCommit>
          <maxTime>1000</maxTime>
       </autoSoftCommit>
     -->



      © 2013 LucidWorks

30
Solr is NoSQL

     • Update durability
        - A transaction log ensures that even uncommitted documents are never lost.
     • Real-time Get
        - The ability to quickly retrieve the latest version of a document, without the need
          to commit or open a new searcher
     • Versioning and Optimistic Locking
        - combined with real-time get, this allows read-update-write functionality that
          ensures no conflicting changes were made concurrently by other clients.
     • Atomic updates
        - the ability to add, remove, change, and increment fields of an existing
          document without having to send in the complete document again.
     • Real-time /get combined with SolrCloud make a very powerful
       key/value pair database



     © 2013 LucidWorks

31
Ṁ  Ȇ  Ʈ  Ẳ  Ḍ  Â  Ṭ  Ä




     © 2013 LucidWorks

32
Future

     • JSON Query Parser
        - https://issues.apache.org/jira/browse/SOLR-4351
     • Shard splitting
        - https://issues.apache.org/jira/browse/SOLR-3755




     © 2013 LucidWorks

33
Credits

     • LucidWorks
        - lucidworks.com
     • Manning Publications
        - manning.com/lucene
     • Apache Software Foundation
        - apache.org
     • Apache Lucene
        - lucene.apache.org




     © 2013 LucidWorks

34
Contact Info

     •IRC: erikhatcher
     •erik dot hatcher @ lucidworks dot com
     •@ErikHatcher
     •http://searchhub.org/author/erik/
     •http://erikhatcher.tumblr.com/




     © 2013 LucidWorks

35
Get at me...




                         @ErikHatcher




     © 2013 LucidWorks

36

Contenu connexe

Tendances

MongoDB Advanced Topics
MongoDB Advanced TopicsMongoDB Advanced Topics
MongoDB Advanced TopicsCésar Rodas
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
Database Programming with Perl and DBIx::Class
Database Programming with Perl and DBIx::ClassDatabase Programming with Perl and DBIx::Class
Database Programming with Perl and DBIx::ClassDave Cross
 
쉽게 이해하는 LOD
쉽게 이해하는 LOD쉽게 이해하는 LOD
쉽게 이해하는 LODMyungjin Lee
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQLMark Wong
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Victor de Boer
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会Toshihiro Suzuki
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
SuRf – Tapping Into The Web Of Data
SuRf – Tapping Into The Web Of DataSuRf – Tapping Into The Web Of Data
SuRf – Tapping Into The Web Of Datacosbas
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 

Tendances (20)

MongoDB Advanced Topics
MongoDB Advanced TopicsMongoDB Advanced Topics
MongoDB Advanced Topics
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Database Programming with Perl and DBIx::Class
Database Programming with Perl and DBIx::ClassDatabase Programming with Perl and DBIx::Class
Database Programming with Perl and DBIx::Class
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
 
Mongodb hackathon 02
Mongodb hackathon 02Mongodb hackathon 02
Mongodb hackathon 02
 
쉽게 이해하는 LOD
쉽게 이해하는 LOD쉽게 이해하는 LOD
쉽게 이해하는 LOD
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
SuRf – Tapping Into The Web Of Data
SuRf – Tapping Into The Web Of DataSuRf – Tapping Into The Web Of Data
SuRf – Tapping Into The Web Of Data
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Catmandu / LibreCat Project
Catmandu / LibreCat ProjectCatmandu / LibreCat Project
Catmandu / LibreCat Project
 

En vedette

Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeRogue Wave Software
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world usesRogue Wave Software
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StorySourcesense
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Provectus
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesPeter
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to KazanProvectus
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 

En vedette (20)

Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source code
 
Hackathon
HackathonHackathon
Hackathon
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world uses
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Solr 4
Solr 4Solr 4
Solr 4
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents Story
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build Sites
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to Kazan
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 

Similaire à "Solr Update" at code4lib '13 - Chicago

What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Solr introduction
Solr introductionSolr introduction
Solr introductionLap Tran
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineDavid Keener
 
Extensions on PostgreSQL
Extensions on PostgreSQLExtensions on PostgreSQL
Extensions on PostgreSQLAlpaca
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks
 
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for NewbiesTYPO3 CertiFUNcation
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Javascript done right - Open Web Camp III
Javascript done right - Open Web Camp IIIJavascript done right - Open Web Camp III
Javascript done right - Open Web Camp IIIDirk Ginader
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 

Similaire à "Solr Update" at code4lib '13 - Chicago (20)

What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
iSoligorsk #3 2013
iSoligorsk #3 2013iSoligorsk #3 2013
iSoligorsk #3 2013
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
Extensions on PostgreSQL
Extensions on PostgreSQLExtensions on PostgreSQL
Extensions on PostgreSQL
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Solr 3.1 and beyond
Solr 3.1 and beyondSolr 3.1 and beyond
Solr 3.1 and beyond
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Elastic search
Elastic searchElastic search
Elastic search
 
Javascript done right - Open Web Camp III
Javascript done right - Open Web Camp IIIJavascript done right - Open Web Camp III
Javascript done right - Open Web Camp III
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 

Plus de Erik Hatcher

Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
 

Plus de Erik Hatcher (15)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache Solr
 

Dernier

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Dernier (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

"Solr Update" at code4lib '13 - Chicago

  • 1. Solr Update code4lib conference, 13 Feburary '13, Chicago presented by Erik Hatcher © Copyright 2013 LucidWorks
  • 2. Abstract Solr is continually improving. Solr 4 was recently released, bringing dramatic changes in the underlying Lucene library and Solr-level features. It's tough for us all to keep up with the various versions and capabilities. This talk will blaze through the highlights of new features and improvements in Solr 4 (and up). Topics will include: SolrCloud, direct spell checking, surround query parser, and many other features. We will focus on the features library coders really need to know about. © 2013 LucidWorks 2
  • 3. About: È  ℛ  Ỉ  Ķ  Ḫ  Ằ  Ţ  Ḉ  Ḣ  Ể  Ŕ • Co-author “Lucene in Action” • Lucene/Solr Committer and PMC, ASF Member • Senior Solutions Architect and co-founder, LucidWorks - (formerly Lucid Imagination) • Library Cred: - developer for Rossetti Archive and NINES - originator/namer of Blacklight © 2013 LucidWorks 3
  • 5. Lucene 4 Highlights • Flexible index formats • Pluggable scoring • String -> BytesRef • DWPT (Document Writer Per Thread) - faster, more consistent indexing speed • NRT (Near Real-Time) - per-segment loading of FieldCache, soft commits • Spatial overhaul • FST/FSA - FuzzyQuery over 100x faster - also reduces memory footprint for Terms index • And much much more! - See http://lucene.apache.org/core/4_1_0/changes/Changes.html © 2013 LucidWorks 5
  • 6. Flexible index formats •For terms, postings lists, stored fields, term vectors, etc •Several new posting list codecs - Pulsing (inlines low doc freq) - Block (packed int blocks) - SimpleText (debugging, transparency) - Bloom (experimental, also inlines low doc freq) - Appending (for append-only filesystems such as HDFS) - Memory (terms as FST) •Compressed stored fields © 2013 LucidWorks 6
  • 7. Pluggable scoring •Decoupled from traditional vector space (TF/IDF) •Additional index statistics - number of tokens for a term or field - number of postings for a field - number of documents with a posting for a field •Several built-in alternatives: - BM25 - DFR – divergence from randomness - Information-based models © 2013 LucidWorks 7
  • 8. Indexing performance • http://people.apache.org/~mikemccand/lucenebench/ indexing.html © 2013 LucidWorks 8
  • 9. QPS (primary key lookup) • http://people.apache.org/~mikemccand/lucenebench/ PKLookup.html © 2013 LucidWorks 9
  • 10. FuzzyQuery • http://people.apache.org/~mikemccand/lucenebench/ Fuzzy2.html © 2013 LucidWorks 10
  • 12. Solr 4 Highlights • Requires Java 1.6+ • Pivot facets - http://wiki.apache.org/solr/SimpleFacetParameters#facet.pivot • DirectSpellChecker support - http://wiki.apache.org/solr/SpellCheckComponent • Improved document response - DocTransformer: [shard], [explain], [value], [docid] - Function query results - http://wiki.apache.org/solr/DocTransformers • Pseudo-join - http://wiki.apache.org/solr/Join • Surround query parser © 2013 LucidWorks 12
  • 13. More Solr 4 Highlights • Transaction log • Several new update processors, including a “script” one - http://wiki.apache.org/solr/ScriptUpdateProcessor • Spatial overhaul - http://wiki.apache.org/solr/SpatialSearch • Content-type savvy /update handler • SolrCloud - http://wiki.apache.org/solr/SolrCloud • And more! - See http://lucene.apache.org/solr/4_1_0/changes/Changes.html © 2013 LucidWorks 13
  • 14. Solr 4.1 • Enhanced document routing (custom sharding) • Compressed stored fields • MoreLikeThis distributed capability • AnalyzingSuggester - http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing- suggester.html - via lookupImpl = org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory - and FuzzyLookupFactory • Many SolrCloud fixes and improvements • Stanford! - _query_ no longer needed to specify nested query parsers © 2013 LucidWorks 14
  • 15. Looks Good! © 2013 LucidWorks 15
  • 16. Pivot Faceting • Finds the top N constraints for field1, then for each of those, finds the top N constraints for field2, etc • Syntax: facet.pivot=field1,field2,field3,… facet.pivot=cat,inStock #docs #docs w/ #docs w/ inStock:true instock:false cat:electronics 14 10 4 cat:memory 3 3 0 cat:connector 2 0 2 cat:graphics card 2 0 2 cat:hard drive 2 2 0 © 2013 LucidWorks 16
  • 17. DirectSpellChecker • Automaton-based • Candidates are presented directly from the term dictionary, based on Levenshtein distance. • A practical benefit of this spellchecker is that it requires no additional datastructures (neither in RAM nor on disk) to do its work. - http://lucene.apache.org/core/4_1_0/suggest/org/apache/lucene/search/spell/ DirectSpellChecker.html © 2013 LucidWorks 17
  • 18. Improved document response • Returns other info along with document stored fields • Function queries - fl=name,location,geodist(),add(myfield,10) • Fieldname globs - fl=id,attr_* • Multiple “fl” (field list) values - &fl=id,attr_* - &fl=geodist() - &fl=termfreq(text,’solr’) • Aliasing - fl=id,location:loc,_dist_:geodist() • fl=id,[explain],[shard] © 2013 LucidWorks 18
  • 19. Improved document response example $ curl http://localhost:8983/solr/query? q=solr &fl=id,apache_mentions:termfreq(text,’apache’) &fl=my_constant:”this is cool!” &fl=inStock, not(inStock) &fl=other_query_score:query($qq) &qq=text:search { "response":{"numFound":1,"start":0,"docs":[ { "id":"SOLR1000", "apache_mentions":1, "my_constant":"this is cool!", "inStock":true, "not(inStock)":false, "other_query_score":0.84178084 }]} © 2013 LucidWorks 19
  • 20. Query Parsing • _query_ no longer needed for nested queries - https://issues.apache.org/jira/browse/SOLR-4093 • "surround" query parser - enables the use of Lucene's SpanQuery family, sophisticated proximity matching - Examples: »5n(dog cat) »dog 5w cat - http://wiki.apache.org/solr/SurroundQueryParse © 2013 LucidWorks 20
  • 21. New Spatial Support • wiki.apache.org/solr/SpatialSearch • Multiple values per field • Index shapes other than points (circles, polygons, etc) • Indexing: - "geo”:”43.17614,-90.57341” - “geo”:”Circle(4.56,1.23 d=0.0710)” - “geo”:”POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))” • Searching: - fq=geo:"Intersects(-74.093 41.042 -69.347 44.558)" - fq=geo:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30)))" © 2013 LucidWorks 21
  • 22. Add and Retrieve document $ curl http://localhost:8983/solr/update -H 'Content-type:application/ json' -d ' [ { "id" : "book1", "title" : "Infinite Jest", "author" : "David Foster Wallace" } ]' $ curl http://localhost:8983/solr/get?id=book1 { "doc": { "id" : "book1", "author": "David Foster Wallace", "title" : "Infinite Jest", "_version_": 1410390803582287872 } } © 2013 LucidWorks 22
  • 23. Atomic Updates $ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d ' [ {"id" : "book1", "pubyear_i" : { "add" : 2006 }, "ISBN_s" : { "add" : "0-380-97365-1"} } ]' $ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d ' [ {"id" : "book1", "copies_i" : { "inc" : 1}, "cat" : { "add" : "fiction"}, "ISBN_s" : { "set" : "0-316-92004-5"} "remove_s" : { "set" : null } } ]' © 2013 LucidWorks 23
  • 24. Pseudo-Join id: post1 id: blog1 blog_id: blog1 name: Blog 1 author: John Doe owner: c4l title: Pseudo-join can be handy! Started: 2007-10-26 body: Here's how to use {!join....} id: blog2 id: post2 name: Blog 2 blog_id: blog1 owner: zoia author: John Doe started: 2005-1-31 title: Solr Update body: Live streaming today! id: post3 blog_id: blog2 author: Jane Doe title: What's New at code4lib Restrict to blogs mentioning netflix: - How it works: fq={!join from=blog_id to=id}body:code4lib - Finds all documents matching “code4lib” - Maps to different docs by following blog_id to id © 2013 LucidWorks
  • 25. Pseudo-Join Examples • Only show posts from blogs started after 2010 &fq={!join from=id to=blog_id}started:[2010 TO *] • If any post in a blog mentions “Chicago”, then search all posts in that blog for “conference” (self-join) q=conference &fq={!join from=blog_id to=blog_id}Chicago • If any blog post mentions “Chicago”, then search all emails with the same blog owner for “conference” q=email_body:conference &fq={!join from=owner_email_user to=email_user}{!join from=blog_id to=id} Chicago © 2013 LucidWorks
  • 26. Cross-Core Join http://localhost:8983/solr/collection1/select?q=*:* &fq={!join fromIndex=sec1 from=security_groups to=security}user:john id: doc1 security: managers id: mary title: doc for managers only security_groups: managers, employees body: … id: john id: doc1 security_groups: employees security: managers, employees title: doc for everyone body: … collection1 sec1 Single Solr Server © 2013 LucidWorks
  • 27. New UpdateProcessor's • FieldMutatingUpdateProcessor family: - ConcatField, CountField, FieldLength, HTMLStripField, IgnoreField, RegexReplace, RemoveBlankField, TrimField, TruncateField • ScriptUpdateProcessor - enables update processing code to be written in a scripting language. The script can be written in any scripting language supported by your JVM (such as JavaScript), and executed dynamically so no pre-compilation is necessary. - http://wiki.apache.org/solr/ScriptUpdateProcessor © 2013 LucidWorks 27
  • 28. SolrCloud: Solr 4’s scalability • Sharded leaders and replicas • ZooKeeper used for cluster management • Distributed indexing - Automatically distributes updates to appropriate shard - Facilitates Near Real-Time (NRT) searching • Distributed search - Automatically distributes to nodes of each shard • Robust, automatic update recovery • Real-time /get - Leverages transaction log • No single point of failure • Large scale NRT using soft commits • Transaction log uses: - Durability for updates that have not yet been committed - Peer syncing in SolrCloud - Real-time get © 2013 LucidWorks 28
  • 29. SolrCloud Visualization Image from http://bit.ly/X4E5H9 © 2013 LucidWorks 29
  • 30. Near Real Time (NRT) softCommit • softCommit opens a new view of the index without flushing + fsyncing files to disk - Decouples update visibility from update durability • commitWithin now implies a soft commit • Current autoCommit defaults from solrconfig.xml: <autoCommit> <maxTime>15000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <!-- <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit> --> © 2013 LucidWorks 30
  • 31. Solr is NoSQL • Update durability - A transaction log ensures that even uncommitted documents are never lost. • Real-time Get - The ability to quickly retrieve the latest version of a document, without the need to commit or open a new searcher • Versioning and Optimistic Locking - combined with real-time get, this allows read-update-write functionality that ensures no conflicting changes were made concurrently by other clients. • Atomic updates - the ability to add, remove, change, and increment fields of an existing document without having to send in the complete document again. • Real-time /get combined with SolrCloud make a very powerful key/value pair database © 2013 LucidWorks 31
  • 33. Future • JSON Query Parser - https://issues.apache.org/jira/browse/SOLR-4351 • Shard splitting - https://issues.apache.org/jira/browse/SOLR-3755 © 2013 LucidWorks 33
  • 34. Credits • LucidWorks - lucidworks.com • Manning Publications - manning.com/lucene • Apache Software Foundation - apache.org • Apache Lucene - lucene.apache.org © 2013 LucidWorks 34
  • 35. Contact Info •IRC: erikhatcher •erik dot hatcher @ lucidworks dot com •@ErikHatcher •http://searchhub.org/author/erik/ •http://erikhatcher.tumblr.com/ © 2013 LucidWorks 35
  • 36. Get at me... @ErikHatcher © 2013 LucidWorks 36