SlideShare a Scribd company logo
1 of 43
Download to read offline
How SolrCloud Changes the
                                Erick Erickson, Lucid
User Experience In a            Imagination

Sharded Environment
Lucene Revolution, 9-May-2012
Who am I?
!   “Erick is just some guy, you know”
    •  Your geekiness score is increased if you know where that quote comes
       from, and your age is hinted at
!   30+ years in the programming business, mostly as a developer
!   Currently employed by Lucid Imagination in Professional Services
    •  I get to see how various organizations interpret “search” and I’m
       amazed at the different problems Solr is used to solve
!   Solr/Lucene committer
! ErickErickson@lucidimagination.com
!   Sailor, anybody need crew for sailboat delivery?




                                                                              2
What we’ll cover

!   Briefly, what else is coming in 4.0
! SolrCloud (NOT Solr-in-the-cloud), upcoming in 4.0
    •  What it is
    •  Why you may care
!   Needs SolrCloud addresses
    •  DR/HA
    •  Distributed indexing
    •  Distributed searching
!   I’m assuming basic familiarity with Solr




                                                       3
I’m not the implementer, Mark is

!   Well, Mark Miller and others
!   Mark’s talk (tomorrow) is a deeper technical dive, I recommend it
    highly

    •  Anything I say that contradicts anything
       Mark says, believe Mark
        − After all, he wrote much of the code
!   Mark insisted on the second slide after this one




                                                                        4
5
6
When and Where can we get 4.0?

!   When will it be released? Hopefully 2012
    •  Open Source; have you ever tried herding cats?
    •  Alpha/Beta planned, this is unusual
    •  3.6 probably last 3x release
!   How usable are nightly builds?
    •  LucidWorks Enterprise runs on trunk, so trunk is quite stable and in
       production
!   There’s lots of new stuff!
    •  “unstable” doesn’t really mean unstable code
         − Changing APIs, index format may change
!   Nightly builds: https://builds.apache.org//view/S-Z/view/Solr/
!   Source code and build instructions: http://wiki.apache.org/solr/
    HowToContribute

                                                                              7
Cool stuff in addition to SolrCloud in 4.0




                                             8
Other cool 4.0 (trunk) features

!   Similarity calculations decoupled from Lucene.
     !   Scoring is pluggable
     !   There are several different OOB implementations now (e.g. BM25)
!   FST (Finite State Automata/Transducer) based work. Speed and size
    improvements http://www.slideshare.net/otisg/finite-state-queries-in-lucene
     !   FST for fuzzy queries, 100x faster (McCandless’ blog)
!   You can plug in your own index codec. See pulsing and
    SimpleTextCodec. This is really your own index format
    •  Can be done on a per field basis
    •  Text output as an example
!   Much more efficient in-memory structures
!   NRT (Near Real Time) searching and “soft commits”
!   Spatial (LSP) rather than spatial contrib


                                                                                  9
More cool new features

!   Adding PivotFacetComponent for Hierarchical faceting. See Yonik's
    presentation, “useful URLs” section
!   Pseudo-join queries – See Yonik’s presentation URL in “useful URLs”
    section
!   New Admin UI
!   Can’t over-emphasize the importance of CHANGES.txt
    •  Solr
    •  Lucene
    •  Please read them when upgrading. Really




                                                                          10
SolrCloud setup and use




                          11
What is SolrCloud

!   SolrCloud is a set of new distributed capabilities in Solr that:
     •  Automatically distributes updates (i.e. indexes documents) to the
        appropriate shard
     •  Uses transaction logs for robust update recovery
     •  Automatically distributes searches in a sharded environment
     •  Automatically assigns replicas to shards when available
     •  Supports Near Real Time searching (NRT)
     •  Uses Zookeeper as a repository for cluster state




                                                                            12
Common pain points (why you may care)

!   Every large organization seems to have a recurring set of issues:
    •  Sharding – have to do it yourself, usually through SolrJ or similar.
    •  Capacity expansion – what to do when you need more capacity
    •  System status – getting alerts when machines die
    •  Replication – configuration
    •  Finding recently-indexed data – everyone wants “real time”
         − Often not as important as people think, but...
    •  Inappropriate configuration
         − Trying for “real time” by replicating every 5 seconds
         − Committing every document/second/packet
         − Mismatched schema or config files on masters and slaves




                                                                              13
Common Pain Points (Why you may care)

!   Maintaining different configuration files (and coordinating them) for
    masters and slaves
! SolrCloud addresses most of these.
! SolrCloud is currently “a work in progress”




                                                                            14
Typical sharding setup
                                         Indexing	
  
!   Multiple Indexers
!   Query Slaves
     •    1 or more per indexer
!   Yes, you can shard & distribute




                                      Load	
  Balancer
                                                     	
  

                                         Searching	
  
Steps to set this up

!   Figure out how many shards required
!   Configure all masters, which may be complex
    •  Point your indexing at the appropriate master
!   Configure all slaves
    •  Configure distributed searching
    •  Make sure the slaves point at the correct master
    •  Find out where you mis-configured something, e.g. “I’m getting duplicate
       documents”.. Because you indexed the same doc to two shards?
    •  Deal with your manager wanting to know why the doc she just indexed
       isn’t showing up in the search (replication delay)
    •  Rinse, Repeat…




                                                                                  16
How is this different with SolrCloud?

!   Decide how many shards you need
!   Ask the ops folks how many machines you can have
!   Start your servers:
   •  On the Zookeeper machine (s): java -Dbootstrap_confdir=./solr/conf -
      DzkRun –DnumShards=### -jar start.jar
   •  On all the other machines: java –DzkHost=<ZookeeperMachine:port>
      [,<ZookeeperMachine:port>…] -jar start.jar
!   Index any way you want
   •  To any machine you want, perhaps in parallel
!   Send search to any machine you want
!   Note: Demo uses embedded Zookeeper
   •  Most production installations will probably use “ensembles”



                                                                             17
Diving a little deeper (indexing)




                                    18
Diving a little deeper (indexing)

!   How are shard machines assigned?
   •  It’s magic, ask Mark.
   •  As each machine is started, it’s assigned shard N+1 until numShards is
      reached
   •  The information is recorded in Zookeeeper where it’s available to all
!   How are leaders elected?
   •  Initially, on a first-come-first-served basis, so at initial setup each shard
      machine will be a leader (numShards == num available machines)
!   How are replicas assigned?
   •  See above (magic), but conceptually it’s on a “round robin” basis
   •  As each machine is started for the first time, it’s assigned to the shard
      with the fewest replicas (tie-breaking on lowest shard ID)




                                                                                      19
Assigning machines
                                  ZK
                                 Host(s
                                   )



          Leader
          shard1




-DnumShards=3
-Dbootstrap_confdir=./solr/conf
-DzkHost=<host>:<port>[,<host>:<port>]


                                          20
Assigning machines
                                    ZK
                                   Host(s
                                     )



          Leader          Leader
          shard1          shard2




-DzkHost=<host>:<port>[,<host>:<port>]


                                            21
Assigning machines
                                     ZK
                                    Host(s
                                      )



          Leader           Leader            Leader
          shard1           shard2            shard3




-DzkHost=<host>:<port>[,<host>:<port>]
At this point you can index and search, you have one machine/shard
                                                                     22
Assigning machines
                                    ZK
                                   Host(s
                                     )



          Leader          Leader            Leader
          shard1          shard2            shard3




          Replica
          shard1




-DzkHost=<host>:<port>[,<host>:<port>]


                                                     23
Assigning machines
                                     ZK
                                    Host(s
                                      )



          Leader          Leader             Leader
          shard1          shard2             shard3




          Replica         Replica
          shard1          shard2




-DzkHost=<host>:<port>[,<host>:<port>]


                                                      24
Assigning machines
                                     ZK
                                    Host(s
                                      )



          Leader          Leader             Leader
          shard1          shard2             shard3




          Replica         Replica            Replica
          shard1          shard2             shard3




-DzkHost=<host>:<port>[,<host>:<port>]


                                                       25
Diving a little deeper (indexing)

!   Let’s break this up a bit
!   There really aren’t any masters/slaves in SolrCloud
    •  “Leaders” and “replicas”. Leaders are automatically elected
          − Leaders are just a replica with some coordination responsibilities for
            the associated replicas
    •  If a leader goes down, one of the associated replicas is elected as the
       new leader
    •  You don’t have to do anything for this to work
!   When you send a document to a machine for indexing the code
    (DistributedUpdateProcessor) does several things:
    •  If I’m a replica, forward the request to my leader
    •  If I’m a leader
          − determine which shard each document should go to and forwards
             the doc (in batches of 10 presently) to that leader
          − Indexes any documents for this shard to itself and replicas

                                                                                     26
Diving a little deeper (indexing)

!   When new machines are added and get assigned to a shard
     •  Probably an old-style replication will occur initially, it’s most efficient for
        bulk updates
          − This doesn’t require user intervention
     •  Any differences between the replication and the current state of the
        leader will be replayed from the transaction log until the new machine’s
        index is identical to the leader
     •  When this is complete, search requests are forwarded to the new
        machine




                                                                                          27
Diving a little deeper (indexing)

!   Transaction log, huh?
!   A record of updates is kept in the “transaction log”. This allows for
    more robust indexing
    •  Any time the indexing process in interrupted, any uncommitted updates
       can be replayed from the transaction log
!   Synchronizing replicas has some heuristics applied.
    •  If there are “a lot” of updates (currently 100) to be synchronized, then an
       old-style replication is triggered
    •  Otherwise, the transaction log is “replayed” to synchronize the replica




                                                                                     28
Diving a little deeper (indexing)

!   “Soft commits”, huh?
!   Solr 4.0 introduces the idea of “soft commits” to handle “near real
    time” searching
    •  Historically, Solr required a “commit” to close segments. At that point:
        − New searchers were opened so those documents could be seen
        − Slaves couldn’t search new documents until after replication
!   Think of soft commits as adding documents to an in-memory,
    writeable segment
    •  On a hard commit, the currently-open segment is closed and the in-
       memory structures are reset
!   Soft commits can happen as often as every second
!   Soft commits (and NRT) are used by SolrCloud, but can be used
    outside of the SolrCloud framework


                                                                                  29
Diving a little deeper (searching) and all the
                     rest




                                             30
Diving a little deeper (searching)

!   Searching “just happens”
    •  There’s no distinction between masters and slaves, so any request can
       be sent to any machine in the cluster
!   Searching is NRT. Since replication isn’t as significant now, this is
    automatic
    •  There is a small delay while the documents are forwarded to all the
       replicas
!   Shard information does not need to be configured in Solr
    configuration files




                                                                               31
Diving a little deeper (the rest)

!       Capacity expansion
!       System status
!       Replication
!       NRT
!       Zookeeper




                                    32
Capacity expansion

!   Whew! Let’s say that you have your system running just fine, and
    you discover that you are running close to the edge of your capacity.
    What do you need to do to expand capacity?
    •    Install Solr on N more machines
    •    Start them up with the –DzkHost parameter
    •    Register them with your fronting load balancer
    •    Sit back and watch the magic
!   Well, what about reducing capacity
    •  Shut the machines down




                                                                            33
System Status

!   There is a new Admin UI that graphically shows the state of your
    cluster, especially active machines
!   But overall, sending alerts etc. isn’t in place today, although it’s
    under discussion




                                                                           34
Replication

!   But we’ve spent a long time understanding replication!
!   Well, it’s largely irrelevant now. When using SolrCloud, replication is
    automatically handled
    •  This includes machines being temporarily down. When they come back
       up, SolrCloud re-synchronizes them with the master and forwards
       queries to them after they are synchronized
    •  This includes temporary glitches (say your network burps)




                                                                              35
Finding Recently-indexed Docs (NRT)

!   NRT has been a long time coming, but it’s here
!   Near Real Time because there are still slight delays from 2 sources
    •  Until a “soft commit” happens, which can be every second
    •  Some propagation delay while incoming index requests are:
        − Perhaps forwarded to the shard leader
        − Forwarded to the proper shard
        − Forwarded to the replicas from the shard leader
    •  But these delays probably won’t be noticed




                                                                          36
Zookeeper

!   ZooKeeper is “a centralized service for maintaining configuration
    information, naming, providing distributed synchronization, and
    providing group services.”
!   A lot of complexity for maintaining Solr installations is solved with
    Zookeeper
!   Zookeeper is the repository for cluster state information
!   See: http://zookeeper.apache.org/




                                                                            37
Using Zookeeper with SolrCloud

!   The –DzkRun flag (in the demo) causes an embedded Zookeeper
    server to run in that server
   •  Simple to use in the tutorials, but probably not the right option for
      production
   •  An enterprise installation will probably run Zookeeper as an “ensemble”,
      external to Solr servers
!   Zookeeper works on a quorum model where N/2+1 Zookeepers
    must be running
   •  It’s best to run an odd number of them (and three or more!) to avoid
      Zookeeper being a single point of failure
!   Yes, setting up Zookeeper and making SolrCloud aware of them is
    an added bit of complexity, but TANSTAAFL (more age/geek points if
    you know where that comes from)


                                                                                 38
Gotchas

!   This is new and changing
         •  Optimistic locking not fully in place yet
         •  At least one machine/shard must be running.
!       _version_ is a magic field, don’t change it
!       It’s a whole new world, some of your infrastructure is obsolete
!       We’re on the front end of the learning curve
!       Some indexing speed penalty
!       This is trunk, index formats may change etc.




                                                                          39
Useful URLs

!   The Solr Wiki: http://wiki.apache.org/solr/
!   Source code, builds, etc:
    http://wiki.apache.org/solr/HowToContribute
!   Main Solr/Lucene website: http://wiki.apache.org/solr/
!   Really good blogs:
    •  Simon Willnauer: http://www.searchworkings.org/blog/-/blogs/
    •  Mike McCandless: http://blog.mikemccandless.com/
    •  Lucid Imagination: http://www.lucidimagination.com/blog/
!   Lucene Spatial Playground/Spatial4J:
    http://code.google.com/p/lucene-spatial-playground/




                                                                      40
More useful URLs

!   DocumentWriterPerThread (DWPT) writeup (Simon Willnauer):
    http://www.searchworkings.org/blog/-/blogs/gimme-all-resources-
    you-have-i-can-use-them!/
!   FST and fuzzy query 100X faster:
    http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-
    times-faster.html
!   Solr Cloud: http://wiki.apache.org/solr/SolrCloud
    •  NOT Solr-in-the-cloud
!   Lucene JIRA: https://issues.apache.org/jira/browse/lucene
!   Solr JIRAs: https://issues.apache.org/jira/browse/SOLR




                                                                        41
Even more useful URLs

!   Yonik Seeley presentations:
    http://people.apache.org/~yonik/presentations/
     •  See particularly the LuceneRevolution2011 presentation, re: pivot
        faceting.
!   Grant Ingersoll’s memory estimator prototype (trunk)
http://www.lucidimagination.com/blog/2011/09/14/estimating-memory-
and-storage-for-lucenesolr/
!   Memory improvements:
http://www.lucidimagination.com/blog/2012/04/06/memory-
comparisons-between-solr-3x-and-trunk/
!   Zookeeper http://zookeeper.apache.org/




                                                                            42
Thank You, Questions?   Erick Erickson
                        Erick.Erickson@lucidim
                        agination.com

More Related Content

What's hot

Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr CloudCominvent AS
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsAnshum Gupta
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyCominvent AS
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to YouAmazon Web Services
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.gutierrezga00
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
 

What's hot (20)

Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
What's New on AWS and What it Means to You
What's New on AWS and What it Means to YouWhat's New on AWS and What it Means to You
What's New on AWS and What it Means to You
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 

Viewers also liked

Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update ChainCominvent AS
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchCloudera, Inc.
 

Viewers also liked (10)

Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
SolrCloud and Shard Splitting
SolrCloud and Shard SplittingSolrCloud and Shard Splitting
SolrCloud and Shard Splitting
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Improving the Solr Update Chain
Improving the Solr Update ChainImproving the Solr Update Chain
Improving the Solr Update Chain
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data Search
 

Similar to How SolrCloud Changes the User Experience In a Sharded Environment

Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...ScyllaDB
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systexJames Chen
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Boulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckBoulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckWill Sterling
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsMichael Zhang
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynsteelucenerevolution
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Spark Summit
 
The age of orchestration: from Docker basics to cluster management
The age of orchestration: from Docker basics to cluster managementThe age of orchestration: from Docker basics to cluster management
The age of orchestration: from Docker basics to cluster managementNicola Paolucci
 
You Can't Correlate what you don't have - ArcSight Protect 2011
You Can't Correlate what you don't have - ArcSight Protect 2011You Can't Correlate what you don't have - ArcSight Protect 2011
You Can't Correlate what you don't have - ArcSight Protect 2011Scott Carlson
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Olalekan Fuad Elesin
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Mark Kerzner
 
Webinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsWebinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsLucidworks
 
Node.js und die Oracle-Datenbank
Node.js und die Oracle-DatenbankNode.js und die Oracle-Datenbank
Node.js und die Oracle-DatenbankCarsten Czarski
 

Similar to How SolrCloud Changes the User Experience In a Sharded Environment (20)

Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Boulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeckBoulder dev ops-meetup-11-2012-rundeck
Boulder dev ops-meetup-11-2012-rundeck
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
 
mtl_rubykaigi
mtl_rubykaigimtl_rubykaigi
mtl_rubykaigi
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
The age of orchestration: from Docker basics to cluster management
The age of orchestration: from Docker basics to cluster managementThe age of orchestration: from Docker basics to cluster management
The age of orchestration: from Docker basics to cluster management
 
You Can't Correlate what you don't have - ArcSight Protect 2011
You Can't Correlate what you don't have - ArcSight Protect 2011You Can't Correlate what you don't have - ArcSight Protect 2011
You Can't Correlate what you don't have - ArcSight Protect 2011
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2
 
Racing with Droids
Racing with DroidsRacing with Droids
Racing with Droids
 
Webinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsWebinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data Analytics
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Node.js und die Oracle-Datenbank
Node.js und die Oracle-DatenbankNode.js und die Oracle-Datenbank
Node.js und die Oracle-Datenbank
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

How SolrCloud Changes the User Experience In a Sharded Environment

  • 1. How SolrCloud Changes the Erick Erickson, Lucid User Experience In a Imagination Sharded Environment Lucene Revolution, 9-May-2012
  • 2. Who am I? !   “Erick is just some guy, you know” •  Your geekiness score is increased if you know where that quote comes from, and your age is hinted at !   30+ years in the programming business, mostly as a developer !   Currently employed by Lucid Imagination in Professional Services •  I get to see how various organizations interpret “search” and I’m amazed at the different problems Solr is used to solve !   Solr/Lucene committer ! ErickErickson@lucidimagination.com !   Sailor, anybody need crew for sailboat delivery? 2
  • 3. What we’ll cover !   Briefly, what else is coming in 4.0 ! SolrCloud (NOT Solr-in-the-cloud), upcoming in 4.0 •  What it is •  Why you may care !   Needs SolrCloud addresses •  DR/HA •  Distributed indexing •  Distributed searching !   I’m assuming basic familiarity with Solr 3
  • 4. I’m not the implementer, Mark is !   Well, Mark Miller and others !   Mark’s talk (tomorrow) is a deeper technical dive, I recommend it highly •  Anything I say that contradicts anything Mark says, believe Mark − After all, he wrote much of the code !   Mark insisted on the second slide after this one 4
  • 5. 5
  • 6. 6
  • 7. When and Where can we get 4.0? !   When will it be released? Hopefully 2012 •  Open Source; have you ever tried herding cats? •  Alpha/Beta planned, this is unusual •  3.6 probably last 3x release !   How usable are nightly builds? •  LucidWorks Enterprise runs on trunk, so trunk is quite stable and in production !   There’s lots of new stuff! •  “unstable” doesn’t really mean unstable code − Changing APIs, index format may change !   Nightly builds: https://builds.apache.org//view/S-Z/view/Solr/ !   Source code and build instructions: http://wiki.apache.org/solr/ HowToContribute 7
  • 8. Cool stuff in addition to SolrCloud in 4.0 8
  • 9. Other cool 4.0 (trunk) features !   Similarity calculations decoupled from Lucene. !   Scoring is pluggable !   There are several different OOB implementations now (e.g. BM25) !   FST (Finite State Automata/Transducer) based work. Speed and size improvements http://www.slideshare.net/otisg/finite-state-queries-in-lucene !   FST for fuzzy queries, 100x faster (McCandless’ blog) !   You can plug in your own index codec. See pulsing and SimpleTextCodec. This is really your own index format •  Can be done on a per field basis •  Text output as an example !   Much more efficient in-memory structures !   NRT (Near Real Time) searching and “soft commits” !   Spatial (LSP) rather than spatial contrib 9
  • 10. More cool new features !   Adding PivotFacetComponent for Hierarchical faceting. See Yonik's presentation, “useful URLs” section !   Pseudo-join queries – See Yonik’s presentation URL in “useful URLs” section !   New Admin UI !   Can’t over-emphasize the importance of CHANGES.txt •  Solr •  Lucene •  Please read them when upgrading. Really 10
  • 12. What is SolrCloud ! SolrCloud is a set of new distributed capabilities in Solr that: •  Automatically distributes updates (i.e. indexes documents) to the appropriate shard •  Uses transaction logs for robust update recovery •  Automatically distributes searches in a sharded environment •  Automatically assigns replicas to shards when available •  Supports Near Real Time searching (NRT) •  Uses Zookeeper as a repository for cluster state 12
  • 13. Common pain points (why you may care) !   Every large organization seems to have a recurring set of issues: •  Sharding – have to do it yourself, usually through SolrJ or similar. •  Capacity expansion – what to do when you need more capacity •  System status – getting alerts when machines die •  Replication – configuration •  Finding recently-indexed data – everyone wants “real time” − Often not as important as people think, but... •  Inappropriate configuration − Trying for “real time” by replicating every 5 seconds − Committing every document/second/packet − Mismatched schema or config files on masters and slaves 13
  • 14. Common Pain Points (Why you may care) !   Maintaining different configuration files (and coordinating them) for masters and slaves ! SolrCloud addresses most of these. ! SolrCloud is currently “a work in progress” 14
  • 15. Typical sharding setup Indexing   ! Multiple Indexers ! Query Slaves •  1 or more per indexer ! Yes, you can shard & distribute Load  Balancer   Searching  
  • 16. Steps to set this up !   Figure out how many shards required !   Configure all masters, which may be complex •  Point your indexing at the appropriate master !   Configure all slaves •  Configure distributed searching •  Make sure the slaves point at the correct master •  Find out where you mis-configured something, e.g. “I’m getting duplicate documents”.. Because you indexed the same doc to two shards? •  Deal with your manager wanting to know why the doc she just indexed isn’t showing up in the search (replication delay) •  Rinse, Repeat… 16
  • 17. How is this different with SolrCloud? !   Decide how many shards you need !   Ask the ops folks how many machines you can have !   Start your servers: •  On the Zookeeper machine (s): java -Dbootstrap_confdir=./solr/conf - DzkRun –DnumShards=### -jar start.jar •  On all the other machines: java –DzkHost=<ZookeeperMachine:port> [,<ZookeeperMachine:port>…] -jar start.jar !   Index any way you want •  To any machine you want, perhaps in parallel !   Send search to any machine you want !   Note: Demo uses embedded Zookeeper •  Most production installations will probably use “ensembles” 17
  • 18. Diving a little deeper (indexing) 18
  • 19. Diving a little deeper (indexing) !   How are shard machines assigned? •  It’s magic, ask Mark. •  As each machine is started, it’s assigned shard N+1 until numShards is reached •  The information is recorded in Zookeeeper where it’s available to all !   How are leaders elected? •  Initially, on a first-come-first-served basis, so at initial setup each shard machine will be a leader (numShards == num available machines) !   How are replicas assigned? •  See above (magic), but conceptually it’s on a “round robin” basis •  As each machine is started for the first time, it’s assigned to the shard with the fewest replicas (tie-breaking on lowest shard ID) 19
  • 20. Assigning machines ZK Host(s ) Leader shard1 -DnumShards=3 -Dbootstrap_confdir=./solr/conf -DzkHost=<host>:<port>[,<host>:<port>] 20
  • 21. Assigning machines ZK Host(s ) Leader Leader shard1 shard2 -DzkHost=<host>:<port>[,<host>:<port>] 21
  • 22. Assigning machines ZK Host(s ) Leader Leader Leader shard1 shard2 shard3 -DzkHost=<host>:<port>[,<host>:<port>] At this point you can index and search, you have one machine/shard 22
  • 23. Assigning machines ZK Host(s ) Leader Leader Leader shard1 shard2 shard3 Replica shard1 -DzkHost=<host>:<port>[,<host>:<port>] 23
  • 24. Assigning machines ZK Host(s ) Leader Leader Leader shard1 shard2 shard3 Replica Replica shard1 shard2 -DzkHost=<host>:<port>[,<host>:<port>] 24
  • 25. Assigning machines ZK Host(s ) Leader Leader Leader shard1 shard2 shard3 Replica Replica Replica shard1 shard2 shard3 -DzkHost=<host>:<port>[,<host>:<port>] 25
  • 26. Diving a little deeper (indexing) !   Let’s break this up a bit !   There really aren’t any masters/slaves in SolrCloud •  “Leaders” and “replicas”. Leaders are automatically elected − Leaders are just a replica with some coordination responsibilities for the associated replicas •  If a leader goes down, one of the associated replicas is elected as the new leader •  You don’t have to do anything for this to work !   When you send a document to a machine for indexing the code (DistributedUpdateProcessor) does several things: •  If I’m a replica, forward the request to my leader •  If I’m a leader − determine which shard each document should go to and forwards the doc (in batches of 10 presently) to that leader − Indexes any documents for this shard to itself and replicas 26
  • 27. Diving a little deeper (indexing) !   When new machines are added and get assigned to a shard •  Probably an old-style replication will occur initially, it’s most efficient for bulk updates − This doesn’t require user intervention •  Any differences between the replication and the current state of the leader will be replayed from the transaction log until the new machine’s index is identical to the leader •  When this is complete, search requests are forwarded to the new machine 27
  • 28. Diving a little deeper (indexing) !   Transaction log, huh? !   A record of updates is kept in the “transaction log”. This allows for more robust indexing •  Any time the indexing process in interrupted, any uncommitted updates can be replayed from the transaction log !   Synchronizing replicas has some heuristics applied. •  If there are “a lot” of updates (currently 100) to be synchronized, then an old-style replication is triggered •  Otherwise, the transaction log is “replayed” to synchronize the replica 28
  • 29. Diving a little deeper (indexing) !   “Soft commits”, huh? !   Solr 4.0 introduces the idea of “soft commits” to handle “near real time” searching •  Historically, Solr required a “commit” to close segments. At that point: − New searchers were opened so those documents could be seen − Slaves couldn’t search new documents until after replication !   Think of soft commits as adding documents to an in-memory, writeable segment •  On a hard commit, the currently-open segment is closed and the in- memory structures are reset !   Soft commits can happen as often as every second !   Soft commits (and NRT) are used by SolrCloud, but can be used outside of the SolrCloud framework 29
  • 30. Diving a little deeper (searching) and all the rest 30
  • 31. Diving a little deeper (searching) !   Searching “just happens” •  There’s no distinction between masters and slaves, so any request can be sent to any machine in the cluster !   Searching is NRT. Since replication isn’t as significant now, this is automatic •  There is a small delay while the documents are forwarded to all the replicas !   Shard information does not need to be configured in Solr configuration files 31
  • 32. Diving a little deeper (the rest) !   Capacity expansion !   System status !   Replication !   NRT !   Zookeeper 32
  • 33. Capacity expansion !   Whew! Let’s say that you have your system running just fine, and you discover that you are running close to the edge of your capacity. What do you need to do to expand capacity? •  Install Solr on N more machines •  Start them up with the –DzkHost parameter •  Register them with your fronting load balancer •  Sit back and watch the magic !   Well, what about reducing capacity •  Shut the machines down 33
  • 34. System Status !   There is a new Admin UI that graphically shows the state of your cluster, especially active machines !   But overall, sending alerts etc. isn’t in place today, although it’s under discussion 34
  • 35. Replication !   But we’ve spent a long time understanding replication! !   Well, it’s largely irrelevant now. When using SolrCloud, replication is automatically handled •  This includes machines being temporarily down. When they come back up, SolrCloud re-synchronizes them with the master and forwards queries to them after they are synchronized •  This includes temporary glitches (say your network burps) 35
  • 36. Finding Recently-indexed Docs (NRT) !   NRT has been a long time coming, but it’s here !   Near Real Time because there are still slight delays from 2 sources •  Until a “soft commit” happens, which can be every second •  Some propagation delay while incoming index requests are: − Perhaps forwarded to the shard leader − Forwarded to the proper shard − Forwarded to the replicas from the shard leader •  But these delays probably won’t be noticed 36
  • 37. Zookeeper ! ZooKeeper is “a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.” !   A lot of complexity for maintaining Solr installations is solved with Zookeeper !   Zookeeper is the repository for cluster state information !   See: http://zookeeper.apache.org/ 37
  • 38. Using Zookeeper with SolrCloud !   The –DzkRun flag (in the demo) causes an embedded Zookeeper server to run in that server •  Simple to use in the tutorials, but probably not the right option for production •  An enterprise installation will probably run Zookeeper as an “ensemble”, external to Solr servers !   Zookeeper works on a quorum model where N/2+1 Zookeepers must be running •  It’s best to run an odd number of them (and three or more!) to avoid Zookeeper being a single point of failure !   Yes, setting up Zookeeper and making SolrCloud aware of them is an added bit of complexity, but TANSTAAFL (more age/geek points if you know where that comes from) 38
  • 39. Gotchas !   This is new and changing •  Optimistic locking not fully in place yet •  At least one machine/shard must be running. !   _version_ is a magic field, don’t change it !   It’s a whole new world, some of your infrastructure is obsolete !   We’re on the front end of the learning curve !   Some indexing speed penalty !   This is trunk, index formats may change etc. 39
  • 40. Useful URLs !   The Solr Wiki: http://wiki.apache.org/solr/ !   Source code, builds, etc: http://wiki.apache.org/solr/HowToContribute !   Main Solr/Lucene website: http://wiki.apache.org/solr/ !   Really good blogs: •  Simon Willnauer: http://www.searchworkings.org/blog/-/blogs/ •  Mike McCandless: http://blog.mikemccandless.com/ •  Lucid Imagination: http://www.lucidimagination.com/blog/ !   Lucene Spatial Playground/Spatial4J: http://code.google.com/p/lucene-spatial-playground/ 40
  • 41. More useful URLs ! DocumentWriterPerThread (DWPT) writeup (Simon Willnauer): http://www.searchworkings.org/blog/-/blogs/gimme-all-resources- you-have-i-can-use-them!/ !   FST and fuzzy query 100X faster: http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100- times-faster.html !   Solr Cloud: http://wiki.apache.org/solr/SolrCloud •  NOT Solr-in-the-cloud !   Lucene JIRA: https://issues.apache.org/jira/browse/lucene !   Solr JIRAs: https://issues.apache.org/jira/browse/SOLR 41
  • 42. Even more useful URLs ! Yonik Seeley presentations: http://people.apache.org/~yonik/presentations/ •  See particularly the LuceneRevolution2011 presentation, re: pivot faceting. !   Grant Ingersoll’s memory estimator prototype (trunk) http://www.lucidimagination.com/blog/2011/09/14/estimating-memory- and-storage-for-lucenesolr/ !   Memory improvements: http://www.lucidimagination.com/blog/2012/04/06/memory- comparisons-between-solr-3x-and-trunk/ !   Zookeeper http://zookeeper.apache.org/ 42
  • 43. Thank You, Questions? Erick Erickson Erick.Erickson@lucidim agination.com