Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

Presented at Lucene/Solr Revolution 2017

  • Soyez le premier à commenter

Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks

  1. 1. Lifecycle of a Solr Search Request Chris "Hoss" Hostetter - 2017-09-14 https://home.apache.org/~hossman/rev2017/ https://twitter.com/_hossman https://www.lucidworks.com/ Abstract: This intermediate session for existing Solr users will provide a Deep Dive look into the lifecycle of a Solr Search Request. We will drill down through each layer of code, discussing what happens at each stage -- including when & how inter-node communication takes place in a multi-node SolrCloud cluster. Along the way, we will also review the various places where users can configure existing (or custom written) plugins to override or amend the default behavior. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 1 of 24 10/4/17, 4:32 PM
  2. 2. Agenda Deep Dive look into the lifecycle of 4 Solr Search Requests... Single Node: Single SolrCore Simple Query1. Facet Query2. SolrCloud: 2 Shards + 2 Replicas Simple Query3. Facet Query4. ...and where various types of Plugins can be used. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 2 of 24 10/4/17, 4:32 PM
  3. 3. Simple Query Single Node: Single SolrCore bin/solr -e techproducts http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & rows = 10 This sample paginated query is based off of the techproducts example configs & data that have been included in ever release of Solr since it was first open sourced. I have a nostalgic affection for this silly little dataset. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 3 of 24 10/4/17, 4:32 PM
  4. 4. HTTP (Jetty) SolrDispatchFilter Solr Webapp/solr ➔ CoreContainer /techproducts ➔ SolrCore /select? ➔ RequestHandler SolrCore foo SolrCore etc... wt=json ➔ ResponseWriter ...:8983/solr/techproducts/select?... UI:HTML,Javascript, Images,CSS SolrCore techproducts Purple: The HTTP layer, currently implemented by Jetty Blue: Solr runs as "webapp" inside the Jetty Servlet container (but that's just an implementation detail) Black: The key pieces of the Solr webapp: misc "flat files" that power the Solr UI, and the SolrDispatchFilter which is responsible for mapping all HTTP request/responses into their internal Solr representations and executing them Red: CoreContainer is singleton responsible to managing the lifecycle of SolrCores Green: each SolrCore encapsulates the configs & data for a single "index" (which in a SolrCloud configuration would be a replica of some shard or some collection) Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 4 of 24 10/4/17, 4:32 PM
  5. 5. SolrCore: techproducts SolrRequestHandlers SearchComponents QueryComponent: query - prepare() - df=text&q=ipod ➔ Query - etc... - process() - etc... SearchHandler: /select - initParams - df = text (default) - components (implicit) - query - etc... SearchHandler: /etc... UpdateRequestHandler : /etc... FacetComponent: facet etc... Green: The SolrCore used for this (HTTP) request Black: Named instances of (plugable) SolrRequestHandlers. SearchHandler is the most common, and it uses a configurable list of SearchComponents Red: Named instances of (plugable) SearchComponents, QueryComponent is the only one used in this simple request All SearchComponents implement prepare() & process() methods, which are called by SearchHandler Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 5 of 24 10/4/17, 4:32 PM
  6. 6. SolrIndexSearcher query IndexSchema - SchemaFields ➔ FieldTypes QueryComponent.prepare() + rows=10 ➔ ok? fl=id,name ➔ ok? / q ➔ LuceneQParser LuceneQParser + (df=text ➔ text) + "ipod" ➔ TermQuery ( "inStock desc" ➔ bool ➔ BoolField.getSortField(inStock,desc) + "score desc" ➔ SortField.SCORE ) ➔ Sort TextField: text - Analyzer - Similarity - etc... TextField: etc.. - Analyzer - Similarity - etc... BoolField: bool - Analyzer - Similarity - getSortField - etc... LuceneQParser DismaxQParser etc... Red: QueryComponent.prepare() and it's basic logic for validating & parsing the basic request params Green: Named instances of (pluggable) QParserPlugins for parsing query strings (q & fq params). Here the (implicit) default LuceneQParser Orange: The IndexSchema which contains... Named SchemaFields (or dynamicFields) which map to... Purple: Named instances of (pluggable) FieldTypes which dictate how the field names mapped to them are parsed, indexed, sorted, queried, etc... Blue: The SolrIndexSearcher is ultimately what will be queried with these parsed queries & sort objects Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 6 of 24 10/4/17, 4:32 PM
  7. 7. SolrIndexSearcher.search(...) window(start, rows, windowSize) (queryResultCache? | Index) ➔ DocList queryQueryComponent.process() search(Query,filters[],start,rows,Sort,...) ➔ DocList JsonResponseWriter DocList { + searcher.doc(#) ➔ Stored Fields } ➔ Bytes ➔ HTTP... documentCache queryResultCache filterCache IndexReader - InvertedIndex - Stored Fields XmlResponseWriter etc... Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 7 of 24 10/4/17, 4:32 PM
  8. 8. Red: QueryComponent.process() which uses the SolrIndexSearcher to execute the Query created by it's prepare() method Blue: the SolrIndexSearcher includes several caches in addition to the InvertedIndex, and when executing a query, first evaluates the start/rows requested to fit a configured "window size" so that "page #2" type requests can result in a cache hit & re-use the results computed for "page #1" Orange: The low level InvertedIndex & The queryResultCache that can be used in it's place when executing basic searchers & the DocList containing a sorted list of (internal) doc#s and their scores for the requested start+rows of this query Purple: The Stored Fields of the documents in the index & the documentCache used by SolrIndexSearcher to reduce disk reads when popular documents are frequently matched by searches Green: Named instances of (pluggable) QueryResponseWriters which dictate how the data structures produced once a request is processed get serialized into bytes (for the HTTP response returned to the original client by Jetty) Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 8 of 24 10/4/17, 4:32 PM
  9. 9. More Complex Query Single Node: Single SolrCore http://localhost:8983/solr/techproducts/select ? q = ipod & fq = price:[* TO 1000] & sort = div(popularity,price) asc, score desc & fl = id, name, why:[explain style=nl] & facet = true & facet.field = cat This slightly more interesting query builds off the previous example by: Adding a "filter query" on the (numeric) price field Changing the primary sort criteria to be a mathematical function against 2 fields Requesting an additional psuedo-field explaining the score of each document Faceting on the "cat" (aka: category) field Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 9 of 24 10/4/17, 4:32 PM
  10. 10. HTTP (Jetty) SolrDispatchFilter Solr Webapp/solr ➔ CoreContainer /techproducts ➔ SolrCore /select? ➔ RequestHandler SolrCore foo SolrCore etc... wt=json ➔ ResponseWriter ...:8983/solr/techproducts/select?... UI:HTML,Javascript, Images,CSS SolrCore techproducts The HTTP, Webapp, DispatchFilter, CoreContainer, SolrCore, and RequestHandler layers all function exactly as in our previous (simpler) example. It's only once the SearchHandler starts looping over the components that things get more interesting.... Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 10 of 24 10/4/17, 4:32 PM
  11. 11. query IndexSchema - SchemaFields ➔ FieldTypes QueryComponent.prepare() etc... "price:[* TO 1000]" ➔ float ➔ PointRangeQuery(...) ➔ filters[] div(popularity,price) ➔ ValueSource(IntFieldSource,...) FloatPointField: float - ValueSource - getRangeQuery() - etc... IntPointField: int - ValueSource - etc... FacetComponent.prepare() facet=true ✔ facet.field=cat ➔ ok? needDocSet = true SolrIndexSearcher div() sum() etc... Most items identical to those shown in the "simple" query are omitted for brevity. Of the new items shown here... Red: In addition to some additional logic in QueryComponent.prepare() method (to parse the filter query and more complex sort) we know also see the FacetComponent.prepare() method, which does it's own validation & sets a flag indicating that it needs extra info (the DocSet) once SolrIndexSearcher is asked to execute the Query Green: Named instances of (pluggable) ValueSourceParsers for parsing function strings -- used here in our sort, but could also be used in queries Orange: As before the IndexSchema, now showing that FieldTypes are also responsible for providing the range query (filter) and ValueSources (used by the functions) Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 11 of 24 10/4/17, 4:32 PM
  12. 12. SolrIndexSearcher queryQueryComponent.process() search(...) ➔〈DocList,DocSet〉 etc... JsonResponseWriter DocList { + searcher.doc(#) ➔ Stored Fields + [explain ...] } + Facet Counts ➔ Bytes ➔ HTTP... ExplainAugmenter ChildDocTransformer queryFacetComponent.process() For Each "cat" Index Terms: ➔ Intersect with DocSet SubQueryAugmenter etc... searcher.explain(#) documentCache queryResultCache filterCache IndexReader - InvertedIndex - Stored Fields Most items identical to those shown in the "simple" query are omitted for brevity. Of the new items shown here... Red: Now when QueryComponent.process() executes the search, the "needsDocSet" flag set by FacetComponent.prepare() is also used. FacetComponent.process() can then use the resulting DocSet (an unordered set of all matching doc# -- regardless of sort) to compute the facet counts. Olive: Named instances of (pluggable) DocTransformers (or Augmenters) which can be used to annotate individual documents returned in the results. For this query in particular we see the ExplainAugmenter which uses the SolrIndexSearcher to get a (debugging) data structure "explaining" how the score of each document was computed. Green: the JsonResponseWriter not only returns the Stored Fields of each document, but also the results of any DocTransformers. It also serializes the Facet Counts. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 12 of 24 10/4/17, 4:32 PM
  13. 13. Simple Query SolrCloud: 4 Nodes, 2 Shards, 2 Replicas bin/solr -e cloud ... http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & rows = 10 This is the same as or original simple query, still using the techproducts sample configs & data, but from here on we'll assume we're using a 4 node SolrCloud cluster, with the techproducts collection configured to have 2 shards, with a replication factor of 2. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 13 of 24 10/4/17, 4:32 PM
  14. 14. SolrDispatchFilter /techproducts ➔ tech_s1_r2 Jetty: http://host1:8983 SolrDispatchFilter /techproducts ?➔ host4 Jetty: http://host3:8983 SolrDispatchFilter /techproducts ?➔ tech_s2_r2 Jetty: http://host2:8983 SolrDispatchFilter /techproducts ➔ tech_s2_r1 Jetty: http://host4:8983 techproducts tech_s1_r2 foo foo_s1_r1 foo foo_s2_r1 techproducts tech_s1_r1 techproducts tech_s2_r1 foo foo_s1_r2 techproducts tech_s2_r2 foo foo_s2_r2 Purple: 4 Jetty instances, running on (the same port 8983 of) 4 different hosts Black: The 4 SolrDispatchFilters running inside each of these 4 Jetty instances, and how each of them resolves requests for the techproducts collection. Green the individual SolrCores (which are each a replica of some shard of a collection) running in each Solr node. Note that for the purposes of illustrating the diff possible ways a Solr request may be routed, host3 does not contain any SolrCores that are part of the techproducts collection. (Other Layers such as the Solr webapp and the CoreContainer have been omitted to save space) Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 14 of 24 10/4/17, 4:32 PM
  15. 15. coordinator shard1 QueryComponent: prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β1: ids=X,Y,Z&fl=name ➔ ... shard2 QueryComponent: prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β2: ids=A,..,G&fl=name ➔ ... SearchHandler: /select Repeat until done: query.distributedProcess ➔ ShardRequests (α,β) Loop: ShardRequests query.handleResponse QueryComponent: distributedProcess() α: shard top10 + sort values β: full fl for final top10 ids FacetComponent Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 15 of 24 10/4/17, 4:32 PM
  16. 16. Purple: The HTTP Layer showing 3 hosts: an arbitrary 'coordinator' node, and 2 nodes each hosting a replica of the 2 shards for the collection Black: SearchHandler. On the coordinator node, SearchHandler executes new logic to execute sub-requests created by it's SearchComponents to arbitrarily selected replicas of each shard. On the replicas handling these sub-requests, the SearchHandler processes these requests just as if they were simple (single node) queries. Red: SearchComponent methods. On the coordinator node SearchHandler loops over every component calling SearchComponent.distributedProcess() to create/modify sub-requests for the individual shards, and then calls SearchComponent.handleResponse() to merge the results from each shard and decide if/when/what additional information may be needed. This process repeats until all calls to distributedProcess() on all SearchComponents indicate that they are finished. Green & Blue: The 2 stages (α & β) of shard sub-requests needed to process this simple query. Note that the α-requests are identical for both shards, but the β-requests are slightly different to request the fl fields for the matches specific to that shard. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 16 of 24 10/4/17, 4:32 PM
  17. 17. Shard Request α q=ipod&fl=id&fsv=true&rows=10 sort=inStock desc, score desc numFound=42+314=356 Z, Zebra F, Frog B, Boat D, Deer C, Car X, X-Ray G, Gong A, Apple Y, Yo-Yo E, Ear Merged Shard 1 numFound=42 F〈true,6〉 B〈true,6〉 D〈true,5〉 C〈true,3〉 G〈true,2〉 A〈true,1〉 E〈false,5〉 Shard 2 numFound=314 Z〈true,6〉 X〈true,3〉 Y〈false,9〉 Shard Request β q=ipod&ids=...&fl=name Shard 1 A, Apple B, Boat C, Car D, Deer E, Ear F, Frog G, Gong Shard 2 X, X-Ray Y, Yo-Yo Z, Zebra Here we see hypothetical α request+responses, hypothetical β requests+responses, & the final Merged results from both -- showing how the IDs and sort values from the α request are used to determine which documents will be in the final results, and in which order. For these specific documents, the β requests+responses fill in the fl fields for the final client. Red & Blue: The responses from shard1 & shard2 for the α request Green & Purple: The responses from shard1 & shard2 for the β request Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 17 of 24 10/4/17, 4:32 PM
  18. 18. Complex Query* SolrCloud: 4 Node, 2 Shards, 2 Replicas http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & facet = true & facet.field = cat In the interest of time, this query is not as "Complex" as the "Complex" Single Core query we looked at before. I've omitted things like fq params, sorting on functions, and the use of DocTransformers in the fl because nothing about how those are handled in a Single Core query changes when they are requested by a coordinator node in a SolrCloud query. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 18 of 24 10/4/17, 4:32 PM
  19. 19. coordinator shard1 QueryComponent: prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β1: ids=X,Y,Z&fl=name ➔... FacetComponent: prepare() + process() α: facet.limit=N + extra ➔ top terms w/counts β1: ..._terms=aa,qq,... ➔... QueryComponent: distributedProcess() α: shard top10 + sort values β: full fl for final top10 ids shard2 FacetComponent: distributedProcess() α: facet.field=cat w/facet.limit overrequest β: request missing counts for final top terms SearchHandler: /select ➔ ShardRequests (α, β) Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 19 of 24 10/4/17, 4:32 PM
  20. 20. Purple: The HTTP Layer showing 3 hosts: an arbitrary 'coordinator' node, and 2 nodes each hosting a replica of the 2 shards for the collection. To save space, the (largely redundant) details of the requests to shard2 are not shown. Black: SearchHandler. To save space, the details (shown in previous diagrams) regarding how SearchHandler processes requests when acting as a coordinator have been omitted -- the key thing to note is that even with the added complexity of the FacetComponent, there are still only 2 stages of sub-requests to each shard (α & β) Red: SearchComponent methods: QueryComponent behaves exactly as before Now that FacetComponent is in use, it can modify the sub- requests created by QueryComponent to "piggy back" on them and request additional information from each shard. Green & Blue: The 2 stages (α & β) of shard sub-requests needed to process this query. Although the details of the requests to shard2 are omitted for brevity, the α-requests are identical for both shards, and (as before) the β-requests are slightly different to request both the the fl fields for the document matches specific to that shard, as well as the facet counts for any "candidate" terms that were not included in the α response from that shard. Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 20 of 24 10/4/17, 4:32 PM
  21. 21. Shard Request α facet.field=cat facet.limit=N+OVERREQUEST Shard Request β facet.field={!_terms=...}cat auto: 253 (3 + 250) lawn: 190 (20 + 170) ... DVD: 102 (5 + 97) Final (Merge α+β)Shard 1 games: 40 ... lawn: 20 books: 10 DVD: 5 ... beach: 4 toys: 3 Shard 2 auto: 250 lawn: 170 ... food: 100 DVD: 97 ... books: 90 clothing: 90 Shard 1 auto: 3 food: 0 Shard 2 games: 45 N auto: 250-253 (? + 250) lawn: 190 (20 + 170) ... games: 40-130 (40 + ?) food: 100-103 (? + 100) DVD: 102 (5 + 97) ... Merge α Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 21 of 24 10/4/17, 4:32 PM
  22. 22. Here we see the additional information involved in α & β requests+responses+merging for our more complex queries compared to what we looked at before. The information requested & merged by QueryComponent is omitted for brevity, and we focus solely on how FacetComponent modifies those requests to "overrequest" the original facet.limit and what it does with the results. In the α request, over-request additional terms from each shard beyond what the user asked for; In the β request, ask each shard for the details about any terms that are "candidates" for the final results but where NOT already returned by this shard in the α response. Each term that is a candidate for the final response is shown in a unique color. Black/Grey is used to indicate terms where incomplete information is available to the coordinator, but enough is known to be confident that they can't possibly be candidates for the final results. Faded terms (in italics) show at what stage the coordinating FacetComponent knows that particular term can be eliminated for consideration. (While the "..." ellipses are used to denote the possibility of many additional terms depending on the value of facet.limit=N (which defaults to 100), viewers may find the easiest way to understand how these results are merged & refined is to assume N=3 and imagine the ellipses do not exist in the diagram) Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 22 of 24 10/4/17, 4:32 PM
  23. 23. Q & A Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 23 of 24 10/4/17, 4:32 PM
  24. 24. Me https://twitter.com/_hossman My Company https://www.lucidworks.com/ These Slides https://home.apache.org/~hossman/rev2017/ Solr Docs & Mailing List https://lucene.apache.org/solr/resources.html Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/ 24 of 24 10/4/17, 4:32 PM

    Soyez le premier à commenter

    Identifiez-vous pour voir les commentaires

  • k-ta_mido

    Nov. 28, 2017
  • Zhiguang

    Mar. 6, 2018
  • NkechiNnadi

    May. 17, 2018
  • duareg

    Jul. 24, 2019

Presented at Lucene/Solr Revolution 2017

Vues

Nombre de vues

1 188

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

241

Actions

Téléchargements

32

Partages

0

Commentaires

0

Mentions J'aime

4

×