Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Native Code & Off-Heap Data 
Structures for Solr 
Yonik Seeley 
Lucene/Solr Revolution 2014 
Washington, D.C.
My Background 
• Creator of Solr 
• Heliosearch Founder 
• LucidWorks Co-Founder 
• Lucene/Solr committer, PMC member 
• A...
Heliosearch Project 
• The Next Evolution of Solr 
• Forked from Solr, Developing at github 
– Started Jan 2014 
– Well al...
Garbage Collection
Garbage Collection Basics 
Eden 
Space 
Survivor 
Space 
1 
Survivor 
Space 
2 
Tenured 
Space 
Permanent 
Space 
q New o...
Java Memory Waste 
- Need to size for worst case scenario 
- OS needs free memory to cache index files 
- JVMs aren’t good...
GC Impact 
q GC Reduces Throughput 
q Time to copy all that memory around could be spent 
better! 
q Stop-the-world pau...
GC Tuning 
UseSerialGC 
UseParallelGC 
UseParallelOldGC 
UseParallelOldGCCompacting 
UseParallelDensePrefixUpdate 
HeapMax...
GC Reduction 
q Reuse objects – cause less garbage 
q Move certain things off-heap (invisible to GC) 
q Option1: Direct...
Off-Heap Filters 
50M docs 
(3.8 GB index) 
8GB RAM 
20K requests 
8 req threads 
500 filters 
JVM Options: 
-Xmx4G (solr)
Off-Heap title 
Filters Test 
Observed 
max 
process 
sizes 
Solr 
: 
3.8GB 
– 
4.3GB 
Heliosearch: 
3.6GB 
– 
3.7GB
Off-Heap FieldCache 
Normal (on-heap) FieldCache 
q Typically the largest data structures kept on the heap 
q Used for s...
nCache admin stats 
item_id:{ 
"field":"id", 
"uses":8, 
"class":"StrTopValues", 
"refcount":2, 
"numSegments":7, 
"carrie...
Off-Heap Integer Field 
q 50M document index 
q Sorting on 6 different integer fields (10,100,1000,10000,1M unique value...
String Field Sorting 
q 10M document index 
q 10 different string fields, each field 80% populated 
q Median latency
String Field Sorting Throughput 
q Concurrent throughput sorting on random fields in random order (asc/desc) 
q ~50% per...
Native Code
Native Code 
q The Idea: create native accelerators for CPU hotspots 
q Faceting anyone? 
q But…. JNI Sucks! (and it’s ...
Native Single Valued String Faceting 
q Top-Level off-heap String cache 
q Improves Sorting and Faceting speed 
q Elimi...
Native Faceting Performance
Terms Query Optimization
New Facet Module
Facet Module Goals 
q Replace the aging “SimpleFacets” 
q First class JSON support 
q Easier programmatic construction ...
API Comparison 
Old Style New JSON API 
&facet=true 
&facet.range={!key=age_ranges}age 
&f.age_ranges.facet.range.start=0 ...
Facet Functions 
q Sort/Report by things other than “count” 
Aggregation Functions / Stats: 
count 
sum(function) 
avg(fu...
Simple Request + Response 
$ 
curl 
http://localhost:8983/solr/query 
-­‐d 
'q=widgets& 
json.facet= 
{ 
// 
Comments 
can...
Terms Facet Example 
json.facet={ 
shoes:{ 
terms:{ 
field: 
shoe_style, 
sort: 
{x 
: 
desc}, 
facet:{ 
x 
: 
"avg(price)...
Sub-Facets 
q Any facet that produces buckets can have sub-facets 
(terms/field, range, query) 
q Sub-facets can have fa...
Sub-Facet Example 
json.facet={ 
shoes:{ 
terms:{ 
field: 
shoe_style, 
sort: 
{x 
: 
desc}, 
facet:{ 
x 
: 
"avg(price)",...
Terms Facet 
Terms facet creates buckets of docs with the same value in a field 
- field – The field name to facet over. 
...
Query Facet 
Query facet creates a single bucket of documents matching the 
query. 
{ 
// 
simple 
example 
highpop:{ 
que...
Range Facet 
Creates buckets over ranges on a numeric or date field 
Parameter names/values "in sync" with Solr range para...
Sub-Facets + Facet-Functions 
= 
Business Intelligence / Analytics
Fantasy 
($1045) 
Top 
Authors 
$423 
George 
R.R. 
MarKn 
$347 
Brandon 
Sanderson 
$155 
JK 
Rowling 
Top 
Books 
$252 
...
date_breakout 
: 
{ 
range: 
{ 
field: 
sale_date, 
start 
: 
..., 
end 
: 
..., 
gap 
: 
"+1MONTH”, 
facet 
: 
{ 
top_gen...
Fantasy 
($1045) 
Top 
Authors 
$423 
George 
R.R. 
MarKn 
$347 
Brandon 
Sanderson 
$155 
JK 
Rowling 
Top 
Books 
$252 
...
Filter 
By 
State 
$852 
NJ 
(14 
stores) 
$658 
NY 
(11 
stores) 
$421 
CT 
(8 
stores) 
Chain 
$984 
Amazoon 
(14 
store...
Misc Features
Parameter Substitution 
q Parameters / macros substituted across whole request 
q Happens before any parsing, so usable ...
New Query Parser Features 
q Filters in queries - just like “fq” parameters, but may appear 
anywhere in a query 
q=(text...
Thank You 
Help Develop the Next Generation of Solr! 
Resources: 
q http://heliosearch.org 
q https://github.com/Heliose...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, Heliosearch
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, Heliosearch
Prochain SlideShare
Chargement dans…5
×
Prochain SlideShare
Data structure lecture 5
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

2

Partager

Télécharger pour lire hors ligne

Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, Heliosearch

Télécharger pour lire hors ligne

Presented at Lucene/Solr Revolution 2014

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, Heliosearch

  1. 1. Native Code & Off-Heap Data Structures for Solr Yonik Seeley Lucene/Solr Revolution 2014 Washington, D.C.
  2. 2. My Background • Creator of Solr • Heliosearch Founder • LucidWorks Co-Founder • Lucene/Solr committer, PMC member • Apache Software Foundation member • M.S. in Computer Science, Stanford
  3. 3. Heliosearch Project • The Next Evolution of Solr • Forked from Solr, Developing at github – Started Jan 2014 – Well aligned community – Open Source, Apache licensed • Bring back to Apache in the future? • Currently drop-in replacement for Solr at the HTTP-API level – A super-set… we continually merge in upstream changes – Latest version of Heliosearch includes latest Solr • Current Features: Off-heap filters, Off-heap fieldcache, facet-by- function, sub-facets, native code performance enhancements
  4. 4. Garbage Collection
  5. 5. Garbage Collection Basics Eden Space Survivor Space 1 Survivor Space 2 Tenured Space Permanent Space q New objects allocated in Eden q Find live objects by tracing from GC “roots” (threads, stack locals, etc) q Make a copy of live objects, leaving “garbage” behind q Eden + Survivor Space copied together to other Survivor space q Tenured from Survivor when old enough q “stop-the-world” needed when GC can’t keep up q Out of memory when too much time spent in GC Thread
  6. 6. Java Memory Waste - Need to size for worst case scenario - OS needs free memory to cache index files - JVMs aren’t good at “sharing” with rest of the system - mmap allocations managed by OS, can be immediately reused on free OS Real Memory max heap Unused Heap Heap in use JVM max heap Unused Heap Heap in use JVM Unused Heap C Heap in use C Process Unused Heap C Heap in use C Process mmap alloced mmap alloced “Free” Memory includes buffer cache, important to cache index files
  7. 7. GC Impact q GC Reduces Throughput q Time to copy all that memory around could be spent better! q Stop-the-world pauses q Seconds to Minutes long q Pause time proportional to heap size q Still exists in all Hotspot GCs… CMS, G1GC, etc q Breaks Application SLAs (request timeouts, etc) q Can cause SolrCloud Zookeeper session timeouts q Reducing max pause size normally means reduced throughput q Non-graceful degradation q if you don't size your heap big enough… BOOM!
  8. 8. GC Tuning UseSerialGC UseParallelGC UseParallelOldGC UseParallelOldGCCompacting UseParallelDensePrefixUpdate HeapMaximumCompactionInterval HeapFirstMaximumCompactionCount UseMaximumCompactionOnSystemGC ParallelOldDeadWoodLimiterMean ParallelOldDeadWoodLimiterStdDev UseParallelOldGCDensePrefix ParallelGCThreads ParallelCMSThreads YoungPLABSize OldPLABSize GCTaskTimeStampEntries AlwaysTenure NeverTenure ScavengeBeforeFullGC UseConcMarkSweepGC ExplicitGCInvokesConcurrent UseCMSBestFit UseCMSCollectionPassing UseParNewGC ParallelGCVerbose ParallelGCBufferWastePct ParallelGCRetainPLAB TargetPLABWastePct PLABWeight ResizePLAB PrintPLAB ParGCArrayScanChunk ParGCDesiredObjsFromOverflowList CMSParPromoteBlocksToClaim AlwaysPreTouch CMSUseOldDefaults CMSYoungGenPerWorker CMSIncrementalMode CMSIncrementalDutyCycle CMSIncrementalPacing CMSIncrementalDutyCycleMin CMSIncrementalSafetyFactor CMSIncrementalOffset CMSExpAvgFactor CMS_FLSWeight CMS_FLSPadding FLSCoalescePolicy CMS_SweepWeight CMS_SweepPadding CMS_SweepTimerThresholdMillis CMSClassUnloadingEnabled CMSCompactWhenClearAllSoftRefs UseCMSCompactAtFullCollection CMSFullGCsBeforeCompaction CMSIndexedFreeListReplenish CMSLoopWarn CMSMarkStackSize CMSMarkStackSizeMax CMSMaxAbortablePrecleanLoops CMSMaxAbortablePrecleanTime CMSAbortablePrecleanMinWorkPerIteration CMSAbortablePrecleanWaitMillis CMSRescanMultiple CMSConcMarkMultiple CMSRevisitStackSize CMSAbortSemantics CMSParallelRemarkEnabled CMSParallelSurvivorRemarkEnabled CMSPLABRecordAlways CMSConcurrentMTEnabled CMSPermGenPrecleaningEnabled CMSPermGenSweepingEnabled CMSPrecleaningEnabled CMSPrecleanIter CMSPrecleanNumerator CMSPrecleanDenominator CMSPrecleanRefLists1 CMSPrecleanRefLists2 CMSPrecleanSurvivors1 CMSPrecleanSurvivors2 CMSPrecleanThreshold CMSCleanOnEnter CMSRemarkVerifyVariant CMSScheduleRemarkEdenSizeThreshold CMSScheduleRemarkEdenPenetration CMSScheduleRemarkSamplingRatio CMSSamplingGrain CMSScavengeBeforeRemark CMSWorkQueueDrainThreshold CMSWaitDuration CMSYield CMSBitMapYieldQuantum UseGCLogFileRotation NumberOfGCLogFiles GCLogFileSize LargePageSizeInBytes LargePageHeapSizeThreshold PrintGCApplicationConcurrentTime PrintGCApplicationStoppedTime OnOutOfMemoryError ClassUnloading BlockOffsetArrayUseUnallocatedBlock RefDiscoveryPolicy ParallelRefProcEnabled CMSTriggerRatio CMSBootstrapOccupancy CMSInitiatingOccupancyFraction UseCMSInitiatingOccupancyOnly HandlePromotionFailure PreserveMarkStackSize ZeroTLAB PrintTLAB TLABStats AlwaysActAsServerClassMachine DefaultMaxRAM DefaultMaxRAMFraction DefaultInitialRAMFraction UseAutoGCSelectPolicy AutoGCSelectPauseMillis UseAdaptiveSizePolicy UsePSAdaptiveSurvivorSizePolicy UseAdaptiveGenerationSizePolicyAtMinorCollection UseAdaptiveGenerationSizePolicyAtMajorCollection UseAdaptiveSizePolicyWithSystemGC UseAdaptiveGCBoundary AdaptiveSizeThroughPutPolicy AdaptiveSizePausePolicy AdaptiveSizePolicyInitializingSteps AdaptiveSizePolicyOutputInterval UseAdaptiveSizePolicyFootprintGoal AdaptiveSizePolicyWeight AdaptiveTimeWeight PausePadding PromotedPadding SurvivorPadding AdaptivePermSizeWeight PermGenPadding ThresholdTolerance AdaptiveSizePolicyCollectionCostMargin YoungGenerationSizeIncrement YoungGenerationSizeSupplement YoungGenerationSizeSupplementDecay TenuredGenerationSizeIncrement TenuredGenerationSizeSupplement TenuredGenerationSizeSupplementDecay MaxGCPauseMillis MaxGCMinorPauseMillis GCTimeRatio AdaptiveSizeDecrementScaleFactor UseAdaptiveSizeDecayMajorGCCost AdaptiveSizeMajorGCDecayTimeScale MinSurvivorRatio InitialSurvivorRatio BaseFootPrintEstimate UseGCOverheadLimit GCTimeLimit GCHeapFreeLimit PrintAdaptiveSizePolicy DisableExplicitGC CollectGen0First BindGCTaskThreadsToCPUs UseGCTaskAffinity ProcessDistributionStride CMSCoordinatorYieldSleepCount CMSYieldSleepCount PrintGCTaskTimeStamps TraceClassLoadingPreorder TraceGen0Time TraceGen1Time PrintTenuringDistribution PrintHeapAtSIGBREAK TraceParallelOldGCTasks PrintParallelOldGCPhaseTimes MaxHeapSize MaxNewSize PretenureSizeThreshold MinTLABSize TLABAllocationWeight TLABWasteTargetPercent TLABRefillWasteFraction TLABWasteIncrement MaxLiveObjectEvacuationRatio OldSize MinHeapFreeRatio MaxHeapFreeRatio SoftRefLRUPolicyMSPerMB MinHeapDeltaBytes MinPermHeapExpansion MaxPermHeapExpansion QueuedAllocationWarningCount MaxTenuringThreshold InitialTenuringThreshold TargetSurvivorRatio MarkSweepDeadRatio PermMarkSweepDeadRatio MarkSweepAlwaysCompactCount PrintCMSStatistics PrintCMSInitiationStatistics PrintFLSStatistics PrintFLSCensus DeferThrSuspendLoopCount DeferPollingPageLoopCount SafepointSpinBeforeYield UseDepthFirstScavengeOrder GCDrainStackTargetSize ThreadSafetyMargin CodeCacheMinimumFreeSpace MaxDirectMemorySize PerfDataMemorySize AggressiveHeap UseCompressedStrings UseStringCache HeapDumpOnOutOfMemoryError HeapDumpPath PrintGC PrintGCDetails PrintGCTimeStamps PG1HeapRegionSize G1ReservePercent G1ConfidencePercent PrintPromotionFailure PrintGCDateStamps -­‐XX:IniKaKngHeapOccupancyPercent=n -­‐XX:MaxGCPauseMillis=n -­‐XX:ConcGCThreads=n -­‐XX:MaxHeapFreeRaKo=70 -­‐XX:MaxTenuringThreshold=n -­‐XX:+ScavengeBeforeFullGC
  9. 9. GC Reduction q Reuse objects – cause less garbage q Move certain things off-heap (invisible to GC) q Option1: Direct ByteBuffers q Limited to “int” (2GB) q No way to directly “free” – still relies on GC q Option2: sun.misc.Unsafe q malloc() + free() + direct memory access q Supported on all major JVMs q Widely used: Java (nio, concurrent),JSR166, Google Guava, objenesis (which is used in Kyro, which is used in Twitter Storm), Apache DirectMemory,Lightning, Hazelcast, snappy, gson, … q Being considered for Java 9
  10. 10. Off-Heap Filters 50M docs (3.8 GB index) 8GB RAM 20K requests 8 req threads 500 filters JVM Options: -Xmx4G (solr)
  11. 11. Off-Heap title Filters Test Observed max process sizes Solr : 3.8GB – 4.3GB Heliosearch: 3.6GB – 3.7GB
  12. 12. Off-Heap FieldCache Normal (on-heap) FieldCache q Typically the largest data structures kept on the heap q Used for sorting, function query values, single-valued faceting, grouping q Uses weak references Heliosearch nCache (n is for “native”) q Allocated off-heap q First-class managed Solr cache q Configure size, warming policies q View statistics q Per-segment (NRT friendly) q No weak references
  13. 13. nCache admin stats item_id:{ "field":"id", "uses":8, "class":"StrTopValues", "refcount":2, "numSegments":7, "carriedOver":6, "size":612} item_popularity:{ "field":"popularity", "uses":5, "class":"IntTopValues", "refcount":2, "numSegments":7, "carriedOver":6, "size":106} item_price:{ "field":"price”, "uses":0, -- the number of top-level uses for searcher "class":"FloatTopValues", "refcount":2, "numSegments":5, -- number of segments populated "carriedOver":5, -- number of segments carried over from last searcher "size":272 -- size in bytes for all populated segments }
  14. 14. Off-Heap Integer Field q 50M document index q Sorting on 6 different integer fields (10,100,1000,10000,1M unique values) q 4 request threads Results q 42% faster sorting q 73% faster functions
  15. 15. String Field Sorting q 10M document index q 10 different string fields, each field 80% populated q Median latency
  16. 16. String Field Sorting Throughput q Concurrent throughput sorting on random fields in random order (asc/desc) q ~50% performance gain
  17. 17. Native Code
  18. 18. Native Code q The Idea: create native accelerators for CPU hotspots q Faceting anyone? q But…. JNI Sucks! (and it’s GC’s fault again) jint *buf= (*env)-­‐>GetIntArrayElements(env, arr, 0); for (i=0; i<len; i++) { sum += buf[i]; q GetArrayElements() – makes a *copy* of the array! q GetPrimitiveArrayCritical() – blocks garbage collection! q Tons of other restrictions… it’s a “critical section” q Defeats the purpose of going to native code in the first place q But… our data is already off-heap, we’re good! }
  19. 19. Native Single Valued String Faceting q Top-Level off-heap String cache q Improves Sorting and Faceting speed q Eliminates FieldCache “insanity” q Native Code q Written in C++, compiled with GCC 4.7, 4.8 q Currently supports 64 bit Windows, OS-X, Linux (x86) q static compilation avoids JVM hotspot warmup period, mis-compilation bugs, and variations between runs
  20. 20. Native Faceting Performance
  21. 21. Terms Query Optimization
  22. 22. New Facet Module
  23. 23. Facet Module Goals q Replace the aging “SimpleFacets” q First class JSON support q Easier programmatic construction of complex nested facet commands q Canonical response format that is easier for clients to parse q First class analytics support q Cleaner distributed search support q Fully pluggable q Better base for integration of other search features Heliosearch is a Solr super-set, so you can still chose to use the old faceting or mix-n-match.
  24. 24. API Comparison Old Style New JSON API &facet=true &facet.range={!key=age_ranges}age &f.age_ranges.facet.range.start=0 &f.age_ranges.facet.range.end=100 &f.age_ranges.facet.range.gap=10 &facet.range={!key=price_ranges}price &f.price_ranges.facet.range.start=0 &f.price_ranges.facet.range.end=1000 &f.price_ranges.facet.range.gap=50 { age_ranges: { // facet name range: { // facet type field : age, // facet params start : 0, end : 100, gap : 10 } }, price_ranges: { range: { field : price, start : 0, end : 1000, gap : 50 } } }
  25. 25. Facet Functions q Sort/Report by things other than “count” Aggregation Functions / Stats: count sum(function) avg(function) sumsq(function) min(function) max(function) unique(string_field) any “funcKon query” that yields a numeric value! Example: sum(mul(num_units, unit_price)) q Stats are calculated “per bucket” q Buckets created by Query, Range, or Terms (field) facets
  26. 26. Simple Request + Response $ curl http://localhost:8983/solr/query -­‐d 'q=widgets& json.facet= { // Comments can help with clarity /* traditional C-­‐style comments are also supported */ x : "avg(price)" , // Simple strings can occur unquoted y : 'unique(brand)' // Strings can also use single quotes } ' […] "facets" : { "count" : 314, "x" : 102.5, "y" : 28 } Number of documents in the facet bucket
  27. 27. Terms Facet Example json.facet={ shoes:{ terms:{ field: shoe_style, sort: {x : desc}, facet:{ x : "avg(price)", y : "unique(brand)" } } } } "facets": { "count" : 472, "shoes": { "buckets" : [ { "val" : "Hiking", "count" : 34, "x" : 135.25, "y" : 17, }, { "val" : "Running", "count" : 45, "x" : 110.75, "y" : 24, }, Executed per-­‐bucket
  28. 28. Sub-Facets q Any facet that produces buckets can have sub-facets (terms/field, range, query) q Sub-facets can have facet functions (stats) or their own sub-facets (no limit to nesting). q A subfacet can be any type (field, range, query) q Multiple subfacets can be added to any given facet q Subfacets are first-class facets - can be configured independently like any other facet. q Different offsets, limits, stats, sorts, etc
  29. 29. Sub-Facet Example json.facet={ shoes:{ terms:{ field: shoe_style, sort: {x : desc}, facet:{ x : "avg(price)", y : "unique(brand)", colors :{terms:color} } } } } "facets": { "count" : 472, "shoes": { "buckets" : [ { "val" : "Hiking", "count" : 34, "x" : 135.25, "y" : 17, "colors" : { "buckets" : [ { "val" : "brown", "count" : 12 }, { "val" : "black", "count" : 10 }, […] ] } // end of colors sub-­‐facet }, // end of Hiking bucket { "val" : "Running", "count" : 45, "x" : 110.75, "y" : 24, "colors" : { "buckets" : […] Short-­‐form for terms facet simply specifies the field. Sorts buckets by count descending.
  30. 30. Terms Facet Terms facet creates buckets of docs with the same value in a field - field – The field name to facet over. - offset – Used for paging, this skips the first N buckets. Defaults to 0. - limit – Limits the number of buckets returned. Defaults to 10. - mincount – Only return buckets with a count of at least this number. Defaults to 1. - sort – Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc” - missing – A boolean that specifies if a special “missing” bucket should be returned that is defined by documents without a value in the field. Defaults to false. - numBuckets – A boolean. If true, adds “numBuckets” to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to false. - allBuckets – A boolean. If true, adds an “allBuckets” bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to false. - prefix – Only produce buckets for terms starting with the specified prefix.
  31. 31. Query Facet Query facet creates a single bucket of documents matching the query. { // simple example highpop:{ query:{ q:"inStock:true AND popularity[8 TO 10]" } } } { // example with multiple sub-­‐facets highpop:{ query:{ q : "inStock:true AND popularity[8 TO 10]", facet : { average_price : "agv(price)", available_colors : { terms : color }, price_ranges : { range : { field:price, start:0, end:200, gap:10 }} }} }
  32. 32. Range Facet Creates buckets over ranges on a numeric or date field Parameter names/values "in sync" with Solr range parameters: field – The numeric field or date field to produce range buckets from start – Lower bound of the ranges end – Upper bound of the ranges gap – Size of each range bucket produced hardend – A boolean, which if true means that the last bucket will end at “end” even if it is less than “gap” wide. If false, the last bucket will be “gap” wide, which may extend past “end”. other – This param indicates that in addition to the counts for each range constraint between facet.range.start and facet.range.end, counts should also be computed for… – "before" all records with field values lower then lower bound of the first range – "after" all records with field values greater then the upper bound of the last range – "between" all records with field values between the start and end bounds of all ranges – "none" compute none of this information – "all" shortcut for before, between, and after include – By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are inclusive of their lower bounds and exclusive of the upper bounds. The “before” range is exclusive and the “after” range is inclusive. This default, equivalent to lower below, will not result in double counting at the boundaries. This behavior can be modified by the facet.range.include param, which can be any combination of the following options… – "lower" all gap based ranges include their lower bound – "upper" all gap based ranges include their upper bound – "edge" the first and last gap ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified – "outer" the “before” and “after” ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries. – "all" shorthand for lower, upper, edge, outer
  33. 33. Sub-Facets + Facet-Functions = Business Intelligence / Analytics
  34. 34. Fantasy ($1045) Top Authors $423 George R.R. MarKn $347 Brandon Sanderson $155 JK Rowling Top Books $252 A Game of Thrones $113 Emperor of Thorns $101 Nine Princes in Amber $82 Steel Heart Sci-­‐Fi ($898) Top Authors $321 Iain M Banks $218 Neal Asher $155 Neal Stephenson Top Books $113 Gridlinked $101 Use of Weapons $93 Snow Crash $82 The Skinner Mystery ($645) Top Authors $191 James Panerson $145 Patricia Cornwell $126 John Grisham Top Books $85 One for the Money $77 Angels & Daemons $64 Shuner Island $35 The Firm Filter By State $852 NJ (14 stores) $658 NY (11 stores) $421 CT (8 stores) Chain $984 Amazoon (14 stores) $734 Houses&Royalty (9 stores) $387 Books-­‐r-­‐us (7 stores) Store $108 Amazoon Branchburg $93 Books-­‐r-­‐us Bridgewater $87 H&R NYC Number of Books Chain 201K Houses&Royalty 183K Amazoon 98K Books-­‐r-­‐us Store 193K H&R NYC 77K Books-­‐r-­‐us Bridgewater 68K Amazoon Branchburg
  35. 35. date_breakout : { range: { field: sale_date, start : ..., end : ..., gap : "+1MONTH”, facet : { top_genre : { terms : { field : genre, sort : "revenue desc", limit : 4, facet : { revenue : "sum(sales)" } }}, by_chain: { terms : { field : chain, facet : { revenue : "sum(sales)" } }} […] Implementation Creates series of facet buckets based on date For each date bucket, facet by genre, taking the top 4 by revenue For each genre bucket, report revenue
  36. 36. Fantasy ($1045) Top Authors $423 George R.R. MarKn $347 Brandon Sanderson $155 JK Rowling Top Books $252 A Game of Thrones $113 Emperor of Thorns $101 Nine Princes in Amber $82 Steel Heart Sci-­‐Fi ($898) Top Authors $321 Iain M Banks $218 Neal Asher $155 Neal Stephenson Top Books $113 Gridlinked $101 Use of Weapons $93 Snow Crash $82 The Skinner Mystery ($645) Top Authors $191 James Panerson $145 Patricia Cornwell $126 John Grisham Top Books $85 One for the Money $77 Angels & Daemons $64 Shuner Island $35 The Firm top_genres:{ terms:{ field: genre, facet : { rev : "sum(sales)", top_authors:{ terms:{ field : author, sort :"rev desc", limit : 3, facet : { rev : "sum(sales)" } }}, top_books:{ terms:{ field : Ktle, sort : "rev desc", limit : 4, facet : { rev : "sum(sales)" } }} […]
  37. 37. Filter By State $852 NJ (14 stores) $658 NY (11 stores) $421 CT (8 stores) Chain $984 Amazoon (14 stores) $734 Houses&Royalty (9 stores) $387 Books-­‐r-­‐us (7 stores) Store $108 Amazoon Branchburg $93 Books-­‐r-­‐us Bridgewater $87 H&R NYC state_breakout:{ terms:{ field: state, sort: "rev desc", facet : { rev : "sum(sales)", num_stores : "unique(store)" }}, chain_breakout:{ terms:{ field: chain, sort: "rev desc", facet : { rev : "sum(sales)", num_stores : "unique(store)" }} , store_breakout:{ terms:{ field: store, sort: "rev desc", facet : { rev : "sum(sales)", }}}
  38. 38. Misc Features
  39. 39. Parameter Substitution q Parameters / macros substituted across whole request q Happens before any parsing, so usable in any context q=price:[ ${low} TO ${high} ] &low=100 &high=200 q Default values q=price:[ ${low:0} TO ${high:100} ] q Nested q=${price_query} &price_query=${price_field}:[ ${low} TO ${high} ] AND inStock:true &price_field=specialPrice &low=50 &high=100
  40. 40. New Query Parser Features q Filters in queries - just like “fq” parameters, but may appear anywhere in a query q=(text:elephant –(filter(*:* -price:[ 0 TO 100 ]) OR filter(date[0 TO 2013]) ) q Constant Score Queries q=color:(blue OR green)^=1 text:shoes q Comments in Queries (can nest) q=+text:elephant /* the main query */ /* boosting part – WIP {!func}mul(pop,rank)^10 */
  41. 41. Thank You Help Develop the Next Generation of Solr! Resources: q http://heliosearch.org q https://github.com/Heliosearch/heliosearch q https://groups.google.com/forum/#!forum/heliosearch q https://groups.google.com/forum/#!forum/heliosearch-dev
  • linekin

    Jun. 27, 2017
  • lasombra

    Jul. 6, 2015

Presented at Lucene/Solr Revolution 2014

Vues

Nombre de vues

2 560

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

100

Actions

Téléchargements

31

Partages

0

Commentaires

0

Mentions J'aime

2

×