4. 7.X
1. Metrics
2. Autoscaling
3. CDCR
4. Time Routed Aliases
5. Replica types
6. Streaming expressions
7. JSON facet API
8. Configset / schema
9. Text Analysis / ML
10. Collections API
11. Queries
12. Large index segment
merging
13. Replication / recovery /
rolling updates
14. Block-join / nested docs
15. Miscellaneous
5. 7.X: Metrics
• Continuation of 6.X work to support Autoscaling efforts
• 7.0: - Aggregated metrics collected in overseer
- solrconfig.xml <jmx> ➞ solr.xml <metrics><reporter>
• 7.1: Prometheus metrics exporter contrib
• 7.4: /admin/metrics/history API: basic long-term key metric
time series aggregation
• Fixed-width windows at
several resolutions
• Not yet in Admin UI:
SOLR-12426
6. 7.X: Autoscaling
• 7.0: - Preferences and policy DSL: flexible replica placement
[ { minimize: cores }, { maximize: freedisk } ]
{ replica: "<2", shard: "#EACH", node: "#ANY" }
- Diagnostics API: return sorted nodes, policy violations
• 7.1: - autoAddReplicas ported to autoscaling framework
- Add/remove/suspend/resume triggers and listeners
- Triggers for added and lost nodes
- ComputePlanAction / ExecutePlanAction
- /autoscaling/history API: cluster events and actions
• 7.2: - Search rate trigger
- /autoscaling/suggestions API
- UTILIZENODE collections API command
8. 7.X: Autoscaling
• 7.4: - Periodic house-keeping task: cleans up inactive shards
- Index size trigger: document count or size in bytes
• 7.5: - Policy replica attribute: #ALL, #EQUAL, percentage,
range, and floating point values
- Policy cores attribute: #EQUAL, percentage,
range, and floating point values
- Percentage in freedisk policy attribute
- Simulation framework: test scaling up to 1 billion docs
9. 7.X: Cross Data Center Replication
• 7.2: Support bi-directional syncing of CDCR clusters
This is not
active-active,
but rather
passive-active
or active-passive:
only one active
cluster at a time.
10. 7.X: Time Routed Aliases
• 7.3: - Specialization of Solr’s collection alias feature
- Support time series data, e.g. logs / sensor data
- Maintain performance under continuous indexing
- CREATEALIAS: start, interval, retention policy
- Automatically create new collections
- Automatically delete old collections (optional)
- Route updates based on timestamp
- Search against all aliased collections*
• 7.5: Preemptively create the next collection when updates
are near the latest collection’s end date (optional)
* Pending optimization: minimize queried collections (SOLR-9562)
11. 7.X: Replica types
• 7.0:
• 7.4: Query param to prioritize replicas by type, e.g.
shards.preference=replica.type:PULL,replica.type:TLOG
Type
Indexes
locally
Supports
soft
commit
& RTG
Pulls
segments
from
leader
Writes to
TLog
Can
become
shard
leader
Queryable
NRT ✅ ✅ ✅ ✅ ✅
TLOG leader ✅ ✅ ✅ ✅ ✅
TLOG ✅ ✅ ✅ ✅
PULL ✅ ✅
12. 7.X: Streaming expressions
• Parallel computation function suite
• Some use cases: MapReduce, aggregations, parallel SQL, pub/
sub messaging, graph traversal, machine learning, statistical
programming
• Each 7.X release has added
many new functions
• 7.5: Ref guide:
Math Expressions User Guide
13. 7.X: JSON Facet API
• 7.0: Terms facets: added optional refinement support
• 7.4: Semantic Knowledge Graph support via new
relatedness() aggregate function
• Finds ad-hoc relationships by scoring documents
relative to foreground and background document
sets
• 7.5: Heatmap facet support
15. 7.X: Text analysis / machine learning
• 7.1: Bengali normalizer and stemmer
• 7.2: Enable off-ZooKeeper storage of large (>1MB) LTR models
• 7.3: OpenNLP integration: tokenization, POS tagging, phrase
chunking, lemmatization, NER, language detection
• 7.4: - ProtectedTermFilterFactory: don’t filter protected terms
- TaggerRequestHandler (a.k.a. SolrTextTagger): NER
• 7.5: - "nori" Korean morphological text analysis: "*_txt_ko"
- PhrasesIdentificationComponent: identify and score
candidate query phrases based on index statistics
- UIMA integration removed
16. 7.X: Collections API
• 7.3: Add collection level properties similar to cluster properties
• 7.4: Cluster-wide defaults for numShards, nrtReplicas,
tlogReplicas, pullReplicas
• 7.5: - Support co-locating replicas of two or more collections
together in a node via the withCollection parameter
to the CREATE and MODIFYCOLLECTION commands
- SPLITSHARD: New split method using hard links: splitMethod=link
• 3-5 times faster than the original splitMethod=rewrite
• Slows down replication
• Increases disk usage on replica nodes
18. 7.X: Queries
• 7.2: New synonymQueryStyle field type option: enable
generation of appropriate queries for hierarchical
relations between overlapping terms
• as_same_term (default): SynonymQuery(bird,robin)
• pick_best: Dismax(bird,robin)
• as_distinct_terms: (bird OR robin)
• 7.4: JSON query DSL: Enable query/filter tagging,
e.g. { "#colorfilt" : "color:blue" }
equivalent to local-param {!tag=colorfilt}color:blue
19. 7.X: Large index segment merging
• Problem: Overly large segments (e.g. as a result of force-
merge/optimize) stop being eligible for merging,
and can start accumulating >50% deleted
documents, wasting space and skewing index stats.
• 7.5: - TieredMergePolicy now respects maxSegmentSizeMB
by default when executing force-merge/optimize and
expunge-deletes
- TieredMergePolicy’s reclaimDeletesWeight has been
replaced with a new deletesPctAllowed setting to
control how aggressively deletes should be reclaimed
20. 7.X: Replication/recovery/rolling upgrades
• 7.3: The old Leader-Initiated-Recovery (LIR) implementation
is deprecated and replaced
• To perform a rolling upgrade to Solr 8, you must be on
Solr 7.3 or higher
• 7.4: - IndexFetcher now skips fetching identical files
- Buffering updates are written to a separate TLog
- Parallel replay of buffering TLogs
21. 7.X: Block-join / nested documents
• 7.3: Added filters and excludeTags local-params for
{!parent} and {!child} query parsers, usable for
multi-select faceting
• 7.5: WIP: Allow Solr to more faithfully represent deeply
nested document relationships, rather than requiring
reconstruction based on the flattened list of child docs
returned by Solr
22. 7.X: Miscellaneous
• 7.3: add-distinct atomic updates
• 7.4: - Ignore large document URP
- TLog: maxSize auto hard-commit setting
(in addition to maxDocs & maxTime)
• 7.5: Custom cluster properties allowed with ext. prefix
24. 8.0: Autoscaling
• Suggestions API: rebalance options even if no violations
• Suggestions API: add-replica for lost replicas
• maxOps limit for index size trigger
• Autoscaling policy framework will be the default replica
placement strategy
25. 8.0: Index upgrades
• 7.0: Lucene indexes record the major Lucene version that
created the index, and the minimum Lucene version
that contributed to segments.
• 8.0: Version N-2 or older indexes will now fail to open,
even if they have been merged into an N-1 index.
• IndexUpgrader will not upgrade 6.X or earlier indexes
• Re-indexing will be required to upgrade
26. 8.0: HTTP/2
• May 2018: Mark Miller announced his Star Burst effort:
many cleanups and performance enhancements
• July 2018: Cao Manh Dat took up the HTTP/2 aspects: SOLR-12639
• Indexing test: 33M docs, 1 shard, 2 replicas (SOLR-12642)
• Garbage: Leader: 26% less; replica: 76% less
• Indexing throughput: 54% higher
• CPU time: Leader: 39% higher; replica: 76% lower
• Ready to merge back to master, pending release of
Jetty 9.4.13, containing SPNEGO HTTP/2 implementation
27. 8.0: Miscellaneous
• Lucene: scores must be non-negative
• Function(Score)Query-s convert negative scores to zero
• TODO: remove deprecations
• Trie fields? Removal effectively blocked by:
• SOLR-12074: Add numeric equivalent to StrField
• SOLR-11127: Mechanism to migrate schema
for .system collection (a.k.a. blob store) schema from
Trie (pre-7.0) to Points (7.0+)
29. 8.X: Lucene/Solr minimum JDK
• Oracle will end free JDK 8 support in January 2019
• Both JDK 9 & 10 are already EOL, no more Oracle support
• JDK 11 will very likely be next minimum supported JDK, no
schedule yet
• Under JDK 9+, Solr’s Hadoop-related functionality has
problems, including with Kerberos
• Uwe Schindler’s Jenkins server tests Lucene/Solr on Oracle
9+10+11+12 JDKs
• All have higher Solr test failure rates than on JDK 8
30. 8.X: Luke: UI framework & licensing
• Andrzej Bialecki: Initial implementation: Thinlet, GPL
• Mark Harwood: GWT
• Mark Miller: Apache Pivot
• Dmitry Kan and Tomoko Uchida took ownership on Github
• Tomoko Uchida: JavaFX (bundled w/JDK 8)
• LUCENE-2562: Make Luke a Lucene/Solr Module
• JavaFX/OpenJFX unbundled from Java 11 JDK, GPL+CPE
• Tomoko Uchida: Swing (7.5 release available)
31. 8.X: New Lucene features
• Index impacts, Block-Max WAND, similarity cleanups
• Some queries (especially term queries and disjunctions)
are much faster when number of hits is not required
• FeatureField: incorporate static relevance signals, e.g.
PageRank
• Soft deletes
• Merge policy retains deleted docs according to policy
• Enables document history, e.g. for time-travel indexes
• RAMDirectory replaced by ByteBuffersDirectory