SlideShare une entreprise Scribd logo
1  sur  27
1
What’s New in Solr
Solr 4.7 & 4.8
June 12, 2014
Search | Discover | Analyze
Speaker
• Software Engineer at LucidWorks
• Lucene/Solr committer and PMC member
• Previously worked on search and NLP at the
Center for Natural Language Processing at
Syracuse University’s iSchool
• Twitter: @steven_a_rowe
Steve Rowe
2
Agenda
• A short history of Solr 4
• Solr 4.7 and 4.8: new features
• Solr 4.9 and beyond
3
A short history of Solr 4
• Solr 4.0 released October 2012
4
A short history of Solr 4
• SolrCloud
– Distributed indexing and searching, NRT and NoSQL
features, e.g. realtime-get, optimistic concurrency and
durable updates
– Sharding, replication, ZooKeeper ensemble
– High availability with no single points of failure
• Real-time Get: Access latest document version, no
commit or new searcher open required
• Atomic updates: incremental field
add/update/increment via stored fields
• NRT: “soft” commits
5
A short history of Solr 4
• Solr Reference Guide now released with each
feature release:
– Live (targeting next Solr release):
http://s.apache.org/SolrReferenceGuide
– Most recent released PDF:
http://s.apache.org/Solr-Ref-Guide-PDF
– Previous release PDFs:
http://s.apache.org/Older-Solr-Ref-Guide-PDFs
6
A short history of Solr 4
• Flexible indexing
– Solr core = Lucene index
• Lucene index = 1 or more segments
– Codec: per-segment suite of formats
• Flexible scoring
– You can specify similarity implementation per fieldType in
your schema.xml if you use SchemaSimilarityFactory
– Built-in Similarities (other than the default TF-IDF):
• Okapi BM25
• Divergence from Randomness
• Information-Based
• Language Models (with two smoothing implementations)
• SweetSpot
7
A short history of Solr 4
• DocValues: typed column stride fields
– Document-to-value mapping built at index time
– Reduced memory usage compared to field cache
– Good for faceting and sorting
– Missing values now supported as of Solr 4.5
• Pseudo-fields
– Field aliasing, e.g. &fl=result:indexed
– Function queries, aliasable too, e.g. &fl=price:sum(a,b)
– Document transformers
• Standard: [explain], [value], [shard], [docid]
• Pseudo-joins, e.g. ?q={!join+from=manu+to=id}ipod
• Pivot faceting: automatic drill-down (no distr.’d support)
8
A short history of Solr 4
• Schema API
• GET /collection/schema/fields/fieldname
• PUT /collection/schema/fields/name
• JSON body: { "type":"text_general",
"stored":true,
"indexed":true }
• Schemaless mode
• a.k.a. data-driven schema or field guessing
• Class guessed based on field values, then class(es)
mapped to a fieldType; first gets added to the schema
• Supported value classes: Boolean, Integer, Long, Float,
Double, and Date
9
A short history of Solr 4
• Document routing
– CompositeId router, e.g. id=tenant!docid
• Used by default when numShards specified when
creating a collection.
• Restrict queries to shard(s): &_route_=tenant!
– Implicit router
• Online shard splitting
– Allows collections to scale, rather than having to
decide on how much to overshard up front.
– Split in two; with custom hash ranges; or using
split.key param to split to a dedicated shard
10
A short history of Solr 4
• Nested documents, a.k.a. Block Join
– Nested doc to be added:
<add>
<doc>
<field name="id">1</field>
<field name="title">Solr adds block join support</field>
<field name="content_type">parentDocument</field>
<doc>
<field name="id">2</field>
<field name="comments">SolrCloud supports it too!</field>
</doc>
</doc>
</add>
– Queries:
• Child query parser, e.g.
q={!child of="content_type:parentDocument"}title:Solr
• Parent query parser, e.g.
q={!parent which="content_type:parentDocument"}comments:SolrCloud
11
A short history of Solr 4
• solr.xml legacy & discovery modes
– Legacy mode (cores listed in solr.xml) is
deprecated; support will be removed in Solr 5.
– Discovery mode (new as of Solr 4.3):
• No cores are listed in solr.xml
• Cores are discovered by a recursive walk of the solr
home directory, marked by core.properties files
• Nested core directories are not allowed
12
A short history of Solr 4
• New web admin UI with SolrCloud support
13
Solr 4.7 and 4.8: new features
• As of Solr 4.8, Java 7 is the minimum supported
JVM version. Recommended: Oracle 1.7.0_60
• <fields> and <types> tags are no longer necessary in
schema.xml
• Collections API improvements
– Working toward “ZooKeeper = Truth” mode
• legacyCloud=false cluster property
– New actions:
• CLUSTERSTATUS, LIST, ADDROLE, DELETEROLE,
ADDREPLICA, DELETEREPLICA, OVERSEERSTATUS,
MIGRATE, CLUSTERPROP
– Core properties can be specified with CREATE and
SPLITSHARD actions
14
Solr 4.7 and 4.8: new features
• Asynchronous execution of long-running
actions
– SolrCloud Collections API:
• CREATE, SPLITSHARD, MIGRATE
– CoreAdminHandler:
• CREATE, RENAME, UNLOAD, SWAP, MERGEINDEXES,
SPLIT
– Tracking request ID supplied via async param
– Track status via the new REQUESTSTATUS action,
using the tracking request ID
• Possible states: running, complete, failed, notfound
– Clear stored statuses with special request ID -1
15
Solr 4.7 and 4.8: new features
• Cursors: Efficient Deep Paging
– Request must include a sort, which must include
the uniqueKey, which must be defined
– First page: ?q=…&sort=id+asc&rows=N&cursorMark=*
• Response contains "nextCursorMark":"<base64encoded>"
– Following pages:
?q=…&sort=id+asc&rows=N&cursorMark=<from response>
– Repeat; when nextCursorMark=cursorMark from the
request, there are no more results
– No server-side state
16
Solr 4.7 and 4.8: new features
17
Solr 4.7 and 4.8: new features
• Document expiration and Time To Live (TTL)
– Auto-delete expired documents
• DocExpirationUpdateProcessorFactory can periodically
wake up and delete expired documents
– Compute expiration date from TTL
• Update request _ttl_ param, or
• Document _ttl_ field
• Both names are configurable, defaulting to _ttl_.
• _ttl_ values are interpreted as Date Math Expressions
relative to NOW, e.g. “+1YEAR”.
18
Solr 4.7 and 4.8: new features
• Dynamic synonyms and stopwords
– “Managed” resources: configuration and content for
synonyms and stopwords, persistence managed by Solr
– Specified as ManagedSynonymFilterFactory and
ManagedStopFilterFactory on analyzers in schema.xml
– CRUD operations are enabled via a REST endpoint per
managed resource.
– The “managed” attribute names the REST endpoint, e.g.
<filter class="solr.ManagedStopFilterFactory"
managed="french" />
– E.g. to delete stopword “le” from the “french” managed
stoplist:
curl -X DELETE "…/solr/colln/schema/analysis/stopwords/french/le"
19
Solr 4.7 and 4.8: new features
• SSL support in SolrCloud
– URL scheme stored in ZooKeeper
– SSL certificates are specifiable via system properties, to
enable authentication
• Nested documents may be specified in JSON format
• Tri-level compositeId routing
– E.g. “tenant!group!docid”, 8/8/16 hash bits per component
• Build Solr indexes with Hadoop’s MapReduce
– +Mark Miller’s blog: http://bit.ly/1oh0fWq
• Github solr-map-reduce-example: http://bit.ly/1pnDAao
• Named config sets in non-SolrCloud mode
– Default base directory is SOLR_HOME/configsets/
20
Solr 4.7 and 4.8: new features
• Suggester v2
– Added BlendedInfixSuggester
– Added FreeTextSuggester
– Queries can use multiple suggesters
• New query parsing features
– SimpleQParserPlugin: parser for human entered
queries with selectable operators.
– ComplexPhraseQParserPlugin: wildcards, ORs, etc.
inside Phrase Queries
• E.g. {!complexphrase inOrder=true}name:"Jo* Smith"
21
Solr 4.7 and 4.8: new features
• CollapsingQParserPlugin
– Performant alternative grouping/field collapsing
implementation, for high distinct group cardinality.
• ExpandComponent
– Expands collapsed groups
– Can also expand nested documents
22
Solr 4.9 and beyond
• ZooKeeper = Truth / legacyCloud=false
• MODIFYCOLLECTION collections API
– Modify maxShardsPerNode, replicationFactor for the
entire collection
• Incremental Field Updates on numeric
DocValues
– Binary DocValues IFUs also coming
• Multi-valued DocValues sort fields
• Legacy numeric/date field types deprecated,
removed in Solr 5 in favor of Trie field types
23
Solr 4.9 and beyond
• In Solr 5, the .war will no longer be shipped
• Index integrity: checksums
• Integrity check on merge off by default
• solrconfig.xml option <indexConfig><checkIntegrityAtMerge>
• New update query param min_rf will allow clients
to set the minimum successful replicas for the
request
• Return Block Join child documents when parents
match, via a new DocTransformer
[child parentFilter=“field:value”]
24
Solr 4.9 and beyond
• AnalyticsQuery: support pluggable, pipeline-able
analytics, orderable via the “cost” parameter, like
PostFilters.
• ReRankingQParserPlugin
• Re-rank the top n results
25
Platform
LucidWorks Open Source
26
• Effortless AWS deployment and monitoring:
http://www.github.com/lucidworks/solr-scale-tk
• Logstash for Solr:
https://github.com/LucidWorks/solrlogmanager
• Banana (Kibana for Solr):
https://github.com/LucidWorks/banana
• Data Quality Toolkit: https://github.com/LucidWorks/data-
quality
• Coming Soon for Big Data: Hadoop, Pig, Hive 2-way
support w/ Lucene and Solr, different file formats, pipelines,
Logstash
Links
Solr website: http://lucene.apache.org/solr
Solr Reference Guide:
• Live (targeting next Solr release):
http://s.apache.org/SolrReferenceGuide
• Most recent released PDF: http://s.apache.org/Solr-Ref-Guide-
PDF
• Previous release PDFs: http://s.apache.org/Older-Solr-Ref-
Guide-PDFs
Lucene/Solr Revolution: http://www.LuceneRevolution.org
Q & A
27

Contenu connexe

Tendances

Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemAdding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
Cloudera, Inc.
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
Mark Miller
 
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo..."Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
Lucidworks
 

Tendances (20)

ApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr IntegrationApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr Integration
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop EcosystemAdding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
 
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo..."Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 

En vedette

Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Lucidworks
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Lucidworks
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 

En vedette (20)

Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
Understand the Breadth and Depth of Solr via the Admin UI: Presented by Upaya...
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images
 
Solr5
Solr5Solr5
Solr5
 
Apache solr
Apache solrApache solr
Apache solr
 
Webinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and BeyondWebinar: Solr's example/files: From bin/post to /browse and Beyond
Webinar: Solr's example/files: From bin/post to /browse and Beyond
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Curso Formacion Apache Solr
Curso Formacion Apache SolrCurso Formacion Apache Solr
Curso Formacion Apache Solr
 
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 AcquiaApache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Webinar: Natural Language Search with Solr
Webinar: Natural Language Search with SolrWebinar: Natural Language Search with Solr
Webinar: Natural Language Search with Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Seminario Apache Solr
Seminario Apache SolrSeminario Apache Solr
Seminario Apache Solr
 
Formación apache Solr
Formación apache SolrFormación apache Solr
Formación apache Solr
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Integration of apache solr with crawlers
Integration of apache solr with crawlersIntegration of apache solr with crawlers
Integration of apache solr with crawlers
 

Similaire à What's new in solr june 2014

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
JSGB
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Sematext Group, Inc.
 

Similaire à What's new in solr june 2014 (20)

Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene/Solr 8: The next major release
Lucene/Solr 8: The next major releaseLucene/Solr 8: The next major release
Lucene/Solr 8: The next major release
 
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Summary of JDK10 and What will come into JDK11
Summary of JDK10 and What will come into JDK11Summary of JDK10 and What will come into JDK11
Summary of JDK10 and What will come into JDK11
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 

Plus de Lucidworks (Archived)

Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Lucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Lucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Lucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Lucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Lucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Lucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Lucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
Lucidworks (Archived)
 

Plus de Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

What's new in solr june 2014

  • 1. 1 What’s New in Solr Solr 4.7 & 4.8 June 12, 2014 Search | Discover | Analyze
  • 2. Speaker • Software Engineer at LucidWorks • Lucene/Solr committer and PMC member • Previously worked on search and NLP at the Center for Natural Language Processing at Syracuse University’s iSchool • Twitter: @steven_a_rowe Steve Rowe 2
  • 3. Agenda • A short history of Solr 4 • Solr 4.7 and 4.8: new features • Solr 4.9 and beyond 3
  • 4. A short history of Solr 4 • Solr 4.0 released October 2012 4
  • 5. A short history of Solr 4 • SolrCloud – Distributed indexing and searching, NRT and NoSQL features, e.g. realtime-get, optimistic concurrency and durable updates – Sharding, replication, ZooKeeper ensemble – High availability with no single points of failure • Real-time Get: Access latest document version, no commit or new searcher open required • Atomic updates: incremental field add/update/increment via stored fields • NRT: “soft” commits 5
  • 6. A short history of Solr 4 • Solr Reference Guide now released with each feature release: – Live (targeting next Solr release): http://s.apache.org/SolrReferenceGuide – Most recent released PDF: http://s.apache.org/Solr-Ref-Guide-PDF – Previous release PDFs: http://s.apache.org/Older-Solr-Ref-Guide-PDFs 6
  • 7. A short history of Solr 4 • Flexible indexing – Solr core = Lucene index • Lucene index = 1 or more segments – Codec: per-segment suite of formats • Flexible scoring – You can specify similarity implementation per fieldType in your schema.xml if you use SchemaSimilarityFactory – Built-in Similarities (other than the default TF-IDF): • Okapi BM25 • Divergence from Randomness • Information-Based • Language Models (with two smoothing implementations) • SweetSpot 7
  • 8. A short history of Solr 4 • DocValues: typed column stride fields – Document-to-value mapping built at index time – Reduced memory usage compared to field cache – Good for faceting and sorting – Missing values now supported as of Solr 4.5 • Pseudo-fields – Field aliasing, e.g. &fl=result:indexed – Function queries, aliasable too, e.g. &fl=price:sum(a,b) – Document transformers • Standard: [explain], [value], [shard], [docid] • Pseudo-joins, e.g. ?q={!join+from=manu+to=id}ipod • Pivot faceting: automatic drill-down (no distr.’d support) 8
  • 9. A short history of Solr 4 • Schema API • GET /collection/schema/fields/fieldname • PUT /collection/schema/fields/name • JSON body: { "type":"text_general", "stored":true, "indexed":true } • Schemaless mode • a.k.a. data-driven schema or field guessing • Class guessed based on field values, then class(es) mapped to a fieldType; first gets added to the schema • Supported value classes: Boolean, Integer, Long, Float, Double, and Date 9
  • 10. A short history of Solr 4 • Document routing – CompositeId router, e.g. id=tenant!docid • Used by default when numShards specified when creating a collection. • Restrict queries to shard(s): &_route_=tenant! – Implicit router • Online shard splitting – Allows collections to scale, rather than having to decide on how much to overshard up front. – Split in two; with custom hash ranges; or using split.key param to split to a dedicated shard 10
  • 11. A short history of Solr 4 • Nested documents, a.k.a. Block Join – Nested doc to be added: <add> <doc> <field name="id">1</field> <field name="title">Solr adds block join support</field> <field name="content_type">parentDocument</field> <doc> <field name="id">2</field> <field name="comments">SolrCloud supports it too!</field> </doc> </doc> </add> – Queries: • Child query parser, e.g. q={!child of="content_type:parentDocument"}title:Solr • Parent query parser, e.g. q={!parent which="content_type:parentDocument"}comments:SolrCloud 11
  • 12. A short history of Solr 4 • solr.xml legacy & discovery modes – Legacy mode (cores listed in solr.xml) is deprecated; support will be removed in Solr 5. – Discovery mode (new as of Solr 4.3): • No cores are listed in solr.xml • Cores are discovered by a recursive walk of the solr home directory, marked by core.properties files • Nested core directories are not allowed 12
  • 13. A short history of Solr 4 • New web admin UI with SolrCloud support 13
  • 14. Solr 4.7 and 4.8: new features • As of Solr 4.8, Java 7 is the minimum supported JVM version. Recommended: Oracle 1.7.0_60 • <fields> and <types> tags are no longer necessary in schema.xml • Collections API improvements – Working toward “ZooKeeper = Truth” mode • legacyCloud=false cluster property – New actions: • CLUSTERSTATUS, LIST, ADDROLE, DELETEROLE, ADDREPLICA, DELETEREPLICA, OVERSEERSTATUS, MIGRATE, CLUSTERPROP – Core properties can be specified with CREATE and SPLITSHARD actions 14
  • 15. Solr 4.7 and 4.8: new features • Asynchronous execution of long-running actions – SolrCloud Collections API: • CREATE, SPLITSHARD, MIGRATE – CoreAdminHandler: • CREATE, RENAME, UNLOAD, SWAP, MERGEINDEXES, SPLIT – Tracking request ID supplied via async param – Track status via the new REQUESTSTATUS action, using the tracking request ID • Possible states: running, complete, failed, notfound – Clear stored statuses with special request ID -1 15
  • 16. Solr 4.7 and 4.8: new features • Cursors: Efficient Deep Paging – Request must include a sort, which must include the uniqueKey, which must be defined – First page: ?q=…&sort=id+asc&rows=N&cursorMark=* • Response contains "nextCursorMark":"<base64encoded>" – Following pages: ?q=…&sort=id+asc&rows=N&cursorMark=<from response> – Repeat; when nextCursorMark=cursorMark from the request, there are no more results – No server-side state 16
  • 17. Solr 4.7 and 4.8: new features 17
  • 18. Solr 4.7 and 4.8: new features • Document expiration and Time To Live (TTL) – Auto-delete expired documents • DocExpirationUpdateProcessorFactory can periodically wake up and delete expired documents – Compute expiration date from TTL • Update request _ttl_ param, or • Document _ttl_ field • Both names are configurable, defaulting to _ttl_. • _ttl_ values are interpreted as Date Math Expressions relative to NOW, e.g. “+1YEAR”. 18
  • 19. Solr 4.7 and 4.8: new features • Dynamic synonyms and stopwords – “Managed” resources: configuration and content for synonyms and stopwords, persistence managed by Solr – Specified as ManagedSynonymFilterFactory and ManagedStopFilterFactory on analyzers in schema.xml – CRUD operations are enabled via a REST endpoint per managed resource. – The “managed” attribute names the REST endpoint, e.g. <filter class="solr.ManagedStopFilterFactory" managed="french" /> – E.g. to delete stopword “le” from the “french” managed stoplist: curl -X DELETE "…/solr/colln/schema/analysis/stopwords/french/le" 19
  • 20. Solr 4.7 and 4.8: new features • SSL support in SolrCloud – URL scheme stored in ZooKeeper – SSL certificates are specifiable via system properties, to enable authentication • Nested documents may be specified in JSON format • Tri-level compositeId routing – E.g. “tenant!group!docid”, 8/8/16 hash bits per component • Build Solr indexes with Hadoop’s MapReduce – +Mark Miller’s blog: http://bit.ly/1oh0fWq • Github solr-map-reduce-example: http://bit.ly/1pnDAao • Named config sets in non-SolrCloud mode – Default base directory is SOLR_HOME/configsets/ 20
  • 21. Solr 4.7 and 4.8: new features • Suggester v2 – Added BlendedInfixSuggester – Added FreeTextSuggester – Queries can use multiple suggesters • New query parsing features – SimpleQParserPlugin: parser for human entered queries with selectable operators. – ComplexPhraseQParserPlugin: wildcards, ORs, etc. inside Phrase Queries • E.g. {!complexphrase inOrder=true}name:"Jo* Smith" 21
  • 22. Solr 4.7 and 4.8: new features • CollapsingQParserPlugin – Performant alternative grouping/field collapsing implementation, for high distinct group cardinality. • ExpandComponent – Expands collapsed groups – Can also expand nested documents 22
  • 23. Solr 4.9 and beyond • ZooKeeper = Truth / legacyCloud=false • MODIFYCOLLECTION collections API – Modify maxShardsPerNode, replicationFactor for the entire collection • Incremental Field Updates on numeric DocValues – Binary DocValues IFUs also coming • Multi-valued DocValues sort fields • Legacy numeric/date field types deprecated, removed in Solr 5 in favor of Trie field types 23
  • 24. Solr 4.9 and beyond • In Solr 5, the .war will no longer be shipped • Index integrity: checksums • Integrity check on merge off by default • solrconfig.xml option <indexConfig><checkIntegrityAtMerge> • New update query param min_rf will allow clients to set the minimum successful replicas for the request • Return Block Join child documents when parents match, via a new DocTransformer [child parentFilter=“field:value”] 24
  • 25. Solr 4.9 and beyond • AnalyticsQuery: support pluggable, pipeline-able analytics, orderable via the “cost” parameter, like PostFilters. • ReRankingQParserPlugin • Re-rank the top n results 25
  • 26. Platform LucidWorks Open Source 26 • Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk • Logstash for Solr: https://github.com/LucidWorks/solrlogmanager • Banana (Kibana for Solr): https://github.com/LucidWorks/banana • Data Quality Toolkit: https://github.com/LucidWorks/data- quality • Coming Soon for Big Data: Hadoop, Pig, Hive 2-way support w/ Lucene and Solr, different file formats, pipelines, Logstash
  • 27. Links Solr website: http://lucene.apache.org/solr Solr Reference Guide: • Live (targeting next Solr release): http://s.apache.org/SolrReferenceGuide • Most recent released PDF: http://s.apache.org/Solr-Ref-Guide- PDF • Previous release PDFs: http://s.apache.org/Older-Solr-Ref- Guide-PDFs Lucene/Solr Revolution: http://www.LuceneRevolution.org Q & A 27

Notes de l'éditeur

  1. Asynchronous collection API calls in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-AsynchronousCalls REQUESTSTATUS action in the Solr Reference Guide: http://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RequestStatus
  2. See Pagination of Results in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
  3. Chris Hostetter’s scripts to produce the graph: https://github.com/LucidWorks/blog-deep-paging-perf
  4. Date Math Expressions in Solr Javadocs: https://lucene.apache.org/solr/4_8_1/solr-core/org/apache/solr/util/DateMathParser.html See Chris Hostetter’s blog post “New in Solr 4.8: Document Expiration”: http://searchhub.org/2014/05/07/document-expiration/
  5. See the “Managed Resources” page in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Managed+Resources See also Tim Potter’s blog “Using Solr’s REST APIs to manage stop words and synonyms”: http://searchhub.org/2014/03/31/introducing-solrs-restmanager-and-managed-stop-words-and-synonyms/
  6. For info on Tri-level compositeId routing, see Anshum Gupta’s blog “Multi level composite-id routing in SolrCloud”: http://searchhub.org/2014/01/06/10590/ See the Config Sets page in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Config+Sets
  7. Suggester v2 JIRA issue: https://issues.apache.org/jira/browse/SOLR-5378 Simple Query Parser in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser Complex Phrase Query Parser in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
  8. See the Collapse & Expand page in the Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Collapse+%26+Expand See also Joel Bernstein’s blog post “The CollapsingQParserPlugin: Solr’s New High Performance Field Collapsing PostFilter”: http://heliosearch.org/the-collapsingqparserplugin-solrs-new-high-performance-field-collapsing-postfilter/ See also Joel Bernstein’s blog post “Solr’s New Expand Component”: http://heliosearch.org/solrs-new-expand-component/ See also Joel Bernstein’s blog post “Using the ExpandComponent to expand a Solr Block Join”: http://heliosearch.org/expand-block-join/
  9. See Joel Bernstein’s blog post “Solr’s New AnalyticsQuery API”: http://heliosearch.org/solrs-new-analyticsquery-api/ See Joel Bernstein’s blog post “New in Solr 4.9: Query Re-Ranking”: http://heliosearch.org/solrs-new-re-ranking-feature/