SlideShare a Scribd company logo
1 of 33
1© Cloudera, Inc. All rights reserved.
Solr 6 Feature Preview
Yonik Seeley
3/09/2016
2© Cloudera, Inc. All rights reserved.
My Background
• Creator of Solr
• Cloudera Engineer
• LucidWorks Co-Founder
• Lucene/Solr committer, PMC member
• Apache Software Foundation member
• M.S. in Computer Science, Stanford
3© Cloudera, Inc. All rights reserved.
Solr 6
• Happy Birthday Solr!
• 10 Years at the Apache Software Foundation as of 1/2016
• Release branch as been cut
• ETA before April
• Java 8+ only
4© Cloudera, Inc. All rights reserved.
Streaming Expressions
5© Cloudera, Inc. All rights reserved.
Solr Streaming Expressions
• Generic platform for distributed computation
• The basis for implementing distributed SQL
• Works across entire result sets (or subsets)
• normal search operations are designed for fast top-N operations
• Map-reduce like "shuffle" partitions result sets for greater scalability
• Worker nodes can be allocated from a collection for parallelism
6© Cloudera, Inc. All rights reserved.
Tuple Streams
• A streaming expression compiles/parses to a tuple stream
• direct mapping from a streaming expression function->tuple_stream
• Stream Sources – produce a tuple stream
• Stream Decorators – operate on tuple streams
• Designed to include streams from non-Solr systems
7© Cloudera, Inc. All rights reserved.
search() expression
$ curl http://localhost:8983/solr/techproducts/stream -d
'expr=search(techproducts, q="*:*", fl="id,price,score", sort="id asc")'
{"result-set":{"docs":[
{"score":1.0,"id":"0579B002","price":179.99},
{"score":1.0,"id":"100-435805","price":649.99},
{"score":1.0,"id":"3007WFP","price":2199.0},
{"score":1.0,"id":"VDBDB1A16"},
{"score":1.0,"id":"VS1GB400C3","price":74.99},
{"EOF":true,"RESPONSE_TIME":6}]}}
resulting tuple stream
8© Cloudera, Inc. All rights reserved.
Search Tuple Stream
Shard 1
Replica 2
Shard 1
Replica 1
Shard 1
Replica 2
Shard 2
Replica 1
Shard 1
Replica 2
Shard 3
Replica 1
Worker
Tuple Stream
Tuple Stream
/stream worker
executing the "search"
expression
• search() is a stream source
• SolrCloud aware (CloudSolrStream java class)
• Fully streaming (no big buffers)
• Worker node doesn't need to be a Solr node
9© Cloudera, Inc. All rights reserved.
search expression args
search( // parses to CloudSolrStream java class
techproducts, // name of the collection to search
zkHost="localhost:9983", // (opt) zookeeper address of collection to search
qt="/select", // (opt) the request handler to use (/export is also
available)
rows=1000000, // (opt) number of rows to retrieve
q=*:*, // query to match returned documents
fl="id,price,score", // which fields to return
sort="id asc, price desc", // how to sort the results
aliases="id=myid,price=myprice" // (opt) renames output fields
)
10© Cloudera, Inc. All rights reserved.
reduce() streaming expression
• Groups tuples by common field values
• Emits one group-head per group
• Each group-head contains list of tuples
• "by" parameter must match up with
"sort" parameter
• Any partitioning should be done on
same group field.
reduce(
search(collection1, qt="/export"
q="*:*",
fl="id,manu,price",
sort="manu asc, price desc"),
by="manu"),
group(sort="price desc",n=100)
)
stream operation
11© Cloudera, Inc. All rights reserved.
rollup() expression
• Groups tuples by common field values
• Emits rollup value along with metrics
• Closest equivalent to faceting
rollup(
search(collection1, qt="/export"
q="*:*",
fl="id,manu,price",
sort="manu asc"),
over="manu"),
count(*),
max(price)
)
metrics
{"result-set":{"docs":[
{"manu":"apple","count(*)":1.0},
{"manu":"asus","count(*)":1.0},
{"manu":"ati","count(*)":1.0},
{"manu":"belkin","count(*)":2.0},
{"manu":"canon","count(*)":2.0},
{"manu":"corsair","count(*)":3.0},
[...]
12© Cloudera, Inc. All rights reserved.
facet() expression
• Like search+rollup, but pushes down
computation to JSON Facet API
facet(
techproducts,
q="*:*",
buckets="manu",
bucketSorts="count(*) desc",
bucketSizeLimit=1000,
count(*),
sum(price),
max(popularity)
)
{"result-set":{"docs":[
{"avg(price)":129.99, "max(popularity)":7.0,"manu":"corsair","count(*)":3},
{"avg(price)":15.72,"max(popularity)":1.0,"manu":"belkin","count(*)":2},
{"avg(price)":254.97,"max(popularity)":7.0,"manu":"canon","count(*)":2},
{"avg(price)":399.0,"max(popularity)":10.0,"manu":"apple","count(*)":1},
{"avg(price)":479.95,"max(popularity)":7.0,"manu":"asus","count(*)":1},
{"avg(price)":649.98,"max(popularity)":7.0,"manu":"ati","count(*)":1},
{"avg(price)":0.0,"max(popularity)":"NaN","manu":"boa","count(*)":1},
[...]
13© Cloudera, Inc. All rights reserved.
Parallel Tuple Stream
Shard 1
Replica 2
Shard 1
Replica 1
Shard 1
Replica 2
Shard 2
Replica 1
Shard 1
Replica 2
Shard 3
Replica 1
Worker
Partition 1
Worker
Partition 2
Worker
Tuple Stream
14© Cloudera, Inc. All rights reserved.
Streaming Expressions – parallel
• Wraps a stream and sends to N worker
nodes
• The first parameter is the collection to
use for the intermediate worker nodes
• partitionKeys must be provided to
underlying workers
• usually makes sense to partition by
what you are grouping on
• inner and outer sorts should match
parallel(collection1,
rollup(
search(techproducts,
q="*:*",
fl="id,manu,price",
sort="manu asc",
partitionKeys="manu"),
over="manu asc"),
workers=2,
zkHost="localhost:9983",
sort="manu asc")
15© Cloudera, Inc. All rights reserved.
Joins!
innerJoin(
search(people, q=*:*, fl="personId,name", sort="personId asc"),
search(pets, q=type:cat, fl="personId,petName", sort="personId asc"),
on="personId"
)
leftOuterJoin, hashJoin, outerHashJoin,
16© Cloudera, Inc. All rights reserved.
More decorators
• complement – emits tuples from A which do not exist in B
• intersect – emits tuples from A whish do exist in B
• merge
• top – reorders the stream and returns the top N tuples
• unique – emits only the first tuple for each value
• select – select, rename, or give default values to fields in a tuple
17© Cloudera, Inc. All rights reserved.
Interesting streams
• update stream – indexes input into another SolrCloud collection!
• daemon stream – blocks until more data is available from underlying stream
• topic stream – a publish/subscribe messaging service
• checkpoints are persisted in a Solr collection
• resubmit to get new stuff
• combine with daemon stream to automatically get continuous updates over time
• further combine with update stream to push all matches to another collection
topic(checkpointCollection, dataCollection, id="topicA",
q="solr rocks" checkpointEvery="1000")
18© Cloudera, Inc. All rights reserved.
jdbc() expression stream
join with other data sources!
innerJoin( // example from JDBCStreamTest
select( search(collection1, fl="personId_i,rating_f", q="rating_f:*",
sort="personId_i asc"),
personId_i as personId, rating_f as rating ),
select( jdbc(connection="jdbc:hsqldb:mem:.", sql="select PEOPLE.ID as
PERSONID, PEOPLE.NAME, COUNTRIES.COUNTRY_NAME from PEOPLE inner join
COUNTRIES on PEOPLE.COUNTRY_CODE = COUNTRIES.CODE order by PEOPLE.ID",
sort="ID asc", get_column_name=true),
ID as personId, NAME as personName, COUNTRY_NAME as country ),
on="personId"
)
19© Cloudera, Inc. All rights reserved.
Parallel SQL
20© Cloudera, Inc. All rights reserved.
/sql Handler
• /sql handler is there by default on all solr nodes
• Translates SQL -> parallel streaming expressions
• SQL tables map to SolrCloud collections
• Query planner / optimizer
• Currently uses Presto parser
• May switch to Apache Calcite?
21© Cloudera, Inc. All rights reserved.
22© Cloudera, Inc. All rights reserved.
Simplest SQL Example
$ curl http://localhost:8983/solr/techproducts/sql -d "stmt=select id from techproducts"
{"result-set":{"docs":[
{"id":"EN7800GTX/2DHTV/256M"},
{"id":"100-435805"},
{"id":"UTF8TEST"},
{"id":"SOLR1000"},
{"id":"9885A004"},
[...]
tables map to
collections
23© Cloudera, Inc. All rights reserved.
SQL handler HTTP parameters
curl http://localhost:8983/solr/techproducts/sql -d '
&stmt=<sql_statement>
&numWorkers=4 // currently used by GROUP BY and DISTINCT (via parallel stream)
&workerCollection=collection1 // where to create intermediate workers
&workerZkhost=localhost:9983 // cluster (zookeeper ensemble) address
&aggregationMode=map_reduce | facet
24© Cloudera, Inc. All rights reserved.
The WHERE clause
• WHERE clauses are all pushed down to the search layer
select id
where popularity=10 // simple match on numeric field "popularity"
where popularity='[5 TO 10]' // solr range query (note the quotes)
where name='hard drive' // phrase query on the "name" field
where name='((memory retail) AND popularity:[5 TO 10])' // arbitrary solr query
where name='(memory retail)' AND popularity='[5 TO 10]' // boolean logic
25© Cloudera, Inc. All rights reserved.
Ordering and Limiting
select id,score from techproducts
where text='(memory hard drive)'
ORDER BY popularity desc // default order is score desc for limited queries
LIMIT 100
• Limited queries use /select handler
• Unlimited queries use /export handler
• fields selected need to be docValues
• fields in "order by" need to be docValues
• no "score" field allowed
26© Cloudera, Inc. All rights reserved.
More SQL examples
select distinct fieldA as fa, fieldB as fb from tableA order by fa desc, fb desc
// simple stats
select count(fieldA) as count, sum(fieldB) as sum from tableA where fieldC = 'Hello'
select fieldA, fieldB, count(*), sum(fieldC), avg(fieldY) from tableA
where fieldC = 'term1 term2'
group by fieldA, fieldB
having ((sum(fieldC) > 1000) AND (avg(fieldY) <= 10))
order by sum(fieldC) asc
27© Cloudera, Inc. All rights reserved.
Solr JDBC Driver
28© Cloudera, Inc. All rights reserved.
Solr JDBC driver works with Zeppelin
29© Cloudera, Inc. All rights reserved.
More Solr6 Features
30© Cloudera, Inc. All rights reserved.
Graph Query
• Basic (non-distributed) graph traversal query
• Follows nodes to edges, optionally filtering during traversal
• Currently only a "filter" query (produces a set of documents)
• Parameters: from, to, traversalFilter, returnRoot, returnOnlyLeaf, maxDepth
• This example query matches “Philip J. Fry” and all of his ancestors:
fq={!graph from=parent_id to=id}id:"Philip J. Fry"
31© Cloudera, Inc. All rights reserved.
Scoring changes
• For docCount (i.e. idf) in scoring, use the number of documents with that field
rather than the number of documents in the whole index (maxDoc).
• can add documents of a different type and not disturb/skew scoring
• BM25 scoring by default
• tweakable on a per-fieldType basis ("k1" and "b" factors)
• classic tf-idf still available
32© Cloudera, Inc. All rights reserved.
Cross DC Replication
33© Cloudera, Inc. All rights reserved.
Thank you
yonik@cloudera.com

More Related Content

What's hot

Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UNLucidworks
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrLucidworks (Archived)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptLucidworks
 

What's hot (20)

Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache Solr
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 

Viewers also liked

Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to KazanProvectus
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world usesRogue Wave Software
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StorySourcesense
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Provectus
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeRogue Wave Software
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesPeter
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...Provectus
 
Solr introduction
Solr introductionSolr introduction
Solr introductionLap Tran
 

Viewers also liked (20)

Hackathon
HackathonHackathon
Hackathon
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to Kazan
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Solr 4
Solr 4Solr 4
Solr 4
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world uses
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents Story
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source code
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build Sites
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 

Similar to Solr 6 Feature Preview

Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSpark Summit
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisFelicia Haggarty
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Scaling Web Applications with Cassandra Presentation.ppt
Scaling Web Applications with Cassandra Presentation.pptScaling Web Applications with Cassandra Presentation.ppt
Scaling Web Applications with Cassandra Presentation.pptssuserbad56d
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceChitturi Kiran
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
 
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...NETWAYS
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Cassandra
CassandraCassandra
Cassandraexsuns
 

Similar to Solr 6 Feature Preview (20)

Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Spark etl
Spark etlSpark etl
Spark etl
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Solr As A SparkSQL DataSource
Solr As A SparkSQL DataSourceSolr As A SparkSQL DataSource
Solr As A SparkSQL DataSource
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Scaling Web Applications with Cassandra Presentation.ppt
Scaling Web Applications with Cassandra Presentation.pptScaling Web Applications with Cassandra Presentation.ppt
Scaling Web Applications with Cassandra Presentation.ppt
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Cassandra
CassandraCassandra
Cassandra
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Solr 6 Feature Preview

  • 1. 1© Cloudera, Inc. All rights reserved. Solr 6 Feature Preview Yonik Seeley 3/09/2016
  • 2. 2© Cloudera, Inc. All rights reserved. My Background • Creator of Solr • Cloudera Engineer • LucidWorks Co-Founder • Lucene/Solr committer, PMC member • Apache Software Foundation member • M.S. in Computer Science, Stanford
  • 3. 3© Cloudera, Inc. All rights reserved. Solr 6 • Happy Birthday Solr! • 10 Years at the Apache Software Foundation as of 1/2016 • Release branch as been cut • ETA before April • Java 8+ only
  • 4. 4© Cloudera, Inc. All rights reserved. Streaming Expressions
  • 5. 5© Cloudera, Inc. All rights reserved. Solr Streaming Expressions • Generic platform for distributed computation • The basis for implementing distributed SQL • Works across entire result sets (or subsets) • normal search operations are designed for fast top-N operations • Map-reduce like "shuffle" partitions result sets for greater scalability • Worker nodes can be allocated from a collection for parallelism
  • 6. 6© Cloudera, Inc. All rights reserved. Tuple Streams • A streaming expression compiles/parses to a tuple stream • direct mapping from a streaming expression function->tuple_stream • Stream Sources – produce a tuple stream • Stream Decorators – operate on tuple streams • Designed to include streams from non-Solr systems
  • 7. 7© Cloudera, Inc. All rights reserved. search() expression $ curl http://localhost:8983/solr/techproducts/stream -d 'expr=search(techproducts, q="*:*", fl="id,price,score", sort="id asc")' {"result-set":{"docs":[ {"score":1.0,"id":"0579B002","price":179.99}, {"score":1.0,"id":"100-435805","price":649.99}, {"score":1.0,"id":"3007WFP","price":2199.0}, {"score":1.0,"id":"VDBDB1A16"}, {"score":1.0,"id":"VS1GB400C3","price":74.99}, {"EOF":true,"RESPONSE_TIME":6}]}} resulting tuple stream
  • 8. 8© Cloudera, Inc. All rights reserved. Search Tuple Stream Shard 1 Replica 2 Shard 1 Replica 1 Shard 1 Replica 2 Shard 2 Replica 1 Shard 1 Replica 2 Shard 3 Replica 1 Worker Tuple Stream Tuple Stream /stream worker executing the "search" expression • search() is a stream source • SolrCloud aware (CloudSolrStream java class) • Fully streaming (no big buffers) • Worker node doesn't need to be a Solr node
  • 9. 9© Cloudera, Inc. All rights reserved. search expression args search( // parses to CloudSolrStream java class techproducts, // name of the collection to search zkHost="localhost:9983", // (opt) zookeeper address of collection to search qt="/select", // (opt) the request handler to use (/export is also available) rows=1000000, // (opt) number of rows to retrieve q=*:*, // query to match returned documents fl="id,price,score", // which fields to return sort="id asc, price desc", // how to sort the results aliases="id=myid,price=myprice" // (opt) renames output fields )
  • 10. 10© Cloudera, Inc. All rights reserved. reduce() streaming expression • Groups tuples by common field values • Emits one group-head per group • Each group-head contains list of tuples • "by" parameter must match up with "sort" parameter • Any partitioning should be done on same group field. reduce( search(collection1, qt="/export" q="*:*", fl="id,manu,price", sort="manu asc, price desc"), by="manu"), group(sort="price desc",n=100) ) stream operation
  • 11. 11© Cloudera, Inc. All rights reserved. rollup() expression • Groups tuples by common field values • Emits rollup value along with metrics • Closest equivalent to faceting rollup( search(collection1, qt="/export" q="*:*", fl="id,manu,price", sort="manu asc"), over="manu"), count(*), max(price) ) metrics {"result-set":{"docs":[ {"manu":"apple","count(*)":1.0}, {"manu":"asus","count(*)":1.0}, {"manu":"ati","count(*)":1.0}, {"manu":"belkin","count(*)":2.0}, {"manu":"canon","count(*)":2.0}, {"manu":"corsair","count(*)":3.0}, [...]
  • 12. 12© Cloudera, Inc. All rights reserved. facet() expression • Like search+rollup, but pushes down computation to JSON Facet API facet( techproducts, q="*:*", buckets="manu", bucketSorts="count(*) desc", bucketSizeLimit=1000, count(*), sum(price), max(popularity) ) {"result-set":{"docs":[ {"avg(price)":129.99, "max(popularity)":7.0,"manu":"corsair","count(*)":3}, {"avg(price)":15.72,"max(popularity)":1.0,"manu":"belkin","count(*)":2}, {"avg(price)":254.97,"max(popularity)":7.0,"manu":"canon","count(*)":2}, {"avg(price)":399.0,"max(popularity)":10.0,"manu":"apple","count(*)":1}, {"avg(price)":479.95,"max(popularity)":7.0,"manu":"asus","count(*)":1}, {"avg(price)":649.98,"max(popularity)":7.0,"manu":"ati","count(*)":1}, {"avg(price)":0.0,"max(popularity)":"NaN","manu":"boa","count(*)":1}, [...]
  • 13. 13© Cloudera, Inc. All rights reserved. Parallel Tuple Stream Shard 1 Replica 2 Shard 1 Replica 1 Shard 1 Replica 2 Shard 2 Replica 1 Shard 1 Replica 2 Shard 3 Replica 1 Worker Partition 1 Worker Partition 2 Worker Tuple Stream
  • 14. 14© Cloudera, Inc. All rights reserved. Streaming Expressions – parallel • Wraps a stream and sends to N worker nodes • The first parameter is the collection to use for the intermediate worker nodes • partitionKeys must be provided to underlying workers • usually makes sense to partition by what you are grouping on • inner and outer sorts should match parallel(collection1, rollup( search(techproducts, q="*:*", fl="id,manu,price", sort="manu asc", partitionKeys="manu"), over="manu asc"), workers=2, zkHost="localhost:9983", sort="manu asc")
  • 15. 15© Cloudera, Inc. All rights reserved. Joins! innerJoin( search(people, q=*:*, fl="personId,name", sort="personId asc"), search(pets, q=type:cat, fl="personId,petName", sort="personId asc"), on="personId" ) leftOuterJoin, hashJoin, outerHashJoin,
  • 16. 16© Cloudera, Inc. All rights reserved. More decorators • complement – emits tuples from A which do not exist in B • intersect – emits tuples from A whish do exist in B • merge • top – reorders the stream and returns the top N tuples • unique – emits only the first tuple for each value • select – select, rename, or give default values to fields in a tuple
  • 17. 17© Cloudera, Inc. All rights reserved. Interesting streams • update stream – indexes input into another SolrCloud collection! • daemon stream – blocks until more data is available from underlying stream • topic stream – a publish/subscribe messaging service • checkpoints are persisted in a Solr collection • resubmit to get new stuff • combine with daemon stream to automatically get continuous updates over time • further combine with update stream to push all matches to another collection topic(checkpointCollection, dataCollection, id="topicA", q="solr rocks" checkpointEvery="1000")
  • 18. 18© Cloudera, Inc. All rights reserved. jdbc() expression stream join with other data sources! innerJoin( // example from JDBCStreamTest select( search(collection1, fl="personId_i,rating_f", q="rating_f:*", sort="personId_i asc"), personId_i as personId, rating_f as rating ), select( jdbc(connection="jdbc:hsqldb:mem:.", sql="select PEOPLE.ID as PERSONID, PEOPLE.NAME, COUNTRIES.COUNTRY_NAME from PEOPLE inner join COUNTRIES on PEOPLE.COUNTRY_CODE = COUNTRIES.CODE order by PEOPLE.ID", sort="ID asc", get_column_name=true), ID as personId, NAME as personName, COUNTRY_NAME as country ), on="personId" )
  • 19. 19© Cloudera, Inc. All rights reserved. Parallel SQL
  • 20. 20© Cloudera, Inc. All rights reserved. /sql Handler • /sql handler is there by default on all solr nodes • Translates SQL -> parallel streaming expressions • SQL tables map to SolrCloud collections • Query planner / optimizer • Currently uses Presto parser • May switch to Apache Calcite?
  • 21. 21© Cloudera, Inc. All rights reserved.
  • 22. 22© Cloudera, Inc. All rights reserved. Simplest SQL Example $ curl http://localhost:8983/solr/techproducts/sql -d "stmt=select id from techproducts" {"result-set":{"docs":[ {"id":"EN7800GTX/2DHTV/256M"}, {"id":"100-435805"}, {"id":"UTF8TEST"}, {"id":"SOLR1000"}, {"id":"9885A004"}, [...] tables map to collections
  • 23. 23© Cloudera, Inc. All rights reserved. SQL handler HTTP parameters curl http://localhost:8983/solr/techproducts/sql -d ' &stmt=<sql_statement> &numWorkers=4 // currently used by GROUP BY and DISTINCT (via parallel stream) &workerCollection=collection1 // where to create intermediate workers &workerZkhost=localhost:9983 // cluster (zookeeper ensemble) address &aggregationMode=map_reduce | facet
  • 24. 24© Cloudera, Inc. All rights reserved. The WHERE clause • WHERE clauses are all pushed down to the search layer select id where popularity=10 // simple match on numeric field "popularity" where popularity='[5 TO 10]' // solr range query (note the quotes) where name='hard drive' // phrase query on the "name" field where name='((memory retail) AND popularity:[5 TO 10])' // arbitrary solr query where name='(memory retail)' AND popularity='[5 TO 10]' // boolean logic
  • 25. 25© Cloudera, Inc. All rights reserved. Ordering and Limiting select id,score from techproducts where text='(memory hard drive)' ORDER BY popularity desc // default order is score desc for limited queries LIMIT 100 • Limited queries use /select handler • Unlimited queries use /export handler • fields selected need to be docValues • fields in "order by" need to be docValues • no "score" field allowed
  • 26. 26© Cloudera, Inc. All rights reserved. More SQL examples select distinct fieldA as fa, fieldB as fb from tableA order by fa desc, fb desc // simple stats select count(fieldA) as count, sum(fieldB) as sum from tableA where fieldC = 'Hello' select fieldA, fieldB, count(*), sum(fieldC), avg(fieldY) from tableA where fieldC = 'term1 term2' group by fieldA, fieldB having ((sum(fieldC) > 1000) AND (avg(fieldY) <= 10)) order by sum(fieldC) asc
  • 27. 27© Cloudera, Inc. All rights reserved. Solr JDBC Driver
  • 28. 28© Cloudera, Inc. All rights reserved. Solr JDBC driver works with Zeppelin
  • 29. 29© Cloudera, Inc. All rights reserved. More Solr6 Features
  • 30. 30© Cloudera, Inc. All rights reserved. Graph Query • Basic (non-distributed) graph traversal query • Follows nodes to edges, optionally filtering during traversal • Currently only a "filter" query (produces a set of documents) • Parameters: from, to, traversalFilter, returnRoot, returnOnlyLeaf, maxDepth • This example query matches “Philip J. Fry” and all of his ancestors: fq={!graph from=parent_id to=id}id:"Philip J. Fry"
  • 31. 31© Cloudera, Inc. All rights reserved. Scoring changes • For docCount (i.e. idf) in scoring, use the number of documents with that field rather than the number of documents in the whole index (maxDoc). • can add documents of a different type and not disturb/skew scoring • BM25 scoring by default • tweakable on a per-fieldType basis ("k1" and "b" factors) • classic tf-idf still available
  • 32. 32© Cloudera, Inc. All rights reserved. Cross DC Replication
  • 33. 33© Cloudera, Inc. All rights reserved. Thank you yonik@cloudera.com