SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Lessons from Sharding Solr at Etsy
Gregg Donovan
@greggdonovan
Senior Software Engineer, etsy.com
• 5.5 Years Solr & Lucene at Etsy.com
• 3 Years Solr & Lucene at TheLadders.com
• Speaker at LuceneRevolution 2011 & 2013
Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems
Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems
1.5Million Active Shops
32Million Items Listed
21.7Million Active Buyers
Agenda
• Sharding Solr at Etsy V0 — No sharding
• Sharding Solr at Etsy V1 — Local sharding
• Sharding Solr at Etsy V2 (*) — Distributed sharding
• Questions
* —What we’re about to launch.
Sharding V0 — Not Sharding
• Why do we shard?
• Data size grows beyond RAM on a single box
• Lucene can handle this, but there’s a performance cost
• Data size grows beyond local disk
• Latency requirements
• Not sharding allowed us to avoid many problems we’ll discuss later.
Sharding V0 — Not Sharding
• How to keep data size small enough for one host?
• Don’t store anything other than IDs
• fl=pk_id,fk_id,score
• Keep materialized objects in memcached
• Only index fields needed
• Prune index after experiments add fields
• Get more RAM
Sharding V0 — Not Sharding
• How does it fail?
• GC
• Solution
• “Banner” protocol
• Client-side load balancer
• Client connects, waits for 4-bytes — OxCODEA5CF— from the server within 1-10ms before
sending query. Otherwise, try another server.
Sharding V1 — Local Sharding
• Motivations
• Better latency
• Smaller JVMs
• Tough to open a 31gb heap dump on your laptop
• Working set still fit in RAM on one box.
• What’s the simplest system we can built?
Sharding V1 — Local Sharding
• Lucene parallelism
• Shikhar Bhushan at Etsy experimented with segment level parallelism
• See Search-time Parallelism at Lucene Revolution 2014
• Made its way into LUCENE-6294 (Generalize how IndexSearcher parallelizes collection
execution). Committed in Lucene 5.1.
• Ended up with eight Solr shards per host, each in its own small JVM
• Moved query generation and re-ranking to separate process: the “mixer”
Sharding V1 — Local Sharding
• Based on Solr distributed search
• By default, Solr does two-pass distributed search
• First pass gets top IDs
• Second pass fetches stored fields for each top document
• Implemented distrib.singlePass mode (SOLR-5768)
• Does not make sense if individual documents are expensive to fetch
• Basic request tracing via HTTP headers (SOLR-5969)
Sharding V1 — Local Sharding
• Required us to fetch 1000+ results from each shard for reranking layer
• How to efficiently fetch 1000 documents per shard?
• Use Solr’s field syntax to fetch data from FieldCache
• e.g. fl=pk_id:field(pk_id),fk_id:field(fk_id),score
• When all fields are “pseudo” fields, no need to fetch stored fields per document.
Sharding V1 — Local Sharding
• Result
• Very large latency win
• Easy system to manage
• Well understood failure and recovery
• Avoided solving many distributed systems issues
Sharding V2 — Distributed Sharding
• Motivation
• Further latency improvements
• Prepare for data to exceed a single node’s capacity
• Significant latency improvements require finer sharding, more CPUs per request
• Requires a real distributed system and sophisticated RPC
• Before proceeding, stop what you’re doing and read everything by Google’s Jeff Dean and
Twitter’s Marius Eriksen
Sharding V2 — Distributed Sharding
• New problems
• Partial failures
• Lagging shards
• Synchronizing cluster state and configuration
• Network partitions
• Jespen
• Distributed IDF issues exacerbated
Solving Distributed IDF
• Inverse Document Frequency (IDF) now varies across shards, biasing ranking
• Calculate IDF offline in Hadoop
• IDFReplacedSimilarityFactory
• Offline data populates cache of Map<BytesRef,Float> (term —> score)
• Override SimilarityFactory#idfExplain
• Cache misses given rare document constant
• Can be extended to solve i18n IDF issues
Sharding V2 — Distributed Sharding
• ShardHandler
• Solr’s abstraction for fanning out queries to shards
• Ships with default implementation (HttpShardHandler) based on HTTP 1.1
• Does fanout (distrib=true) and processes requests coming from other Solr nodes
(distrib=false).
• Reads shards.rows and shards.start parameters
ShardHandler API
Solr’s SearchHandler calls submit for each shard and then either takeCompletedIncludingErrors
or takeCompletedOrError depending on partial results tolerance.
public abstract class ShardHandler {

public abstract void checkDistributed(ResponseBuilder rb);


public abstract void submit(ShardRequest sreq, String shard, ModifiableSolrParams params);

public abstract ShardResponse takeCompletedIncludingErrors();

public abstract ShardResponse takeCompletedOrError();

public abstract void cancelAll();

public abstract ShardHandlerFactory getShardHandlerFactory();

}
Sharding V2 — Distributed Sharding
Distributed query requirements
• Distributed tracing
• E.g.: Google’s Dapper, Twitter’s Zipkin, Etsy’s CrossStich
• Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
• Handle node failures, slowness
Better Know Your Switches
Have a clear understanding of your networking requirements and whether your hardware meets
them.
• Prefer line-rate switches
• Prefer cut-through to store-and-forward
• No buffering, just read the IP packet header and move packet to the destination
• Track and graph switch statistics in the same dashboard you display your search latency stats
• errors, retransmits, etc.
Sharding V2 — Distributed Sharding
First experiment, Twitter’s Finagle
• Built on Netty
• Mux RPC multiplexing protocol
• SeeYour Server as a Function by Marius Eriksen
• Built-in support for Zipkin distributed tracing
• Served as inspiration for Facebook’s futures-based RPC Wangle
• Implemented a FinagleShardHandler
Sharding V2 — Distributed Sharding
Second experiment, custom Thrift-based protocol
• Blocking I/O easier to integrate with SolrJ API
• Able to integrate our own distributed tracing
• LZ4 compression via a custom Thrift TTransport
Sharding V2 — Distributed Sharding
Future experiment: HTTP/2
• One TCP connection for all requests between two servers
• Libraries
• Square’s OkHttp
• Google’s gRpc
• Jetty client in 9.3+ — appears to be Solr’s choice
Sharding V2 — Distributed Sharding
Implementation note
• Separated fanout from individual request processing
• SolrJ client via an EmbeddedSolrServer containing empty RAM directory.
• Saves a network hop
• Makes shards easier to profile, tune
• Can return result to SolrJ without sending merged results over the network
Sharding V2 — Distributed Sharding
• Good
• Individual shard times demonstrate very low average latency
• Bad
• Overall p95, p99 nowhere near averages
• Why? Lagging shards due to GC, filterCache misses, etc.
• More shards means more chances to hit outliers
Sharding V2 — Distributed Sharding
• Solutions
• See The Tail at Scale by Jeff Dean, CACM 2013.
• Eliminate all sources of inter-host variability
• No filter or other cache misses
• No GC
• Eliminate OS pauses, networking hiccups, deploys, restarts, etc.
• Not realistic
Sharding V2 — Distributed Sharding
• Backup Requests
• Methods
• Brute force — send two copies of every request to different hosts, take the fastest
response
• Less crude — wait X milliseconds for the first server to respond, then send a backup
request.
• Adaptive — choose X based on the first Y% of responses to return.
• Cancellation — Cancel the slow request to save CPU once you’re sure you don’t need it.
Sharding V2 — Distributed Sharding
• “Good enough”
• Return results to user after X% of results return if there are enough results. Don’t issue
backup requests, just cancel laggards.
• Only applicable in certain domains.
• Poses questions:
• Should you cache partial results?
• How is paging effected?
Resilience Testing
Now you own a distributed system. How do you know it works?
• “The Troublemaker”
• Inspired by Netflix’s Chaos Monkey
• Authored by Etsy’s Toria Gibbs
• Make sure humans can operate it
• Failure simulation — don’t wait until 3am
• Gameday exercises and Runbooks
Bonus material!
Better Know Your Kernel
A lesson not about sharding learned while sharding…
• Linux’s futex_wait() was broken in CentOS 6.6
• Backported patches needed from Linux 3.18
• Future direction: make kernel updates independent from distribution updates
• E.g. Plenty of good stuff (e.g. networking improvements, kernel introspection [see
@brendangregg]) between 3.10 and 4.2+, but it won’t come to CentOS for years
• Updating kernel alone easier to roll out
What else are we working on?
• Mesos for cluster orchestration
• GPUs for massive increases in per query computational capacity
Thanks for coming.
gregg@etsy.com
@greggdonovan
Questions?
@greggdonovan
gregg@etsy.com

Contenu connexe

Tendances

Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Lucidworks
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksLucidworks
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Lucidworks
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...Lucidworks
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMLucidworks
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchMark Miller
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolLucidworks
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologyLucidworks
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopAyon Sinha
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Lucidworks
 

Tendances (20)

Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 

Similaire à Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy

Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...Amazon Web Services
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640LLC NewLink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Newlink
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012Eonblast
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLThijs Terlouw
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 

Similaire à Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy (20)

Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640Xen and-the-art-of-rails-deployment2640
Xen and-the-art-of-rails-deployment2640
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 

Plus de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Plus de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Dernier

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Dernier (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Lessons from Sharding Solr at Etsy Gregg Donovan @greggdonovan Senior Software Engineer, etsy.com
  • 3. • 5.5 Years Solr & Lucene at Etsy.com • 3 Years Solr & Lucene at TheLadders.com • Speaker at LuceneRevolution 2011 & 2013
  • 4. Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems
  • 5. Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems
  • 6.
  • 10. Agenda • Sharding Solr at Etsy V0 — No sharding • Sharding Solr at Etsy V1 — Local sharding • Sharding Solr at Etsy V2 (*) — Distributed sharding • Questions * —What we’re about to launch.
  • 11. Sharding V0 — Not Sharding • Why do we shard? • Data size grows beyond RAM on a single box • Lucene can handle this, but there’s a performance cost • Data size grows beyond local disk • Latency requirements • Not sharding allowed us to avoid many problems we’ll discuss later.
  • 12. Sharding V0 — Not Sharding • How to keep data size small enough for one host? • Don’t store anything other than IDs • fl=pk_id,fk_id,score • Keep materialized objects in memcached • Only index fields needed • Prune index after experiments add fields • Get more RAM
  • 13.
  • 14. Sharding V0 — Not Sharding • How does it fail? • GC • Solution • “Banner” protocol • Client-side load balancer • Client connects, waits for 4-bytes — OxCODEA5CF— from the server within 1-10ms before sending query. Otherwise, try another server.
  • 15. Sharding V1 — Local Sharding • Motivations • Better latency • Smaller JVMs • Tough to open a 31gb heap dump on your laptop • Working set still fit in RAM on one box. • What’s the simplest system we can built?
  • 16. Sharding V1 — Local Sharding • Lucene parallelism • Shikhar Bhushan at Etsy experimented with segment level parallelism • See Search-time Parallelism at Lucene Revolution 2014 • Made its way into LUCENE-6294 (Generalize how IndexSearcher parallelizes collection execution). Committed in Lucene 5.1. • Ended up with eight Solr shards per host, each in its own small JVM • Moved query generation and re-ranking to separate process: the “mixer”
  • 17. Sharding V1 — Local Sharding • Based on Solr distributed search • By default, Solr does two-pass distributed search • First pass gets top IDs • Second pass fetches stored fields for each top document • Implemented distrib.singlePass mode (SOLR-5768) • Does not make sense if individual documents are expensive to fetch • Basic request tracing via HTTP headers (SOLR-5969)
  • 18. Sharding V1 — Local Sharding • Required us to fetch 1000+ results from each shard for reranking layer • How to efficiently fetch 1000 documents per shard? • Use Solr’s field syntax to fetch data from FieldCache • e.g. fl=pk_id:field(pk_id),fk_id:field(fk_id),score • When all fields are “pseudo” fields, no need to fetch stored fields per document.
  • 19. Sharding V1 — Local Sharding • Result • Very large latency win • Easy system to manage • Well understood failure and recovery • Avoided solving many distributed systems issues
  • 20. Sharding V2 — Distributed Sharding • Motivation • Further latency improvements • Prepare for data to exceed a single node’s capacity • Significant latency improvements require finer sharding, more CPUs per request • Requires a real distributed system and sophisticated RPC • Before proceeding, stop what you’re doing and read everything by Google’s Jeff Dean and Twitter’s Marius Eriksen
  • 21. Sharding V2 — Distributed Sharding • New problems • Partial failures • Lagging shards • Synchronizing cluster state and configuration • Network partitions • Jespen • Distributed IDF issues exacerbated
  • 22. Solving Distributed IDF • Inverse Document Frequency (IDF) now varies across shards, biasing ranking • Calculate IDF offline in Hadoop • IDFReplacedSimilarityFactory • Offline data populates cache of Map<BytesRef,Float> (term —> score) • Override SimilarityFactory#idfExplain • Cache misses given rare document constant • Can be extended to solve i18n IDF issues
  • 23. Sharding V2 — Distributed Sharding • ShardHandler • Solr’s abstraction for fanning out queries to shards • Ships with default implementation (HttpShardHandler) based on HTTP 1.1 • Does fanout (distrib=true) and processes requests coming from other Solr nodes (distrib=false). • Reads shards.rows and shards.start parameters
  • 24. ShardHandler API Solr’s SearchHandler calls submit for each shard and then either takeCompletedIncludingErrors or takeCompletedOrError depending on partial results tolerance. public abstract class ShardHandler {
 public abstract void checkDistributed(ResponseBuilder rb); 
 public abstract void submit(ShardRequest sreq, String shard, ModifiableSolrParams params);
 public abstract ShardResponse takeCompletedIncludingErrors();
 public abstract ShardResponse takeCompletedOrError();
 public abstract void cancelAll();
 public abstract ShardHandlerFactory getShardHandlerFactory();
 }
  • 25. Sharding V2 — Distributed Sharding Distributed query requirements • Distributed tracing • E.g.: Google’s Dapper, Twitter’s Zipkin, Etsy’s CrossStich • Dapper, a Large-Scale Distributed Systems Tracing Infrastructure • Handle node failures, slowness
  • 26.
  • 27. Better Know Your Switches Have a clear understanding of your networking requirements and whether your hardware meets them. • Prefer line-rate switches • Prefer cut-through to store-and-forward • No buffering, just read the IP packet header and move packet to the destination • Track and graph switch statistics in the same dashboard you display your search latency stats • errors, retransmits, etc.
  • 28. Sharding V2 — Distributed Sharding First experiment, Twitter’s Finagle • Built on Netty • Mux RPC multiplexing protocol • SeeYour Server as a Function by Marius Eriksen • Built-in support for Zipkin distributed tracing • Served as inspiration for Facebook’s futures-based RPC Wangle • Implemented a FinagleShardHandler
  • 29. Sharding V2 — Distributed Sharding Second experiment, custom Thrift-based protocol • Blocking I/O easier to integrate with SolrJ API • Able to integrate our own distributed tracing • LZ4 compression via a custom Thrift TTransport
  • 30. Sharding V2 — Distributed Sharding Future experiment: HTTP/2 • One TCP connection for all requests between two servers • Libraries • Square’s OkHttp • Google’s gRpc • Jetty client in 9.3+ — appears to be Solr’s choice
  • 31. Sharding V2 — Distributed Sharding Implementation note • Separated fanout from individual request processing • SolrJ client via an EmbeddedSolrServer containing empty RAM directory. • Saves a network hop • Makes shards easier to profile, tune • Can return result to SolrJ without sending merged results over the network
  • 32. Sharding V2 — Distributed Sharding • Good • Individual shard times demonstrate very low average latency • Bad • Overall p95, p99 nowhere near averages • Why? Lagging shards due to GC, filterCache misses, etc. • More shards means more chances to hit outliers
  • 33. Sharding V2 — Distributed Sharding • Solutions • See The Tail at Scale by Jeff Dean, CACM 2013. • Eliminate all sources of inter-host variability • No filter or other cache misses • No GC • Eliminate OS pauses, networking hiccups, deploys, restarts, etc. • Not realistic
  • 34. Sharding V2 — Distributed Sharding • Backup Requests • Methods • Brute force — send two copies of every request to different hosts, take the fastest response • Less crude — wait X milliseconds for the first server to respond, then send a backup request. • Adaptive — choose X based on the first Y% of responses to return. • Cancellation — Cancel the slow request to save CPU once you’re sure you don’t need it.
  • 35. Sharding V2 — Distributed Sharding • “Good enough” • Return results to user after X% of results return if there are enough results. Don’t issue backup requests, just cancel laggards. • Only applicable in certain domains. • Poses questions: • Should you cache partial results? • How is paging effected?
  • 36. Resilience Testing Now you own a distributed system. How do you know it works? • “The Troublemaker” • Inspired by Netflix’s Chaos Monkey • Authored by Etsy’s Toria Gibbs • Make sure humans can operate it • Failure simulation — don’t wait until 3am • Gameday exercises and Runbooks
  • 38. Better Know Your Kernel A lesson not about sharding learned while sharding… • Linux’s futex_wait() was broken in CentOS 6.6 • Backported patches needed from Linux 3.18 • Future direction: make kernel updates independent from distribution updates • E.g. Plenty of good stuff (e.g. networking improvements, kernel introspection [see @brendangregg]) between 3.10 and 4.2+, but it won’t come to CentOS for years • Updating kernel alone easier to roll out
  • 39. What else are we working on? • Mesos for cluster orchestration • GPUs for massive increases in per query computational capacity