Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Search | Discover | Analyze 
Confidential and Proprietary © Copyright 2013 
Benchmarking Solr 
Performance 
June 18, 2014 ...
My SolrCloud Experience 
• At LucidWorks, mostly focused on hardening SolrCloud; Lucene/Solr 
committer 
• Operated 36 nod...
Agenda 
• Indexing performance tests 
• Solr Scale Toolkit 
• Next steps 
Confidential and Proprietary © Copyright 2013
Cluster sizing 
How many servers do I need to index X docs? 
... shards ... ? 
... replicas ... ? 
I need N queries per se...
Methodology 
• Transparent repeatable results 
– Ideally hoping for something owned by the community 
• Synthetic docs ~ 1...
Indexing Results 
Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec 
10 10 1 48 1762 73,780 
10 10 2 ...
Direct Updates 
Indexing 
Client 1 
<doc> 
Confidential and Proprietary © Copyright 2013 
CloudSolrServer 
(SolrJ) 
ZooKee...
Replication 
CloudSolrServer 
(SolrJ) 
ZooKeeper 
/clusterstate.json 
Confidential and Proprietary © Copyright 2013 
Shard...
Don’t swamp your servers! 
Confidential and Proprietary © Copyright 2013
Lessons Learned 
• Know what throughput your client side is capable of 
generating 
– If in MapReduce, index from reducers...
Query Performance Tests 
• All nodes in SolrCloud perform indexing and execute queries 
• Using the TermsComponent to buil...
Solr Scale Toolkit 
• Fabric / Python based toolset for deploying and 
managing SolrCloud clusters 
• SolrJ-based client a...
Python-based Tools 
boto – Python API for AWS (EC2, S3, etc) 
Fabric – Python-based tool for automating system admin tasks...
Solr Scale Toolkit: Demo 
• Launch a meta node 
– Log agg / basic monitoring using SiLK 
• Launch ZooKeeper Ensemble 
– 3 ...
Provisioning machines 
fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge 
• Custom built AMI? 
• Block device mappin...
ZooKeeper 
fab new_zk_ensemble:zk1,n=3 
• Two options: 
– provision 1 to N nodes when you launch Solr cluster 
– use exist...
SolrCloud 
fab new_solrcloud:test1,zk=zk1,nodesPerHost=2 
• Upload a BASH script that starts/stops Solr 
• Set system prop...
solr-ctl.sh 
• BASH script that implements: 
– start/stop Solr nodes on each EC2 instance 
– sets JVM memory options, syst...
Miscellaneous Utility Tasks 
• Deploy a configuration directory to ZooKeeper 
• Create a new collection 
• Attach a local ...
Other useful stuff ... 
• fab mine: See clusters I’m running (or for other users too) 
• fab kill_mine: Terminate all inst...
SolrCloud Tools (SolrJ client app) 
./tools.sh –tool healthcheck 
• Java-based command-line application that uses SolrJ’s ...
SiLK Integration 
• SiLK: Solr integrated with Logstash and Kibana 
– Index time-series data, such as log data (collectd, ...
SiLK Integration 
Confidential and Proprietary © Copyright 2013
What’s Next? 
• Migrate to using Apache libcloud instead of using boto 
directly 
• Benchmark mixed work-loads (queries an...
Wrap-up 
• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk 
• LucidWorks: http://www.lucidworks.com 
• SiL...
Prochain SlideShare
Chargement dans…5
×
Prochain SlideShare
Benchmarking Solr Performance at Scale
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

1

Partager

Télécharger pour lire hors ligne

Benchmarking Solr Performance

Télécharger pour lire hors ligne

SFBay Area Solr Meetup - June 18th.

Benchmarking Solr Performance

  1. 1. Search | Discover | Analyze Confidential and Proprietary © Copyright 2013 Benchmarking Solr Performance June 18, 2014 Timothy Potter
  2. 2. My SolrCloud Experience • At LucidWorks, mostly focused on hardening SolrCloud; Lucene/Solr committer • Operated 36 node cluster in AWS for Dachis Group (1.5 years ago, 18 shards ~900M docs) • Built a Fabric/boto framework for deploying and managing a cluster in EC2 • Co-author of Solr In Action Confidential and Proprietary © Copyright 2013
  3. 3. Agenda • Indexing performance tests • Solr Scale Toolkit • Next steps Confidential and Proprietary © Copyright 2013
  4. 4. Cluster sizing How many servers do I need to index X docs? ... shards ... ? ... replicas ... ? I need N queries per second over M docs, how many servers do I need? It depends?!? Confidential and Proprietary © Copyright 2013
  5. 5. Methodology • Transparent repeatable results – Ideally hoping for something owned by the community • Synthetic docs ~ 1K each on disk, mix of field types – Data set created using code borrowed from PigMix – English text fields generated using a Zipfian distribution • Java 1.7u55, Amazon Linux, r3.2xlarge nodes – enhanced networking enabled, placement group, same AZ • Stock Solr (cloud) 4.8.1 – Using Shawn Heisey’s GC tuning parameters • Use Elastic MapReduce to generate load – As many nodes as I need to drive Solr! Confidential and Proprietary © Copyright 2013
  6. 6. Indexing Results Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec 10 10 1 48 1762 73,780 10 10 2 34 3727 34,881 10 20 1 48 1282 101,404 10 20 2 34 3207 40,536 10 30 1 72 1070 121,495 10 30 2 60 3159 41,152 15 15 1 60 1106 117,541 15 15 2 42 2465 52,738 15 30 1 60 827 157,195 15 30 2 42 2129 61,062 Confidential and Proprietary © Copyright 2013
  7. 7. Direct Updates Indexing Client 1 <doc> Confidential and Proprietary © Copyright 2013 CloudSolrServer (SolrJ) ZooKeeper /clusterstate.json Shard 1 (leader) Shard 2 (leader) Shard 3 (leader) <doc> <doc> Watch /clusterstate.json <doc> compute shard assignment on batch client
  8. 8. Replication CloudSolrServer (SolrJ) ZooKeeper /clusterstate.json Confidential and Proprietary © Copyright 2013 Shard 1 (leader) Shard 2 (leader) Shard 3 (leader) <doc> <doc> Watch /clusterstate.json <doc> Shard 1 (replica) Shard 2 (replica) Shard 3 (replica) Blocks for response from replica(s)
  9. 9. Don’t swamp your servers! Confidential and Proprietary © Copyright 2013
  10. 10. Lessons Learned • Know what throughput your client side is capable of generating – If in MapReduce, index from reducers with speculative execution disabled • Don’t change Solr config without good reasons for doing so • Overshard (but not too much) • Near-linear scalability as I added nodes! Confidential and Proprietary © Copyright 2013
  11. 11. Query Performance Tests • All nodes in SolrCloud perform indexing and execute queries • Using the TermsComponent to build queries based on the terms in each field. • Harder to accurately simulate user queries over synthetic data – Need mix of faceting, paging, sorting, grouping, boolean clauses, range queries, boosting, filters (some cached, some not), etc ... • Does the randomness in your test queries model (expected) user behavior? • Start with one server (1 shard) to determine baseline query performance. – Look for inefficiencies in your schema and other config settings Confidential and Proprietary © Copyright 2013
  12. 12. Solr Scale Toolkit • Fabric / Python based toolset for deploying and managing SolrCloud clusters • SolrJ-based client application useful for building tools that need access to cluster state information in ZooKeeper • Code to support benchmarks for Solr Confidential and Proprietary © Copyright 2013
  13. 13. Python-based Tools boto – Python API for AWS (EC2, S3, etc) Fabric – Python-based tool for automating system admin tasks over SSH pysolr – Python library for Solr (sending commits, queries, ...) kazoo – Python client tools for ZooKeeper Supporting Cast: JMeter – run tests, generate reports collectd – system monitoring Logstash4Solr – log aggregation JConsole/VisualVM – monitor JVM during indexing / queries Confidential and Proprietary © Copyright 2013
  14. 14. Solr Scale Toolkit: Demo • Launch a meta node – Log agg / basic monitoring using SiLK • Launch ZooKeeper Ensemble – 3 nodes to establish quorum – Setup cron job to clean-up snapshots • Launch SolrCloud cluster • Create new collection and index some docs – Attach JConsole while indexing • Run a healthcheck on the collection • Checkout Banana Dashboard • Backup / Restore – Requires patch for SOLR-5956 – Use fab patch_jars to update jars and do a rolling restart Confidential and Proprietary © Copyright 2013
  15. 15. Provisioning machines fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge • Custom built AMI? • Block device mapping – dedicated disk per Solr node • Launch and then poll status until they are live – verify SSH connectivity • Tag each instance with a cluster ID and username Confidential and Proprietary © Copyright 2013
  16. 16. ZooKeeper fab new_zk_ensemble:zk1,n=3 • Two options: – provision 1 to N nodes when you launch Solr cluster – use existing named ensemble • Fabric command simply creates the myid files and zoo.cfg file for the ensemble – and some cron scripts for managing snapshots • Basic health checking of ZooKeeper status: – echo srvr | nc localhost 2181 Confidential and Proprietary © Copyright 2013
  17. 17. SolrCloud fab new_solrcloud:test1,zk=zk1,nodesPerHost=2 • Upload a BASH script that starts/stops Solr • Set system props: jetty.port, host, zkHost, JVM opts • One or more Solr nodes per machine • JVM mem opts dependent on instance type and # of Solr nodes per instance • Optionally configure log4j.properties to append messages to Rabbitmq for Logstash4Solr integration Confidential and Proprietary © Copyright 2013
  18. 18. solr-ctl.sh • BASH script that implements: – start/stop Solr nodes on each EC2 instance – sets JVM memory options, system properties (jetty.port), enable remote JMX, etc – backup log files before restarting nodes – ensure JVM is killed correctly before restarting • Environment variables in: solr-ctl-env.sh Confidential and Proprietary © Copyright 2013
  19. 19. Miscellaneous Utility Tasks • Deploy a configuration directory to ZooKeeper • Create a new collection • Attach a local JConsole/VisualVM to a remote JVM • Rolling restart (with Overseer awareness) • Build Solr locally and patch remote – Use a relay server to scp the JARs to Amazon network once and then scp them to other nodes from within the network • Put/get files • Grep over all log files (across the cluster) Confidential and Proprietary © Copyright 2013
  20. 20. Other useful stuff ... • fab mine: See clusters I’m running (or for other users too) • fab kill_mine: Terminate all instances I’m running – Use termination protection in production • fab ssh_to: Quick way to SSH to one of the nodes in a cluster • fab stop/recover/kill: Basic commands for controlling specific Solr nodes in the cluster • fab jmeter: Execute a JMeter test plan against your cluster – Example test plan and Java sampler is included with the source Confidential and Proprietary © Copyright 2013
  21. 21. SolrCloud Tools (SolrJ client app) ./tools.sh –tool healthcheck • Java-based command-line application that uses SolrJ’s CloudSolrServer to perform advanced cluster management operations: – healthcheck: collect metadata and health information from all replicas for a collection from ZooKeeper – backup: create a snapshot of each shard in a collection for backing up to remote storage (S3) • Framework for building complex tools that benefit from having access to cluster state information in ZooKeeper Confidential and Proprietary © Copyright 2013
  22. 22. SiLK Integration • SiLK: Solr integrated with Logstash and Kibana – Index time-series data, such as log data (collectd, Solr logs, ...) – Build cool dashboards with Banana (fork of Kibana) • Easily aggregate all WARN and more severe log messages from all Solr servers into logstash4solr • Send collectd metrics to logstash4solr Confidential and Proprietary © Copyright 2013
  23. 23. SiLK Integration Confidential and Proprietary © Copyright 2013
  24. 24. What’s Next? • Migrate to using Apache libcloud instead of using boto directly • Benchmark mixed work-loads (queries and indexing) • SiLK is improving rapidly! • Chaos monkey tests – integrate jepsen? • Open source so please kick the tires! Confidential and Proprietary © Copyright 2013
  25. 25. Wrap-up • Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk • LucidWorks: http://www.lucidworks.com • SiLK: http://www.lucidworks.com/lucidworks-silk/ • Solr In Action: http://www.manning.com/grainger/ • Connect: @thelabdude / tim.potter@lucidworks.com Questions? Confidential and Proprietary © Copyright 2013
  • dictcp

    Apr. 15, 2015

SFBay Area Solr Meetup - June 18th.

Vues

Nombre de vues

2 844

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

11

Actions

Téléchargements

48

Partages

0

Commentaires

0

Mentions J'aime

1

×