Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Best practices for highly available and large scale SolrCloud

3 341 vues

Publié le

From my talk at eBay HQ, this talk covers best practices for running SolrCloud in production at a large scale with high availability

Publié dans : Logiciels
  • Soyez le premier à commenter

Best practices for highly available and large scale SolrCloud

  1. 1. Best practices for highly available SolrCloud Anshum Gupta Apache Lucene/Solr committer, PMC member Search Guy @ IBM Watson
  2. 2. • Anshum Gupta, Apache Lucene/Solr committer and PMC member, IBM Watson Search team. • Interested in search and related stuff. • Apache Lucene since 2006 and Solr since 2010. • Organizations I am or have been a part of: About me
  3. 3. Apache Solr is the most widely-used search solution on the planet. Solr has tens of thousands of applications in production. You use everyday. 8,000,000+ Total downloads Solr is both established and growing. 250,000+ Monthly downloads 2,500+ Open Solr jobs and the largest community of developers.
  4. 4. 01 SolrCloud Logical Architecture Shard 1 (leader) Followers Shard 2 (leader) Followers ZooKeeper ZooKeeper instance Solr Instance
  5. 5. 01 SolrCloud - Physical Architecture ZooKeeper Node 1 Node 2 Load Balancer Client Client Client Client Client Client Client Client Client Lots Of Interaction Coins by Creative Stall from the Noun Project
  6. 6. • Not just config repo but a lot more! • No Zk = Stale clusterstate, and other things + No writes • Watches & GC! Solr <> ZK interaction
  7. 7. • NEVER use embedded zk in production • ZK ensemble - (2n + 1) nodes • ZK chroot, especially if sharing • Use an OOM hook - shipped with Solr ZooKeeper best practices
  8. 8. • Be frugal with watches - For every watch on the ZK server, there’s a 300 bytes memory footprint • ZK - not built for 1000’s of watchers on a single node. Break it down! e.g. Clusterstate Also remember - for custom code
  9. 9. Data and Indexing
  10. 10. • Shard your data - It generally helps • Sharding is almost = Splitting into different collections • Use different nodes for replicas - Replica placement strategy • Use a composite key or a custom router • Distributed IDF - Sharding > Different collections Sharding and Routing
  11. 11. • Batching • Reuse the http and solr client • CloudSolrClient • Atomic updates - It’s wrapped and expensive • Omit norms, term freq, and positions if you don’t need them Indexing best practices
  12. 12. • Replication Bandwidth limiting • Think about what you want indexed vs stored Other things to look at
  13. 13. • Soft commits = visibility • delay as much as you can • Hard commits = durability • Durability • autoCommit • openSearcher • initiate background merges if needed • Only in times of desperation : updateLog config - syncLevel=fsync Commits and transaction log
  14. 14. Indexing recommendations Bulk Indexing Heavy Indexing Heavy querying Crazy! soft commit Long! Best: -1 As long as possible -1 As long as possible hard commit 15 sec 15 sec 10 min 15 sec openSearch er FALSE FALSE TRUE TRUE/FALSE
  15. 15. Querying
  16. 16. • DocValues - Don’t forget there are 3 of those: • default • memory • direct • Large heaps - Bad idea generally, unless you know what you're doing • OS Cache - It’s important Memory usage
  17. 17. • Only retrieve what you want! • Fields (fl=*) • Rows (rows=0, when all you want is hit count) • timeAllowed • Partial results • ReRankQueryParser - Only recent releases Tuning Queries
  18. 18. • Warm up caches • UI ! UI ! UI ! - It’s got almost everything you need! • Efficiently use caches - Hit/eviction stats • Non-cached - specify cost • Postfilters can be your friend Caches
  19. 19. • Don’t run a regular query if all you need is to export the data! • Cursormark • /export handler - not distributed, sans ranking Deep paging
  20. 20. • Have more than 1 replicas • HDFS - High availability, but at a cost! • Great work • Way more redundancy, on its way to being fixed • Use sharding • Hostname - More reliable than IP addresses at times. • Jepsen tests came back fine! More things to note…
  21. 21. • Overestimating heap size? ~ index-size + delta for new generation • Watch out for increasing major GCs - Red flag! • Turn off swapping • Consider explicit GC if it comes to that • The OS needs memory, as much as the JVM… JVM tuning
  22. 22. • Rolling restarts to upgrade • Watch out back-compat issues • Don’t kill the leader unless need be. Ditto with the Overseer • Outsource it all to solr-scale-toolkit Upgrading and restarts
  23. 23. • Protect your cluster • Kerberos, BasicAuth • Role based • Protect your ZooKeeper Security
  24. 24. Connect @ http://www.twitter.com/anshumgupta http://www.linkedin.com/in/anshumgupta/ anshum@apache.org

×