Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Solr Anti - patterns

4 061 vues

Publié le

Solr Anti-patterns talk given during Lucene Revolution 2014 in Washington DC.

Publié dans : Internet
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Solr Anti - patterns

  1. 1. Solr Anti - patterns Rafał Kuć, Sematext Group, Inc. @kucrafal @sematext http://sematext.com
  2. 2. About me Sematext consultant & engineer Solr.pl co-founder Father & husband
  3. 3. The (not so) perfect migration http://en.wikipedia.org/wiki/Bird_migration http://www.likesbooks.com/aarafterhours/?p=750
  4. 4. From 3.1 to 4.10 (and hopefully not back) March 2011 September 2014
  5. 5. The lonely solrconfig.xml <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" /> <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" /> <requestHandler name="/update/csv" class="solr.CSVRequestHandler" /> <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" /> <luceneMatchVersion>LUCENE_31</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
  6. 6. DOC DOC DOC And faulty indexing EXCEPTIONS :)
  7. 7. And faulty indexing <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">0</int> </lst> <lst name="error"> <str name="msg">missing content stream</str> <int name="code">400</int> </lst> </response> 109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source)
  8. 8. Let’s make that right <requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/update/json" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="stream.contentType">application/json</str> </lst> </requestHandler> <luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> <nrtMode>true</nrtMode> <updateLog> <str name="dir"> ${solr.ulog.dir:} </str> </updateLog>
  9. 9. The old schema.xml <fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
  10. 10. <fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/> The old schema.xml
  11. 11. The new schema.xml <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
  12. 12. Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  13. 13. I see deadlocks
  14. 14. Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  15. 15. OK, so now we can actually run queries <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">10000</Set> <Set name="detailedDump">false</Set> </New> </Set>
  16. 16. The ZooKeeper
  17. 17. The ZooKeeper
  18. 18. The ZooKeeper
  19. 19. The ZooKeeper
  20. 20. The ZooKeeper
  21. 21. The ZooKeeper – production
  22. 22. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  23. 23. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  24. 24. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  25. 25. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  26. 26. Let’s cache everything <filterCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/> <queryResultCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/><documentCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="0"/>
  27. 27. And now let’s look at the warmup times
  28. 28. And now let’s look at the warmup times
  29. 29. OK, show us the way „Mr. Consultant” <filterCache class="solr.FastLRUCache" size="1024" initialSize="1024" autowarmCount="512"/> <queryResultCache class="solr.LRUCache" size="16000" initialSize="16000" autowarmCount="8000"/><documentCache class="solr.LRUCache" size="16384" initialSize="16384" autowarmCount="0"/>
  30. 30. Let’s look at the warmup times again
  31. 31. Let’s look at the warmup times again
  32. 32. Bulks are for noobs Application Application Application Doc Doc Doc
  33. 33. Bulks are for noobs Application Application Application Doc Doc Doc
  34. 34. But let’s use bulks, just in case
  35. 35. But let’s use bulks, just in case
  36. 36. We need to refresh and hard commit <autoCommit> <maxTime>1000</maxTime> <openSearcher>true</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  37. 37. Maybe we should only refresh? <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  38. 38. OK, let’s go easy with refreshing <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit>
  39. 39. But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  40. 40. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">100</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  41. 41. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="error"> <str name="msg">java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448) . . . Caused by: java.lang.OutOfMemoryError: Java heap space . . . </str> <int name="code">500</int> </lst> </response> curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  42. 42. But I really need all that data Query
  43. 43. But I really need all that data
  44. 44. But I really need all that data
  45. 45. But I really need all that data Response
  46. 46. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
  47. 47. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">189</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">*</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str> </response>
  48. 48. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='
  49. 49. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA=' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">184</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str> </response>
  50. 50. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0'
  51. 51. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9967</int> <lst name="params"> ... </lst> </lst> <result name="response" numFound="3284000" start="0"> . . . </result> <lst name="facet_counts"> <lst name="facet_fields"> <lst name="tag"> ... </lst> </lst> </lst> </response>
  52. 52. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> . . . <lst name="error"> <str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space . . . Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685) . . . </str> <int name="code">500</int> </lst> </response>
  53. 53. Now let’s look at performance
  54. 54. Now let’s look at performance
  55. 55. Now let’s look at performance
  56. 56. Now let’s look at performance
  57. 57. Now let’s look at performance
  58. 58. Magic happens with small changes curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=100&facet.mincount=1'
  59. 59. Magic happens with small changes
  60. 60. Magic happens with small changes
  61. 61. Magic happens with small changes
  62. 62. Magic happens with small changes
  63. 63. Magic happens with small changes
  64. 64. Magic happens with small changes
  65. 65. Magic happens with small changes
  66. 66. Monitoring in production http://sematext.com/spm/index.html
  67. 67. And remember… <luceneMatchVersion> 3.1 </luceneMatchVersion>
  68. 68. Quick summary http://www.soothetube.com/2013/12/29/thats-all-folks/
  69. 69. We are hiring! Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open – source? We’re hiring world – wide! http://sematext.com/about/jobs.html
  70. 70. Thank you! Rafał Kuć @kucrafal rafal.kuc@sematext.com Sematext @sematext http://sematext.com http://blog.sematext.com

×