Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

ElasticSearch in action

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 86 Publicité

ElasticSearch in action

Télécharger pour lire hors ligne

"ElasticSearch in action" by Thijs Feryn.
ElasticSearch is a really powerful search engine, NoSQL database & analytics engine. It is fast, it scales and it's a child of the Cloud/BigData generation. This talk will show you how to get things done using ElasticSearch. The focus is on doing actual work, creating actual queries and achieving actual results. Topics that will be covered: - Filters and queries - Cluster, shard and index management - Data mapping - Analyzers and tokenizers - Aggregations - ElasticSearch as part of the ELK stack - Integration in your code.

"ElasticSearch in action" by Thijs Feryn.
ElasticSearch is a really powerful search engine, NoSQL database & analytics engine. It is fast, it scales and it's a child of the Cloud/BigData generation. This talk will show you how to get things done using ElasticSearch. The focus is on doing actual work, creating actual queries and achieving actual results. Topics that will be covered: - Filters and queries - Cluster, shard and index management - Data mapping - Analyzers and tokenizers - Aggregations - ElasticSearch as part of the ELK stack - Integration in your code.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (12)

Publicité

Similaire à ElasticSearch in action (20)

Plus par Codemotion (20)

Publicité

Plus récents (20)

ElasticSearch in action

  1. 1. Elasticsearch in action By Thijs Feryn
  2. 2. Explain in 1 slide
  3. 3. •Full-text search engine •NoSQL database •Analytics engine •Written in Java •Lucene based ( ~Solr) •Inverted indices •Easy to scale (~Elastic) •RESTFul interface (HTTP/JSON) •Schemaless •Real-time •ELK stack
  4. 4. Still with me?
  5. 5. Hi, I’m Thijs
  6. 6. I’m @ThijsFeryn on Twitter
  7. 7. I’m an Evangelist At
  8. 8. I’m a at board member
  9. 9. https://www.elastic.co/ downloads/elasticsearch
  10. 10. { "name" : "node-1", "cluster_name" : "elasticsearch", "version" : { "number" : "2.2.0", "build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe", "build_timestamp" : "2016-01-27T13:32:39Z", "build_snapshot" : false, "lucene_version" : "5.4.1" }, "tagline" : "You Know, for Search" } http://localhost: 9200
  11. 11. RDBMS Elasticsearch Database Table Row Index Type Document
  12. 12. POST /blog {"acknowledged":true} Confirmation
  13. 13. POST/blog/post/6160 { "language": "en-US", "title": "WordPress 4.4 is available! And these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" }
  14. 14. { "_index": "blog", "_type": "post", "_id": "6160", "_version": 1, "created": true } Confirmation
  15. 15. GET /blog/post/6160 { "_index": "blog", "_type": "post", "_id": "6160", "_version": 1, "found": true, "_source": { "language": "en-US", "title": "WordPress 4.4 is available! And these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" } } Retrieve document by id Document & meta data
  16. 16. GET /blog/_mapping { "blog": { "mappings": { "post": { "properties": { "author": { "type": "string" }, "category": { "type": "string" }, "date": { "type": "string" }, "guid": { "type": "string" }, "language": { "type": "string" }, "title": { "type": "string" } } } } } } Schemaless? Not really … “Guesses” mapping on insert
  17. 17. Explicit mapping
  18. 18. POST /blog { "mappings" : { "post" : { "properties": { "title" : { "type" : "string" }, "date" : { "type" : "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "author": { "type": "string" }, "category": { "type": "string" }, "guid": { "type": "integer" } } } } } Explicit mapping at index creation time
  19. 19. POST /blog { "mappings": { "post": { "properties": { "author": { "type": "string", "index": "not_analyzed" }, "category": { "type": "string", "index": "not_analyzed" }, "date": { "type": "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "guid": { "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } Alternative mapping
  20. 20. "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } What’s with the analyzers?
  21. 21. Analyzed vs non-analyzed
  22. 22. Full-text vs exact value
  23. 23. By default strings are analyzed … unless you mention it in the mapping
  24. 24. Analyzer •Character filters •Tokenizers •Token filters Replaces characters for analyzed text Break text down into terms Add/modify/ delete tokens
  25. 25. Built-in analyzers •Standard •Simple •Whitespace •Stop •Keyword •Pattern •Language •Snowball •Custom Standard tokenizer Lowercase token filter English stop word token filter
  26. 26. Hey man, how are you doing? hey man how are you doing Standard Hey man, how are you doing? Whitespace hei man how you do English
  27. 27. POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title": "working" } } }
  28. 28. "total": 1, "max_score": 1.7562683, "hits": [ { "_index": "blog", "_type": "post", "_id": "2742", "_score": 1.7562683, "fields": { "title": [ "Hosted SharePoint 2010: working efficiently as a team" ] } } ] } }
  29. 29. POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title.en": "working" } } }
  30. 30. "failed": 0 }, "hits": { "total": 6, "max_score": 2.4509864, "hits": [ { "_index": "blog", "_type": "post", "_id": "828", "_score": 2.4509864, "fields": { "title": [ "Still a lot of work in store" ] } }, { "_index": "blog", "_type": "post", "_id": "3873", "_score": 2.144613, "fields": { "title": [ "SSL: what is it and how does it work?" ] } }, { "_index": "blog",
  31. 31. Search
  32. 32. GET /blog/post/_search?pretty { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 963, "max_score": 1, "hits": [ { "_index": "blog", "_type": "post", "_id": "6067", "_score": 1, "_source": { "language": "en-US", "title": "My Combell Power Tips: Registrant Templates and new domain name overview", "date": "Tue, 24 Nov 2015 15:58:48 +0000", "author": "Romy", "category": [ "Combell news", "Domain names", "News", "Tools", "control panel", "domain name", "my combell", "register", "templates" ], "guid": "6067"
  33. 33. GET /blog/post/_search?pretty POST /blog/post/_search?pretty { "query": { "match_all": {} } } Search “lite” vs full query DSL
  34. 34. GET /blog/post/_search?pretty&q=title:Thijs POST /products/product/_search?pretty { "query": { "match": { "title": "Thijs" } } } Search “lite” vs full query DSL
  35. 35. POST /blog/post/_count { "query": { "match": { "title": "PROXY protocol support in Varnish" } } } 162 posts 1 postPOST /blog/post/_count { "query": { "filtered": { "filter": { "term": { "title.raw": "PROXY protocol support in Varnish" } } } } }
  36. 36. Filter vs Query
  37. 37. Filter •Does it match? Yes or no •When relevance doesn’t matter •Faster & cacheable •For non-analyzed data Query •How well does it match? •For full-text search •On analyzed/tokenized data
  38. 38. Match Query Multi Match Query Bool Query Boosting Query Common Terms Query Constant Score Query Dis Max Query Filtered Query Fuzzy Like This Query Fuzzy Like This Field Query Function Score Query Fuzzy Query GeoShape Query Has Child Query Has Parent Query Ids Query Indices Query Match All Query More Like This Query Nested Query Prefix Query Query String Query Simple Query String Query Range Query Regexp Query Span First Query Span Multi Term Query Span Near Query Span Not Query Span Or Query Span Term Query Term Query Terms Query Top Children Query Wildcard Query Minimum Should Match Multi Term Query Rewrite Template Query
  39. 39. And Filter Bool Filter Exists Filter Geo Bounding Box Filter Geo Distance Filter Geo Distance Range Filter Geo Polygon Filter GeoShape Filter Geohash Cell Filter Has Child Filter Has Parent Filter Ids Filter Indices Filter Limit Filter Match All Filter Missing Filter Nested Filter Not Filter Or Filter Prefix Filter Query Filter Range Filter Regexp Filter Script Filter Term Filter Terms Filter Type Filter
  40. 40. Filter examples
  41. 41. POST /blog/post/_search?pretty { "query": { "filtered": { "filter": { "ids": { "values": [231,234,258] } } } } }
  42. 42. POST /blog/_search { "query": { "filtered": { "filter": { "bool": { "must" : [ { "term" : { "language" : "en-US" } }, { "range" : { "date" : { "gte" : "2016-01-01", "format" : "yyyy-MM-dd" } } } ], "must_not" : [ { "term" : { "category" : "joomla" } } ], "should" : [ { "term" : { "category" : "Hosting" } }, { "term" : { "category" : "evangelist" } } ] } } } } }
  43. 43. POST /blog/_search?pretty { "query": { "filtered": { "filter": { "prefix": { "title.raw": "Combell" } } } } }
  44. 44. POST /cities/city/_search { "size": 200, "sort": [ { "city": { "order": "asc" } } ], "query": { "filtered": { "filter": { "geo_distance_range": { "lt": "5km", "location": { "lat": 51.033333, "lon": 2.866667 } } } } } } Requires “geo point” typed field
  45. 45. POST /cities/city/_search { "size": 200, "query": { "filtered": { "query": { "match_all": {} }, "filter": { "geo_bounding_box": { "location": { "bottom_left": { "lat": 51.1, "lon": 2.6 }, "top_right": { "lat": 51.2, "lon": 2.7 } } } } } } } Requires “geo point” typed field Draw a “box”
  46. 46. Relevance
  47. 47. POST /blog/_search { "fields": ["title"], "query": { "bool": { "must": [ { "match": { "title": "varnish thijs" } }, { "filtered": { "filter": { "term": { "language": "en-US" } } } } ] } } }
  48. 48. { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 8, "max_score": 1.984594, "hits": [ { "_index": "blog", "_type": "post", "_id": "4275", "_score": 1.984594, "fields": { "title": [ "Thijs Feryn gave a demo of Varnish Cache on WordPress during a Future Insights webinar" ] } }, { "_index": "blog", "_type": "post", "_id": "6238", "_score": 0.8335616, "fields": { "title": [ "PROXY protocol support in Varnish" ] } }, { "_index": "blog", Hits both terms. More relevant
  49. 49. POST /blog/_search?_source=false { "query": { "filtered": { "filter": { "term": { "category": "PHPBenelux" } } } } } Using a filter instead of a query We don’t care about the source
  50. 50. { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "blog", "_type": "post", "_id": "6254", "_score": 1 }, { "_index": "blog", "_type": "post", "_id": "11749", "_score": 1 } ] } } No relevance on filters Score is always 1
  51. 51. POST /blog/_search { "fields": ["title", "category"], "query": { "bool": { "must": [ { "match": { "title": "thijs feryn" } } ], "should": [ { "match": { "category": "Varnish" } } ] } } } Only search for “thijs feryn” Increase relevance if category contains “Varnish”
  52. 52. POST /blog/_search { "fields": ["title", "category"], "query": { "bool": { "must_not": [ { "filtered": { "filter": { "term": { "author": "Romy" } } } } ], "should": [ { "match": { "category": "Magento" } } ] } } } Increase relevance Combining filters & queries
  53. 53. POST /blog/_search { "query": { "bool": { "should": [ { "match": { "title": { "query": "Magento", "boost" : 3 } } }, { "match": { "title": { "query": "Wordpress", "boost" : 2 } } } ] } } } Increase relevance Query- time boosting
  54. 54. Multi index multi type
  55. 55. /_search /products/_search /products/product/_search /products,clients/_search /pro*/_search /pro*,cli*/_search /products/product,invoice/_search /products/pro*/_search /_all/product/_search /_all/product,invoice/_search /_all/pro*/_search
  56. 56. Multi “all the things”
  57. 57. Aggregations
  58. 58. Group by on steroids
  59. 59. SELECT author, COUNT(guid) FROM blog.post GROUP BY author Aggregations in SQL Metric Bucket
  60. 60. SELECT author, COUNT(guid) FROM blog.post GROUP BY author POST /blog/post/_search? pretty&search_type=count { "aggs": { "popular_bloggers": { "terms": { "field": "author" } } } } Only aggs, no docs
  61. 61. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Romy", "doc_count": 415 }, { "key": "Combell", "doc_count": 184 }, { "key": "Tom", "doc_count": 184 }, { "key": "Jimmy Cappaert", "doc_count": 157 }, { "key": "Christophe", "doc_count": 23 } ] } } Aggregation output
  62. 62. POST /blog/_search { "query": { "match": { "title": "varnish" } }, "aggs": { "popular_bloggers": { "terms": { "field": "author", "size": 10 }, "aggs": { "used_languages": { "terms": { "field": "language", "size": 10 } } } } } } Nested multi-group by alongside query
  63. 63. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Romy", "doc_count": 4, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "en-US", "doc_count": 3 }, { "key": "nl-NL", "doc_count": 1 } ] } }, { "key": "Combell", "doc_count": 3, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "nl-NL", "doc_count": 3 } ] } }, Aggregation output
  64. 64. Min Aggregation Max Aggregation Sum Aggregation Avg Aggregation Stats Aggregation Extended Stats Aggregation Value Count Aggregation Percentiles Aggregation Percentile Ranks Aggregation Cardinality Aggregation Geo Bounds Aggregation Top hits Aggregation Scripted Metric Aggregation Global Aggregation Filter Aggregation Filters Aggregation Missing Aggregation Nested Aggregation Reverse nested Aggregation Children Aggregation Terms Aggregation Significant Terms Aggregation Range Aggregation Date Range Aggregation IPv4 Range Aggregation Histogram Aggregation Date Histogram Aggregation Geo Distance Aggregation GeoHash grid Aggregation
  65. 65. Managing Elasticsearch
  66. 66. Plenty of ways … for which we don’t have enough time
  67. 67. Clustering
  68. 68. Single node 2 node cluster 3 node cluster
  69. 69. Example config settings node.rack: my-location node.master: true node.data: true http.enabled: true cluster.name: my-cluster node.name: my-node index.number_of_shards: 5 index.number_of_replicas: 1 discovery.zen.minimum_master_nodes: 2
  70. 70. GET /_cat
  71. 71. GET /_cat =^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} Non-JSON output
  72. 72. GET /_cat/shards?v index shard prirep state docs store ip node my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3 my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2 my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3 my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2 my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1 my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3 my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1 my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2 my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1 my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3 5 shards & a single replica by default
  73. 73. GET /_cat/health? v&h=cluster,status,node.total,shards,pri,unassign,init cluster status node.total shards pri unassign init mycluster green 3 12 6 0 0 Cluster health
  74. 74. The ELK stack
  75. 75. Logs Parse & ship Store Visualize
  76. 76. Beats •File beat •Top beat •Packet beat •Winlog beat
  77. 77. Logs Parse Store Visualize Ship
  78. 78. Integrating Elasticsearch
  79. 79. It’s REST, deal with it!
  80. 80. Or just use an API PHP Java Perl PythonRuby.NET
  81. 81. Try it yourself! http://github.com/ thijsferyn/ elasticsearch_tutorial
  82. 82. https://blog.feryn.eu https://talks.feryn.eu https://youtube.com/thijsferyn https://soundcloud.com/thijsferyn https://twitter.com/thijsferyn http://itunes.feryn.eu

×