Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

My Presentation to ElasticSearch Meetup in Rome on 8th of February about new tricks in Elasticsearch

  • Identifiez-vous pour voir les commentaires

ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup

  1. 1. Roma – 8 Febbraio 2017 presenta Alberto Paro, Seacom ElasticSearch 5.x New Tricks
  2. 2. Alberto Paro  Laureato in Ingegneria Informatica (POLIMI)  Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech review  Lavoro principalmente in Scala e su tecnologie BD (Akka, Spray.io, Playframework, Apache Spark) e NoSQL (Accumulo, Cassandra, ElasticSearch e MongoDB)  Evangelist linguaggio Scala e Scala.JS
  3. 3. Tip 1: Shrink - 1/5 Why?  The wrong number of shards during the initial design sizing. Often sizing the shards without knowing the correct data/text distribution tends to oversize the number of shards  Reducing the number of shards to reduce memory and resource usage  Reducing the number of shards to speed up searching
  4. 4. Tip 1: Shrink - 2/5 - Where is your data? We can retrieve it via the _nodes API: curl -XGET 'http://localhost:9200/_nodes?pretty' In the result there will be a similar section: .... "nodes" : { "5Sei9ip8Qhee3J0o9dTV4g" : { "name" : "Gin Genie", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1", "version" : "5.1.1",.... The name of my node is Gin Genie
  5. 5. Tip 1: Shrink - 3/5 - Relocate your data We can change the index settings, forcing allocation to a single node for our index, and disabling the writing for the index. curl -XPUT 'http://localhost:9200/myindex/_settings' -d ’ { "settings": { "index.routing.allocation.require._name": "Gin Genie", "index.blocks.write": true } }’ We can check for the green status: curl -XGET 'http://localhost:9200/_cluster/health?pretty'
  6. 6. Tip 1: Shrink - 4/5 – Shrink our shards We need to disable the writing for the index via: curl -XPUT 'http://localhost:9200/myindex/_settings?index.blocks.write=true' The shrink call for creating the reduced_index, will be: curl -XPOST 'http://localhost:9200/myindex/_shrink/reduced_index' -d '{ "settings": { "index.number_of_replicas": 1, "index.number_of_shards": 1, "index.codec": "best_compression” }, "aliases": {"my_search_indices": {}} }'
  7. 7. Tip 1: Shrink - 5/5 – Post Shrinking We can also wait for a yellow status if the index it is ready to work: curl -XGET 'http://localhost:9200/_cluster/health? wait_for_status=yellow’ Now we can remove the read-only by changing the index settings: curl -XPUT 'http://localhost:9200/myindex/_settings? index.blocks.write=true'
  8. 8. Tip 2: Reindex - 1/2 Why?  Changing an analyzer for a mapping  Adding a new subfield to a mapping and you need to reprocess all the records to search for the new subfield  Removing an unused mapping  Changing a record structure that requires a new mapping
  9. 9. Tip 2: Reindex - 2/2 curl -XPOST 'http://localhost:9200/_reindex?pretty=true' -d '{ "source": { "index": "myindex” "type": "mytype", "query": "…" }, "dest": { "index": "myindex2", "script": "…" } }'
  10. 10. Tip 3: Update By Query with painless Add a new Field 1. Create your mapping (i.e modified: date) 2. Call an update by query curl -XPOST http://$server/$index/$mapping/_update_by_query -d '{ "script": { "inline": "ctx._source.modified="2015-10-06T00:00:00.000+00:00"", "lang": "painless” }, "query": { "bool": {"must_not":[{"exists":{"field":"modified"} }]} } }'
  11. 11. Tip 4: Use search_after Step 1: curl -XGET 'http://$server/$index/$type/_search' -d ’{ "size": 100, "query": { "match_all" : {} }, "sort": [{"_uid": "desc"} ] }’ Step n, n>1: curl -XGET 'http://$server/$index/$type/_search' -d ’{ "size": 100, "query": { "match_all" : {} }, "search_after": ["$type#100"], "sort": [{"_uid": "desc"} ] }’
  12. 12. Tip 5: Reindex for a remote node – 1/2 Why?  The backup is a safe Lucene index copy, so it depends on the Elasticsearch version used. If you are switching from a version of Elastisearch that is prior to version 5.x, it's not possible to restore old indices.  It's not possible to restore backups of a newer Elasticsearch version in an older version. The restore is only forward- compatible.  It's not possible to restore partial data from a backup.
  13. 13. Tip 5: Reindex for a remote node – 2/2 In config/elasticsearch.yml add: reindex.remote.whitelist: ["192.168.1.227:9200"] Then: curl -XPOST "http://$server/_reindex" -d' { "source": { "remote": { "host": "http://192.168.1.227:9200" }, "index": "test-source” }, "dest": { "index": "test-dest” } }'
  14. 14. Tip 6: Ingest Pipeline – 1/2 Why  Adding/Removing fields without changing your code  Manipulate your records before ingesting  Computed fields  Also supports scripting
  15. 15. Tip 6: Ingest Pipeline – 2/2 curl -XPUT 'http://127.0.0.1:9200/_ingest/pipeline/add-user-john' -d '{ "description" : "Add user john field", "processors" : [ { "set" : { "field": "user", "value": "john"} } ], "version":1 }’ curl -XPUT http://$server/$index/$type/$id?pipeline=add-user-john -d '{}'
  16. 16. Grazie per l’attenzione Alberto Paro
  17. 17. Q&A

×