Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

03. ElasticSearch : Data In, Data Out

425 vues

Publié le

03. ElasticSearch : Data In, Data Out

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

03. ElasticSearch : Data In, Data Out

  1. 1. ElasticSearch Data In Data Out http://elastic.openthinklabs.com/
  2. 2. What Is a Document? { "name":"John Smith", "age":42, "confirmed":true, "join_date":"2014-06-01", "home":{ "lat":51.5, "lon":0.1 }, "accounts":[ { "type":"facebook", "id":"johnsmith" }, { "type":"twitter", "id":"johnsmith" } ] }
  3. 3. Document Metadata ● _index :: Collection of documents that should be grouped together for a common reason ● _type :: The class of object that the document represents ● _id :: The unique identifier for the document
  4. 4. Indexing a Document Using Our Own ID PUT /website/blog/123 { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "created": true } Index request Elasticsearch responds PUT verb : store this document at this URL
  5. 5. Indexing a Document Autogenerating IDs POST /website/blog/ { "title": "My second blog entry", "text": "Still trying this out...", "date": "2014/01/01" } { "_index": "website", "_type": "blog", "_id": "AVeTjE9FnhloyZ20gpEj", "_version": 1, "created": true } Index request Elasticsearch responds POST verb : store this document under this URL
  6. 6. Retrieving a Document GET /website/blog/123?pretty { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } } curl -i -XGET http://localhost:9200/website/blog/124?pretty HTTP/1.1 404 Not Found Content-Type: application/json; charset=UTF-8 Content-Length: 83 { "_index" : "website", "_type" : "blog", "_id" : "124", "found" : false }
  7. 7. Retrieving Part of a Document GET /website/blog/123?_source=title,text { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "text": "Just trying this out...", "title": "My first blog entry" } } GET /website/blog/123/_source { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }
  8. 8. Checking Whether a Document Exists curl -i -IHEAD http://localhost:9200/website/blog/123 HTTP/1.1 200 OK Content-Type: text/plain; charset=UTF-8 Content-Length: 0 curl -i -IHEAD http://localhost:9200/website/blog/124 HTTP/1.1 404 Not Found Content-Type: text/plain; charset=UTF-8 Content-Length: 0
  9. 9. Updating a Whole Document ● Documents in Elasticsearch are immutable; we cannot change them. Instead, if we need to update an existing document, we reindex or replace it, which we can do using the same index API PUT /website/blog/123 { "title": "My first blog entry", "text": "I am starting to get the hang of this...", "date": "2014/01/02" } { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "created": false }
  10. 10. Creating a New Document POST /website/blog/ { ... } PUT /website/blog/123?op_type=create { ... } PUT /website/blog/123/_create { ... } 1 2 3 PUT /website/blog/123?op_type=create { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" } { "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]", "status": 409 }
  11. 11. Deleting a Document DELETE /website/blog/123 { "found": true, "_index": "website", "_type": "blog", "_id": "123", "_version": 3 } { "found": false, "_index": "website", "_type": "blog", "_id": "123", "_version": 1 } DELETE /website/blog/123
  12. 12. Dealing with Conflicts Consequence of no concurrency control
  13. 13. Optimistic Concurrency Control PUT /website/blog/1/_create { "title": "My first blog entry", "text": "Just trying this out..." } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 1, "found": true, "_source": { "title": "My first blog entry", "text": "Just trying this out..." } } PUT /website/blog/1?version=1 { "title": "My first blog entry", "text": "Starting to get the hang of this..." } { "_index": "website", "_type": "blog", "_id": "1", "_version": 2, "created": false } 1 2 3
  14. 14. Using Versions from an External System PUT /website/blog/2?version=5&version_type=external { "title": "My first external blog entry", "text": "Starting to get the hang of this..." } { "_index": "website", "_type": "blog", "_id": "2", "_version": 5, "created": true } PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." } { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "created": false } PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." } { "error": "VersionConflictEngineException[[website][3] [blog][2]: version conflict, current [10], provided [10]]", "status": 409 } 1 2 3
  15. 15. Partial Updates to Documents POST /website/blog/1/_update { "doc" : { "tags" : [ "testing" ], "views": 0 } } { "_index": "website", "_type": "blog", "_id": "1", "_version": 3 } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 3, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 0, "tags": [ "testing" ] } } 1 2
  16. 16. Using Scripts to Make Partial Updates POST /website/blog/1/_update { "script" : "ctx._source.views+=1" } { "_index": "website", "_type": "blog", "_id": "1", "_version": 4 } POST /website/blog/1/_update { "script" : "ctx._source.tags+=new_tag", "params" : { "new_tag" : "search" } } { "_index": "website", "_type": "blog", "_id": "1", "_version": 5 } GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "_version": 6, "found": true, "_source": { "title": "My first blog entry", "text": "Starting to get the hang of this...", "views": 1, "tags": [ "testing", "search" ] } } 1 2 3
  17. 17. Using Scripts to Make Partial Updates POST /website/blog/1/_update { "script" : "ctx.op = ctx._source.views == count ? 'delete' : 'none'", "params" : { "count": 1 } } Delete a document based on its contents, by setting ctx.op to delete GET /website/blog/1 { "_index": "website", "_type": "blog", "_id": "1", "found": false }
  18. 18. Updating a Document That May Not Yet Exist POST /website/pageviews/1/_update { "script" : "ctx._source.views+=1", "upsert": { "views": 1 } } { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1 } GET /website/pageviews/1 { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 1, "found": true, "_source": { "views": 1 } }
  19. 19. Update and Conflicts POST /website/pageviews/1/_update?retry_on_conflict=5 { "script" : "ctx._source.views+=1", "upsert": { "views": 0 } } { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 2 "found": true, "_source": { "views": 2 } }
  20. 20. Retrieving Multiple Documents GET /_mget { "docs" : [ { "_index" : "website", "_type" : "blog", "_id" : 2 }, { "_index" : "website", "_type" : "pageviews", "_id" : 1, "_source": "views" } ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ] }
  21. 21. Retrieving Multiple Documents GET /website/blog/_mget { "docs" : [ { "_id" : 2 }, { "_type" : "pageviews", "_id" : 1 } ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "pageviews", "_id": "1", "_version": 3, "found": true, "_source": { "views": 3 } } ] }
  22. 22. Retrieving Multiple Documents GET /website/blog/_mget { "ids" : [ "2", "1" ] } { "docs": [ { "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "found": true, "_source": { "title": "My first external blog entry", "text": "This is a piece of cake..." } }, { "_index": "website", "_type": "blog", "_id": "1", "found": false } ] }
  23. 23. Cheaper in Bulk { action: { metadata }}n { request body }n { action: { metadata }}n { request body }n ... The bulk request body has the following format : POST /_bulk { "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "My first blog post" } { "index": { "_index": "website", "_type": "blog" }} { "title": "My second blog post" } { "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} } { "doc" : {"title" : "My updated blog post"} } { "took": 4, "errors": false, "items": [ { "delete": { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "status": 404, "found": false } }, { "create": { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVu4ZmPwPQAxVyMVtH", "_version": 1, "status": 201 } }, { "update": { "_index": "website", "_type": "blog", "_id": "123", "_version": 3, "status": 200 } } ] }
  24. 24. Cheaper in Bulk POST /_bulk { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "Cannot create - it already exists" } { "index": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "But we can update it" } { "took": 2, "errors": true, "items": [ { "create": { "_index": "website", "_type": "blog", "_id": "123", "status": 409, "error": "DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]" } }, { "index": { "_index": "website", "_type": "blog", "_id": "123", "_version": 4, "status": 200 } } ] }
  25. 25. Don’t Repeat Yourself POST /website/_bulk { "index": { "_type": "log" }} { "event": "User logged in" } { "took": 3, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVyqWVPwPQAxVyMV3_", "_version": 1, "status": 201 } } ] }
  26. 26. Don’t Repeat Yourself POST /website/log/_bulk { "index": {}} { "event": "User logged in" } { "index": { "_type": "blog" }} { "title": "Overriding the default type" } { "took": 2, "errors": false, "items": [ { "create": { "_index": "website", "_type": "log", "_id": "AVeVzBQjPwPQAxVyMV4_", "_version": 1, "status": 201 } }, { "create": { "_index": "website", "_type": "blog", "_id": "AVeVzBQjPwPQAxVyMV5A", "_version": 1, "status": 201 } } ] }
  27. 27. How Big Is Too Big ?
  28. 28. Referensi ● ElasticSearch, The Definitive Guide, A Distrib uted Real-Time Search and Analytics Engine, Cl inton Gormely & Zachary Tong, O’Reilly

×