Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 30 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Elastic Search (20)

Publicité

Plus par NexThoughts Technologies (20)

Publicité

Elastic Search

  1. 1. 1 Karan Arora Dec 20, 2017 Elasticsearch
  2. 2. 2 Contents ● Introduction ● Installation ● Configuration ● Concepts ● Search Types ● Analysis and Analyzers ● Search in depth
  3. 3. 3 Introduction ● Elasticsearch is a highly scalable open-source full-text search and analytics engine. ● It allows you to store, search, and analyze big volumes of data quickly and in near real time. ● It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements.
  4. 4. 4 Installation ● Java ● Elasticsearch – "6.1.0" – Steps - ● Download and unzip Elasticsearch ● Run bin/elasticsearch ● Run curl http://localhost:9200/ https://elastic.co/downloads/elasticsearch
  5. 5. 5 Configuration ● path.data ● path.logs ● cluster.name ● node.name ● network.host (127.0.0.1 by default)
  6. 6. 6 Why Elasticsearch ● Easy to scale ● Everything is one json call away ● Support to advanced search features ● Document oriented ● Schema free ● Conflict Management ● Unleashed power of Lucene under the hood ● Multi-tenancy
  7. 7. 7 Few Popular Techies that use elastisearch
  8. 8. 8 Basic Concepts ● Near Realtime(NRT) ● Cluster ● Node ● Index ● Type ● Document ● Shards & Replicas
  9. 9. 9 Search Types ● Count – GET - http://localhost:9200/_search?search_type=count ● Scan - (The scan search type disables sorting in order to allow very efficient scrolling through large result sets) – GET - /index/_search?search_type=scan&scroll=1m ● Search – GET - http://localhost:9200/_search
  10. 10. 10 Indexing a document ● Using our own ID – Post – http://localhost:9200/index/type/id ● Autogenerating ID – Post – http://localhost:9200/index/type
  11. 11. 11 Retrieving a document Get – http://localhost:9200/index/type/id Retrieving part of a document Get – http://localhost:9200/index/type/id? _source=commaSeparatedFieldNames
  12. 12. 12 Checking whether a document exists ● Head – http://localhost:9200/index/type/id – Elasticsearch will return 200 OK status if the document exists – And a 404 Not Found if it doesn‘t exists Deleting a document ● Delete – http://localhost:9200/index/type/id – If the document isn‘t found, we get a 404 Not Found response code.
  13. 13. 13 Updating a whole document ● Post – http://localhost:9200/index/type/id – Documents in elasticsearch are immutable; we cannot change them. Instead, if we need to update an existing document, we reindex or replace it. – Internally, Elasticsearch has marked the document as deleted and added an entirely new document. Elasticsearch cleand up the deleted documents in the background as you continue to index more data.
  14. 14. 14 Searching ● Multi-Index, Multitype – GET - http://localhost:9200/_search – GET - http://localhost:9200/index/_search – GET - http://localhost:9200/idx1,idx2/_search – GET - http://localhost:9200/s*a*/_search – GET - http://localhost:9200/index/type/_search – GET - http://localhost:9200/idx1,idx2/t1,t2/_search – GET - http://localhost:9200/_all/type1,type2/_search
  15. 15. 15 Pagination ● GET – http://localhost:9200/_search?size=5 ● GET – http://localhost:9200/_search?size=5&from=5 ● GET – http://localhost:9200/_search?size=5&from=10
  16. 16. 16 Inverted Index An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears. Example 1. The quick brown fox jumped over the lazy dog ● 2. Quick brown foxes leap over lazy dogs in summer
  17. 17. 17 Inverted Index Contd...
  18. 18. 18 Analysis ● First Tokenize, then Normalize – Character filters ● html, & – Tokenizer ● whitespace or punctuation – Token filters ● Lowercase, synonyms, remove stopwords
  19. 19. 19 Built-in Analyzers ● Standard analyzer – Lowercase, remove punctuation, split ● Simple analyzer – Lowercase, split on anything that isn’t a letter ● Whitespace analyzer – Split only ● Language analyzers –
  20. 20. 20 Search in depth ● Full text Search ● Multifield Search ● Proximity Matching ● Fuzzy search
  21. 21. 21 Full Text Search { "query": { "match": { "title": { "query": "BROWN DOG!", "operator": "and" } } } } { "query": { "match": { "title": { "query": "BROWN DOG!", "minimum_should_match": "75%" } } } }
  22. 22. 22 Combining Queries { "bool": { "must": { "match": { "title": "quick" }}, "should": [ { "match": { "title": "brown"} }, { "match": { "title": "dog" }} ], "must_not": { "match": { "title": "lazy" } } } }
  23. 23. 23 Controlling Precision { "bool": { "must": { "match": { "title": "quick" }}, "should": [ { "match": { "title": "brown"} }, { "match": { "author": "Leo Tolstoy" }} ], "minimum_should_match" : 2 } }
  24. 24. 24 Multifield Search { "bool": { "must": { "match": { "title": "quick" }}, "should": [ { "match": { "title": "brown"} }, { "match": { "author": "Leo Tolstoy" }} ] } }
  25. 25. 25 Proximity Matching { "query": { "match_phrase": { "title": "quick brown fox" } } } { "query": { "match": { "title": { "query": "quick brown fox", "type": "phrase" } } } }
  26. 26. 26 Partial Matching Prefix Query { "query": { "prefix": { "postcode": "W1" } } } wildcard Queries { "query": { "wildcard": { "postcode": "F?N*BS" } } } Regexp Query { "query": { "regexp": { "postcode": "W[0-9].+" } } }
  27. 27. 27 Typoes and Mispelings ● Fuzziness – Substitution of one character for another: _f_ox → _b_ox – Insertion of a new character: sic → sic_k_ – Deletion of a character:: b_l_ack → back – Transposition of two adjacent characters: _st_ar → _ts_ar ● Damerau Levenshtein observed that 80% of human misspellings have an edit distance of 1. In other words, 80% of misspellings could be corrected with a single edit to the original string. ● AUTO fuzziness parameter – 0 for strings of one or two characters – 1 for strings of three, four, or five characters – 2 for strings of more than five characters
  28. 28. 28 Fuzzy Query { "query": { "fuzzy": { "text": "surprize" } } } 1. Surprise me! (MATCH) 2. That was surprising (NO MATCH) 3. I wasn't surprised (MATCH)
  29. 29. 29 Fuzzy Query contd... { "query": { "match": { "text": { "query": "SURPRIZE ME!", "fuzziness": "AUTO", "operator": "and" } } } } { "query": { "fuzzy": { "text": { "value": "surprize", "fuzziness": 1 } } } }
  30. 30. 30 Questions ? Thanks

×