Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Intro to elasticsearch

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 40 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Intro to elasticsearch (20)

Publicité

Plus récents (20)

Intro to elasticsearch

  1. 1. Your Data, Your Search ! 问志光 2016-06-27 1
  2. 2. Outline  Information retrieval  Indexing & Searching  Elasticsearch 2
  3. 3. Information retrieval  Information Retrieval(IR) is finding material(usually documents) of an unstructured nature(usually text) that statisfies an information need from within large collections(usually stored on computers).  Search Engine is a software system that is designed to search for information. It’s a kind of implementation of IR. 3
  4. 4. What is search engine?  A search engine is  An index engine for documents  A search engine on indexes  A search engine is more powerful to do searches: It’s designed for it ! 4
  5. 5. Search Engine Architecture 5
  6. 6. 6
  7. 7. 7
  8. 8. 8
  9. 9. Problems ??  How to store the data ?  How to index the data ?  How to search the data ? 9
  10. 10. How to store the data ? INVERTED LIST 10
  11. 11. How to the data ? INDEX 11
  12. 12. the follow two files  File1: Students should be allowed to go out with their friends, but not allowed to drink beer.  File2: My friend Jerry went to school to see his students but found them drunk which is not allowed. 12
  13. 13. Step 1: Tokenzier  Split doc into words  Remove the punctuation  Remove stop word (the, a, this, that etc.) “Students”,“allowed”,“go”,“their”, “friends”,“allowed”,“drink”,“beer”,“My”, “friend”,“Jerry”,“went”,“school”,“see”, “his”,“students”,“found”,“them”,“drunk”, “allowed” 13
  14. 14. Step2: Linguistic Processor  Lowercase  Stemming, cars -> car, etc.  Lemmatizatio, drove -> drive, etc. “student”,“allow”,“go”,“their”,“friend” ,“allow”,“drink”,“beer”,“my”,“friend” ,“jerry”,“go”,“school”,“see”,“his”, “student”,“find”,“them”,“drink”,“allow” Term 14
  15. 15. Step3: Index Term Document ID student 1 allow 1 go 1 their 1 friend 1 allow 1 … …  Dict  Sort  Posting list 15
  16. 16. 16
  17. 17. How to the data ? SEARCH 17
  18. 18. Step1: User search query • Suppose you have the follow query: lucene AND learned NOT hadoop 18
  19. 19. Step2: Lexical & Syntax Analysis  Identify words and keywords  Words: lucene, learned, hadoop  Keywords: AND, NOT  Building a syntax tree lucene learned hadoopAND Not 19
  20. 20. Step3: Search  Search in the Inverted List  Sort, Conjunction, Disconjunction  Scorer 20
  21. 21. full text search RESTful API real time, Search and analytics engine open source high availability schema free JSON over HTTP Lucene based distributed RESTful API ElasticSearch 21
  22. 22. Elastic Search  Distributed and Highly Available Search Engine.  Each index is fully sharded with a configurable number of shards.  Each shard can have one or more replicas.  Read / Search operations performed on either one of the replica shard.  Multi Tenant with Multi Types.  Support for more than one index.  Support for more than one type per index.  Index level configuration (number of shards, index storage, ...).  Document oriented  No need for upfront schema definition.  Schema can be defined per type for customization of the indexing process.  Various set of APIs  HTTP RESTful API  Native Java API.  All APIs perform automatic node operation rerouting.  (Near) Real Time Search.  Reliable, Asynchronous Write Behind for long term persistency.  Built on top of Lucene  Each shard is a fully functional Lucene index  All the power of Lucene easily exposed through simple configuration / plugins.  Per operation consistency  Single document level operations are atomic, consistent, isolated and durable.  Open Source under the Apache License, version 2 ("ALv2") 22
  23. 23. Terminologies of Elastic Search  Cluster  Node  Index  Shard 23
  24. 24. Cluster ● A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes ● A cluster is identified by a unique name which by default is "elasticsearch" Terminologies of Elastic Search 24
  25. 25. Node ● It is an elasticsearch instance (a java process) ● A node is created when a elasticsearch instance is started ● A random Marvel Charater name is allocated by default Terminologies of Elastic Search 25
  26. 26. Index ● An index is a collection of documents that have somewhat similar characteristics. eg:customer data, product catalog ● Very crucial while performing indexing, search, update, and delete operations against the documents in it ● One can define as many indexes in one single cluster Terminologies of Elastic Search 26
  27. 27. Document ● It is the most basic unit of information which can be indexed ● It is expressed in json (key:value) pair. ‘{“user”:”nullcon”}’ ● Every Document gets associated with a type and a unique id. Terminologies of Elastic Search 27
  28. 28. Shard ● Every index can be split into multiple shards to be able to distribute data. ● The shard is the atomic part of an index, which can be distributed over the cluster if you add more nodes. Terminologies of Elastic Search 28
  29. 29. 29
  30. 30. 30
  31. 31. A terminology comparison Relational database Elasticsearch Database Index Table Type Row Document Column Field Schema Mapping Index Everything is indexed SQL Query DSL SELECT * FROm tb … GET http:// UPDATE tb SET … PUT http:// 31
  32. 32. Playing with Elasticsearch REST API: http://host:port/[index]/[type]/[_action/ id] HTTP Methods: GET, POST,PUT,DELETE 32
  33. 33. Playing with Elasticsearch • Search – curl –XGET http://localhost:9200/my_index/test/_search – curl –XGET http://localhost:9200/my_index/_search – curl –XPUT http://localhost:9200/_search • Meta Data – curl –XPUT http://localhost:9200/my_index/_status • Documents: – curl –XPUT http://localhost:9200/my_index/test/1 – curl –XGET http://localhost:9200/my_index/test/1 – curl –XDELETE http://localhost:9200/my_index/test/1 33
  34. 34. Example: Index Curl –XPUT http://localhost:9200/my_index/test/1 -d ‘{ "name": "joeywen", "value": 100 }’ 34
  35. 35. Example: Search Curl –XGET http://localhost:9200/my_index/_search –d ‘{ “query”: { “match_all”: {} } }’ Total number of docs Relevance Search time Max score 35
  36. 36. Creating, indexing, or deleting a single document 36
  37. 37. Plugins-Kopf 37
  38. 38. Plugins-head 38
  39. 39. Web 39
  40. 40. 40

×