Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Search and analyze your data with elasticsearch

The importance of search for modern applications is evident and nowadays it is higher than ever. A lot of projects use search forms as a primary interface for communication with a user. Though implementation of an intelligent search functionality is still a challenge and we need a good set of tools.
In this presentation, I will talk through the high-level architecture and benefits of Elasticsearch with some examples. Aside from that, we will also take a look at its existing competitors, their similarities, and differences.

  • Identifiez-vous pour voir les commentaires

Search and analyze your data with elasticsearch

  1. 1. SEARCH AND ANALYZE YOUR DATA WITH ELASTICSEARCH Anton Udovychenko JEEConf May 20, 2016
  2. 2. ABOUT ME Software Architect @ Levi9 8+ years of Java experience Passionate about agile methodology and clean code http://ua.linkedin.com/in/antonudovychenko http://www.slideshare.net/antonudovychenko
  3. 3. AGENDA • Why does search matter to you • Why Elasticsearch • Basic Concepts • Comparison with SQL • Elasticsearch usage • Elasticsearch and Java • Q&A
  4. 4. WHY DOES SEARCH MATTER TO YOU
  5. 5. WHY DOES SEARCH MATTER TO YOU
  6. 6. WHAT IS IT ABOUT Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability
  7. 7. WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability
  8. 8. WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability
  9. 9. WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability Apache 2.0 License
  10. 10. WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability { "title": "My blogpost", "body": "Having a lot of text...", "user": “es_user", "postDate": "2016-01-01 15:03:32" }
  11. 11. WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability REST API
  12. 12. WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability
  13. 13. Image via batman-news.com
  14. 14. WHY ELASTICSEARCH - ALTERNATIVES – Complex logic (No additional level of abstraction) + More fine-grained control = Elasticsearch is based on Lucene
  15. 15. WHY ELASTICSEARCH - ALTERNATIVES – Proprietary protocol – Real-time caveats – Difficult to go to cloud – More difficult to start using – Smaller community Sphinx + Faster on a cold start + Occupies less memory = Non Java based (C++)
  16. 16. WHY ELASTICSEARCH - ALTERNATIVES + Truly open-source + Primary support of Hadoop distributors + ZooKeeper is more mature than Zen = Near Real-Time Search = Similar performance – More difficult to start using – SolrCloud (vs ES out of the box) – Zookeeper is harder to use then Zen – Worse operational tools – Worse monitoring tools – Worse analytical abilities
  17. 17. WHY ELASTICSEARCH
  18. 18. BASIC CONCEPTS • Near realtime • Cluster • Node • Index • Type • Document • Shards and replicas
  19. 19. BASIC CONCEPTS Cluster
  20. 20. BASIC CONCEPTS Node Node Node
  21. 21. BASIC CONCEPTS Shard Shard Shard Shard Shard Shard ShardShard
  22. 22. BASIC CONCEPTS Shard Shard Shard Shard Shard Shard ShardShard Index
  23. 23. BASIC CONCEPTS Shard Segment Segment Segment Segment Lucene Index
  24. 24. BASIC CONCEPTS Segment core Term Freq DocIds brown 2 0,1 dog 2 0,1 fox 2 0,1 in 1 1 jump 2 0,1 lazy 2 0,1 over 2 0,1 quick 2 0,1 summer 1 1 the 2 0,1 Inverted index DocId Fields 0 Text: The quick brown fox jumped over the lazy dog Author: Bob 1 Text: Quick brown foxes leap over lazy dogs in summer Author: Bill Document store 0 210 1 90 Column store Likes 0 59 1 23 Shared
  25. 25. BASIC CONCEPTS Segment core DocId Fields 0 Text: The quick brown fox jumped over the lazy dog Author: Bob 1 Text: Quick brown foxes leap over lazy dogs in summer Author: Bill Document store 0 210 1 90 Column store Likes 0 59 1 23 Shared Search term: Leaping brown Fox Term Freq DocIds brown 2 0,1 dog 2 0,1 fox 2 0,1 in 1 1 jump 2 0,1 lazy 2 0,1 over 2 0,1 quick 2 0,1 summer 1 1 the 2 0,1 Inverted index
  26. 26. SQL ELASTIC
  27. 27. COMPARISON WITH SQL SQL Elasticsearch Database Index Table Type Row Document Column field Field
  28. 28. COMPARISON WITH SQL SQL Elasticsearch Database Index Table Type Row Document with properties Column field Field
  29. 29. COMPARISON WITH SQL id title body user postDate 1 My first blogpost Having a lot of text... es_user 2016-01-01 15:03:32 2 About search The search data sometimes has a peculiar property… es_user 2016-01-01 19:22:03 3 Introduction to Elasticsearch Once I have stumbled upon this idea… es_user 2016-01-03 11:55:41
  30. 30. COMPARISON WITH SQL POST http://localhost:9200/blog CREATE DATABASE blog; USE blog; CREATE TABLE post( id bigint(20) AUTO_INCREMENT, title varchar(250), body text, user varchar(50), postDate timestamp, PRIMARY KEY(id) ); {"mappings": { "post": { "properties": { "title": { "type": "string" }, "body": { "type": "string" }, "user": { "type": "string" }, "postDate": { "type": "date" } } } } (not obligatory)
  31. 31. COMPARISON WITH SQL (CREATE) POST http://localhost:9200/blog/post INSERT INTO post( title, body, user, postDate ) VALUES( 'My blogpost', 'Having a lot of text...', ‘es_user', '2016-01-01 15:03:32' ); { "title": "My blogpost", "body": "Having a lot of text...", "user": "es_user", "postDate": "2016-01-01 15:03:32" }
  32. 32. COMPARISON WITH SQL (UPDATE) POST http://localhost:9200/blog/post/1/_update UPDATE post SET title='My blogpost‘ WHERE id=1; { "doc": { "title": "My blogpost" } }
  33. 33. COMPARISON WITH SQL (DELETE) DELETE http://localhost:9200/blog/post/1DELETE FROM post WHERE id=1
  34. 34. COMPARISON WITH SQL (READ) GET http://localhost:9200/blog/post/1SELECT * FROM post WHERE id=1 SELECT * FROM post GET http://localhost:9200/blog/post/_search SELECT * FROM post WHERE user=‘es_user’ GET http://localhost:9200/blog/post/_search ?q=user:es_user
  35. 35. COMPARISON WITH SQL (READ) POST http://localhost:9200/blog/post/_search SELECT * FROM post WHERE body LIKE '%Having %'; { "query": { "match": { "body": "Having" } } }
  36. 36. DEMO TIME
  37. 37. ELASTICSEARCH AND JAVA • Native Java client • Spring Data Elasticsearch • REST endpoints • Jest (https://github.com/searchbox-io/Jest) https://github.com/terrafant/es-feeder
  38. 38. DEMO TIME
  39. 39. Application ELASTICSEARCH USAGE ESclientJDBC DB Elasticsearch cluster RESTNative Request SQL Binary JSON
  40. 40. ELASTICSEARCH USAGE (DETAILS)Loadbalancer Master- eligible Node Master- eligible Node Client Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Master Node Client Node Client Node Elasticsearchcluster
  41. 41. ELASTICSEARCH USAGE (ELK) Frontend Backend ElasticsearchKibana Logstash Browser DB Logstash Logstash Broker
  42. 42. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security
  43. 43. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain
  44. 44. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain 3. Use dedicated master nodes
  45. 45. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain 3. Use dedicated master nodes 4. Use unicast (not multicast)
  46. 46. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain 3. Use dedicated master nodes 4. Use unicast (not multicast) 5. Configure recovery settings
  47. 47. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain 3. Use dedicated master nodes 4. Use unicast (not multicast) 5. Configure recovery settings 6. Number of replicas is not less than 2
  48. 48. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain 3. Use dedicated master nodes 4. Use unicast (not multicast) 5. Configure recovery settings 6. Number of replicas is not less than 2 7. Allocate enough physical memory
  49. 49. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain 3. Use dedicated master nodes 4. Use unicast (not multicast) 5. Configure recovery settings 6. Number of replicas is not less than 2 7. Allocate enough physical memory 8. Configure OS user
  50. 50. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain 3. Use dedicated master nodes 4. Use unicast (not multicast) 5. Configure recovery settings 6. Number of replicas is not less than 2 7. Allocate enough physical memory 8. Configure OS user 9. Use monitoring tools
  51. 51. TOP 10 PRODUCTION RECOMMENDATIONS 1. Take care of security 2. Avoid split-brain 3. Use dedicated master nodes 4. Use unicast (not multicast) 5. Configure recovery settings 6. Number of replicas is not less than 2 7. Allocate enough physical memory 8. Configure OS user 9. Use monitoring tools 10.Use Oracle JDKs
  52. 52. THANK YOU! Get social @elastic Explore the docs elastic.co/guide Give it a try elastic.co/downloads/elasticsearch Join the community discuss.elastic.com Check ELK stack demo.elastic.co

×