Elasticsearch workshop presentation

2. I’ve done things Used Elasticsearch since v.0.18 (2011) Been on-call for production systems using Elasticsearch since 2013 Paired it with (mostly) Python, also Ruby and Javascript Used it as the sole place to hold data Also used it in a more usual way - paired with a database

3. Elasticsearch is a really fast and easily scalable Open source Distributed RESTful Search and Analytics Engine Part of an ecosystem of tools for analytics (massage, store and graph data)

4. The features of Elasticsearch

5. A walk through the woods. Features. Many features that can be categorised as: - Indexing - Querying - Aggregating (Analysing)

6. Indexing Receive raw data Analyse Record You can just throw data at it

7. Querying Receive the query Analyse the query Search Fetch (return results) Control paging and sorting Many types of query to support many use cases

8. Aggregating An aggregation is some analysis over some documents Types Buckets are very useful You can nest aggregations They’re cleverly cached

10.

11. You can do quite a lot with Elasticsearch

12. Search through Natural Language ~30 minutes to prototype Ingredients The text you want to search through The searches you want to do (queries) Elasticsearch Preparation Put text into Elasticsearch. No schema or configuration necessary (for basics). Put queries into Elasticsearch 1. Get results Let me show you quickly.

13. Logs ~60 minutes to prototype Put logs in. Run aggregations. Get insight into app and traffic. The Elastic Stack is geared towards this with multiple products tackling log formats, ingestion and analysis.

14. Custom Dashboards ~180 minutes to prototype Put data in. Run aggregations. Get insight. Plays really well with D3 and other common visualisation libraries. Can also use Kibana + Elasticsearch

15. Further use cases Search Faceting “Did you mean?” Autocomplete Sounds-like suggestions “People who buy this also buy...”

16. Do you have a nail? Elasticsearch is a hammerES is not great at: ● Relational integrity ● Transactions Problems you should not try to solve with ES: ● Calculate inventory ● Grand totals ● Rollback-able stuff ● User accounts

17. Let’s play!

18. I was your host and would love feedback Emanuil Tolev emanuil@cottagelabs.com @emanuil_tolev on Twitter Link to slides: http://tinyurl.com/es-intro-slides Really, really good intro blog post to ES with use cases and further reading, like securing your Elasticsearch: http://tinyurl.com/es-intro-blog . US State map came from http://greasethewheels.org/cpi/ , actually a US corruption research paper.

Notes de l'éditeur

Am a consultant, specialising in performance and robust technical architecture. The right tools for the right problems, etc. Work in a loose partnership of other consultants and freelancers called Cottage Labs.
About to use it a lot more with RDBMS
Open source - 1-2 of the usual positives. Strong resilient community in this case. Distributed - stuff can go down and the system rebalances itself automatically. Restful - Very easy to use - only need a browser. Very good, simple HTTP API speaking in JSON. Note Search vs. Analytics distinction The Elastic Stack is more than Elasticsearch, but out of scope here.
Indexing (= putting data in) Querying (= find a needle in haystack). Includes things like searching, fuzzy searching, autocompletion and instant searches (train apps). Aggregating (= analysing data and counting things)
Throw data at it: ES will guess data types and enforce them for you. You can’t save a number into a field that ES has learned is a date. Of course, you can also be much more careful and thorough - use Mappings. ES will always analyse by default. Is it possible that we might not always want that? Advanced: asciifolding, tokenisation, find a document by its translation, and more. Index-time analysis and analysers Common pitfall: avoiding analysis for exact string matches
Paging and sorting directly in the URL, or in JSON: ?sort ?size Queries: match, terms, geo, More Like This (takes doc as input to return similar docs)
Types: matrix, metrics, bucket, pipeline Buckets are very useful, especially Terms buckets. Aggregations are cached with some very clever algorithms and great cache management by default, ensuring both low resource use and no stale results. Say we have a field called “us_state” in some data we’ve got. A Terms aggregation over that data will tell us the unique US state codes which are present in our data. If it’s a comprehensive dataset, we’ll essentially just get a list of the US states. Not that useful, right. But, you can nest aggregations so you have sub-aggregations. Which means, we could ask Show a Terms aggregation drilling further and further down into some category. Fashion may be a good metaphore, e.g. All Stock -> Shoes -> Ladies’ -> Red -> Size 6.5 TODO replace with housing example Bucketing: all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to "fall in" the relevant bucket. By the end of the aggregation process, we’ll end up with a list of buckets - each one with a set of documents that "belong" to it. Metric: Aggregations that keep track and compute metrics over a set of documents. Min, max, avg, sum, ranking, geo bounds and geo centroid. (If asked) Geo bounds gives you the box containing all locations. Geo centroid gives you the center given other points. Matrix: operate on multiple fields and produce a matrix result based on the values. Experimental. Statistics (variance, covariance, correlation). Pipeline: Aggregations that aggregate the output of other aggregations and their associated metrics. More advanced.
Just an example. Example aggregation using geo centroid and the number of, say, museums in the USA - the exact data is not important. But now, let’s see what bucketing the documents by US state gives us.
So this is what “bucketing” is. You’ll find it very useful for building intuitive analytics dashboards and user interfaces that deal with search and discovery. I’ll give you a sneak peek of what the data, the request and the response might look like. The Elastic example is museums in Europe. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html
Predefined aggregations available. Logstash capable of understanding many log formats, and you can add custom ones.
Why the ugly dashboard? Dashboards should be useful first, pretty … later. Netflix built an open source application metrics project based on Java and ES. Called Servo
Searching a large number of descriptions for the best match for a specific phrase (e.g. property search, say “no pets”) and returning the best results Faceting: get a breakdown of the types of dwelling that forbid pets :( “Did you mean …?” suggestions Auto-completing a search box based on partially typed words based on previously issued searches while accounting for mis-spellings Searching text for words that sound like another word Product and information suggestions: “People who were interested in / bought this also look at…”
Not great at: Instant availability in search results after indexing High cardinality & high precision analysis Problems you should not try to solve: Very limited resource projects (embedded devices, tiny websites) Elasticsearch is generally fantastic at providing approximate answers from data, such as scoring the results by quality. While Elasticsearch can perform exact matching and statistical calculations, its primary task of search is an inherently approximate task. Finding approximate answers is a property that separates Elasticsearch from more traditional databases. That being said, traditional relational databases excel at precision and data integrity.
The Elastic website has a lot of blogs and videos on user stories, including top senior dogs from Netflix, Rightmove, banks, supercomputer and AI people, fighting Ebola, the BBC and many more! It was a pleasure! I hope you had fun. Please leave a comment on the meetup page or send me an email with feedback.

Elasticsearch workshop presentation

Recommandé

Recommandé

Contenu connexe

Similaire à Elasticsearch workshop presentation

Similaire à Elasticsearch workshop presentation (20)

Plus de Laura Steggles

Plus de Laura Steggles (9)

Dernier

Dernier (20)

Elasticsearch workshop presentation

Notes de l'éditeur