9. Use case 2
• Aggregate-oriented repository
• ...as in DDD
http://ptgmedia.pearsoncmg.com/images/chap10_9780321834577/elementLinks/10fig05.jpg
10. Elasticsearch
Distributed RESTful search and analytics
real time data and analytics
distributed
high availability
multi tenancy
full-text search
schema free
RESTful, JSON API
28. Mapping
• JSON data is parsed on indexing
• Mapping is done on first field indexing
• Inferred if not configured (!)
• Types: float, long, boolean, date
(+formatting), object, nested
• String type can have arbitrary analyzers
• Fields can be split up in more fields
33. Date Histogram Facet
The histogram facet works with numeric data by
building a histogram across intervals of the field values.
Each value is placed in a “bucket”
35. Facets - lessons
•
•
•
Bug in 0.90.x:
https://github.com/elasticsearch/elasticsearch/
issues/1305*
Solutions:
use 1 shard
ask for top 100 instead of 10
*will be solved in 1.0 with aggregation
module
39. Nested Documents
Specify Book type is “nested” in Author’s Mapping
We can query Authors with a query on properties
of nested Books
“Authors who published at least a book with
Penguin, in scifi genre”
44. Data Design
Index Configurations
• One index “per user”
• Single index
• SI + Routing: 1 index + custom doc routing
•
to shards
Time: 1 index per time window *
* we can search across indices
45. One Index per user
Hulk
Thor
User1 s0
User1 s1
User2 s0
+ different sharding per user
- small users own (and cost) at least 1 shard
47. Single Index + routing
Hulk
Thor
Users s0
Users s3
Users s2
+ a user’s data is all in one shard,
allows large overallocation
48. Index per time range
Hulk
Thor
2013_01 s1
2013_01 s2
2013_02 s1
+ allows change in future indices
49. Data Design - lessons
Test, test, test your use case!
Take a single node with one shard and
throw load at it, checking the shard capacity
The shard is the scaling unit:
overallocate to enable future scaling
#shards > #nodes
50. ...ES has lots of other
features!
• Bulk operations
• Percolator (alerts, classification, …)
• Suggesters (“Did you mean …?”)
• Index templates (Automatic index
•
•
•
configuration)
Monitoring API (Amount of memory used,
number of operations, …)
Plugins
...