How EverTrue is building a donor CRM on top of ElasticSearch. We cover some of the issues around scaling ElasticSearch and which aspects of ElasticSearch we are using to deliver value to our customers.
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Building a CRM on top of ElasticSearch
1. +
How we’re building a CRM on top of ElasticSearch
2. About me (quickly)
Mark Greene / @markjgreene
Director of Engineering @ EverTrue
Love distributed data stores, love them!
Using ElasticSearch for ~1 year
3. What does EverTrue do?
We help nonprofits raise more money
by allowing them to identify and build relationships
with potential donors
4. How do we do that?
Resolving identities across third party data sources
Obligatory database tube
5. Cluster Setup
• 3 Masters, 2 data nodes, AZ aware
• ~40m documents, ~25GB
• 1 index, 7 types
• 5 shards, 1 replica
• Peak work loads equate to 4-5k ops/s
• Using mostly default settings
6. Data Model
• Mapping contains ~50 default fields.
• Most fields are stored as both analyzed
and not analyzed
• Leverage dynamic templates for custom
fields created by our customers
• Each custom field is stored by as analyzed
and not analyzed
10. Filter Cache: Our first scaling issue
Turns out field cache is unbounded by default...
11. First Solution
• We set indices.fielddata.cache.size
to 50%
• No more OOME Crashes
• Then something else happened....Really slow
queries (Problem sign #1)
12.
13. Slow Query?... More Hardware Right?!
Type m1.xlarge r3.2xlarge r3.2xlarge
Hardware
4 CPU 8 CPU 8 CPU
15GB RAM 60GB RAM 60GB RAM
Round disk
thingy SSD’s SSD’s
ES Version v1.1.2 v1.1.2 v1.3.2
has_child query
time 12-15s 6-8s ~100ms
14. Lessons Learned
• Watch the release notes & GH issues like a
hawk
• Don’t fall to far behind w/r/t versions
• We waited to long (6 months)
• Keep ES fed with plenty of memory
• Need monitoring to have any hope of
understanding operational issues
15. Settings We Tweaked
• indices.store.throttle.max_bytes_per_sec
• Default 20mb -> 60mb (SSD’s can handle it)
• indices.fielddata.cache.size
• Set to 70% of heap
16. ES Hadoop Integration
• We use it for a lot of our offline jobs
• One map task per shard
• Small shard deployments may underutilize
your hadoop cluster
• Mapper inputs do not contain meta fields
like _version
• Forces another read for write back
scenarios