Who doesn't love building high-available, scalable systems holding multiple Terabytes of data? Recently we had the pleasure to crack some tough nuts to solve the problems and we'd love to share our findings designing, building up and operating a 120 Node, 6TB Elasticsearch (and Hadoop) cluster with the community.
25. • Shard allocation
• Avoid rebalancing (Discovery Timeout)
• Uncached Facets
https://github.com/lovelysystems/elasticsearch-ls-plugins
• LUCENE-2205
Rework of the TermInfosReader class to remove the
Terms[], TermInfos[], and the index pointer long[] and create
a more memory efficient data structure.
26. 3 AP server / MC
c1.xlarge
6 ES Master Nodes 6 Node Hadoop Cluster
c1.xlarge + Spot Instances
40 ES nodes per zone
m1.large
8 EBS Volumes
33. Cutting the cost
• Reduce the amount of Data
use Hadoop/MapRed transform to
eliminate SPAM, irrelevant Languages,...
• no more time-based indizes
• Dedicated Hardware
• SSD Disks
• Share Hardware for ES and Hadoop
38. That's thirty
minutes away.
I'll be there in ten.
@jodok
Notes de l'éditeur
\n
\n
\n
\n
\n
\n
\n
\n
\n
how do i work?\n* agile leader * i say what i do * i do what i say * hands on\n* quality over speed * responsibility to team\n* attract specialists * not trying to sell something. but DO IT. DELIVER\n