Tata AIG General Insurance Company - Insurer Innovation Award 2024
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
1. MIGRATING A 130TB CLUSTER FROM
ELASTICSEARCH 2 TO 5 IN 20 HOURS WITHOUT
DOWNTIME
FRED DE VILLAMIL
@FDEVILLAMIL
OCTOBER 2017
2. ABOUT ME
FRED DE VILLAMIL, FORMER DIRECTOR OF INFRASTRUCTURE
@SYNTHESIO
FIRST ELASTICSEARCH IN PRODUCTION WAS 0.17.6
LINUX / (FREE)BSD USER SINCE 1996,
OPEN SOURCE CONTRIBUTOR SINCE 1998,
LOVES COOL TECHS, TENNIS, PHOTOGRAPHY, CUTE OTTERS,
INAPPROPRIATE HUMOR AND ELASTICSEARCH CLUSTERS OF UNUSUAL
SIZE.
WRITES ABOUT ES & MORE AT HTTPS://THOUGHTS.T37.NET
3. ABOUT SYNTHESIO
SYNTHESIO IS THE LEADING SOCIAL INTELLIGENCE TOOL FOR
SOCIAL MEDIA MONITORING & SOCIAL ANALYTICS
SYNTHESIO CRAWLS THE WEB FOR RELEVANT DATA, ENRICHES
IT WITH SENTIMENT ANALYSIS AND DEMOGRAPHICS TO BUILD
SOCIAL ANALYTICS DASHBOARDS.
4. ELASTICSEARCH @SYNTHESIO
8 production clusters:
• +600 hosts, all bare metal
• 3 data center
• 1.7PB storage SSD / NVME
• 37.5TB RAM
Hardware:
• 6 core Xeon E5v3 or bi Xeon E5-2687Wv4
12 core (160 watts!!!)
• 64GB to 256GB RAM
• 4 x 800GB SSD / 2 x 1.2TB NVME
• RAID0 everywhere
We agregate data from various cold
storage and make them searchable in a
giffy.
Average cluster stats
• writes: 85k documents / second, 1.5M
in peak
• 800 search /s, with some cluster
having a continuous 25k search /
second
• Doc size from 150KB to 200MB
5. THE BLACKHOLE CLUSTER
Topology
• 68 data nodes
• 3 master nodes
• 6 ingest nodes
• 200TB storage SSD
• 2.4TB heap
• 924 core
Cluster stats:
• 1137 indices (daily)
• 27266, shards
• 130TB data
• 201 billion documents
• 7000 new documents / second
• 800 search / second on the whole dataset
8. USING THE REINDEX API?
REINDEX API:
• NO SLICED SCROLL UNTIL ES
6.0
• SLOW
• MIGHT LOSE SOME DOCUMENTS,
NEEDS LOTS OF ERROR CONTROL
LOGSTASH:
• NO SLICED SCROLLS UNTIL ES
6.0
• FASTER THAN THE REINDEX API
• REALLY DOESN’T LIKE ERRORS
9. BEFORE UPGRADING
• USE THE UPGRADE CHECK PLUGIN TO VALIDATE CURRENT INDEXES
COMPATIBILITY
• UPGRADE YOUR MAPPING TEMPLATES TO BE ES 5 COMPLIANT
• CREATE THE NEXT 10 DAYS INDEXES (JUST IN CASE)
• TELL YOUR HOSTING PROVIDER YOU’RE GOING TO TRANSFER 130TB
IN 17 HOURS
10. EXPANDING BLACKHOLE
OPS:
• +90 NEW SERVERS IN 2 NEW RACKS
• RAISED THE REPLICATION FACTOR TO 3
RESULT:
• 167 NODES
• 53626 SHARDS
• 279TB DATA
• 391TB STORAGE
• 5.42TB HEAP
• 2004 CORE
11. SETTINGS UPDATE DURING THE REPLICA INIT
"INDICES.RECOVERY.MAX_BYTES_PER_SEC": “4096MB"
"INDICES.RECOVERY.CONCURRENT_STREAMS": "50"
"CLUSTER.ROUTING.ALLOCATION.DISK.WATERMARK.LOW" : "98%"
"CLUSTER.ROUTING.ALLOCATION.DISK.WATERMARK.HIGH" : “99%"
"CLUSTER.ROUTING.REBALANCE.ENABLE": “NONE"
12. PROBLEMS
• THE TRANSFER PUT THE
WHOLE CLUSTER ON
THEIR KNEES.
• THIS SLOWERS THE
WRITES.
• THE BULK THREAD POOL
STARTS TO FILL IN.
13. SOLUTION: ZONING FOR FUN & PROFIT
• ALLOCATE THE FRESHEST DATA AND
ONGOING IN A ZONE
• SEGREGATE EVERYTHING ELSE IN A
DIFFERENT ZONE
• WAIT FOR THE CLUSTER TO CALM
DOWN
• TOTAL SPENT TIME FOR THE
TRANSFER: 17 HOURS
14. SPLITTING THE CLUSTER IN 2
• SET
"CLUSTER.ROUTING.ALLOC
ATION.ENABLE" TO "ALL"
• SHUTDOWN 2 OF THE RACKS
• SHUTDOWN ONE OF THE
MASTERS
• SWITCH THE NUMBER OF
REPLICAS TO 1
15. BUILDING BLACKHOLE02
• RECONFIGURE THE 2 SHUTDOWN RACKS AND MASTER SO
THEY TALK TO EACH OTHER
• START THE MASTER, ALONE, CLOSE THE INDEXES
• UPGRADE THE MASTER TO ES 5.1.1
• UPGRADE ALL THE PLUGINS
• START THE MASTER: THE WHOLE UPGRADE TOOK 32 SECONDS
16. BRINGIN BACK THE DATA
• UPGRADE ES AND THE PLUGINS ON THE DATA NODES
• START ELASTICSEARCH
• WAIT 30 MINUTES FOR THE CLUSTER TO GO BACK GREEN
• PLUG A WORK UNIT TO CATCH UP WITH THE PAST 18 HOURS
OF DATA
• UPDATE THE LOAD BALANCER CONFIGURATION TO USE THE
NEWLY UPGRADED CLUSTER