Contenu connexe Similaire à Automation of Hadoop cluster operations in Arm Treasure Data (20) Automation of Hadoop cluster operations in Arm Treasure Data1. Confidential © Arm 2017
Automation of Hadoop cluster operations
in Arm Treasure Data
Yan Wang
Arm Treasure Data
March 14, 2019
2. Confidential © Arm 20172
Who am I?
● Yan Wang (王岩)
● May 2018 〜 Arm Treasure Data
Hadoop team, Software Engineer
● Contributing hadoop
● Like Japanese Mahjong
● Blog https://tiana528.github.io/
LukaMe
3. Confidential © Arm 20173
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace chef by debian packaging and Codedeploy
● Future roadmap
● Summary
4. Confidential © Arm 20174
Arm Treasure Data Product
Customers don’t
need to operate
hadoop clusters.
We do.
5. Confidential © Arm 20175
Hadoop Usage
multi-clouds
Cluster
very multi-tenancy
permanent storage
HA
M
S
S S
cluster structure
patched hadoop
PTD-2.7.3-xxx
operation tool
CDH
HDP
Self-developed
Operation point of view
● Recreate cluster on incident
● Self-developed operation tool is
key point for operation
Improved in the past year
6. Confidential © Arm 20176
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace chef by debian packaging and Codedeploy
● Future roadmap
● Summary
7. Confidential © Arm 20177
Reduce hadoop cluster creation time significantly
-- by making use of AWS Auto Scaling Group
● Before
Environment
Setup
Create cluster
of 100 nodes
launch nodes one
by one
● Too slow
○ Client side
■ 1 hour
○ Cluster ready
■ 1 hour
Environment
Setup
create AWS Auto
Scaling Group
● Much faster
○ Client side
■ 3 minutes
○ Cluster ready
■ 15 minutes
● After
Create cluster
of 100 nodes
9 months ago
8. Confidential © Arm 20178
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace chef by debian packaging and Codedeploy
● Future roadmap
● Summary
9. Confidential © Arm 20179
General flow of how to recreate a hadoop cluster
● No downtime : A/B switch
ClusterA
job
server
ClusterA
job
server
ClusterB ClusterA
job
server
ClusterB ClusterB
job
server
create new
cluster
switch
traffic
shutdown
old cluster
10. Confidential © Arm 201710
Simplify hadoop cluster recreation
-- by creating our wrapper script of SRE tool
● Issues
○ Too many parameters
○ Stressful to shutdown
7 months ago
● Before ● After
service create -S aws -s development -c ClusterB ...
service delete -S aws -s development -c ClusterA ...
cluster create ClusterB
cluster delete ClusterA
● Improved
○ 1 parameter
○ Stressless to shutdown
Use SRE team tool directly Use our wrapper script
= SRE tool + verification + config
11. Confidential © Arm 201711
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace chef by debian packaging and Codedeploy
● Future roadmap
● Summary
12. Confidential © Arm 201712
Gained a lot of merits by changing instance type of slaves
c3.8xlarge
Very old model
6 months ago
● Before ● After
m5d.12xlarge
Latest model
● Improved
○ Larger per container memory
○ Larger & faster local disk
○ Lower cost
○ ...
● But …
13. Confidential © Arm 201713
But… new issue occured
● New issue happened
○ Amazon don’t have so many m5d instances for on-demand allocation
○ Insufficient instances to do A/B switch in one availability zone when
recreate a cluster.
● Ask Amazon support for help
○ They suggest us buying more reserved instances or use other instance
types intermediately.
● Other approaches?
14. Confidential © Arm 201714
Handle the situation of insufficient instances in one AZ
-- by supporting cross AZ environment
● Cross AZ environment
C
job
server
● Keypoint : no large network traffic between AZs which can be expensive.
worker
AZ_1 AZ_2
job
server
worker
AZ_1 AZ_2
job
server
job
server
worker
AZ_1 AZ_2
job
server
job
server
worker
AZ_1 AZ_2
job
server
A CA B CA B C B
REST API REST API
create new
cluster
switch
traffic
shutdown
old cluster
15. Confidential © Arm 201715
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace chef by debian packaging and Codedeploy
● Future roadmap
● Summary
16. Confidential © Arm 201716
Create patches to fast fail jobs consuming too much disk
task timeline
0h 10h 20h 30h 40h
job fail here
● Before ● After
failed
retried
We created two patches
For local : MAPREDUCE-7022 Fast fail rogue jobs based on task scratch dir size
For HDFS : MAPREDUCE-7148 Fast fail jobs when exceeds dfs quota limitation
(Disk quota configured)
failed
retried
failed
retried failed
Retry is meaningless
task timeline
0h 10h 20h 30h 40h
job fail here
failed
4 months ago
Fail fast
17. Confidential © Arm 201717
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace chef by debian packaging and Codedeploy
● Future roadmap
● Summary
18. Confidential © Arm 201718
installed on all nodes
check very detailed status
Simplify incident Handling by creating health check scripts
Check A
Run command B
Check C
If … else…
Open URL ...
● Before ● After
runbook
health check script
● When incident happen
○ Follow complex runbook during
incident. Needs to collect info first.
● When incident happen
○ Run health check during incident,
and know where is the issue.
● Future
○ integrate with Auto Scaling Group health check.
4 months ago
datadog metrics
trigger
alerts
19. Confidential © Arm 201719
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace chef by debian packaging and Codedeploy
● Future roadmap
● Summary
20. Confidential © Arm 201720
Easy to know when to scale out
-- by creating capacity metrics based on machine learning
on going(POC)
alert comes
manually scale out if
having performance issue
● Before ● After
HDFS put/get latency
Price plan & using slots
Probe query
HDFS usage
CPU I/O wait
linear regression
capacity metrics
● Expect improvement
○ Know when to scale out immediately
and easily.
● Future plan : use it for auto scale.
● Issue
○ A little late…
○ Hard for junior to understand
21. Confidential © Arm 201721
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace chef by debian packaging and Codedeploy
● Future roadmap
● Summary
22. Confidential © Arm 201722
Simplify shutdown slaves
-- by using Auto Scaling Group shutdown hook
shutdown 2 at a time
wait block replication finish
then shutdown 2 more…
● Before ● After
● Issue
○ boring operation
○ potential job retry
AWS Auto Scaling Group shutdown hook
● Expect improvement
○ safe & fast
on going
hadoop node decommission script
● Future plan : find a “proper” node to kill
○ e.g. short running tasks
23. Confidential © Arm 201723
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
○ Reduce hadoop cluster creation time significantly
○ Simplify hadoop cluster recreation
○ Modernize instance type of slaves
○ Create patches to fast fail jobs consuming too much disk
○ Simplify incident handling
○ Make it easy to know when to scale out
○ Simplify shutting down nodes
○ Replace Chef by debian packaging and Codedeploy
● Future roadmap
● Summary
24. Confidential © Arm 201724
Replace Chef by debian packaging and Codedeploy
We meet many issues using Chef
○ Only ruby
○ Unnecessary complicated
○ Stateful
○ 15 override rules of attributes
○ Slow
○ Fail silently
○ Dependent on other team’s release
cycle
○ two pass model
○ 5 years adding little by little
○ ...
● Before ● After
Debian packaging
○ Standard way in Linux
AWS Codedeploy
○ Fast and easy to maintenance
○ Can be used in other cloud
● Expect improvement
○ Much easier to maintenance
○ cluster creation 15 minutes => 5
minutes
on going
25. Confidential © Arm 201725
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
● Future roadmap
○ API-based routing and workflow-based hadoop recreation
○ Usage history based account routing
● Summary
26. Confidential © Arm 201726
API-based routing and workflow-based hadoop recreation
● Expect improvement
○ Totally automate hadoop cluster
recreation through workflow
○ server side validation
● Issue
○ Very manual
○ depends on manual validation
job
server
submit git pull request,
review, merge,
upload databag,
run chef-client on all nodes
change routing
● Before ● After
A B
job
server
A B
job
server
REST API Call
curl -X PUT .../hadoop_routes -d
'{"defauls":"ClusterB"}'
change routing
A B
job
server
A B
API-based routing
27. Confidential © Arm 201727
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
● Future roadmap
○ API-based routing and workflow-based hadoop recreation
○ Usage history based account routing
● Summary
28. Confidential © Arm 201728
Usage history based account routing
Busy
cluster
Idle
cluster
resource not fully utilized
job
server
Fixed
routing
Big cluster
easy to meet insufficient instance
issue when creating big cluster
Fixed
size
Busy
cluster
Idle
cluster
resource utilization increase
job
server
Dynamic routing
more accounts to
idle cluster
AZ_1 AZ_2
● Before ● After
Dynamic account routing
easy to split cluster when instances are
insufficient
smaller
cluster1
smaller
cluster2
29. Confidential © Arm 201729
Agenda
● Hadoop in Arm Treasure Data
● Hadoop Cluster Operation Automation
● Future roadmap
● Summary
30. Confidential © Arm 201730
Summary
● Common idea
○ Use modernized cloud-based approach
○ API-based operation
○ Start from small and many small changes leading to large impact
31. Confidential © Arm 201731
We are hiring
https://www.treasuredata.com/company/careers/jobs/positions/?job=f6fd040b-c843-4991-bd49-bc674aab9a9e&team=Engineering
32. Confidential © Arm 201732 Confidential © Arm 201732 Confidential © Arm 201732
Thank You!
Danke!
Merci!
谢谢!
ありがとう!
Gracias!
Kiitos!