Talk about Hadoop Operations and Best Practices for building and maintaining Hadoop cluster.
Talk was held at the data2day conference in Karlsruhe, Germany on 27.11.2014
14. 2
Rack Design (without HA)
Rack 1 Rack 2
NameNode
ResourceManager
Mgmt. Server
5 x Master
Nodes
5 x Worker Nodes
6 x Worker Nodes
Gateway Server
Nexus 3 K
Cisco Catalyst 2960
1 x ToR Switch Nexus 3 K 1 x ToR Switch
1 x Mgmt. Network Cisco Catalyst 2960 1 x Mgmt. Network
Secondary
NameNode
15. 2
Rack 1 Rack 2
NameNode
(Active)
ResourceManager
(Active)
Mgmt. Server
4 x Master
Nodes
5 x Worker Nodes
6 x Worker Nodes
NameNode
(Passive)
ResourceManager
(Passive)
2 x Standby
HA Nodes
Gateway Server
Nexus 3 K
Cisco Catalyst 2960
1 x ToR Switch Nexus 3 K 1 x ToR Switch
1 x Mgmt. Network Cisco Catalyst 2960 1 x Mgmt. Network
Rack Design (with HA)
16. 2
HDFS DataNode
YARN NodeManager
Hadoop Client Libraries
Worker Nodes
NameNode (Active)
ZooKeeper Server
Journal Node
Hadoop Client Libraries
HDFS NameNode (Active) NameNode (Passive)
ZooKeeper Server
Journal Node
Hadoop Client Libraries
HDFS NameNode (Passive)
ResourceManager
App Timeline Server
MapReduce2 History Server
ZooKeeper Server
Journal Node
Hadoop Client Libraries
YARN ResourceManager (Active)
MySQL Server
• Hive MetaStore
• Oozie
• Ganglia
HiveServer2
Oozie Server
Ganglia Server
Nagios Server
Zookeeper Server
Journal Node
Kerberos
Hadoop Client Libraries
Management Server
Hue Server
Ambari Server
NFS Gateway Server
WebHCat Server
WebHDFS
Falcon
Sqoop
Solr
Hadoop Client Libraries
Gateway Server
ResourceManager
App Timeline Server
MapReduce2 History Server
ZooKeeper Server
Journal Node
Hadoop Client Libraries
YARN ResourceManager (Passive)
Service Mapping
18. 2
Hardware
• Get good-quality commodity hardware!
• Buy the sweet-spot in pricing: 3 TB disk, 128 GB
RAM, 8-12 core CPU
– More memory is better. Always.
• First scale horizontally than vertically (1U 6 disks
vs. 2U 12 disks)
– Get to at least 30-40 machines or 3-4 racks
• Don‘t forget about rack size (42U) and power
consumption.
• Use pilot cluster to learn about load patterns
– Balanced workload
– Compute intensive
– I/O intensive
19. 2
It’s about storage
Total: 3,00 TB
Intermediate data: ~25% - 0,75 TB
= 2,25 TB
HDFS Replication: 3
= 0,75 TB
x 12 disk
x 11 Data Nodes
= 99 TB
Compression: …well, it depends…
20. 2
It’s about Zen
Xeon 10C
Model E5-2660v2
4 Memory Channels
10 Cores
8 x 16 GB
12 disks
21. 2
Hardware
Component HDFS NameNode
+
HDFS Secondary NN
+
YARN Resource Manager
Management Server
+
Gateway
Server
Worker Nodes
CPU 2 x 3+ GHz with 8+ cores 2 x 3+ GHz with 8+ cores 2 x 2.6+ GHz with 8+ cores
Memory 128 GB
(DDR3, ECC)
128 GB
(DDR3, ECC)
128 GB
(DDR3, ECC)
Storage 2 x 1+ TB
(RAID 1, OS)
1 x 1 TB
(Hadoop Logs)
1 x 1 TB
(ZooKeeper)
1 x 3 TB
(HDFS)
2 x 1+ TB
(RAID 1, OS)
1 x 1 TB
(Hadoop Logs)
1 x 3 TB
(HDFS)
2 x 1+ TB
(RAID 1, OS)
10 x 3 TB (HDFS)
If disk chassis allows:
12 x 3 TB (HDFS)
Network 2 x Bonded
10 GbE NICs
1 x 1 GbE NIC
(for mgmt.)
2 x Bonded
10 GbE NICs
1 x 1 GbE NIC
(for mgmt.)
2 x Bonded
10 GbE NICs
1 x 1 GbE NIC
(for mgmt.)
25. 2
Linux File System
• Ext3
• Ext4
• XFS with -noattime, -inode64, -nobarrier options
Possibly better performance, be aware of delayed data
allocation (Consider turning off the delalloc option in /etc/fstab)
26. 2
OS Optimizations
• Of course depending on your OS choice
• Specific recommendations available by OS vendors
• Common recommendations
• No physical I/O Scheduling (competes with virtual/HDFS
I/O Scheduling) (e.g. use NOOP Scheduler)
• Adjust vm.swapiness to 0
• Set number of file handles (ulimit, soft+hard) to 16384
(Data Nodes) / 65536 (Master Nodes)
• Set number of pending connections (net.core.somaxconn)
to 1024
• Use Jumbo Frames (MTU=9000)
• Consider network bonding (802.3ad)
28. 2
Java Optimizations
• Use 64 bit JVM for all daemons
– Compressed OOPS enabled by default (Java 6 u23+)
• Java Heap Size
– Set Xmx == Xms
– Avoid Java defaults for NewSize and MaxNewSize
• Use 1/8 to 1/6 of max size for JVM’s larger than 4 GB
– Configure –XX:PermSize=128 MB, -XX:MaxPermSize=256 MB
• Use low-latency GC collector
– Set -XX:+UseConcMarkSweepGC, -XX:ParallelGCThreads=<N>
• Use high <N> on NameNode & ResourceManager
• Useful for debugging
– -verbose:gc -Xloggc:<file> -XX:+PrintGCDetails
– -XX:ErrorFile=<file>
– -XX:+HeapDumpOnOutOfMemoryError
29. 2
Hadoop Configuration
• Multiple redundant directories for NameNode metadata
– One of dfs.namenode.name.dir should be on NFS
– Softmount NFS with -tcp,soft,intr,timeo=20,retrans=5
• Take periodic backups of NameNode metadata
– Make copies of the entire storage directory
• Set dfs.datanode.failed.volumes.tolerated=true
– Disk failure is no longer complete DataNode failure
– Especially important for large density nodes
• Set dfs.namenode.name.dir.restore=true
• Restores NN storage directory during checkpointing
• Reserve a lot of disk space for NameNode logs
– Hadoop logging is verbose – set aside multiple GB’s
– NameNode logs roll with in minutes – hard to debug issues
• Use version control for configuration!
34. 2
Monitoring
• The basics: Nagios, Ganglia, Ambari/Cloudera Manager, Hue
• Admins need to understand the principles behind Hadoop and learn
about their tool set: fsck, dfsadmin, …
• Monitor the hardware usage for your work load
– Disk I/O, network I/O, CPU and memory usage
– Use this information when expanding cluster capacity
• Monitor the usage with Hadoop metrics
– JVM metrics: GC times, memory used, thread Status
– RPC metrics: especially latency to track slowdowns
– HDFS metrics: Used storage, # of files & blocks, cluster load, file system
operations
– Job Metrics: Slot utilization and Job status
• Tweak configurations during upgrades & maintenance windows on an
ongoing basis
• Establish regular performance tests
– Use Oozie to run standard test like TeraSort, TestDFSIO, HiBench,
…
36. 2
Security today
Kerberos in
native Apache
Hadoop
Perimeter
Security with
Apache Knox
• LDAP
• SSO
Authentication
Control access to
cluster.
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Native in Apache Hadoop
• HDFS Permissions + ACL’s
• Queues + Job ACL’s
• Process Execution audit trail
Fine grained role based authorization
• Hive
• Apache Sentry
• Apache Accumulo
Service level authorization with Knox
Central security policies with Ranger
Wire encryption
in native
Apache Hadoop
Wire Encryption
with Knox
Orchestrated
encryption with
3rd party tools
38. 2
Data Boxing
Raw data layer
Read & Write
Read
Division 1
--
Read & Write
Division 2
--
Read & Write
Read
Set up data boxing using
• Users & Groups
• HDFS Permissions & ACL‘s
• Higher level where applicable