Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Deployment and Management of Hadoop Clusters

7 084 vues

Publié le

This presentation explains about the end to end architecture of hadoop cluster and the procedures required for the deployement of hadoop clusters.

Publié dans : Technologie
  • Soyez le premier à commenter

Deployment and Management of Hadoop Clusters

  1. 1. Deployment and Management of Hadoop Clusters Amal G Jose Big Data Analytics http://www.coderfox.com/ http://amalgjose.wordpress.com/ in.linkedin.com/in/amalgjose/
  2. 2. • Introduction • Cluster design and deployment • Backup and Recovery • Hadoop Upgrade • Routine Administration Tasks Agenda
  3. 3. Introduction • What is Hadoop ? • What makes Hadoop different ? • Need for a hadoop cluster ?
  4. 4. This has 4 parts: • Cluster Planning. • OS installation & Hardening. • Cluster Software Installation. • Cluster configuration. Cluster Installation
  5. 5. Cluster Planning Hadoop Daemon Configuration Namenode Dedicated servers. OS is installed on the RAID device. The dfs.name.dir will reside on the same RAID device. One more copy is configured to have on NFS. Secondary Namenode Dedicated Server OS is installed on RAID device Jobtracker Dedicated Server. OS installed on JBOD configuration Datanode/Tasktracker Individual servers. OS installed on JBOD configuration
  6. 6. Workload Patterns For Hadoop • Balanced Workload • Compute Intensive • I/O Intensive • Unknown or evolving workload patterns
  7. 7. Cluster Topology
  8. 8. Name Node Job Tracker Ganglia-Daemon Name Node Job Tracker Ganglia-Daemon MN Hive Pig Oozie Mahout Ganglia-Master Hive Pig Oozie Mahout Ganglia-Master CN Typical Hadoop Cluster Topology Task Tracker Data Node Ganglia-Daemon Task Tracker Data Node Ganglia-Daemon SN
  9. 9. • Creating the instances based on the requirement Creating Instances (in case of cloud)
  10. 10. • We will be installing the Hadoop on the RHEL6 64- bit servers. • OS should be hardened based on RHEL6 hardening document. • Setting iptable rules necessary for hadoop services. • In case of Amazon EC2 instances create key/value pairs for logging in. • GUI can be disabled to make more room for hadoop. • Time should be made same in all the servers. Operating System Hardening
  11. 11. • Choosing the distribution of Hadoop. • Creation of Local Yum Repository. • Java Installation in all the machines. Cluster Software Installation
  12. 12. Hadoop Ecosystem
  13. 13. Installation Methods • Hadoop can be installed either manually or automatically using some tools such as ClouderaManager, Ambari etc. • One click installation tools helps the users to install hadoop on clusters without any pain.
  14. 14. Manual Installation • Install hadoop daemons in the nodes. • We can either use tarball or rpm for installation. • rpm installation will be easier.
  15. 15. Setting up Client Node • What is client node ? • Necessity of a client node ? • How to configure a client node ? • What all services are installed ? • Need for multiuser segregation ?
  16. 16. Cluster Configuration • Storage location for namenode, secondarynamenode and datanode. • Number of task slots (map/reduce slots). – Number of task slots/node = (memory available/child jvm size) • Backup location for namenode. • Configuring mysql for hive and oozie.
  17. 17. Namenode - Single point of Failure • Why namenode is the single point of failure? • How to resolve this issue? • How backup can be achieved?
  18. 18. Implementing Schedulers • Capacity scheduler • Fair scheduler
  19. 19. Monitoring Hadoop Cluster • For manual installation, we can use Ganglia. • Automated installation tools have built-in monitoring mechanisms available.
  20. 20. Ganglia
  21. 21. Cluster Maintenance • Managing Hadoop processes – Starting/stopping processes • HDFS Maintenance – Adding /Decommissioning datanode – Checking file system integrity with fsck. – Balancing hdfs block data – Dealing with a failed disk • Mapreduce Maintenance – Adding /Decommissioning tasktracker – Killing a mapreduce Job/ Task – Dealing with a blacklisted tasktracker
  22. 22. Backup and Recovery • Data Backup – Distributed copy (distcp) – Parallel Ingestion • Namenode Metadata • Hive metastore backup.
  23. 23. Hadoop Upgrades • Data Backup • Software upgrade • HDFS upgrade • Finalize upgrade
  24. 24. Steps for Hadoop Upgrade • Make sure that any previous upgrade is finalized before proceeding with another upgrade. • Shut down MapReduce and kill any orphaned task processes on the tasktrackers. • Shut down HDFS and backup the namenode directories. • Install new versions of Hadoop HDFS and MapReduce on the cluster and on clients. • Start HDFS with the -upgrade option. • Wait until the upgrade is complete. • Perform some sanity checks on HDFS. • Start MapReduce. • Roll back or finalize the upgrade (optional).
  25. 25. Routine Administration Procedures • Checking every nodes • Metadata backups • Data backups • File system check • File system balancer
  26. 26. Summary • Hadoop Cluster design • Hadoop Cluster Installation • Back up and Recovery • Hadoop Upgrade • Routine Administration Procedures
  27. 27. For more info, visit: http://amalgjose.wordpress.com http://coderfox.com http://in.linkedin.com/in/amalgjose Additional Information
  28. 28. Thank You