1. The document discusses setting up a Hadoop cluster using Cloudera Manager. It outlines the requirements for Cloudera Manager, including supported operating systems, browsers, databases, and Java versions.
2. The process of setting up the Hadoop cluster with Cloudera Manager is described. It involves installing the Cloudera Manager installer, logging into the admin console, specifying hosts, and configuring services.
3. Flume is introduced as a data collection tool that can run independently or on Hadoop clusters. Its important settings - sources, channels, and sinks - are defined along with example types for each.
3. Before Starting
▪ Ask yourself what do you want!
An expert to make
Hadoop itself better
Provide Service by
Using Hadoop
Co-graph confidential
4. As a Hadoop Expert
Better to know Hadoop as detail as possible
Companies like Cloudera and MapR
Co-graph confidential
5. Other Usages on Hadoop
1. Learn how to use
Hadoop to solve
problems more
effectively and
efficiently
2. Find an easiest
way to make sure
your Hadoop can
work properly
Co-graph confidential
6. Desired Skills
▪ Network knowledge is imperative
▪ Every node in a cluster communicates with each
other through network
▪ Even with cloudera manager, you still need to
handle it on your own
▪ Linux administration
▪ Everyone knows that!!
Co-graph confidential
7. Requirement for Cloudera
Manager (1)
▪ Prepare Your Machines
▪ Supported OS version
▪ Only 64bit Linux-based
▪ Supported Browsers
▪ For admin console
▪ Supported Database
▪ If you need to use custom database other than embedded PostgreSQL database
▪ Supported JDK version
▪ Cloudera Manager would install it for you if there is no JDK installed
▪ Repositories
▪ All hosts must have to access standard packages repositories and Cloudera
Hadoop repositories
Co-graph confidential
8. Requirement for Cloudera
Manager (2)
▪ Networking and Security
▪ Properly configuring DNS or /etc/hosts
▪ Everyone should know who’s who
▪ Using root account ro password-less sudo permision ssh
access to all cluster machines
▪ No blocking by iptables or firewalls
▪ 7180 port is used to access Cloudera Manager
▪ No blocking by Security-Enhanced Linux (SELinux)
▪ disabled
▪ There are more details on cloudera.com
▪ If there is a problem, don’t feel ashamed to google!
Co-graph confidential
9. Set Up a Hadoop Cluster
▪ After everything is done, install clouderamanager-installer.bin from the Cloudera
Downlaods page
▪ Change the permission and install
▪ Login to admin console on http://<Server
host>:7180
▪ Follow the steps by Cloudera Manager
▪ Done!
Co-graph confidential
22. More about Cloudera Manager
▪ Easy to upgrade your CHD version
▪ Easy to add/delete a host and a cluster
▪ Easy to configure High Availability (HA)
▪ Support Hadoop security by using
Kerberos
▪ Support backup and disaster recovery
Co-graph confidential
26. Two Ways to Use Flume
Independent of Hadoop
cluster
• Flume can totally run by
itself
• Configure flume.conf in
/etc/flume-ng/conf
On cluster of Hadoop
Or a node managed by
Cloudera Manager
• Easy to keep the agent
nodes under control
• Start, Stop, Restart
service on admin console
• Configure flume on admin
console
• Convenient to check log
file
Co-graph confidential
27. 3 Important Settings
Source
• Define what kind of events sent by external source
to accept
Channel
• Define which way to keep the event until it’s
consumed by a Flume sink
Sink
• Define which repository like HDFS or Flume agent
to put/forward the event kept in Channel
Co-graph confidential