2. Another way to monitor HBase processes
• org.apache.hadoop.hbase.tool.Canary
– Be used to do "canary monitoring" of a running HBase cluster.
– For each region tries to get one row per column family and
outputs some information about failure or latency
Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table 1 [table 2...]]
where [opts] are:
-help Show this help and exit.
-daemon Continuous check at defined intervals. # 6sec
-interval <N> Interval between checks (sec) # specify how many secs you want
https://issues.apache.org/jira/browse/HBASE-4393
4. Canary Tool study
private void sniffRegion(HRegionInfo region, HTable table) throws Exception {
HTableDescriptor tableDesc = table.getTableDescriptor();
for (HColumnDescriptor column : tableDesc.getColumnFamilies()) {
Get get = new Get(region.getStartKey());
get.addFamily(column.getName());
try {
long startTime = System.currentTimeMillis();
table.get(get);
long time = System.currentTimeMillis() - startTime;
sink.publishReadTiming(region, column, time);
} catch (Exception e) {
sink.publishReadFailure(region, column);
}
}
}
5. Canary Tool study
public interface Sink {
public void publishReadFailure(HRegionInfo region);
public void publishReadFailure(HRegionInfo region, HColumnDescriptor column);
public void publishReadTiming(HRegionInfo region, HColumnDescriptor
column, long msTime);
}
public static class StdOutSink implements Sink {
public void publishReadFailure(HRegionInfo region) {
LOG.error(String.format("read from region %s failed", region.getRegionNameAsString()));
}
public void publishReadFailure(HRegionInfo region, HColumnDescriptor column) {
LOG.error(String.format("read from region %s column family %s failed",
region.getRegionNameAsString(), column.getNameAsString()));
}
public void publishReadTiming(HRegionInfo region, HColumnDescriptor column, long
msTime) {
LOG.info(String.format("read from region %s column family %s in %dms",
region.getRegionNameAsString(), column.getNameAsString(), msTime));
}
}
6. Canary Tool study
//constructors
public Canary() {
this(new StdOutSink());
}
public Canary(Sink sink) {
this.sink = sink;
}
7. Canary Tool in Circus
Send mail if any
Fix problem
abnormal
Canary-tool
Start here Nagios Server
Tm-puppet
operation server
hbase-
Write to canary.log Read from
/var/log/hbase/
8. Canary Tool in Circus
com.trendmicro.spn.ops.hbase.RunCanaryTool
private static class CustomSink implements Canary.Sink {
public void publishReadFailure(HRegionInfo regionInfo) {
//...
LOG.error(String.format("Read from table:%s, region:%s failed", tableName, regionName));
}
public void publishReadFailure(HRegionInfo regionInfo, HColumnDescriptor colDescriptor) {
//...
LOG.error(String.format("Read from table:%s, region:%s, columnFamily:%s
failed", tableName, regionName, colFamilyName));
}
public void publishReadTiming(HRegionInfo regionInfo, HColumnDescriptor colDescriptor,
long msTime) {
//...
LOG.info(String.format("Read from table:%s, region:%s, columnFamily:%s in
%dms", tableName, regionName, colFamilyName, msTime));
}
}
9. Canary Tool in Circus
com.trendmicro.spn.ops.hbase.RunCanaryTool
public static void main(String[] args) throws Exception {
Canary canary = new Canary(new CustomSink());
int exitCode = ToolRunner.run(canary, args);
System.exit(exitCode);
}
hbase-canary-monitor.sh
su - hbase <<EOF
kinit -kt /etc/hbase/conf/hbase.keytab hbase/$(hostname -f)
java -cp $CLASSPATH com.trendmicro.spn.ops.hbase.RunCanaryTool $@
EOF