"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
HDFS metadata (fsimage and edits) difference CDH3 and CDH4
1. fsimage and edits in CDH3 and CDH4
Tatsuo Kawasaki
tatsuo@cloudera.com
2. objective
HDFS metadata (fsimage and edits) management is different
between CDH3 and CDH4.
This presentation introduces a these difference.
Please let me know if you find any issue.
3. HDFS metada in CDH3
[root@localhost ~]# ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/
total 1100
-rw-r--r-- 1 hdfs hdfs 101 Jan 30 00:21 VERSION
-rw-r--r-- 1 hdfs hdfs 8 Jan 30 00:21 fstime
-rw-r--r-- 1 hdfs hdfs 57248 Jan 30 00:21 fsimage
-rw-r--r-- 1 hdfs hdfs 1048580 Jan 31 16:16 edits
after checkpoint
[root@localhost ~]# ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/
total 84
-rw-r--r-- 1 hdfs hdfs 101 Feb 5 14:37 VERSION
-rw-r--r-- 1 hdfs hdfs 8 Feb 5 14:37 fstime
-rw-r--r-- 1 hdfs hdfs 66760 Feb 5 14:37 fsimage
-rw-r--r-- 1 hdfs hdfs 4 Feb 5 14:37 edits
4. timeline (CDH3)
NameNode CheckPoint CheckPoint
put file start Done
t0 t1 t2 t3 t4
rename
fsimage fsimage.ckpt fsimage
Update edits
edits Update metadata in edits.new edits
memory rename
create
fstime fstime
get transfer update time
Secondary NameNode
fsimage fsimage.ckpt
merge
edits
6. HDFS metadata in CDH4
After formatting HDFS
-bash-4.1$ ls -l /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/
total 1040
-rw-r--r-- 1 hdfs hdfs 1048576 Feb 5 01:35 edits_inprogress_0000000000000000001
-rw-rw-r-- 1 hdfs hdfs 119 Feb 5 01:33 fsimage_0000000000000000000
-rw-rw-r-- 1 hdfs hdfs 62 Feb 5 01:33 fsimage_0000000000000000000.md5
-rw-r--r-- 1 hdfs hdfs 2 Feb 5 01:35 seen_txid
-rw-rw-r-- 1 hdfs hdfs 202 Feb 5 01:33 VERSION
-bash-4.1$ hexdump -C /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/seen_txid
00000000 31 0a |1.|
00000002
Transaction ID is
included in
seen_txid
7. try to add new file
[training@localhost ~]$ hadoop fs -put /etc/hosts hosts
[training@localhost ~]$
8. oiv - fsimage viewer
-bash-4.1$ hdfs oiv -i /var/lib/hadoop-
hdfs/cache/hdfs/dfs/name/current/fsimage_000000000000000
0000 -o aaa
-bash-4.1$ cat aaa
drwxr-xr-x - hdfs supergroup 0 1969-12-31 19:00 /
‘hosts’ file has not written in
fsimage before checkpointing
17. parameters (CDH4)
fsimage_0 The number of image checkpoint files that will be retained
fsimage_33 dfs.namenode.num.checkpoints.retained
edits_inprogress_34
edits_1-10
edits_11-32 The number of extra transaction which should be retained
edits_33-33 dfs.namenode.num.extra.edits.retained
interval
dfs.namenode.checkpoint.period
transcations
dfs.namenode.checkpoint.txns
Secondary NameNode Poll NameNode every seconds
dfs.namenode.checkpoint.check.period
*fstime is no longer necessary since it’s all encapsulated in the transaction IDs