Twitter stores hundreds of petabytes in multiple datacenters, on multiple Hadoop clusters. ViewFileSystem makes the interaction with our HDFS infrastructure as simple as a single namespace spanning all datacenters and clusters. Users no longer need to use any lengthy URI's. Instead they access files using a path as simple as /datacenter/cluster/user/cecily. We greatly extended the functionality behind ViewFileSystem to 1) automatically detect and add new underlying primitive namespaces (such as the local node filesystem and an HDFS namespace) to the global namespace 2) provide a namespace merge functionality allowing us to split existing namespaces transparently when they are full 3) enable synchronous writes to multiple datacenters simultaneously and transparent failover for reads. This effort resulted in the productivity boost for both: hadoop users and the @TwitterHadoop team, due to much greater user experience and higher reliability. It also contributed to reduction of technical debt as we almost eliminated the usage of web-based Hftp filesystem and migrated most use cases to the native HDFS clients with HA enabled. We'll conclude our presentation with our solutions of offering enhanced metrics and different SLAs in cross-datacenter traffic in Twitter's multi-tenant datacenter infrastructure.
17. hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA
hdfs --config /etc/hadoop/dw-dcB fsck -fs hdfs://dw-user-dc2 /
// find all “fileC” files on all clusters
for i in `ls /etc/hadoop`;
do hadoop --config /etc/hadoop/$i fs -ls fileC; done
20. ○
Description $sourcePath Result
global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable
Active namenode hdfs://revenue-log-nn1-dcA/logs/dirB /logs/dirB not reliable due to hard
coded namenode
21. ○
Description $sourcePath Result
global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable
Active namenode hdfs://revenue-log-nn1-dcA/logs/dirB /logs/dirB not reliable due to hard
coded namenode
hftp & DNS alias hftp://revenue-nn-dcA/logs/dirB hftp reliability and efficiency
32. // no need to remember copy(From/To)Local aka get/put
scalding> fsShell("-cp /local/user/cecily/file.txt
/user/cecily/hdfs_file.txt")
res6: Int = 0
41. c@dc1
FileSystem nfly = FileSystem.get(conf);
out = nfly.create(“/nfly/C/user/cecily/file.txt”);
out.write(....);
out.close() BEGIN
out.close() END
In = nfly.open(“/nfly/C/user/cecily/file.txt”);
c@dc2
/user/cecily/_nfly_tmp_file.txt
/user/cecily/_nfly_tmp_file.txt
close, rename to
/user/cecily/file.txt
close, rename to
/user/cecily/file.txt
client@dc1