Contenu connexe Similaire à Zh Tw Introduction To Hadoop And Hdfs (20) Zh Tw Introduction To Hadoop And Hdfs1. Hadoop
Hadoop (HDFS)
Public 2009/5/13
3. Hadoop ?
• Hadoop
• Apache top-level Cloud Applications
• Hadoop
– (HDFS) MapReduce HBase
– MapReduce
• Java Hadoop Distributed File System
(HDFS)
• C++/Java/Shell/
Command…
A Cluster of Machines
•
– Linux Mac OS/X Windows Solaris
–
Copyright 2009 - Trend Micro Inc.
4. Hadoop
• 2003 2
– Google MapReduce
• 2003 10
– Google Goofle File System (GFS)
• 2004 12
– Google MapReduce
• 2005 7
– Doug Cutting Nutch MapReduce
• 2006 2
– Hadoop Nutch Lucene
• 2006 11
– Google Bigtable
Copyright 2009 - Trend Micro Inc.
5. Hadoop
• 2007 2
– Mike Cafarella Hbase
• 2007 4
– Yahoo! 1000 Hadoop
• 2008 1
– Hadoop Apache
Copyright 2009 - Trend Micro Inc.
6. Who use Hadoop?
• Yahoo!
– Hadoop 2 CPU 10
• Google
– Hadoop
• Amazon
– Amazon Hadoop
–
• IBM
– Blue Cloud
• Trend Micro
– Hadoop
• Hadoop …
– http://wiki.apache.org/hadoop/PoweredBy
Copyright 2009 - Trend Micro Inc.
7. Hadoop (HDFS)
Copyright 2009 - Trend Micro Inc.
8. HDFS
• (Single
Namespace)
•
– 1 1 10 Peta Bytes
•
– Write-once-read-many
–
• (block)
– 128 MB
– (replica)
(DataNode)
Copyright 2009 - Trend Micro Inc.
9. HDFS
•
–
• (File replication)
– 3 .
–
•
–
–
•
– (low latency)
– (Batch processing)
Copyright 2009 - Trend Micro Inc.
12. (NameNode)
• NameNode HDFS (File System
Namespace)
– (blocks)
– (block) Data Node
• Hadoop cluster
•
Copyright 2009 - Trend Micro Inc.
13. NameNode (Metadata)
• Name node Metadata
– Metadata
–
• Metadata
– (files)
– (blocks)
– (block)
(Data Node)
–
• : (creation time),
(replication factor)
Copyright 2009 - Trend Micro Inc.
14. NameNode (Metadata)
• ( EditLog)
–
• FsImage
– Name Node
• (Name Space)
• (Block) (File)
•
– NameNode
FsImage EditLog
• Checkpoint
– NameNode
– FsImange
EditLog EditLog
FsImange
Copyright 2009 - Trend Micro Inc.
15. (Secondary NameNode)
• NameNode FsImage EditLog NameNode
• FSImage EditLog FSImage
• FSImage NameNode
– NameNode EditLog
• Secondary NameNode NameNode (Fail over)
– Hadoop Name Node
FsImage
FsImage
(new)
EditLog
Copyright 2009 - Trend Micro Inc.
16. NameNode
• NameNode SPOF (single point of failure)
• (High Availablity)
SPOF!!
Copyright 2009 - Trend Micro Inc.
17. (DataNode)
• (Blocks)
– ( ext3)
– block metadata
• (CRC), block
–
• Block
– Blocks
NameNode
– NameNode
block
NameNode
block
Copyright 2009 - Trend Micro Inc.
18. HDFS – (Replication)
• 3
• (block size)
(replication factor)
• (rack- aware)
.
Copyright 2009 - Trend Micro Inc.
20. Heartbeats
• DataNode Heartbeats NameNode
– 3
• NameNode Heartbeats DataNode
Copyright 2009 - Trend Micro Inc.
21. (Data Correctness)
• Checksum
– Cyclic Redundancy Check (CRC32 )
•
– 512 Checksum
– DataNode Checksum
•
– Checksum
–
Copyright 2009 - Trend Micro Inc.
22. (User Interface)
• API
– Java API
– C language wrapper for the Java API is also avaiable
• POSIX like command
– hadoop dfs -mkdir /foodir
– hadoop dfs -cat /foodir/myfile.txt
– hadoop dfs -rm /foodir myfile.txthadoop dfs -rm /foodir myfile.txt
• DFSAdmin
– bin/hadoop dfsadmin –safemode
– bin/hadoop dfsadmin –report
– bin/hadoop dfsadmin -refreshNodes
• Web
– http://host:port/dfshealth.jsp
Copyright 2009 - Trend Micro Inc.
23. Web
Copyright 2009 - Trend Micro Inc.
26. Java API
Copyright 2009 - Trend Micro Inc.
28. • Hadoop document and installation
– http://hadoop.apache.org/
• Hadoop Wiki
– http://wiki.apache.org/hadoop/
• Google File System Paper
– http://labs.google.com/papers/gfs.html
Copyright 2009 - Trend Micro Inc.