Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
HDFS(Hadoop DistributedFile System)                      Thiru
Typical Work     Writing file in           flow            to HDFS          Reading file       RackAgenda    from HDFS    ...
Client                                                                                    Masters                         ...
Hadoop ClientHadoopCluster   Name Node   Job Tracker   Secondary NN   Hadoop Client           DN + TT           DN + TT   ...
Write data in to cluster (HDFS)                              Analyze the data (Map Reduce)Sample HDFS                   St...
I want to                                                           File size 200 MB                      write file      ...
Name Node                DN 1               DN 5           DN 9                       A       C              C            ...
File.txt                     File size 200 MB              Hadoop Client                                     Name Node    ...
 Data node sends hearth beats             Every 10th heart beat is Block report             Name node builds meta data ...
File System Metadata:                               File.txt = A0 {1,5,7}                                    A1 {1,7,9}   ...
Primary Name Node          Secondary Name Node                edits         fsimage                                       ...
I want to read file                             file.txt                      Hadoop Client                               ...
Single Point of                                       Failure                 # Task per                 node             ...
Practice at Yahoo!
 HDFS clusters at Yahoo! include about 3500 nodes               A typical cluster node has:               · 2 quad core...
 70 percent of the disk space is allocated to HDFS. The remainder is                reserved for the operating system (Re...
 Durability of Data              uncorrelated node failures              Replication of data three times is a robust gua...
 BenchmarksPractice atYAHoo!
 BenchmarksPractice atYAHoo!                       NameNode Throughput benchmark
 Automated failover                plan: Zookeeper, Yahoo’s distributed consensus technology to              build an aut...
Thank you
Prochain SlideShare
Chargement dans…5
×

4

Partager

Télécharger pour lire hors ligne

Understanding hdfs

Télécharger pour lire hors ligne

Understanding hdfs

  1. 1. HDFS(Hadoop DistributedFile System) Thiru
  2. 2. Typical Work Writing file in flow to HDFS Reading file RackAgenda from HDFS Awareness Planning for a Q&A Cluster
  3. 3. Client Masters HDFS Map Reduce {Name Node} {Job Tracker} {Secondary Name Node}Hadoop Server Data Node Data Node Data Node Data NodeRoles Task Tracker Task Tracker Task Tracker Task Tracker Slaves Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker
  4. 4. Hadoop ClientHadoopCluster Name Node Job Tracker Secondary NN Hadoop Client DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT
  5. 5. Write data in to cluster (HDFS) Analyze the data (Map Reduce)Sample HDFS Store the result in to cluster (HDFS)Workflow Read the result from cluster (HDFS) Sample scenario: How many times customer called to customer care enquiring about a recently launched product? Compare it against the AD campaign in the television. Correlate both and find the best time to run the AD HDFS CRM Data entry SQOOP Map Reduce Result Result
  6. 6. I want to File size 200 MB write file Hadoop Client Name NodeWrite data in Data Node 1 Data Node 2 Ok! Block size is 64 MB. Split theto cluster file in to 3 and(HDFS Data Node 3 Data Node 4 write in to node 1,4,5 Data Node 5 Data Node 6 Data node replicates as Client write Client Consults per replication Cycle repeats data to one Name node factor and for every block data node intimates Name node
  7. 7. Name Node DN 1 DN 5 DN 9 A C C Rack Aware: Rack 1: B DN 2 DN 6 B DN 10 Data node 1 A Data node 2 Rack 2: DN 3 DN 7 C DN 11Rack DN 4 A DN 8 B DN 12 Data node 5Awareness Never loose data when a rack is down Keep bulky flows within Rack when possible Assumption in rack has higher bandwidth, and low latency
  8. 8. File.txt File size 200 MB Hadoop Client Name Node A B C Replicate in 3,8Multi BlockReplication DN 1 DN 5 DN 9 A DN 2 DN 6 DN 10 DN 3 A DN 7 DN 11 DN 4 DN 8 A DN 12
  9. 9.  Data node sends hearth beats  Every 10th heart beat is Block report  Name node builds meta data from block report  If name node is down, HDFS is downName Node  Missing heartbeats signify lost nodes  Name node consults metadata and finds affected data  Name node consults rack awareness script  Name node tells data node to replicate
  10. 10. File System Metadata: File.txt = A0 {1,5,7} A1 {1,7,9} A2{5,10,15} Primary Name Secondary Name Node nodeName Node & It’s been 1 hr, giveSecondary your dataName node  Not a hot standby for the name node* (Zoo keeper)  Connects to name node every one hour* (Configurable)  Housekeeping, backup of Name node meta data  Saved meta data can be used to rebuild name node
  11. 11. Primary Name Node Secondary Name Node edits fsimage edits fsimageUnderstanding edits- newSecondaryname node Fsimage.ckpt Fsimage.ckpthouse keeping edits Fsimage.ckpt
  12. 12. I want to read file file.txt Hadoop Client Name NodeReading data Data Node 1 Data Node 2 Data Node 7 Ok! File.txt = blck afrom HDFS A B B {1,5,6}Cluster Data Node 3 Data Node 4 Data Node 8 Blck b {8,1,2} Blck c {5,8,9} C B Data Node 5 Data Node 6 Data Node 9 C A A C Client Client Client picks Client reads receives DN Consults first node of data list for each Name node list sequentially block
  13. 13. Single Point of Failure # Task per node Dual power •1 core can run supply for 1.5 Mapper or redundancy ReducerChoosing right Masterhardware node RAM thumb rule – 1 GB/ No Commodity Million blocks hardware of data Regular Data backup
  14. 14. Practice at Yahoo!
  15. 15.  HDFS clusters at Yahoo! include about 3500 nodes  A typical cluster node has:  · 2 quad core Xeon processors @ 2.5ghzPractice at  · Red Hat Enterprise Linux Server Release 5.1YAHoo!  · Sun Java JDK 1.6.0_13-b03  · 4 directly attached SATA drives (one terabyte each)  · 16G RAM  · 1-gigabit Ethernet
  16. 16.  70 percent of the disk space is allocated to HDFS. The remainder is reserved for the operating system (Red Hat Linux), logs, and space to spill the output of map tasks. (MapReduce intermediate data are not stored in HDFS.)  For each cluster, the NameNode and the BackupNode hosts arePractice at specially provisioned with up to 64GB RAM; application tasks are never assigned to those hosts.YAHoo!  In total, a cluster of 3500 nodes has 9.8 PB of storage available as blocks that are replicated three times yielding a net 3.3 PB of storage for user applications. As a convenient approximation, one thousand nodes represent one PB of application storage.
  17. 17.  Durability of Data uncorrelated node failures Replication of data three times is a robust guard against loss of data due to uncorrelated node failures.Practice at correlated node failures, the failure of a rack or core switch.YAHoo! HDFS can tolerate losing a rack switch (each block has a replica on some other rack). loss of electrical power to the cluster a large cluster will lose a handful of blocks during a power-on restart.
  18. 18.  BenchmarksPractice atYAHoo!
  19. 19.  BenchmarksPractice atYAHoo! NameNode Throughput benchmark
  20. 20.  Automated failover plan: Zookeeper, Yahoo’s distributed consensus technology to build an automated failover solution  Scalability of the NameNodeFuture work Solution: Our near-term solution to scalability is to allow multiple namespaces (and NameNodes) to share the physical storage within a cluster. Drawbacks: The main drawback of multiple independent namespaces is the cost of managing them.
  21. 21. Thank you
  • PuneetChavan

    May. 27, 2019
  • rajakasturi

    Sep. 9, 2016
  • itworks

    Jan. 6, 2015
  • binlijin

    Apr. 16, 2013

Vues

Nombre de vues

1 947

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

467

Actions

Téléchargements

58

Partages

0

Commentaires

0

Mentions J'aime

4

×