SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez nos Conditions d’utilisation et notre Politique de confidentialité.
SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez notre Politique de confidentialité et nos Conditions d’utilisation pour en savoir plus.
HISTORY• Distributed file systems have been around for a long time• DFS battle optimizing the CAP theorem• Hadoops DFS implementation is called HDFS• Wide adoption of hadoop, users forced to use HDFS as theonly alternative• HDFS has technical trade offs and limitations
HDFS ISSUESHandy• Locking around metadata operations permitted by single namenode• File locking permitted by single name nodeFrustrating• Difficult to get data in and out (ingest)• Name Node is single point of failure• Name Node is system bottleneck
GLUSTER FILESYSTEMGluster is an open source multi purpose DFSFeatures:• Data Striping• Global elastic hashing for file placement• Basic and GEO Replication• Full POSIX Compliant Interface• Flexible architecture• Supports Storage Resident Apps – Compute and Data onsame machineMore Info: www.gluster.org
HCFSHCFS: Hadoop Compatible File System• Implementing the o.a.h.fs.FileSystem interface not enough forexisting hadoop jobs to run on a different file system• HDFS architecture created semantics and assumptions• HCFS defines these semantics so any file system can replaceHDFS without fear of compatibility• Open ongoing effort to define file system semantics decoupledfrom architectureJIRA:issues.apache.org/jira/browse/HADOOP-9371
COMMON FILESYSTEMATTRIBUTES• Hierarchical structure of directories containing directories andfiles• File contain between 0 and MAX_SIZE data• Directories contain 0 or more files or directories• Directories have no data, only child elements
NETWORKASSUMPTIONS• The final state of a file system after a network failure isundefined• The immediate consistency state of a file system after anetwork failure is undefined• If a network failure can be reported to the client, the failureMUST be an instance of IOException
NETWORK FAILURE• Any operation with a file system MAY signal an error bythrowing an instance of IOException• File system operations MUST NOT throw RuntimeExceptionexceptions on the failure of a remote operations, authenticationor other operational problems• Stream read operations MAY fail if the read channel has beenidle for a file system specific period of time• Stream write operations MAY fail if the write channel has beenidle for a file system specific period of time• Network failures MAY be raised in the Stream close() operation
ATOMICITY• Rename of a file MUST be atomic• Rename of a directory SHOULD be atomic• Delete of a file MUST be atomic• Delete of an empty directory MUST be atomic• Recursive directory deletion MAY be atomic. Although HDFSoffers atomic recursive directory deletion, none of the other filesystems that Hadoop supports offers such a guarantee -including the local file systems• mkdir() SHOULD be atomic• mkdirs() MAY be atomic. [It is currently atomic on HDFS, butthis is not the case for most other filesystems -and cannot beguaranteed for future versions of HDFS]
CONCURRENCY• The data added to a file during a write or append MAY be visiblewhile the write operation is in progress• If a client opens a file for a read() operation while another read()operation is in progress, the second operation MUST succeed.Both clients MUST have a consistent view of the same data• If a file is deleted while a read() operation is in progress, the read()operation MAY complete successfully. Implementations MAYcause read() operations to fail with an IOException instead• Multiple writers MAY open a file for writing. If this occurs, theoutcome is undefined• Undefined: action of delete() while a write or append operation isin progress
CONSISTENCYThe consistency model of a Hadoop file system is one-copy-update-semantics; partiallygenerally that of a traditional Posix file system.• Create: once the close() operation on an output stream writing a newly created file hascompleted, in-cluster operations querying the file metadata and contents MUSTimmediately see the file and its data• Update: Once the close() operation on an output stream writing a newly created file hascompleted, in-cluster operations querying the file metadata and contents MUSTimmediately see the new data• Delete: once a delete() operation is on a file has completed, listStatus() , open() ,rename() and append() operations MUST fail• When file is deleted then overwritten, listStatus() , open() , rename() and append()operations MUST succeed: the file is visible• Rename: after a rename has completed, operations against the new path MUST succeed;operations against the old path MUST fail• The consistency semantics out of cluster client MUST be the same as in-cluster clients: Allclients calling read() on a closed file MUST see the same metadata and data until it ischanged from a create() , append() , rename() and append() operation