Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

HDFS Tiered Storage: Mounting Object Stores in HDFS

742 vues

Publié le

Most users know HDFS as the reliable store of record for big data analytics. HDFS is also used to store transient and operational data when working with cloud object stores, such as Azure HDInsight and Amazon EMR. In these settings- but also in more traditional, on premise deployments- applications often manage data stored in multiple storage systems or clusters, requiring a complex workflow for synchronizing data between filesystems to achieve goals for durability, performance, and coordination.

Building on existing heterogeneous storage support, we add a storage tier to HDFS to work with external stores, allowing remote namespaces to be "mounted" in HDFS. This capability not only supports transparent caching of remote data as HDFS blocks, it also supports synchronous writes to remote clusters for business continuity planning (BCP) and supports hybrid cloud architectures.

This idea was presented at last year’s Summit in San Jose. Lots of progress has been made since then and the feature is in active development at the Apache Software Foundation on branch HDFS-9806, driven by Microsoft and Western Digital. We will discuss the refined design & implementation and present how end-users and admins will be able to use this powerful functionality.

Publié dans : Technologie
  • Soyez le premier à commenter

HDFS Tiered Storage: Mounting Object Stores in HDFS

  1. 1. HDFS Tiered Storage Thomas Demoor (Western Digital) Virajith Jalaparti (Microsoft)
  2. 2. >id Thomas Demoor • PO/Architect @ Western Digital • S3-compatible object storage • Hadoop: ̶ S3a optimizations • Fast uploader (stream from mem) • Hadoop2/YARN support • Coming up: object-store committer ̶ HDFS Tiered Storage Virajith Jalaparti • Scientist @ Microsoft CISL • Hadoop ̶ HDFS Tiered Storage 2
  3. 3. Overview • HDFS Tiered Storage ̶ Mount and manage remote stores through HDFS • Earlier talks ̶ Hadoop Summit ‘16, San Jose ̶ Dataworks Summit ‘17, Munich • This talk ̶ Introduce Tiered Storage in HDFS (design, read path,…) ̶ Focus on progress since earlier talks (mounting in HDFS, write path,…) ̶ Demo 3 REMOTE STORE APP HADOOP CLUSTER HDFS
  4. 4. Use Case I: Ephemeral Hadoop Clusters • EMR on S3, HDInsight over WASB, … • Several workarounds used today ̶ DistCp ̶ Use only remote storage ̶ Explicitly manage local and cloud storage • Goal: Seamlessly use local and remote (cloud) stores as one instance of HDFS ̶ Retrieve data to local cluster on-demand ̶ Use local storage to cache data 4 Data in cloud store (e.g., S3, WASB) Hadoop clusterHadoop cluster read/write read/write
  5. 5. Use Case II: Backup data to object stores • Business value of Hadoop + Object Storage: ̶ Data retention: very high fault tolerance (erasure coding) ̶ Economics: cheap storage for cold data ̶ Business continuity planning: backup, migrate, … • Public Clouds: Microsoft Azure, AWS S3, GCS, … • Private Clouds: WD ActiveScale Object Storage ̶ S3-compatible object storage system ̶ Linear scalability in # racks, objects, throughput ̶ Entry level (100’s TB) – Scale out (5PB+/rack) ̶ http://www.hgst.com/products/systems
  6. 6. • Today: Hadoop Compatible FileSystems (s3a://, wasb://) ̶ Direct IO between Hadoop apps and object store ̶ Scalable & Resilient: outsourcing NameNode functions • Compatible does not mean identical ̶ Most are not even FileSystems (notion of directories, append, …) ̶ No Data Locality: less performant for hot/real-time data ̶ Hadoop admin tools require HDFS: permissions/quota/security/… ̶ Workaround: explicitly manage local HDFS and remote cloud storage • Goal: integrate better with HDFS ̶ Data-locality for hot data + object storage for cold data ̶ Offer familiar HDFS admin abstractions Use Case II: Backup data to object stores APP HADOOP CLUSTER READWRITE
  7. 7. Solution: “Mount” remote storage in HDFS • Use HDFS to manage remote storage ̶ HDFS coordinates reads/writes to remote store ̶ Mount remote store as a PROVIDED tier in HDFS • Details later in the talk ̶ Set StoragePolicy to move data between the tiers 7 … … / a b HDFS Namespace … … … / d e f Remote Namespace Mount remote namespace c d e f Mount point REMOTE STORE APP HADOOP CLUSTER WRITE THROUGH LOAD ON-DEMAND HDFS READWRITE READWRITE
  8. 8. Solution: “Mount” remote storage in HDFS • Use HDFS to manage remote storage ̶ HDFS coordinates reads/writes to remote store ̶ Mount remote store as a PROVIDED tier in HDFS • Details later in the talk ̶ Set StoragePolicy to move data between the tiers • Benefits ̶ Transparent to users/applications ̶ Provides unified namespace ̶ Can extend HDFS support for quotas, security etc. ̶ Enables caching/prefetching 8 REMOTE STORE APP HADOOP CLUSTER HDFS
  9. 9. Challenges • Synchronize metadata without copying data ̶ Dynamically page in “blocks” on demand ̶ Define policies to prefetch and evict local replicas • Mirror changes in remote namespace ̶ Handle out-of-band churn in remote storage ̶ Avoid dropping valid, cached data (e.g., rename) • Handle writes consistently ̶ Writes committed to the backing store must “make sense” • Dynamic mounting ̶ Efficient/clean mount-unmount behavior ̶ One Object Store mapping to multiple Namenodes 9
  10. 10. Outline • Use cases • Mounting remote stores in HDFS • Demo 1. Backup from on-prem HDFS cluster to Azure Blob Store 2. Spin up an ephemeral HDFS cluster on Azure • Types of mounts • Reads in Tiered HDFS • Writes in Tiered HDFS 10
  11. 11. Demo summary 11 Azure blob storage Hadoop cluster on Azure On-prem HDFS Backup to Azure Storage (-setStoragePolicy PROVIDED) Generate FSImage FSImage /user/hadoop/workloads/ wasb://container@storageAccount /backup/user/hadoop/workloads/ /user/hadoop/workloads/
  12. 12. Outline • Use cases • Mounting remote stores in HDFS • Demo 1. Backup from on-prem HDFS cluster to Azure Blob Store 2. Spin up an ephemeral HDFS cluster on Azure • Types of mounts • Reads in Tiered HDFS • Writes in Tiered HDFS 12
  13. 13. Types of mounts • Ephemeral mounts ̶ Access data in remote store using HDFS (Use Case I) ̶ <source>: remoteFS://remote/path ̶ <dest>: hdfs://local/path ̶ Changes are bi-directional • Backup mounts ̶ Backup data from HDFS to remote store (Use Case II) ̶ <source>: hdfs://local/path ̶ <dest>: remoteFS://remote/path ̶ Changes are uni-directional hdfs dfsadmin -mount <source> <dest> [-ephemeral|-backup] 13 APP HDFS APP HDFS Ephemeral mount Backup mount
  14. 14. Reads in ephemeral mounts Remote namespace remoteFS:// … … … … / a b c e f g d Remote store mount Client read(/d/e) read(/c/d/e) (file data) (file data) DN1 DN2 HDFS cluster NN … … d e f g 14
  15. 15. Enabled using the PROVIDED Storage Type • Peer to RAM, SSD, DISK in HDFS (HDFS-2832) • Data in remote store mapped to HDFS blocks on PROVIDED storage ̶ Each block associated with BlockAlias = (REF, nonce) • Nonce used to detect changes on external store • REF = (file URI, offset, length); nonce = GUID • REF= (s3a://bucket/file, 0, 1024); nonce = <ETag> ̶ Mapping stored in a AliasMap • Can use a KV store which is external to or in the NN • PROVIDEDVolume on Datanodes reads/writes data from/to remote store DN1 Remote store DN2 BlockManager /𝑎/𝑓𝑜𝑜 → 𝑏𝑖, … , 𝑏𝑗 𝑏𝑖 → {𝑠1, 𝑠2, 𝑠3} /𝑟𝑒𝑚𝑜𝑡𝑒/𝑏𝑎𝑟 → 𝑏 𝑘, … , 𝑏𝑙 𝑏 𝑘 → {𝑠 𝑃𝑅𝑂𝑉𝐼𝐷𝐸𝐷} FSNamesystem NN AliasMap 𝑏 𝑘→ 𝐴𝑙𝑖𝑎𝑠 𝑘 … RAM_DISK SSD DISK PROVIDED 15
  16. 16. Example: Using an immutable cloud store • Create FSImage and AliasMap ̶ Block StoragePolicy can be set as required ̶ E.g.: {rep=2, PROVIDED, DISK } FSImage AliasMap /𝑑/𝑒 → {𝑏1, 𝑏2, … } /d/f/z1 → {𝑏𝑖, 𝑏𝑖+1, … } … 𝑏𝑖 → {rep = 1, PROVIDED} … 𝑏𝑖 → { 𝑟𝑒𝑚𝑜𝑡𝑒://c/d/f/z1, 0, 𝐿 , inodeId1} 𝑏𝑖+1 → { 𝑟𝑒𝑚𝑜𝑡𝑒://c/d/f/z1, 𝐿, 2𝐿 , inodeId1} … Remote namespace remoteFS:// … … … … / a b c e f g d Remote store 16
  17. 17. Example: Using an immutable cloud store • Start NN with the FSImage • All blocks reachable when a DN with PROVIDED storage heartbeats in … … d e f g NN BlockManager DN1 DN2 … … … … / a b c e f g d FSImage AliasMap 17 Remote namespace remoteFS://
  18. 18. Example: Using an immutable cloud store • DN uses BlockAlias to read from external store ̶ Data can be cached locally as it is read (read-through cache) … … d e f g NN BlockManager DFSClient getBlockLocation (“/d/f/z1”, 0, L) return LocatedBlocks {{DN2, 𝑏𝑖, PROVIDED}} Remote store lookup(𝑏𝑖) FSImage AliasMap 18 open(“remote:///c/d/f/z1/”, GUID1) … … … … / a b c e f g d Remote namespace remoteFS:// DN1 DN2
  19. 19. Writes in ephemeral mounts • Metadata operations ̶ create(), mkdir(), chown etc. ̶ Synchronous on remote store ̶ For FileSystems: Namenode performs operation on remote store first ̶ For Blob Stores: metadata operations need not be propagated • Example: Clients directly accessing S3 do not support notion of directories • Data operations ̶ One of the Datanodes in the write pipeline writes to remote store ̶ BlockAlias passed in write pipeline 19 APP HDFS DN3DN1 DN2 DFSClient Remote store Alias (Alias)
  20. 20. Writes in Backup mounts • Daemon on Namenode backs up metadata/data in the mount • Delegate work to Datanodes (similar to SPS [HDFS-10285]) • Backup of data based on remote store capabilities ̶ For FileSystems: Write block by block ̶ For blob stores: multi-part upload to upload blocks in parallel 20 APP HDFS DN2 Coordinator DN Remote store DN1
  21. 21. Writes in Backup mounts • Daemon on Namenode backs up metadata/data in the mount • Delegate work to Datanodes (similar to SPS [HDFS-10285]) • Backup of data based on remote store capabilities ̶ For FileSystems: Write block by block ̶ For blob stores: multi-part upload to upload blocks in parallel • Use snapshots to maintain a consistent view ̶ Backup a particular snapshot ̶ Backup changes from previous snapshot 21 APP HDFS
  22. 22. Assumptions • Churn is rare and relatively predictable ̶ Analytic workloads, ETL into external/cloud storage, compute in cluster • Clusters are either consumers/producers for a subtree/region ̶ FileSystem API has too little information to resolve conflicts Ingest ETL Raw Data Bucket Analytic Results Bucket Analytics 22
  23. 23. Conflict resolution • Conflicts occur when remote store is directly modified • Detected ̶ On read operations: e.g., using open-by-nonce operation ̶ On write operations: e.g., file to be created is already present • Pluggable policy to resolve conflicts ̶ “HDFS wins” ̶ “Remote store wins” ̶ Rename files under conflict 23
  24. 24. Status • Read-only ephemeral mounts ̶ HDFS-9806 branch on Apache Hadoop • Backup mounts ̶ Prototype available (available on github) • Next: ̶ Writes in ephemeral mounts ̶ Conflict resolution ̶ Create mounts in a running Namenode 24
  25. 25. Resources + Q&A • HDFS Tiered Storage HDFS-9806 ̶ Design documentation ̶ List of subtasks, lots of linked tickets – take one! ̶ Discussion of scope, implementation, and feedback • Joint work Microsoft – Western Digital ̶ {thomas.demoor, ewan.higgs}@wdc.om ̶ {cdoug,vijala}@microsoft.com 25
  26. 26. Backup slides 26
  27. 27. Benefits of the PROVIDED design • Use existing HDFS features to enforce quotas, limits on storage tiers ̶ Simpler implementation, no mismatch between HDFS invariants and framework • Supports different types of back-end storages ̶ org.apache.hadoop.FileSystem, blob stores, etc. • Credentials hidden from client ̶ Only NN and DNs require credentials of external store ̶ HDFS can be used to enforce access controls for remote store • Enables several policies to improve performance ̶ Set replication in FSImage to pre-fetch ̶ Read-through cache ̶ Actively pre-fetch while cluster is running 27

×