More Related Content Similar to Democratizing Memory Storage (20) More from DataWorks Summit (20) Democratizing Memory Storage1. © Hortonworks Inc. 2011 - 2015
Democratizing Memory Storage
Arpit Agarwal
arp@apache.org
@aagarw
Page 1
2. © Hortonworks Inc. 2011 - 2015
HDFS Heterogeneous Storage Media
Page 2
Architecting the Future of Big Data
3. © Hortonworks Inc. 2011 - 2015
Heterogeneous Storage (continued)
• Introduced in Apache Hadoop 2.3
• Memory introduced as a storage medium
–RAM Disk provides retention across process restarts
• Memory is treated differently due to its transient nature
–More on this later
Page 3
Architecting the Future of Big Data
4. © Hortonworks Inc. 2011 - 2015
HDFS Heterogeneous Storage (Continued)
• Rich storage media policies introduced in Hadoop 2.6
• Applications can target different storage media
• Set policy of individual file or directory sub-tree
–setStoragePolicy API
Page 4
Architecting the Future of Big Data
5. © Hortonworks Inc. 2011 - 2015
HDFS Heterogeneous Storage (Continued)
• Example built-in policies
– DEFAULT – All replicas on DISK
– ONESSD – One replica on SSD, rest on DISK
– ALLSSD – All replicas on SSD
– COLD – All replicas on Archival Storage
– LAZY_PERSIST – 1 replica in local memory, lazy write to disk
Page 5
Architecting the Future of Big Data
6. © Hortonworks Inc. 2011 - 2015
Page 6
Architecting the Future of Big Data
• Why not rely on the OS page cache?
7. © Hortonworks Inc. 2011 - 2015
Page 7
Architecting the Future of Big Data
• Scan workloads invalidate the page cache
–HDFS uses buffered IO for reads and writes
• Control the eviction scheme
• Permit further optimizations
–Checksum computation off the hot path
–Collocate data and computation
8. © Hortonworks Inc. 2011 - 2015
Centralized Cache Management (CCM)
• Introduced in Hadoop 2.3
• Pin hot data to memory
Page 8
Architecting the Future of Big Data
9. © Hortonworks Inc. 2011 - 2015
CCM (Continued)
• Administrator configures cache pools
• User issues commands to manage the contents of pools
• Users specify which files or directories are hot
–HDFS loads file contents into memory
Page 9
Architecting the Future of Big Data
10. © Hortonworks Inc. 2011 - 2015
CCM (Continued)
Page 10
Architecting the Future of Big Data
11. © Hortonworks Inc. 2011 - 2015
CCM (Continued)
• Eliminate checksum computations during read
–Checksums used to flag disk and network errors
–HDFS will pre-verify checksums when caching data from disk
• Data Node and the HDFS client use shared memory segments to
communicate which blocks are shared
Page 11
Architecting the Future of Big Data
12. © Hortonworks Inc. 2011 - 2015
CCM (Continued)
• Enables short-circuit and zero-copy reads from memory to avoid RPC
overhead
• Short-circuit reads are transparent to applications
• Zero-copy read API
–ByteBuffer read(ByteBufferPool factory, int maxLength,
EnumSet<ReadOption> opts);
–void releaseBuffer(ByteBuffer buffer);
• E.g. Apache Hive uses ZCR for ORC files
Page 12
Architecting the Future of Big Data
13. © Hortonworks Inc. 2011 - 2015
HDFS Lazy Persist Writes
• HDFS feature Introduced in Apache Hadoop 2.6
• Exposed via Storage Policies
–Set the LAZY_PERSIST policy on a file or directory
Page 13
Architecting the Future of Big Data
14. © Hortonworks Inc. 2011 - 2015
HDFS Lazy Persist Writes (continued)
• Applications can write to files in memory
• HDFS will write the data to persistent storage off the hot path
–Retain memory latency
• Expected to be used with single replica writes
–Latency benefits negated by pipeline replication over the network
Page 14
Architecting the Future of Big Data
15. © Hortonworks Inc. 2011 - 2015
HDFS Lazy Persist Writes (Continued)
Page 15
Architecting the Future of Big Data
16. © Hortonworks Inc. 2011 - 2015
HDFS Lazy Persist Writes (Continued)
• Best-effort persistence with retention across process restarts
• Data loss rare but possible – node restart, network partition
–Recovery pushed to compute framework layers
• Adoption by Apache projects
–Hive in-memory tables
–Low latency persistence for Spark RDDs
Page 16
Architecting the Future of Big Data
17. © Hortonworks Inc. 2011 - 2015
Areas of Improvement
• Cache data on Read as opposed to pinning on demand
• Short-circuit writes
–Eliminate Hadoop RPC overhead for writes
• Isolate applications from HDFS APIs
Page 17
Architecting the Future of Big Data
18. © Hortonworks Inc. 2011 - 2015
Areas of Improvement
• Challenging to fix computation frameworks to use memory storage
• Address use cases beyond intermediate data
–When to cache?
–Frameworks do not know
• The application context knows or the user knows
• Let the user decide
–E.g. jobfoo input=memfs://… tmp=memfs://… output=hdfs://…
Page 18
Architecting the Future of Big Data
19. © Hortonworks Inc. 2011 - 2015
Memfs – A Layered File System
• Planned for Apache Hadoop 2.9
• A thin HCFS that can layer over any other HCFS
• Transparently uses HDFS memory features when available
• HDFS has used layered FS approach before
–ViewFS, ChecksumFS
Page 19
Architecting the Future of Big Data
20. © Hortonworks Inc. 2011 - 2015
Page 20
Architecting the Future of Big Data
• Memfs paths correspond to underlying FS paths 1:1
–E.g. memfs://results.txt hdfs://results.txt
• Reading a file via Memfs loads it into DataNode RAM
• Writing a file via Memfs transparently uses the Lazy Persist Storage
Policy for low latency writes
21. © Hortonworks Inc. 2011 - 2015
Memfs Benefits
• Beyond the typical use case of intermediate data
• Isolate applications from HDFS APIs
–Let us evolve HDFS support over time
• Lightweight - no state maintained outside of the base FS
Page 21
Architecting the Future of Big Data
22. © Hortonworks Inc. 2011 - 2015
Memfs Benefits (Continued)
• All IO is channeled through the base FS in the user’s security context
• Behavior can be controlled by configuration
–E.g. Administrator configures separate cache pools for Memfs
–Move the pool selection logic to Memfs
• Future Memfs implementations using other base HCFS are possible
–May not be as lightweight
Page 22
Architecting the Future of Big Data
23. © Hortonworks Inc. 2011 - 2015
Spark RDD
• Spark Resilient Distributed Datasets
• Lineage Information for Fault Tolerance is recorded with the RDD
–Lost data recomputed via Lineage
• HDFS Lazy Persist writes can complement Spark RDD as a low latency
backing store (SPARK-6479)
Page 23
Architecting the Future of Big Data
24. © Hortonworks Inc. 2011 - 2015
Tachyon
• Tachyon is also a layered file system
–Powerful idea
• Works best when data is guaranteed to fit in memory
• Introduces the concept of Lineage
–Optional but required for persistence and recovery
–memfs designed to use recovery built into framework layers in case of rare failures
Page 24
Architecting the Future of Big Data
25. © Hortonworks Inc. 2011 - 2015
Credits
• Heterogeneous Storage Media
– Tsz Wo (Nicholas) Sze, Hortonworks (szetszwo@apache.org)
– Sanjay Radia, Hortonworks (sradia@apache.org)
– Suresh Srinivas, Hortonworks (suresh@apache.org)
– Junping Du, Hortonworks (junping_du@apache.org)
• Rich Storage Policies
– Jing Zhao, Hortonworks (jing9@apache.org)
– Tsz Wo (Nicholas) Sze, Hortonworks (szetszwo@apache.org)
• CCM
– Andrew Wang, Cloudera (wang@apache.org)
– Colin Mccabe, Cloudera (cmccabe@apache.org)
– Chris Nauroth, Hortonworks (cnauroth@apache.org)
• Lazy Persist Writes
– Jitendra Pandey, Hortonworks (jitendra@apache.org)
– Sanjay Radia, Hortonworks (sradia@apache.org)
– Xiaoyu Yao, Hortonworks (xyao@apache.org)
– Gopal V, Hortonworks (gopalv@apache.org)
Page 25
Architecting the Future of Big Data
26. © Hortonworks Inc. 2011 - 2015
Slides URL
• http://s.apache.org/mem-2015
Page 26
Architecting the Future of Big Data
28. © Hortonworks Inc. 2011 - 2015
Apache Hadoop File Systems primer (Bonus)
• FileSystem interface captures common FS operations
• Any conforming implementation is a Hadoop Compatible File System
(HCFS)
Page 28
Architecting the Future of Big Data
29. © Hortonworks Inc. 2011 - 2015
Page 29
Architecting the Future of Big Data
• HDFS is the canonical Hadoop FS
• Ships with Apache Hadoop and implements the complete set of features
exposed by the FileSystem interface e.g.
–Snapshots
–Heterogeneous Storage Media
–Extended Attributes
–Posix ACLs
• Supports Kerberos Authentication in Secure Mode
Editor's Notes Storage Policies can be set by unprivileged users.
HDFS also supports quotas on storage media which are set by the administrator Memory-mapped files are another option. Work well for reads but do not work well with the existing HDFS write pipeline. Cache pools are analogous to HDFS Quotas, but not quite the same
Cache pools allow administrators to control which users can use memory resources These two problems are relatively easy to solve. We don’t want to indiscriminately target all input or output data to memory
Frameworks lack application context such as which data will be accessed often, expected output size of a given job
Let’s say we have a hypothetical file system called memfs which performs caching io on both read and write path