2. Questions I Have Been Asked
• I have a system that currently has 10 billion
files
• Files range from 10K to 100MB and average
about 1MB
• Access is from a legacy parallel app currently
running on thousands of machines?
• I want to expand to 200 billion files of about
the same size range
3. World-wide Grid
• I have 100,000 machines spread across 10
data centers around the world
• I have a current grid architecture local to each
data center (it works OK)
• But I want to load-level by mirroring data from
data center to data center
• And I want to run both legacy and Hadoop
programs
4. Little Records in Real Time
• I have a high rate stream of incoming small
records
• I need to have several years of data on-line
– about 10-30 PB
• But I have to have aggregates and alarms
based in real-time
– maximum response should be less than 10s end to
end
– typical response should be much faster (200ms)
5. Model Deployment
• I am building recommendation models off-line
and would like to scale that using Hadoop
• But these models need to be deployed
transactionally to machines around the world
• And I need to keep reference copies of every
model and the exact data it was trained on
• But I can’t afford to keep many copies of my
entire training data
6. Video Repository
• I have video output from thousands of sources
indexed by source and time
• Each source wanders around a lot, I know where
• Each source has about 100,000 video
snippets, each of which is 10-100MB
• I need to run map-reduce-like programs on all
video for particular locations
• 10,000 x 100,000 x 100MB = 10^17 B = 100PB
• 10,000 x 100,000 = 10^9 files
7. And then they say
• Oh, yeah… we need to have backups, too
• Say, every 10 minutes for the last day or so
• And every hour for the last month
• You know… like Time Machine on my Mac
9. Scenario 1
• 10 billion files x 1MB average
– 100 federated name nodes?
• Legacy code access
• Expand to 200 billion files
– 2,000 name nodes?! Let’s not talk about HA
• Or 400 node MapR – no special adaptation
10. World Wide Grid
• 10 x 10,000 machines + legacy code
• 10 x 10,000 or 10 x 10 x 1000 node cluster
• NFS for legacy apps
• Scheduled mirrors move data at end of shift
11. Little Records in Real Time
• Real-time + 10-30 PB
• Storm with pluggable services
• Or Ultra-messaging on the commercial side
• Key requirement is that real-time processors
need distributed, mutable state storage
12. Model Deployment
• Snapshots
– of training data at start of training
– of models at the end of training
• Mirrors and data placement allow precise
deployment
• Redundant snapshots require no space
13. Video Repository
• Media repositories typically have low average
bandwidth
• This allows very high density machines
– 36 x 3 TB = 100TB per 4U = 25TB net per 4U
• 100PB = 4,000 nodes = 400 racks
• Can be organized as one cluster or several
pods
14. And Backups?
• Snapshots can be scheduled at high frequency
• Expiration allows complex retention schedules
• You know… like Time Machine on my Mac
• (and off-site backups work as well)
16. MapR Areas of Development
HBase Map
Reduce
Ecosystem
Storage Management
Services
17. MapR Improvements
• Faster file system
– Fewer copies
– Multiple NICS
– No file descriptor or page-buf competition
• Faster map-reduce
– Uses distributed file system
– Direct RPC to receiver
– Very wide merges
18. MapR Innovations
• Volumes
– Distributed management
– Data placement
• Read/write random access file system
– Allows distributed meta-data
– Improved scaling
– Enables NFS access
• Application-level NIC bonding
• Transactionally correct snapshots and mirrors
19. MapR's Containers
Files/directories are sharded into blocks, which
are placed into mini NNs (containers ) on disks
Each container contains
Directories & files
Data blocks
Replicated on servers
Containers are 16-
No need to manage
32 GB segments of
directly
disk, placed on
nodes
20. MapR's Containers
Each container has a
replication chain
Updates are transactional
Failures are handled by
rearranging replication
21. Container locations and replication
N1, N2 N1
N3, N2
N1, N2
N1, N3 N2
N3, N2
CLDB
N3
Container location database
(CLDB) keeps track of nodes
hosting each container and
replication chain order
22. MapR Scaling
Containers represent 16 - 32GB of data
Each can hold up to 1 Billion files and directories
100M containers = ~ 2 Exabytes (a very large cluster)
250 bytes DRAM to cache a container
25GB to cache all containers for 2EB cluster
But not necessary, can page to disk
Typical large 10PB cluster needs 2GB
Container-reports are 100x - 1000x < HDFS block-reports
Serve 100x more data-nodes
Increase container size to 64G to serve 4EB cluster
Map/reduce not affected
23. Export to the world
NFS
NFS
Server
NFS
Server
NFS
Server
NFS Server
Client
24. Local server
Application
NFS
Server
Client
Cluster
Nodes