The Path to Migrating off MapR

MapR Future and Offload Solutions
Uncertainty around future state of MapR means looking at
alternative solutions to being able to offload HDFS data
Possible Offload Solutions:
Move to a different Hadoop distribution, Cloudera/Hortonworks
Move to the Cloud, EMR(AWS)/ Dataproc(GCP)/HDInsight(Azure)
Move to an Object Store

Why Object Store
Object storage solves problem of scalability at a fraction of the
cost of scale-out file systems(HDFS)
Economies of scale mean that it is also cheaper long term
than leveraging public cloud as data sizes grow
Object store providers give the flexibility of being able to
deploy both on-premise and in the cloud

Migration Solution Overview w/ Alluxio
Migration to Cloud Object
Store
Cloud Object Store
Alluxio
HDFS
Presto/Spark
OnPrem Object
Store
Alluxio
HDFS
Presto/Spark
Migration to On-Prem
Object store

Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
Independent scaling of compute & storage

Data Elasticity
with a unified
namespace
Abstract data silos & storage
systems to independently scale
data on-demand with compute
Run Spark, Hive, Presto, ML
workloads on your data
located anywhere
Accelerate big data
workloads with transparent
tiered local data
Data Accessibility
for popular APIs &
API translation
Data Locality
with Intelligent
Multi-tiering
Alluxio – Key innovations

Environment Setup with Alluxio
S3
Alluxio
HDFS
Alluxio with 2 mount points:
- First being an AWS S3 bucket
- Second being an on premise HDFS cluster

Data Movement via Alluxio (HDFS to Alluxio)
Map HDFSTables to Alluxio File Location
Assuming that the table already exists in HDFS and HDFS is
mounted as a root understore then can move an internal table
from to Alluxio as follows(shown via hive):
hive> alter table u_user set location
"alluxio://master_hostname:port/hdfs/hive/warehouse/u_user";
Once altered on first time reads the data will be served from
HDFS, however subsequent reads will pull from data cached in
Alluxio.

Data Movement via Alluxio (Alluxio to S3)
Create table against Alluxio file location backed by S3
hive> CREATE EXTERNALTABLE u_user (
userid INT, age INT, gender CHAR(1), occupation STRING,
zipcode STRING) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '|' LOCATION
'alluxio://master_hostname:port/s3/u_user’;

Data Movement via Alluxio (HDFS to S3)
Leverage the Alluxio cp command to copy a file or directory in the
Alluxio file system or between the local file system and Alluxio file
system.
cp command can be used to copy files between under storage systems.
$ ./bin/alluxio fs cp /hdfs/hive/warehouse/u_user /s3/u_user
The above will copy files belonging to the u_user table on HDFS to the
table we have created against S3

Data Movement Options via Alluxio
Through Alluxio you are able to seamlessly move data off of
HDFS and to the object store
Alluxio allows end users continued access to the data without
waiting for the data migration to complete
Alluxio can also set policies to do data movement on demand
(2.0 EE feature)

Alluxio
MasterZookeeper /
RAFT
Standby
Master
WAN
Alluxio
Client
Alluxio
Client
Alluxio
Worker
RAM / SSD / HDD
Alluxio
Worker
RAM / SSD / HDD
Alluxio Reference Architecture
…
…
Application
Application
Under Store 1
Under Store 2

Data Elasticity with a Global Namespace
hdfs://host:port/directory/
Reports Sales

Interacting with data in Alluxio – flexible app patterns
Reading Data
• From under store
• From a co-located Alluxio
node
• From a different Alluxio
node
Writing Data
• Write only to Alluxio
• Write only to Under Store
• Write synchronously to Alluxio and
Under Store
• Write to Alluxio and
asynchronously write to Under
Store
• Write to Alluxio and replicate to N
other workers
• Write to Alluxio and async write to
multiple Under stores
Application have great flexibility to read / write data with many options

Enterprises moving towards independent compute & storage
Learn more

Incredible Open Source Momentum with growing community
1000+ contributors &
growing
4000+ Git Stars
Apache 2.0 Licensed
Hundreds of thousands
of downloads
Join the conversation on Slack
alluxio.io/slack

Questions?
Join the Alluxio Community
www.alluxio.org | www.alluxio.com | @alluxio

The Path to Migrating off MapR

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à The Path to Migrating off MapR

Similaire à The Path to Migrating off MapR (20)

Plus de Alluxio, Inc.

Plus de Alluxio, Inc. (20)

Dernier

Dernier (20)

The Path to Migrating off MapR