Spark China Summit 2015 Guancheng Chen

SuperVessel: Enabling Spark as a
Service with OpenStack and Docker
Guan Cheng (G.C.) Chen
IBM Research - China
weibo.com/parallellabs

SuperVessel Cloud
4/18/15 IBM
Research
-‐
China 2
•  Public
cloud
built
on
the
POWER7/POWER8
servers
with
OpenStack

•  It
provides
free
access
for
students,
researchers,
developers
across
the
world,
and

helps
grow
OpenPOWER
ecosystem
(used
in
30+
universiNes
now)

•  It
provides
advanced
technology
services
such
as
Spark
as
a
Service,
Docker
Services,

CogniNve
CompuNng
Service,
IoT
Service,
Accelerator
as
a
Service
(FPGA
and
GPU)

Spark as a Service
4/18/15 IBM
Research
-‐
China 3
Step
1:
Login Step
2:
Create Step
3:
Ready!
3 steps to launch a Spark cluster, easy!
Try
it
at:
www.ptopenlab.com

Why OpenStack?
4/18/15 IBM
Research
-‐
China 4
•  Most
popular
IaaS
soXware

•  Supports
Docker*

•  Heat
can
orchestra
docker
containers
easily
–
good
for
provision
a
Spark

cluster

Picture
source:
h-p://www.qyjohn.net/?p=3801

Why Docker?
•  Less
resource
consumpNon
than

KVM

–  We
can
provision
more
Spark

clusters!

•  Boot
faster
than
KVM

–  Users
like
fast
provision!

•  Incrementally
build,
revert
and

reuse
your
container

–  We
love
Git
and
AUFS!

•  However,
Docker
is
not
oﬃcially

supported
in
OpenStack
yet

–  Nova
Docker
is
an
external

component
of
OpenStack

•  Port
Docker
to
POWER

Architecture
(ppc64
and
ppc64le)

–  Ubuntu
15.04
includes
Docker
for

POWER8
ppc64le
4/18/15 IBM
Research
-‐
China 5
Picture
source:
h-p://www.zdnet.com/ar@cle/what-‐is-‐docker-‐and-‐why-‐is-‐it-‐so-‐
darn-‐popular/

Why Spark?
•  Fast

•  Uniﬁed

•  Ecosystem

PorNng
to
POWER

–  Bugﬁx
submieed
to
the

community

–  Spark
1.3
works

smoothly!
J

4/18/15 IBM
Research
-‐
China 6

Why not Sahara?
Sahara is a component for Hadoop/Spark as a Service in OpenStack
•  We
started
from
OpenStack
Icehouse
…

•  DockerizaNon

– Beeer
service
deployment
and
isolaNon
for
Big

Data
dashboard
server

•  CustomizaNon

– WaiNng
for
Sahara’s
improvements
is
somehow

*SLOW*

•  Docker,
user
analyNcs,
Spark
1.4,
Spark
IDE,
scheduling,

data
visualizaNon
etc.

4/18/15 IBM
Research
-‐
China 7

4/18/15 IBM
Research
-‐
China 8
Architecture Design
Big Data
Dashboard
Keystone
Glance
Neutron
Heat Nova
Nova
Docker
Cinder/
Manila
Spark Cluster
Docker
Image
Spark Master
Spark Worker
Spark Worker
Container 1
Container 2
Container 3
NameNode
Spark Driver
DataNode
DataNode
Billing&Auth
1
2
2 3

Dockerize everything!
•  We
use
containers
to

– run
applica<ons

– run
other
containers

– run
OpenStack
python
daemons

– run
OpenStack
services
inside
OpenStack
(with
a

special
trunk
link
in
neutron/OVS)

– run
OpenStack
inside
OpenStack

•  Good
for
mulN-‐site
expansion

4/18/15 IBM
Research
-‐
China 9

Heat template design
Heat is a component for orchestration in OpenStack
•  Parameters

– Cinder/Malina/Neutron
uuid

– Size
of
Cinder/Malina
resources

•  Resources

– Master/slave
node

– Neutron

– Cinder/Manila

•  Need
to
modify
nova-‐docker
to
mount
the
cinder/
manila
resources
when
booNng
the
docker
container

4/18/15 IBM
Research
-‐
China 10

Spark Docker Image
•  Built
from
Ubuntu
14.04.1

•  All
Spark
nodes
use
the
same
image,
with

diﬀerent
iniNalizaNon
scripts
by
using
cloudinit

•  IniNalizaNon
scripts
will

– Sync
/etc/hosts
across
all
nodes

– Set
HDFS
and
Spark
conﬁguraNons
accordingly

– Format
HDFS
and
launch
HDFS

– Launch
Spark
4/18/15 IBM
Research
-‐
China 11

Big Data Dashboard Development
•  2
Developers
(frond
end
+
backend)

– Online
in
2
months

– Separates
Heat
related
stuﬀ
and
Dashboard

– Reskul
API
(for
billing
and
authenNcaNon
etc)

– Dockerize
the
big
data
dashboard
server

– Separates
development
and
producNon

environment
4/18/15 IBM
Research
-‐
China 12

Where should I put the data?
Shared File System for Cloud and Spark as a Service
4/18/15 IBM
Research
-‐
China 13
Docker
(Symphon
y)
Horizon
OpenStack controller
HEAT
NeutronGlance Manila
Nova
Cloud Infrastructure
Service
Big Data Service
•  Select
Big
data
compuNng

framework
(Mapreduce,

SPARK

•  Select
cluster
size

•  Select
data
folder
size
HEAT
template
for
big
data
cluster
Docker
(Symphon
y)
Docker
(Symphon
y)
Docker
(Symphon
y)
Docker
(Symphon
y)
Docker
(SPARK)
POWER7/POWER8
KVM/
Docker
(Web app)
Folder
A
User
BUser
A
Folder
B
User
A
•  HEAT
will
orchestrate
docker
instances,
subnet
and
data
folder
based
on
user’s
request

•  Manila
provides
the
NFS
service
using
GPFS
as
backend,
and
the
folder
will
be
mounted
via
nova-‐docker
(with
-‐v
support)

•  Folder
created
by
Manila
could
be
accessed
by
the
KVM/docker
instances
created
for
big
data
and
other
purpose

GPFS
FPO
POWER7/POWER8
Servers
GPFS
FPO
Servers
GPFS
FPO
KeyStone
Cinder

SuperVessel Services Roadmap
4/18/15 IBM
Research
-‐
China 14
SuperVessel
Cloud
Infrastructure

SuperVessel

Cloud

Service
SuperVessel

Big
Data
and
HPC

Service
Super

Class

Service
OpenPOWER

Enablement

Service
Super
Project

Team

Service
Super
Marketplace
1.  VM
and
container

service

2.  Storage
service

3.  Network
service

4.  Accelerator
as

service

5.  Image
service

1.  Big
Data:

MapReduce

(Symphony),
SPARK

2.  Performance
tuning

service
1.  X-‐to-‐P
migraNon:

AutoPort
tool

2.  OpenPOWER
new

system
test

service
1.  On-‐line
video

courses

2.  Teacher
course

management

3.  User
contribuNon

management

1.  Project

management

service

2.  DevOps

automaNon
Storage IBM
POWER
servers OpenPOWER
server FPGA/GPU
Docker
(Online)
(Online) (Preparing)
(Online)

Summary
•  Spark
+
OpenStack
+
Docker
works
very
well
on
OpenPOWER

servers

•  Dockerized
services
made
DevOps
easier

•  Docker
issues

–  Zombie
process

–  Can’t
dynamically
aeach
volume

–  Commit,
aeach
operaNons
is
not
supported
on
Nova-‐docker

•  Monitoring
everything
(API,
resources
etc)

–  key
for
operaNng
a
cloud

•  TODO:
Spark
1.4,
Spark
IDE,
SWIFT,
Data
VisualizaNon

4/18/15 IBM
Research
-‐
China 15

4/18/15 IBM
Research
-‐
China 16
Join Us!
www.ptopenlab.com
QQ
group:
SuperVessel
SuperVessel

WeChat
group
@冠诚
chengc@cn.ibm.com

Spark China Summit 2015 Guancheng Chen

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Spark China Summit 2015 Guancheng Chen

Similar to Spark China Summit 2015 Guancheng Chen (20)

Spark China Summit 2015 Guancheng Chen