1. SuperVessel: Enabling Spark as a
Service with OpenStack and Docker
Guan Cheng (G.C.) Chen
IBM Research - China
weibo.com/parallellabs
2. SuperVessel Cloud
4/18/15 IBM
Research
-‐
China 2
• Public
cloud
built
on
the
POWER7/POWER8
servers
with
OpenStack
• It
provides
free
access
for
students,
researchers,
developers
across
the
world,
and
helps
grow
OpenPOWER
ecosystem
(used
in
30+
universiNes
now)
• It
provides
advanced
technology
services
such
as
Spark
as
a
Service,
Docker
Services,
CogniNve
CompuNng
Service,
IoT
Service,
Accelerator
as
a
Service
(FPGA
and
GPU)
3. Spark as a Service
4/18/15 IBM
Research
-‐
China 3
Step
1:
Login Step
2:
Create Step
3:
Ready!
3 steps to launch a Spark cluster, easy!
Try
it
at:
www.ptopenlab.com
4. Why OpenStack?
4/18/15 IBM
Research
-‐
China 4
• Most
popular
IaaS
soXware
• Supports
Docker*
• Heat
can
orchestra
docker
containers
easily
–
good
for
provision
a
Spark
cluster
Picture
source:
h-p://www.qyjohn.net/?p=3801
5. Why Docker?
• Less
resource
consumpNon
than
KVM
– We
can
provision
more
Spark
clusters!
• Boot
faster
than
KVM
– Users
like
fast
provision!
• Incrementally
build,
revert
and
reuse
your
container
– We
love
Git
and
AUFS!
• However,
Docker
is
not
officially
supported
in
OpenStack
yet
– Nova
Docker
is
an
external
component
of
OpenStack
• Port
Docker
to
POWER
Architecture
(ppc64
and
ppc64le)
– Ubuntu
15.04
includes
Docker
for
POWER8
ppc64le
4/18/15 IBM
Research
-‐
China 5
Picture
source:
h-p://www.zdnet.com/ar@cle/what-‐is-‐docker-‐and-‐why-‐is-‐it-‐so-‐
darn-‐popular/
6. Why Spark?
• Fast
• Unified
• Ecosystem
PorNng
to
POWER
– Bugfix
submieed
to
the
community
– Spark
1.3
works
smoothly!
J
4/18/15 IBM
Research
-‐
China 6
7. Why not Sahara?
Sahara is a component for Hadoop/Spark as a Service in OpenStack
• We
started
from
OpenStack
Icehouse
…
• DockerizaNon
– Beeer
service
deployment
and
isolaNon
for
Big
Data
dashboard
server
• CustomizaNon
– WaiNng
for
Sahara’s
improvements
is
somehow
*SLOW*
• Docker,
user
analyNcs,
Spark
1.4,
Spark
IDE,
scheduling,
data
visualizaNon
etc.
4/18/15 IBM
Research
-‐
China 7
8. 4/18/15 IBM
Research
-‐
China 8
Architecture Design
Big Data
Dashboard
Keystone
Glance
Neutron
Heat Nova
Nova
Docker
Cinder/
Manila
Spark Cluster
Docker
Image
Spark Master
Spark Worker
Spark Worker
Container 1
Container 2
Container 3
NameNode
Spark Driver
DataNode
DataNode
Billing&Auth
1
2
2 3
9. Dockerize everything!
• We
use
containers
to
– run
applica<ons
– run
other
containers
– run
OpenStack
python
daemons
– run
OpenStack
services
inside
OpenStack
(with
a
special
trunk
link
in
neutron/OVS)
– run
OpenStack
inside
OpenStack
• Good
for
mulN-‐site
expansion
4/18/15 IBM
Research
-‐
China 9
10. Heat template design
Heat is a component for orchestration in OpenStack
• Parameters
– Cinder/Malina/Neutron
uuid
– Size
of
Cinder/Malina
resources
• Resources
– Master/slave
node
– Neutron
– Cinder/Manila
• Need
to
modify
nova-‐docker
to
mount
the
cinder/
manila
resources
when
booNng
the
docker
container
4/18/15 IBM
Research
-‐
China 10
11. Spark Docker Image
• Built
from
Ubuntu
14.04.1
• All
Spark
nodes
use
the
same
image,
with
different
iniNalizaNon
scripts
by
using
cloudinit
• IniNalizaNon
scripts
will
– Sync
/etc/hosts
across
all
nodes
– Set
HDFS
and
Spark
configuraNons
accordingly
– Format
HDFS
and
launch
HDFS
– Launch
Spark
4/18/15 IBM
Research
-‐
China 11
12. Big Data Dashboard Development
• 2
Developers
(frond
end
+
backend)
– Online
in
2
months
– Separates
Heat
related
stuff
and
Dashboard
– Reskul
API
(for
billing
and
authenNcaNon
etc)
– Dockerize
the
big
data
dashboard
server
– Separates
development
and
producNon
environment
4/18/15 IBM
Research
-‐
China 12
13. Where should I put the data?
Shared File System for Cloud and Spark as a Service
4/18/15 IBM
Research
-‐
China 13
Docker
(Symphon
y)
Horizon
OpenStack controller
HEAT
NeutronGlance Manila
Nova
Cloud Infrastructure
Service
Big Data Service
• Select
Big
data
compuNng
framework
(Mapreduce,
SPARK
• Select
cluster
size
• Select
data
folder
size
HEAT
template
for
big
data
cluster
Docker
(Symphon
y)
Docker
(Symphon
y)
Docker
(Symphon
y)
Docker
(Symphon
y)
Docker
(SPARK)
POWER7/POWER8
KVM/
Docker
(Web app)
Folder
A
User
BUser
A
Folder
B
User
A
• HEAT
will
orchestrate
docker
instances,
subnet
and
data
folder
based
on
user’s
request
• Manila
provides
the
NFS
service
using
GPFS
as
backend,
and
the
folder
will
be
mounted
via
nova-‐docker
(with
-‐v
support)
• Folder
created
by
Manila
could
be
accessed
by
the
KVM/docker
instances
created
for
big
data
and
other
purpose
GPFS
FPO
POWER7/POWER8
Servers
GPFS
FPO
Servers
GPFS
FPO
KeyStone
Cinder
14. SuperVessel Services Roadmap
4/18/15 IBM
Research
-‐
China 14
SuperVessel
Cloud
Infrastructure
SuperVessel
Cloud
Service
SuperVessel
Big
Data
and
HPC
Service
Super
Class
Service
OpenPOWER
Enablement
Service
Super
Project
Team
Service
Super
Marketplace
1. VM
and
container
service
2. Storage
service
3. Network
service
4. Accelerator
as
service
5. Image
service
1. Big
Data:
MapReduce
(Symphony),
SPARK
2. Performance
tuning
service
1. X-‐to-‐P
migraNon:
AutoPort
tool
2. OpenPOWER
new
system
test
service
1. On-‐line
video
courses
2. Teacher
course
management
3. User
contribuNon
management
1. Project
management
service
2. DevOps
automaNon
Storage IBM
POWER
servers OpenPOWER
server FPGA/GPU
Docker
(Online)
(Online) (Preparing)
(Online)
15. Summary
• Spark
+
OpenStack
+
Docker
works
very
well
on
OpenPOWER
servers
• Dockerized
services
made
DevOps
easier
• Docker
issues
– Zombie
process
– Can’t
dynamically
aeach
volume
– Commit,
aeach
operaNons
is
not
supported
on
Nova-‐docker
• Monitoring
everything
(API,
resources
etc)
– key
for
operaNng
a
cloud
• TODO:
Spark
1.4,
Spark
IDE,
SWIFT,
Data
VisualizaNon
4/18/15 IBM
Research
-‐
China 15
16. 4/18/15 IBM
Research
-‐
China 16
Join Us!
www.ptopenlab.com
QQ
group:
SuperVessel
SuperVessel
WeChat
group
@冠诚
chengc@cn.ibm.com