SlideShare a Scribd company logo
1 of 35
Big Data Technologies
Experimenting with Openstack Sahara on Docker
Weiting Chen
weiting.chen@intel.com
BIG DATA TECHNOLOGY
Legal Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from
course of performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information
provided here is subject to change without notice. Contact your Intel representative to obtain the latest
forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause
deviations from published specifications. Current characterized errata are available on request.
© 2015 Intel Corporation.
BIG DATA TECHNOLOGY
Agenda
Background
How to use Docker with Sahara
Performance Testing
Conclusion
BIG DATA TECHNOLOGY
Who We Are
We are from Intel Big Data Technology Group.
We push big data technology forward into OpenStack
We contribute Sahara source code in OpenStack,
bring Cloudera CDH 5.3 plugin in Kilo.
BIG DATA TECHNOLOGY
Sahara Background
• Sahara becomes a core project in Juno
• Bring Hadoop into OpenStack
• Add more features to Kilo release
• Two Key Features
1. To provide users easily provisioning Hadoop clusters by specifying several parameters
2. Analytics as a Service for data scientist or analyst
BIG DATA TECHNOLOGY
Sahara Key Features - Provision Cluster
Create/Terminate Cluster
• Heat API/Nova Direct API
• Integrate with Neutron/Nova Network
• Use Guide as a template
• Anti-affinity
Cluster Scaling
• Add Node/Remove Node
Support More Plugins in Kilo
• Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR/Storm
BIG DATA TECHNOLOGY
Sahara Key Features - Elastic Data Processing
Support Job Type
• Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase
Support Data Locality
• Rack/Hypervisor/Swift
Data Source
• Internal: Internal HDFS(Ephemeral Disk/Cinder)
• External: Swift/HDFS
Run Job in Transient Cluster
*Different Plugin provide different capabilities
BIG DATA TECHNOLOGY
Sahara Working Flow
Fast Cluster Provisioning
Select
Hadoop Version
Select
Base Image
w/ Hadoop
Define
Cluster
Configuration
Provision
Cluster
Operate
Cluster
Terminate
Cluster
Analytic as a Service using Elastic Data Processing
Select
Hadoop Version
Configure Jobs
Set Limit
for Cluster
Execute Jobs Get The Result
• Choose type of the job: pig, hive, jar-file, etc.
• Select input and output data location (Swift support)
• Cluster will be removed automatically after the job completion
• Provide the details Hadoop configuration, like size, topology, and others
• Sahara will provision VMs, install and configure Hadoop
• Support Scale out Cluster to add/remove nodes
BIG DATA TECHNOLOGY
Sahara Data Processing
Swift
OpenStack
Virtual Clusters
OpenStack
Virtual Clusters
HDFS
Collector Agent
Data Stream
Pattern 2: External - SwiftPattern 1: Internal - HDFS Only
Collector Agent
Collecting Data
Collecting Data
OpenStack use Swift as a data source to store input
and output data. The benefit is to process the data
directly and persist the data via Swift.
OpenStack support to create HDFS on Cinder or
Ephemeral Disk. This method can provide a better
data processing performance via Ephemeral Disk or
to persist the data via Cinder with lower performance.
Cinder
Ephemeral Disk
MapReduce MapReduce
BIG DATA TECHNOLOGY
Docker Background
• An open source project
• The latest version is v1.6
• Automates the deployment of applications inside software containers
• Provide fast and application portability
• Use libcontainer library to use virtualization facilities from Linux kernel
• Resource isolation using cgroups, kernel namespaces, …etc
BIG DATA TECHNOLOGY
Sahara + Docker
• Deliver Better Performance (compare with hypervisors)
• Optimize Resource Utilization
• Reduce Cost
• Fast Deployment
BIG DATA TECHNOLOGY
Sahara Architecture
Sahara
RESTAPI
Horizon
Python Sahara
Client
Sahara Pages
Keystone
Auth
DAL Image Registry
Provisioning
Engine
Vendor Plugins
EDP
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Nova|Heat|Cinder
Glance
Neutron
BIG DATA TECHNOLOGY
Sahara + Docker Architecture
Sahara
RESTAPI
Horizon
Python Sahara
Client
Sahara Pages
Keystone
Auth
DAL Image Registry
Provisioning
Engine
Vendor Plugins
EDP
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Nova|Heat|Cinder
Glance
nova docker
driver
Docker
Registry
Docker Image
Docker
Neutron
BIG DATA TECHNOLOGY
Sahara CDH Plugin
Controller Computing Node1
Cloudera Manager
API Python Client
(Migrate from CM-API Client)
Sahara Service
Horizon(OpenStack Dashboard)
CDH Plugin
Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine.
Step2: Use CM API Client to connect to CM and provision the other services in the cluster.
STEP1
STEP2
End Customer
VM1 - Master VM2 - Slave
Cloudera Manager
(Cloudera Express v5.1.3,
CDH v5.0.0 & CM API v7)
Job History
Resource Manager
Oozie Server
Name Node
Secondary
Name Node
Data Node
Node Manager
STEP3
CDH Cluster
BIG DATA TECHNOLOGY
Nova Docker Driver
Introduced with Havana, move out Icehouse and Juno
For Juno,
• Must install an older version novadocker
# git checkout -b pre-i18n 9045ca43b645e72751099491bf5f4f9e4bddbb91
• Implement a RESTFul client via httplib to communicate with Docker
For Kilo(Upstream),
• Need to install docker-py
• Use Docker API Client to communicate with Docker
BIG DATA TECHNOLOGY
Authenticate & Hostname Issue
Use username & password instead of inject authorized key into instance
• No cloud-init in docker image, use username & password instead of inject key
Upgrade Docker version to support change hostname
• Docker v1.2 or later can support to change hostname
Change “sudo mv etc-host /etc/hosts” to “sudo cp etc-host /etc/hosts”
• Docker v1.3 response the device is busy when using “mv”. By using “cp” to replace “mv” can
be success to run the change
BIG DATA TECHNOLOGY
Network Port Issue
Open Privilege Mode to expose all the ports in the container
• Modify nova docker driver source code to add “privileged=True” and publish
all ports
BIG DATA TECHNOLOGY
Docker Image
Build a docker image by using Dockerfile
• Refer sahara-image-elements to build a CDH5 docker image
• Build a docker image may take a lot of time(try-and-error)
• Better use Dockerfile cache to reduce the time building the image
Copy docker image to every compute node manually
• Must copy docker image to all the compute nodes, currently glance cannot support to copy
the image to compute node
• If the image cannot be found in docker images, nova will raise an error during starting an
instance
BIG DATA TECHNOLOGY
Build Docker Image - using Dockerfile
Using docker build to build image by DockerFile
# docker build -t $image_name:$tag
From centos:centos6
MAINTAINER Weiting Chen weiting.chen@intel.com
ENV http_proxy http://xxx:1080
…
RUN echo 'proxy=http://xxx:1080' >> /etc/yum.conf
RUN yum install -y cloudera-manager-agent …
…
EXPOSE 21
…
Add ENV variables at beginning
1. Add proxy setting in individual
software configuration
2. Install required software
Expose Required Service Port
Dockerfile Example
BIG DATA TECHNOLOGY
Register & Copy Docker Image to Compute Nodes
Register docker image to glance
# docker save cdh5:20150425 | glance image-create --is-public=True --
container-format=docker --disk-format=raw --name cdh5:20150425
Copy image to all compute nodes
# scp cdh5:20150425.tar $compute_node:./
Load image to docker registry
# docker load -i cdh5:20150425.tar
If no image can be used in computing node, it will raise an error from nova.
BIG DATA TECHNOLOGY
Nova Docker Driver Network
• Set network to “none”
• Nova docker driver would leverage existing network configuration from Neutron
• Support Linux Bridge or OVS
• NOT use docker0
• Use VXLAN in our experiment
• Create a bridge to OVS automatically
• Set Privilege Mode to True for convenience
• Need to set port mapping during docker run if not use privilege mode
BIG DATA TECHNOLOGY
Docker Network
Host1
Docker
Container1 Container2 Container3
eth0
172.17.42.10
eth0
172.17.42.11
eth0
172.17.42.12
docker0
172.17.42.1
Host1
Docker
Container1 Container2 Container3
eth0
192.168.0.1
eth1
10.10.10.1
docker0
172.17.42.1
Host1
Docker
Container1 Container2 Container3
docker0
172.17.42.1
Bridge Mode Host Mode None Mode
• Default Mode
• Support multiple namespaces
• Only one namespace • Nova Docker Driver use this
• Configure network and
connect to bridge via driver
BIG DATA TECHNOLOGY
Docker Network Performance
Host1 Host2
Host1
C1
941 Mb
Host2
941 Mb
Host1
C1
Host2900 Mb
C1
Host1
C1 941 Mb
Host1
C1 C2
14 Gb
• Container to the same Host
• Container to Container in the same Host• Container to Container in different Host
• Container to different Host
• Host to Host
phy. network
br-ex(floating ip)
br-tun
BACKGROUND
• OpenStack Juno using VXLAN
• Use Docker v1.3
• 1Gb Ethernet
br-ex(floating ip)
qbr~
14Gb
w/ DVR
BIG DATA TECHNOLOGY
Neutron VXLAN without DVR
Controller/Network Node Compute Node
br-tun
patch-tun
br-int
br-ex
eth1
br-tun
patch-tun
br-int
qvo~
172.16.0.0/16
192.168.0.0/16
VM
vm0
eth2 eth2
VM
vm0
qbr~qvb~
patch-int tap~
10.0.0.0/16
tap
patch-int
tap
qdhcp
ns~
snat-
sg~ qg~
qrouter~
qr~
tap
tap
BIG DATA TECHNOLOGY
Neutron VXLAN with DVR
Controller/Network Node Compute Node
br-tun
patch-tun
br-int
br-ex
eth1
br-tun
patch-tun
br-int
tap
qvo~
172.16.0.0/16
192.168.0.0/16
br-ex
eth1
VM
vm0
eth2 eth2
VM
vm0
tap
qbr~qvb~
patch-int tap~
fip- qrouter~
fpr~ rfp~
10.0.0.0/16
fg~ qr~
tap
patch-int
tap
qdhcp
ns~
snat-
sg~ qg~
qrouter~
qr~
tap
tap
BIG DATA TECHNOLOGY
Change MTU Size
• Change MTU Size if you are using VXLAN
• Impact:
MTU size could impact the network performance. If the MTU size is not
change, create instances still can work, but network performance is going
down to 1MB.
• Solution:
Change MTU Size in VM
#sudo ifconfig eth1 mtu 1400 up
BIG DATA TECHNOLOGY
Container Disk Space
• Default image disk space only use 10 GB
• Impact:
Default reserve 10GB space for HDFS configuration, there is no space to put
data in HDFS
• Solution:
Assign parameters when starting Docker service
# sudo ./docker -d --storage-opt dm.basesize=20G --storage-opt
dm.loopdatasize=200G &
*To enable the parameters must clean up /var/lib/docker/ and restart docker
BIG DATA TECHNOLOGY
vCPU Numbers
The number of vCPU is always 1.
• Impact:
vCPUs calculation may be fail.
• Solution:
In Juno, change the number in nova docker driver source code and set it
equal to the number of physical cores.
BIG DATA TECHNOLOGY
Docker in OpenStack Performance
Network Performance
Instance Boot/Cluster Provision
Disk Performance using DD
HiBench Testing
BIG DATA TECHNOLOGY
Our Testing Environment
CLUSTER CONFIGURATION
Role Details
Controller w/ Compute x 1 Controller, Network, Compute
Compute x 5 Compute
HARDWARE CONFIGURATION
Items Details
CPU Intel Xeon X5670 2.93Ghz
Memory 64GB(1333Mhz 8GB x 8)
Storage 1TB SATA HDD
SOFTWARE CONFIGURATION
Software Name Versions
CentOS 7.0
Docker v1.6
OpenStack Juno
BIG DATA TECHNOLOGY
Create an instance/Provision a cluster
Assume image has been copied to all the computing node. Create an instance
and check the log to capture the response time.
Use Docker in 1sec
Use KVM in 10sec
Provision a cdh cluster still take a long time, this issue comes from Sahara CDH
plugin.
BIG DATA TECHNOLOGY
DD Test
Docker Container use CentOS6.6 in Host with CentOS7.
File System is XFS
Use DD Command: dd if=/dev/zero of=test1 bs=1M count=8192 conv=fdatasync
Host: 140~160MB/s
Host w/ OpenStack: 100~130MB/s(Controller), 140~160MB/s(Compute)
Container Result: 100~140MB/s
Docker can provide almost closer disk IO performance with Bare Metal
BIG DATA TECHNOLOGY
Conclusion
• Docker can bring benefit to boot mass instances
• Docker can provide good performance in Disk and Network with a little
overhead
• How to optimize resource utilization will be the focus
BIG DATA TECHNOLOGY
Call-For-Action
• Contribute more for Docker and OpenStack
• Find the critical components for Big Data on Cloud and let it become better
• Need more customer use cases for Sahara
Contact: weiting.chen@intel.com
20150425 experimenting with openstack sahara on docker

More Related Content

What's hot

Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...Sergey Lukjanov
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Saharaspinningmatt
 
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackSergey Lukjanov
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSergey Lukjanov
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFSMySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFSMats Kindahl
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...DataWorks Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architecturesnine
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 

What's hot (20)

Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using El...
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFSMySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architectures
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 

Similar to 20150425 experimenting with openstack sahara on docker

ContainerDayVietnam2016: Dockerize a small business
ContainerDayVietnam2016: Dockerize a small businessContainerDayVietnam2016: Dockerize a small business
ContainerDayVietnam2016: Dockerize a small businessDocker-Hanoi
 
Use Docker to Deliver Cognitive Services Running Cross Platform and Multi Clo...
Use Docker to Deliver Cognitive Services Running Cross Platform and Multi Clo...Use Docker to Deliver Cognitive Services Running Cross Platform and Multi Clo...
Use Docker to Deliver Cognitive Services Running Cross Platform and Multi Clo...Docker, Inc.
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Dockernklmish
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPSACA IT-Solutions
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITStijn Wijndaele
 
Cassandra and Docker Lessons Learned
Cassandra and Docker Lessons LearnedCassandra and Docker Lessons Learned
Cassandra and Docker Lessons LearnedDataStax Academy
 
Killer Docker Workflows for Development
Killer Docker Workflows for DevelopmentKiller Docker Workflows for Development
Killer Docker Workflows for DevelopmentChris Tankersley
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker IntroductionPeng Xiao
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017Patrick Chanezon
 
WebSphere and Docker
WebSphere and DockerWebSphere and Docker
WebSphere and DockerDavid Currie
 
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on AzureDocker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on AzurePatrick Chanezon
 
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure Patrick Chanezon
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Anthony Dahanne
 
Practical guide to Oracle Virtual environments
Practical guide to Oracle Virtual environmentsPractical guide to Oracle Virtual environments
Practical guide to Oracle Virtual environmentsNelson Calero
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesDocker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesSreenivas Makam
 
Getting Started with MariaDB with Docker
Getting Started with MariaDB with DockerGetting Started with MariaDB with Docker
Getting Started with MariaDB with DockerMariaDB plc
 
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzureDevoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzurePatrick Chanezon
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Mandi Walls
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Patrick Chanezon
 

Similar to 20150425 experimenting with openstack sahara on docker (20)

ContainerDayVietnam2016: Dockerize a small business
ContainerDayVietnam2016: Dockerize a small businessContainerDayVietnam2016: Dockerize a small business
ContainerDayVietnam2016: Dockerize a small business
 
Use Docker to Deliver Cognitive Services Running Cross Platform and Multi Clo...
Use Docker to Deliver Cognitive Services Running Cross Platform and Multi Clo...Use Docker to Deliver Cognitive Services Running Cross Platform and Multi Clo...
Use Docker to Deliver Cognitive Services Running Cross Platform and Multi Clo...
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
Cassandra and Docker Lessons Learned
Cassandra and Docker Lessons LearnedCassandra and Docker Lessons Learned
Cassandra and Docker Lessons Learned
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
Killer Docker Workflows for Development
Killer Docker Workflows for DevelopmentKiller Docker Workflows for Development
Killer Docker Workflows for Development
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
WebSphere and Docker
WebSphere and DockerWebSphere and Docker
WebSphere and Docker
 
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on AzureDocker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
Docker Seattle Meetup April 2015 - The Docker Orchestration Ecosystem on Azure
 
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
Docker New York Meetup May 2015 - The Docker Orchestration Ecosystem on Azure
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
Practical guide to Oracle Virtual environments
Practical guide to Oracle Virtual environmentsPractical guide to Oracle Virtual environments
Practical guide to Oracle Virtual environments
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesDocker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting Techniques
 
Getting Started with MariaDB with Docker
Getting Started with MariaDB with DockerGetting Started with MariaDB with Docker
Getting Started with MariaDB with Docker
 
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzureDevoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

20150425 experimenting with openstack sahara on docker

  • 1. Big Data Technologies Experimenting with Openstack Sahara on Docker Weiting Chen weiting.chen@intel.com
  • 2. BIG DATA TECHNOLOGY Legal Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. © 2015 Intel Corporation.
  • 3. BIG DATA TECHNOLOGY Agenda Background How to use Docker with Sahara Performance Testing Conclusion
  • 4. BIG DATA TECHNOLOGY Who We Are We are from Intel Big Data Technology Group. We push big data technology forward into OpenStack We contribute Sahara source code in OpenStack, bring Cloudera CDH 5.3 plugin in Kilo.
  • 5. BIG DATA TECHNOLOGY Sahara Background • Sahara becomes a core project in Juno • Bring Hadoop into OpenStack • Add more features to Kilo release • Two Key Features 1. To provide users easily provisioning Hadoop clusters by specifying several parameters 2. Analytics as a Service for data scientist or analyst
  • 6. BIG DATA TECHNOLOGY Sahara Key Features - Provision Cluster Create/Terminate Cluster • Heat API/Nova Direct API • Integrate with Neutron/Nova Network • Use Guide as a template • Anti-affinity Cluster Scaling • Add Node/Remove Node Support More Plugins in Kilo • Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR/Storm
  • 7. BIG DATA TECHNOLOGY Sahara Key Features - Elastic Data Processing Support Job Type • Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase Support Data Locality • Rack/Hypervisor/Swift Data Source • Internal: Internal HDFS(Ephemeral Disk/Cinder) • External: Swift/HDFS Run Job in Transient Cluster *Different Plugin provide different capabilities
  • 8. BIG DATA TECHNOLOGY Sahara Working Flow Fast Cluster Provisioning Select Hadoop Version Select Base Image w/ Hadoop Define Cluster Configuration Provision Cluster Operate Cluster Terminate Cluster Analytic as a Service using Elastic Data Processing Select Hadoop Version Configure Jobs Set Limit for Cluster Execute Jobs Get The Result • Choose type of the job: pig, hive, jar-file, etc. • Select input and output data location (Swift support) • Cluster will be removed automatically after the job completion • Provide the details Hadoop configuration, like size, topology, and others • Sahara will provision VMs, install and configure Hadoop • Support Scale out Cluster to add/remove nodes
  • 9. BIG DATA TECHNOLOGY Sahara Data Processing Swift OpenStack Virtual Clusters OpenStack Virtual Clusters HDFS Collector Agent Data Stream Pattern 2: External - SwiftPattern 1: Internal - HDFS Only Collector Agent Collecting Data Collecting Data OpenStack use Swift as a data source to store input and output data. The benefit is to process the data directly and persist the data via Swift. OpenStack support to create HDFS on Cinder or Ephemeral Disk. This method can provide a better data processing performance via Ephemeral Disk or to persist the data via Cinder with lower performance. Cinder Ephemeral Disk MapReduce MapReduce
  • 10. BIG DATA TECHNOLOGY Docker Background • An open source project • The latest version is v1.6 • Automates the deployment of applications inside software containers • Provide fast and application portability • Use libcontainer library to use virtualization facilities from Linux kernel • Resource isolation using cgroups, kernel namespaces, …etc
  • 11. BIG DATA TECHNOLOGY Sahara + Docker • Deliver Better Performance (compare with hypervisors) • Optimize Resource Utilization • Reduce Cost • Fast Deployment
  • 12. BIG DATA TECHNOLOGY Sahara Architecture Sahara RESTAPI Horizon Python Sahara Client Sahara Pages Keystone Auth DAL Image Registry Provisioning Engine Vendor Plugins EDP Hadoop VM Hadoop VM Hadoop VM Hadoop VM Nova|Heat|Cinder Glance Neutron
  • 13. BIG DATA TECHNOLOGY Sahara + Docker Architecture Sahara RESTAPI Horizon Python Sahara Client Sahara Pages Keystone Auth DAL Image Registry Provisioning Engine Vendor Plugins EDP Hadoop VM Hadoop VM Hadoop VM Hadoop VM Nova|Heat|Cinder Glance nova docker driver Docker Registry Docker Image Docker Neutron
  • 14. BIG DATA TECHNOLOGY Sahara CDH Plugin Controller Computing Node1 Cloudera Manager API Python Client (Migrate from CM-API Client) Sahara Service Horizon(OpenStack Dashboard) CDH Plugin Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine. Step2: Use CM API Client to connect to CM and provision the other services in the cluster. STEP1 STEP2 End Customer VM1 - Master VM2 - Slave Cloudera Manager (Cloudera Express v5.1.3, CDH v5.0.0 & CM API v7) Job History Resource Manager Oozie Server Name Node Secondary Name Node Data Node Node Manager STEP3 CDH Cluster
  • 15. BIG DATA TECHNOLOGY Nova Docker Driver Introduced with Havana, move out Icehouse and Juno For Juno, • Must install an older version novadocker # git checkout -b pre-i18n 9045ca43b645e72751099491bf5f4f9e4bddbb91 • Implement a RESTFul client via httplib to communicate with Docker For Kilo(Upstream), • Need to install docker-py • Use Docker API Client to communicate with Docker
  • 16. BIG DATA TECHNOLOGY Authenticate & Hostname Issue Use username & password instead of inject authorized key into instance • No cloud-init in docker image, use username & password instead of inject key Upgrade Docker version to support change hostname • Docker v1.2 or later can support to change hostname Change “sudo mv etc-host /etc/hosts” to “sudo cp etc-host /etc/hosts” • Docker v1.3 response the device is busy when using “mv”. By using “cp” to replace “mv” can be success to run the change
  • 17. BIG DATA TECHNOLOGY Network Port Issue Open Privilege Mode to expose all the ports in the container • Modify nova docker driver source code to add “privileged=True” and publish all ports
  • 18. BIG DATA TECHNOLOGY Docker Image Build a docker image by using Dockerfile • Refer sahara-image-elements to build a CDH5 docker image • Build a docker image may take a lot of time(try-and-error) • Better use Dockerfile cache to reduce the time building the image Copy docker image to every compute node manually • Must copy docker image to all the compute nodes, currently glance cannot support to copy the image to compute node • If the image cannot be found in docker images, nova will raise an error during starting an instance
  • 19. BIG DATA TECHNOLOGY Build Docker Image - using Dockerfile Using docker build to build image by DockerFile # docker build -t $image_name:$tag From centos:centos6 MAINTAINER Weiting Chen weiting.chen@intel.com ENV http_proxy http://xxx:1080 … RUN echo 'proxy=http://xxx:1080' >> /etc/yum.conf RUN yum install -y cloudera-manager-agent … … EXPOSE 21 … Add ENV variables at beginning 1. Add proxy setting in individual software configuration 2. Install required software Expose Required Service Port Dockerfile Example
  • 20. BIG DATA TECHNOLOGY Register & Copy Docker Image to Compute Nodes Register docker image to glance # docker save cdh5:20150425 | glance image-create --is-public=True -- container-format=docker --disk-format=raw --name cdh5:20150425 Copy image to all compute nodes # scp cdh5:20150425.tar $compute_node:./ Load image to docker registry # docker load -i cdh5:20150425.tar If no image can be used in computing node, it will raise an error from nova.
  • 21. BIG DATA TECHNOLOGY Nova Docker Driver Network • Set network to “none” • Nova docker driver would leverage existing network configuration from Neutron • Support Linux Bridge or OVS • NOT use docker0 • Use VXLAN in our experiment • Create a bridge to OVS automatically • Set Privilege Mode to True for convenience • Need to set port mapping during docker run if not use privilege mode
  • 22. BIG DATA TECHNOLOGY Docker Network Host1 Docker Container1 Container2 Container3 eth0 172.17.42.10 eth0 172.17.42.11 eth0 172.17.42.12 docker0 172.17.42.1 Host1 Docker Container1 Container2 Container3 eth0 192.168.0.1 eth1 10.10.10.1 docker0 172.17.42.1 Host1 Docker Container1 Container2 Container3 docker0 172.17.42.1 Bridge Mode Host Mode None Mode • Default Mode • Support multiple namespaces • Only one namespace • Nova Docker Driver use this • Configure network and connect to bridge via driver
  • 23. BIG DATA TECHNOLOGY Docker Network Performance Host1 Host2 Host1 C1 941 Mb Host2 941 Mb Host1 C1 Host2900 Mb C1 Host1 C1 941 Mb Host1 C1 C2 14 Gb • Container to the same Host • Container to Container in the same Host• Container to Container in different Host • Container to different Host • Host to Host phy. network br-ex(floating ip) br-tun BACKGROUND • OpenStack Juno using VXLAN • Use Docker v1.3 • 1Gb Ethernet br-ex(floating ip) qbr~ 14Gb w/ DVR
  • 24. BIG DATA TECHNOLOGY Neutron VXLAN without DVR Controller/Network Node Compute Node br-tun patch-tun br-int br-ex eth1 br-tun patch-tun br-int qvo~ 172.16.0.0/16 192.168.0.0/16 VM vm0 eth2 eth2 VM vm0 qbr~qvb~ patch-int tap~ 10.0.0.0/16 tap patch-int tap qdhcp ns~ snat- sg~ qg~ qrouter~ qr~ tap tap
  • 25. BIG DATA TECHNOLOGY Neutron VXLAN with DVR Controller/Network Node Compute Node br-tun patch-tun br-int br-ex eth1 br-tun patch-tun br-int tap qvo~ 172.16.0.0/16 192.168.0.0/16 br-ex eth1 VM vm0 eth2 eth2 VM vm0 tap qbr~qvb~ patch-int tap~ fip- qrouter~ fpr~ rfp~ 10.0.0.0/16 fg~ qr~ tap patch-int tap qdhcp ns~ snat- sg~ qg~ qrouter~ qr~ tap tap
  • 26. BIG DATA TECHNOLOGY Change MTU Size • Change MTU Size if you are using VXLAN • Impact: MTU size could impact the network performance. If the MTU size is not change, create instances still can work, but network performance is going down to 1MB. • Solution: Change MTU Size in VM #sudo ifconfig eth1 mtu 1400 up
  • 27. BIG DATA TECHNOLOGY Container Disk Space • Default image disk space only use 10 GB • Impact: Default reserve 10GB space for HDFS configuration, there is no space to put data in HDFS • Solution: Assign parameters when starting Docker service # sudo ./docker -d --storage-opt dm.basesize=20G --storage-opt dm.loopdatasize=200G & *To enable the parameters must clean up /var/lib/docker/ and restart docker
  • 28. BIG DATA TECHNOLOGY vCPU Numbers The number of vCPU is always 1. • Impact: vCPUs calculation may be fail. • Solution: In Juno, change the number in nova docker driver source code and set it equal to the number of physical cores.
  • 29. BIG DATA TECHNOLOGY Docker in OpenStack Performance Network Performance Instance Boot/Cluster Provision Disk Performance using DD HiBench Testing
  • 30. BIG DATA TECHNOLOGY Our Testing Environment CLUSTER CONFIGURATION Role Details Controller w/ Compute x 1 Controller, Network, Compute Compute x 5 Compute HARDWARE CONFIGURATION Items Details CPU Intel Xeon X5670 2.93Ghz Memory 64GB(1333Mhz 8GB x 8) Storage 1TB SATA HDD SOFTWARE CONFIGURATION Software Name Versions CentOS 7.0 Docker v1.6 OpenStack Juno
  • 31. BIG DATA TECHNOLOGY Create an instance/Provision a cluster Assume image has been copied to all the computing node. Create an instance and check the log to capture the response time. Use Docker in 1sec Use KVM in 10sec Provision a cdh cluster still take a long time, this issue comes from Sahara CDH plugin.
  • 32. BIG DATA TECHNOLOGY DD Test Docker Container use CentOS6.6 in Host with CentOS7. File System is XFS Use DD Command: dd if=/dev/zero of=test1 bs=1M count=8192 conv=fdatasync Host: 140~160MB/s Host w/ OpenStack: 100~130MB/s(Controller), 140~160MB/s(Compute) Container Result: 100~140MB/s Docker can provide almost closer disk IO performance with Bare Metal
  • 33. BIG DATA TECHNOLOGY Conclusion • Docker can bring benefit to boot mass instances • Docker can provide good performance in Disk and Network with a little overhead • How to optimize resource utilization will be the focus
  • 34. BIG DATA TECHNOLOGY Call-For-Action • Contribute more for Docker and OpenStack • Find the critical components for Big Data on Cloud and let it become better • Need more customer use cases for Sahara Contact: weiting.chen@intel.com

Editor's Notes

  1. Support External HDFS, but needs to have some configurations manually
  2. Container to the same Host can be better
  3. DVR can enhance the performance in “Container to the same Host”, from 941Mb to 14Gb
  4. Other machine testing result using SSD(Write Through): Host: 180MB/s 1.3GB/s VM(Ubuntu): 111MB/s 107MB/s