3. SSG / STO / BDT
WHO WE ARE
Bring Cloudera CDH 5.3 Plugin into OpenStack Sahara
Complete to add all the services in Cloudera CDH 5.3 and integrate them into Sahara CDH Plugin
Provide Complete Integration Test to Help a Better User Experience
A complete integration testing in OpenStack Sahara to help deliver a good user experience in Sahara
CDH Plugin
Rank #3 Commits Company in Sahara Contribution
Ranked after #1 Mirantis and #2 Red Hat
4. SSG / STO / BDT
OPENSTACK HISTORY
Austin
Bexar
Cactus
Diablo
Essex
Folsom
Grizzly
Havana
Icehouse
Juno
Kilo
Nova
Swift
Glance
Horizon
Keystone
Quantum
Cinder
Ceilometer
Trove
Sahara
Ironic
• Zaqar
• Manila
• Designate
• Barbican
Incubation
2010
2011
2012
2013
2014
2015
5. SSG / STO / BDT
Move Focus from IaaS to PaaS and SaaS
more and more applications(xxx-as-a-service) based on OpenStack infrastructure
6. SSG / STO / BDT
~ 25.9% CAGR
Big Data Market expects to
grow from 16.5 billion (2014)
to 41.5 billion (2018), it also
includes cloud infrastructure
segment from 1.2 billion
(2014) to 4.7 billion (2018)
200 Billion
Cloud market will hit 118
billion in 2015, 200 billion by
2018, from 95.8 million
market reached in 2014.
Trend
Source from IDC 2014
Cloud-based solution will
shape IT spending for years.
IDC estimates cloud services
spending will continue to
grow at double-digit rates for
the next few years.
FROM THE MARKET
Big Data Cloud Market X-as-a-Service
7. SSG / STO / BDT
Big DataInternet Of Thing
THE VISION
Cloud Computing
Different data source will
come from diversity of
devices.
Using data processing
model to process the data
and transfer it become high
value.
A shared resources
infrastructure to support a
flexible IT environment and
fulfill the requirement on
demand.
8. SSG / STO / BDT
OpenStack vs Hadoop
Most Companies using OpenStack cluster in their IT environment are
also preparing another Hadoop cluster for Big Data analytics.
Sahara is a solution to bring Hadoop and OpenStack together.
9. SSG / STO / BDT
SAHARA BACKGROUND
Basic Idea comes from Amazon Elastic MapReduce (EMR)
To provide users easily provisioning Hadoop clusters by specifying
several parameters
Analytics as a Service for data scientist or analyst
11. SSG / STO / BDT
Sahara Key Features - Provision Cluster
Create/Terminate Cluster
• Heat API/Nova Direct API
• Neutron/Nova Network
• Floating IP Management
• Anti-affinity
Cluster Scaling
• Add Node/Remove Node
Support Plugins
• Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR
12. SSG / STO / BDT
Sahara Key Features - Elastic Data Processing
Support Job Type
• Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase
Support Data Locality
• Rack/Hypervisor/Swift
Data Source
• Internal: Ephemeral Disk/Cinder
• External: Swift
Run Job in Transient Cluster
*Different Plugin provide different capabilities
13. SSG / STO / BDT
WORKING FLOW
Fast Cluster Provisioning
Select
Hadoop Version
Select
Base Image
w/ Hadoop
Define
Cluster
Configuration
Provision
Cluster
Operate
Cluster
Terminate
Cluster
Analytic as a Service using Elastic Data Processing
Select
Hadoop Version
Configure Jobs
Set Limit
for Cluster
Execute Jobs Get The Result
• Choose type of the job: pig, hive, jar-file, etc.
• Select input and output data location (Swift support)
• Cluster will be removed automatically after the job completion
• Provide the details Hadoop configuration, like size, topology, and others
• Sahara will provision VMs, install and configure Hadoop
• Support Scale out Cluster to add/remove nodes
14. SSG / STO / BDT
CLOUDERA CDH PLUGIN
Controller Computing Node1
VM1 - Master VM2 - Slave
Cloudera Manager
(Cloudera Express v5.1.3,
CDH v5.0.0 & CM API v7)
Job History
Resource Manager
Oozie Server
Name Node
Secondary
Name Node
Data Node
Node Manager
Cloudera Manager
API Python Client
(Migrate from CM-API Client)
Sahara Service
Horizon(OpenStack Dashboard)
CDH Plugin
Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine.
Step2: Use CM API Client to connect to CM and provision the other services in the cluster.
STEP1
STEP2
CDH ClusterEnd Customer
15. SSG / STO / BDT
DATA PROCESSING MODEL
Swift
OpenStack
Virtual Clusters
OpenStack
Virtual Clusters
HDFS
Collector Agent
Data Stream
Pattern 2: External - SwiftPattern 1: Internal - HDFS Only
Collector Agent
Collecting Data
Collecting Data
OpenStack use Swift as a data source to store input
and output data. The benefit is to process the data
directly and persist the data via Swift.
OpenStack support to create HDFS on Cinder or
Ephemeral Disk. This method can provide a better
data processing performance via Ephemeral Disk or
to persist the data via Cinder with lower performance.
Cinder
Ephemeral Disk
MapReduce MapReduce
16. SSG / STO / BDT
Current Issue
~30%
Performance Loss
We use Sahara with KVM to create a Hadoop
Cluster(HDFS in Ephemeral Disk) and compare
with a Bare Metal Hadoop in the same servers.
Different workloads(Hi-Bench) may shown
different results.
17. SSG / STO / BDT
Beyond The Performance…
Performance may always be an issue compare with Hypervisor and Bare Metal
18. SSG / STO / BDT
IT Integration
Sahara must provide an elastic platform
to fulfill the customer’s request and to
adopt big data’s infrastructure. To
support more technologies can help
Sahara seamless integrating to
customer’s IT environment.
EDP should provide a simple interface
to help data scientists only need to
focus on their own expertise and no
worry about how to deploying clusters.
Analytics-as-a-Service is a trend in the
future.
Workload-based EDP
19. SSG / STO / BDT
MORE …
Bare Metal Support
• OpenStack Ironic
Docker Support
• Nova-docker driver, OpenStack Magnum
Support More Storage Backend
• OpenStack Manila, External HDFS
Complete to Support More Data Processing Model
• Hadoop, Spark, …etc
20. SSG / STO / BDT
WHAT’S NEW IN KILO
• Vanilla support Hadoop v1.2.1 and Hadoop 2.6
• Spark Plugin
• Cloudera CDH Plugin
• MapR Plugin
• Storm Plugin
• New Horizon UI with New Guide Panel
• Default Template Support
Notes de l'éditeur
IOT-BIG DATA-CLOUD COMPUTING
By 2016, 11% IT budget away from traditional in house IT towards cloud based solution
By 2017, 35% of new applications will use cloud-enabled
Support External HDFS, but needs to have some configurations manually
The root cause about performance comes from the difference between KVM and Bare Metal.