4. OpenStack Data Processing - Savanna
Mission:
To provide the OpenStack community with an open,
cutting edge, performant and scalable data
processing stack and associated management
interfaces
● provision and operate Hadoop clusters
● schedule and operate Hadoop jobs
7. Use Cases
● Self-service provisioning of Hadoop clusters
● Utilization of unused compute capacity for
bursty workloads
● Run Hadoop workloads in few clicks without
expertise in Hadoop ops
9. Savanna Status
● Official incubated OpenStack project
● v0.3 released 17 Oct 2013
● Supported Hadoop distros:
○ Vanilla Apache Hadoop (reference implementation)
○ Hortonworks Data Platform 1.3.x
○ Intel Distribution on review
○ Cloudera Distribution in blueprint
● Included in OpenStack distros:
○ RDO - http://openstack.redhat.com
○ Mirantis OpenStack - http://software.mirantis.com
13. EDP Overview
● End users have data and questions
○ The data lives in a data repository
○ The questions are embodied in code
● Savanna Elastic Data Processing (EDP) brings the
Hadoop ecosystem to the end user
○ Hides all cluster management behind the scenes
15. EDP
● Variety and depth of value add offerings on top of
clouds are growing
● Offerings are rarely open, rarely allow for choice
● Examples - Google Cloud, Azure, AWS
16. EDP
Savanna and EDP can both match and
exceed use cases provided by most
public clouds
17. EDP in Savanna v0.3
● UI, integrated into Horizon, for ad-hoc analytics
queries based on Hive or Pig
● API to execute MapReduce jobs without exposing
details of underlying infrastructure
● Pluggable data sources: Swift
● Supported job types: Jar, Pig, Hive
● Integration with Oozie for workflow management
19. Cluster Ops in Savanna 0.3
REST API
Configuration templates
Manual cluster scaling
Data node anti-affinity and location control
Full support of data locality - rack and 4-level
awareness for HDFS and Swift
● Swift integration
●
●
●
●
●
20. OpenStack Integration in Savanna 0.3
●
●
●
●
OpenStack Dashboard plugin
Both Neutron and Nova Network support
Keystone trusts used for async operations
Python client