SlideShare a Scribd company logo
1 of 26
Download to read offline
Creating Pools of 10s or
100s of Virtual Machines
        Andrei Savu
     ApacheCon NA 2013
Who is this guy?

●   Founder of Axemblr.com
●   Organizer of Bucharest JUG (bjug.ro)
●   Apache Whirr PMC, ZooKeeper contributor
●   Passion for DevOps & Data Analysis

●   Connect with me on LinkedIn
@ Axemblr

●   Data Processing Infrastructure
●   Deployment Automation

●   Product: Hadoop On-Demand Appliance
●   Open Source (part of our DNA)
●   Fair amount of consulting (bootstrapping)
Agenda

●   What is Provisionr?
●   Challenges & Architecture
●   Demo (HDFS on EC2)
●   Future @ Apache Incubator
What is Provisionr?

.. and how does it help me create
    pools of virtual machines?
What?

●   Simple Service for Managing Pools of 10s
    or 100s of Virtual Machines

●   A way to create clusters of machines that
    share a common set of characteristics on
    multiple cloud providers
Characteristics like?

●   Operating system     ●   Network settings
●   Pre-installed        ●   Firewall
    packages &           ●   SSH config
    binaries
                         ●   Admin access
●   Sane DNS settings
    (forward & reverse   ●   VPN access
    dns resolution)      ●   etc.
●   NTP settings
Why? (initially)

●   Setup on-demand Hadoop clusters
    (Axemblr)

●   Handles basic setup for large clusters
●
    Service config by using 3rd party apps like
    Ambari or Cloudera Manager
Why? (long term)

       Core functionality is generic
                                                          Next generation
                                                           Apache Whirr?

  External                            Configuration
Specification         Events


                                                 Events
                Provisionr


                             Events         Monitoring
FAQ: Looks like Puppet?

●   No
●   Provisionr is actually using Puppet

●   Focus: Interact with IaaS APIs to start
    machines in groups with minimal configs
    (as listed before). Simple & reliable.
Challenges

How is the game different when we
work with 50-100+ virtual machines?
Challenges #1

●   API Throttling (batch calls)

●   Concurrency Control (across multiple
    instances)

●   Error handling, partial failures and
    automatic retries (idempotency)
Challenges #2

●   Granular internal workflows (short
    transactions)

●   State persistence across restarts and
    upgrades

●   Audit & Logging
Challenges #3

●   Integrating multiple native provider SDKs

●   Provide a plugin architecture (run just a
    sub-set of all the features)

●   Semi-automated and fully automated
    modes
Challenges #4

●   Automatic creation of gold images
Architecture

   Building Blocks, Internals,
Persistence, Packaging, Plugins
Activiti (from Alfresco)

●   Light-weight workflow engine (BPM)

●   Has a nice Java API
●   Has a nice set of tools
●   Handles persistence as expected
●   Good error handling (retryable activities)
Activiti – Process Execution
Activiti – Interactive View
Apache Karaf

●   Using it as an application server

●   Provides an interactive shell
●   Integrated with Activiti
●   Solves the packaging problem (custom
    distribution)
Apache Karaf - Shell
IaaS SDKs

●   AWS SDK for Java
    –   http://aws.amazon.com/sdkforjava/


●   jclouds (for CloudStack)
    –   http://www.jclouds.org/
Demo Time (video)

  Provisionr & Rundeck
CDH4 HDFS cluster on EC2
Summary

●   Provisionr solves the problem of creating
    large pools of virtual machines (100s)

●   Cloud portability by making the machines &
    the cluster indistinguishable from an
    application perspective on multiple clouds
You're invited to vote!

●   Apache Provisionr proposal (wiki)
●   Check general@incubator.apache.org

●   Feedback at asavu@apache.org
●   Looking for mentors & contributors
Thanks! Questions?
     Andrei Savu
  asavu@apache.org

  Twitter: @andreisavu

More Related Content

What's hot

Zero Code Multi-Cloud Automation with Ansible and Terraform
Zero Code Multi-Cloud Automation with Ansible and TerraformZero Code Multi-Cloud Automation with Ansible and Terraform
Zero Code Multi-Cloud Automation with Ansible and TerraformAvi Networks
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and cephShapeBlue
 
OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...
OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...
OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...OpenNebula Project
 
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanOpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanNETWAYS
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and TuningNGINX, Inc.
 
Network Automation - Interconnection tools
Network Automation - Interconnection toolsNetwork Automation - Interconnection tools
Network Automation - Interconnection toolsAndy Davidson
 
2013-cloudconnect-OpenStack@BT
2013-cloudconnect-OpenStack@BT2013-cloudconnect-OpenStack@BT
2013-cloudconnect-OpenStack@BTuictamale
 
Glass fish performance tuning tips from the field
Glass fish performance tuning tips from the fieldGlass fish performance tuning tips from the field
Glass fish performance tuning tips from the fieldPayara
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceNagios
 
Ansible & Cumulus Networks - Simplify Network Automation
Ansible & Cumulus Networks - Simplify Network AutomationAnsible & Cumulus Networks - Simplify Network Automation
Ansible & Cumulus Networks - Simplify Network AutomationCumulus Networks
 
Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best PracticesNagios
 
What is NetDevOps? How? Leslie Carr PuppetConf 2015
What is NetDevOps? How? Leslie Carr PuppetConf 2015What is NetDevOps? How? Leslie Carr PuppetConf 2015
What is NetDevOps? How? Leslie Carr PuppetConf 2015Leslie Carr
 
Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!MongoDB
 
.Net Core Fall update
.Net Core Fall update.Net Core Fall update
.Net Core Fall updateMSDEVMTL
 
Infrastructure Management in GCP
Infrastructure Management in GCPInfrastructure Management in GCP
Infrastructure Management in GCPDana Hoffman
 
Nagios Conference 2014 - Leland Lammert - Distributed Heirarchical Nagios
Nagios Conference 2014 - Leland Lammert - Distributed Heirarchical NagiosNagios Conference 2014 - Leland Lammert - Distributed Heirarchical Nagios
Nagios Conference 2014 - Leland Lammert - Distributed Heirarchical NagiosNagios
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Waysmalltown
 
Moving mongo db to the cloud strategies and points to consider
Moving mongo db to the cloud  strategies and points to considerMoving mongo db to the cloud  strategies and points to consider
Moving mongo db to the cloud strategies and points to considerVinicius M Grippa
 
Open escalar presentation
Open escalar presentationOpen escalar presentation
Open escalar presentationMiguel Zuniga
 

What's hot (20)

Zero Code Multi-Cloud Automation with Ansible and Terraform
Zero Code Multi-Cloud Automation with Ansible and TerraformZero Code Multi-Cloud Automation with Ansible and Terraform
Zero Code Multi-Cloud Automation with Ansible and Terraform
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and ceph
 
OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...
OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...
OpenNebulaconf2017US: Configuration management with OpenNebula and Ansible by...
 
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanOpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and Tuning
 
Network Automation - Interconnection tools
Network Automation - Interconnection toolsNetwork Automation - Interconnection tools
Network Automation - Interconnection tools
 
2013-cloudconnect-OpenStack@BT
2013-cloudconnect-OpenStack@BT2013-cloudconnect-OpenStack@BT
2013-cloudconnect-OpenStack@BT
 
Glass fish performance tuning tips from the field
Glass fish performance tuning tips from the fieldGlass fish performance tuning tips from the field
Glass fish performance tuning tips from the field
 
Ansible for networks
Ansible for networksAnsible for networks
Ansible for networks
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
 
Ansible & Cumulus Networks - Simplify Network Automation
Ansible & Cumulus Networks - Simplify Network AutomationAnsible & Cumulus Networks - Simplify Network Automation
Ansible & Cumulus Networks - Simplify Network Automation
 
Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
 
What is NetDevOps? How? Leslie Carr PuppetConf 2015
What is NetDevOps? How? Leslie Carr PuppetConf 2015What is NetDevOps? How? Leslie Carr PuppetConf 2015
What is NetDevOps? How? Leslie Carr PuppetConf 2015
 
Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!
 
.Net Core Fall update
.Net Core Fall update.Net Core Fall update
.Net Core Fall update
 
Infrastructure Management in GCP
Infrastructure Management in GCPInfrastructure Management in GCP
Infrastructure Management in GCP
 
Nagios Conference 2014 - Leland Lammert - Distributed Heirarchical Nagios
Nagios Conference 2014 - Leland Lammert - Distributed Heirarchical NagiosNagios Conference 2014 - Leland Lammert - Distributed Heirarchical Nagios
Nagios Conference 2014 - Leland Lammert - Distributed Heirarchical Nagios
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
 
Moving mongo db to the cloud strategies and points to consider
Moving mongo db to the cloud  strategies and points to considerMoving mongo db to the cloud  strategies and points to consider
Moving mongo db to the cloud strategies and points to consider
 
Open escalar presentation
Open escalar presentationOpen escalar presentation
Open escalar presentation
 

Similar to Creating pools of Virtual Machines - ApacheCon NA 2013

Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Andrei Savu
 
Devops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftDevops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftYaniv cohen
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanAmbassador Labs
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflixaspyker
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherNETWAYS
 
DevOps, A brief introduction to Vagrant & Ansible
DevOps, A brief introduction to Vagrant & AnsibleDevOps, A brief introduction to Vagrant & Ansible
DevOps, A brief introduction to Vagrant & AnsibleArnaud LEMAIRE
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applicationsCesar Cardenas Desales
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg
 
20141111_SOS3_Gallo
20141111_SOS3_Gallo20141111_SOS3_Gallo
20141111_SOS3_GalloAndrea Gallo
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using AnsibleAlok Patra
 
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackBackroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackShapeBlue
 
Infrastructure Automation with Chef & Ansible
Infrastructure Automation with Chef & AnsibleInfrastructure Automation with Chef & Ansible
Infrastructure Automation with Chef & Ansiblewajrcs
 
Ansible Tutorial.pdf
Ansible Tutorial.pdfAnsible Tutorial.pdf
Ansible Tutorial.pdfNigussMehari4
 
Build cloud like Rackspace with OpenStack Ansible
Build cloud like Rackspace with OpenStack AnsibleBuild cloud like Rackspace with OpenStack Ansible
Build cloud like Rackspace with OpenStack AnsibleJirayut Nimsaeng
 

Similar to Creating pools of Virtual Machines - ApacheCon NA 2013 (20)

Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10
 
Devops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftDevops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShift
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill Monkman
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflix
 
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius SchumacherOSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
 
DevOps, A brief introduction to Vagrant & Ansible
DevOps, A brief introduction to Vagrant & AnsibleDevOps, A brief introduction to Vagrant & Ansible
DevOps, A brief introduction to Vagrant & Ansible
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Microservices at Mercari
Microservices at MercariMicroservices at Mercari
Microservices at Mercari
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 
20141111_SOS3_Gallo
20141111_SOS3_Gallo20141111_SOS3_Gallo
20141111_SOS3_Gallo
 
PaaS options for .NET
PaaS options for .NETPaaS options for .NET
PaaS options for .NET
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackBackroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
 
Infrastructure Automation with Chef & Ansible
Infrastructure Automation with Chef & AnsibleInfrastructure Automation with Chef & Ansible
Infrastructure Automation with Chef & Ansible
 
Ansible Tutorial.pdf
Ansible Tutorial.pdfAnsible Tutorial.pdf
Ansible Tutorial.pdf
 
Ansible - Hands on Training
Ansible - Hands on TrainingAnsible - Hands on Training
Ansible - Hands on Training
 
Build cloud like Rackspace with OpenStack Ansible
Build cloud like Rackspace with OpenStack AnsibleBuild cloud like Rackspace with OpenStack Ansible
Build cloud like Rackspace with OpenStack Ansible
 

More from Andrei Savu

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringAndrei Savu
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringAndrei Savu
 
Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Andrei Savu
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAndrei Savu
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupAndrei Savu
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data PlatformAndrei Savu
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAndrei Savu
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUGAndrei Savu
 
Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Andrei Savu
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverAndrei Savu
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with DropwizardAndrei Savu
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Andrei Savu
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Andrei Savu
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudAndrei Savu
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Andrei Savu
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Andrei Savu
 

More from Andrei Savu (20)

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSF
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x Overview
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG
 
Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at Hackover
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with Dropwizard
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011
 
Apache Whirr
Apache WhirrApache Whirr
Apache Whirr
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36
 

Creating pools of Virtual Machines - ApacheCon NA 2013

  • 1. Creating Pools of 10s or 100s of Virtual Machines Andrei Savu ApacheCon NA 2013
  • 2. Who is this guy? ● Founder of Axemblr.com ● Organizer of Bucharest JUG (bjug.ro) ● Apache Whirr PMC, ZooKeeper contributor ● Passion for DevOps & Data Analysis ● Connect with me on LinkedIn
  • 3. @ Axemblr ● Data Processing Infrastructure ● Deployment Automation ● Product: Hadoop On-Demand Appliance ● Open Source (part of our DNA) ● Fair amount of consulting (bootstrapping)
  • 4. Agenda ● What is Provisionr? ● Challenges & Architecture ● Demo (HDFS on EC2) ● Future @ Apache Incubator
  • 5. What is Provisionr? .. and how does it help me create pools of virtual machines?
  • 6. What? ● Simple Service for Managing Pools of 10s or 100s of Virtual Machines ● A way to create clusters of machines that share a common set of characteristics on multiple cloud providers
  • 7. Characteristics like? ● Operating system ● Network settings ● Pre-installed ● Firewall packages & ● SSH config binaries ● Admin access ● Sane DNS settings (forward & reverse ● VPN access dns resolution) ● etc. ● NTP settings
  • 8. Why? (initially) ● Setup on-demand Hadoop clusters (Axemblr) ● Handles basic setup for large clusters ● Service config by using 3rd party apps like Ambari or Cloudera Manager
  • 9. Why? (long term) Core functionality is generic Next generation Apache Whirr? External Configuration Specification Events Events Provisionr Events Monitoring
  • 10. FAQ: Looks like Puppet? ● No ● Provisionr is actually using Puppet ● Focus: Interact with IaaS APIs to start machines in groups with minimal configs (as listed before). Simple & reliable.
  • 11. Challenges How is the game different when we work with 50-100+ virtual machines?
  • 12. Challenges #1 ● API Throttling (batch calls) ● Concurrency Control (across multiple instances) ● Error handling, partial failures and automatic retries (idempotency)
  • 13. Challenges #2 ● Granular internal workflows (short transactions) ● State persistence across restarts and upgrades ● Audit & Logging
  • 14. Challenges #3 ● Integrating multiple native provider SDKs ● Provide a plugin architecture (run just a sub-set of all the features) ● Semi-automated and fully automated modes
  • 15. Challenges #4 ● Automatic creation of gold images
  • 16. Architecture Building Blocks, Internals, Persistence, Packaging, Plugins
  • 17. Activiti (from Alfresco) ● Light-weight workflow engine (BPM) ● Has a nice Java API ● Has a nice set of tools ● Handles persistence as expected ● Good error handling (retryable activities)
  • 18. Activiti – Process Execution
  • 20. Apache Karaf ● Using it as an application server ● Provides an interactive shell ● Integrated with Activiti ● Solves the packaging problem (custom distribution)
  • 21. Apache Karaf - Shell
  • 22. IaaS SDKs ● AWS SDK for Java – http://aws.amazon.com/sdkforjava/ ● jclouds (for CloudStack) – http://www.jclouds.org/
  • 23. Demo Time (video) Provisionr & Rundeck CDH4 HDFS cluster on EC2
  • 24. Summary ● Provisionr solves the problem of creating large pools of virtual machines (100s) ● Cloud portability by making the machines & the cluster indistinguishable from an application perspective on multiple clouds
  • 25. You're invited to vote! ● Apache Provisionr proposal (wiki) ● Check general@incubator.apache.org ● Feedback at asavu@apache.org ● Looking for mentors & contributors
  • 26. Thanks! Questions? Andrei Savu asavu@apache.org Twitter: @andreisavu