SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Building and Deploying Netflix in the Cloud

  @bmoyles @garethbowles #netflixcloud
Who Are These Guys?
Brian Moyles     Gareth Bowles
What We Build

Large number of loosely-coupled Java Web Services
Common code in libraries that can be shared across
apps
Each service is “baked” - installed onto a base Amazon
Machine Image and then created as a new AMI ...
... and then deployed into a Service Cluster (a set of
Auto Scaling Groups running a particular service)
Getting Built
Build Pipeline
                            Artifactory                          yum

                                  libraries
  Jenkins
   CBF steps
               resolve           compile              publish          report

     sync                check                build             test

           source


Perforce
       GitHub
build.xml
<project name="helloworld">
    <import file="../../../Tools/build/webapplication.xml"/>
</project>

ivy.xml
<info organisation="netflix" module="helloworld">
  <publications>
    <artifact name="helloworld" type="package"
              e:classifier="package" ext="tgz"/>
    <artifact name="helloworld" type="javadoc"
              e:classifier="javadoc" ext="jar"/>
  </publications>
  <dependencies>
    <dependency org="netflix" name="resourceregistry"
                 rev="latest.${input.status}" conf="compile"/>
    <dependency org="netflix" name="platform"
                 rev="latest.${input.status}" conf="compile" />
    ...
Jenkins at Netflix
Jenkins Statistics
1600 job definitions, 50% SCM triggered
2000 builds per day
Common Build Framework updates trigger 800 rebuilds;
by scaling up to 20 cloud slaves we can complete the
flood of new builds in 30 minutes
2TB of build data
Jenkins Architecture
x86_64 slave 11
 x86_64 slave 1
  x86_64 slave
 buildnode01 1
   x86_64 slave
       Standard
  buildnode01                                    custom slaves
   buildnode01
    buildnode01                                  custom slaves
                                                 custom slaves
      slave group                              misc. architecture
                                                 custom slaves
                                               misc. architecture
                                               misc. architecture
                                                  custom slaves
     Amazon Linux        Single Master         misc. architecture
       m1.xlarge                               misc. architecture
                                                  Ad-hoc slaves
                        Red Hat Linux
                     2x quad core x86_64    misc. O/S & architectures
                           26G RAM


x86_64 slave 11
 x86_64Custom
  x86_64slave 1
           slave
 buildnode01
                                              ~40 custom slaves
  buildnode01
       slave group
   buildnode01                               maintained by product
     Amazon Linux                                    teams
          various

   us-west-1 VPC       Netflix data center     Netflix data center and
                                                      office
Other Uses of Jenkins
Monitoring of our test and production Cassandra clusters
Automated integration tests, including bake and deploy
Production bake and deployment
Housekeeping of the build / deploy infrastructure:
  Reap unreferenced artifacts in Artifactory
  Disable Jenkins jobs with no recent successful builds
  Mark Jenkins builds as permanent if they are used by
  an active deployment in prod or test
  Alert owners when slaves get disconnected
Jenkins Scaling Challenges
Flood of simultaneous builds can quickly exhaust all build
executors and clog the pipeline
Flood of simultaneous builds can hammer rest of the
infrastructure (especially Artifactory)
Making global changes to all jobs
Some plugins don’t scale to our number of jobs / builds
Hard to test every job before upgrading master or plugins
Large amount of state encapsulated in build data makes
restoring from backup time consuming
Netflix Extensions to Jenkins

 Job DSL plugin: allow jobs to be set up with minimal
 definition, using templates and a Groovy-based DSL.
 Housekeeping and maintenance processes
 implemented as Jenkins jobs, system Groovy scripts
The
DynaSlave
Plugin
Our cloud-based
army of build nodes
The DynaSlave Plugin
Genesis
Original build fleet: 15 VMs on datacenter hardware, 8G
RAM, single vCPU, 2 executors per node
Many jobs build on SCM change. Changes to our
common build framework create massive thundering
herd since everything depends on it.
Ask for more VMs? Modify CBF less frequently?
The DynaSlave Plugin
What We Wanted

Leverage our extensive AWS infrastructure, tooling, and
experience
No manual fiddling with machines once they launch
Quick and easy to maintain a fixed pool of slave nodes
that can grow/shrink to meet build demand
The DynaSlave Plugin
What We Have
Exposes a new endpoint in Jenkins that EC2 instances
in VPC use for registration
Allows a slave to name itself, label itself, tell Jenkins
how many executors it can support
EC2 == Ephemeral. Disconnected nodes that are gone
for > 30 mins are reaped
Sizing handled by EC2 ASGs, tweaks passed through
via user data (labels, names, etc)
The DynaSlave Plugin
What’s Next
Dynamic resource management: have Jenkins respond
to build demand and manage its own slave pools
Slave groups: Allows us to create specialized (and
isolated from the genpop) pools of build nodes
Refresh mechanism for slave tools (JDKs, Ant versions,
etc)
Enhanced security/registration of nodes
Give it back to the community (watch
techblog.netflix.com!)
Further Reading

http://techblog.netflix.com
http://www.slideshare.net/adrianco
   @netflixoss
https://github.com/netflix

http://jobs.netflix.com
Thank you


 @bmoyles @garethbowles
Thank you
Questions?



  @bmoyles @garethbowles

Contenu connexe

Tendances

DevOps Practices: Configuration as Code
DevOps Practices:Configuration as CodeDevOps Practices:Configuration as Code
DevOps Practices: Configuration as CodeDoug Seven
 
Testing Distributed Micro Services. Agile Testing Days 2017
Testing Distributed Micro Services. Agile Testing Days 2017Testing Distributed Micro Services. Agile Testing Days 2017
Testing Distributed Micro Services. Agile Testing Days 2017Carlos Sanchez
 
What’s New in Docker - Victor Vieux, Docker
What’s New in Docker - Victor Vieux, DockerWhat’s New in Docker - Victor Vieux, Docker
What’s New in Docker - Victor Vieux, DockerDocker, Inc.
 
SF DevOps: Introducing Vagrant
SF DevOps: Introducing VagrantSF DevOps: Introducing Vagrant
SF DevOps: Introducing VagrantMitchell Hashimoto
 
Using Containers for Building and Testing: Docker, Kubernetes and Mesos. FOSD...
Using Containers for Building and Testing: Docker, Kubernetes and Mesos. FOSD...Using Containers for Building and Testing: Docker, Kubernetes and Mesos. FOSD...
Using Containers for Building and Testing: Docker, Kubernetes and Mesos. FOSD...Carlos Sanchez
 
Securing Containers, One Patch at a Time - Michael Crosby, Docker
Securing Containers, One Patch at a Time - Michael Crosby, DockerSecuring Containers, One Patch at a Time - Michael Crosby, Docker
Securing Containers, One Patch at a Time - Michael Crosby, DockerDocker, Inc.
 
Running your Java EE 6 applications in the Cloud (FISL 12)
Running your Java EE 6 applications in the Cloud (FISL 12)Running your Java EE 6 applications in the Cloud (FISL 12)
Running your Java EE 6 applications in the Cloud (FISL 12)Arun Gupta
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildHow to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildDocker, Inc.
 
Seven Habits of Highly Effective Jenkins Users (2014 edition!)
Seven Habits of Highly Effective Jenkins Users (2014 edition!)Seven Habits of Highly Effective Jenkins Users (2014 edition!)
Seven Habits of Highly Effective Jenkins Users (2014 edition!)Andrew Bayer
 
DCSF19 Dockerfile Best Practices
DCSF19 Dockerfile Best PracticesDCSF19 Dockerfile Best Practices
DCSF19 Dockerfile Best PracticesDocker, Inc.
 
Go Faster with Ansible (PHP meetup)
Go Faster with Ansible (PHP meetup)Go Faster with Ansible (PHP meetup)
Go Faster with Ansible (PHP meetup)Richard Donkin
 
The Golden Ticket: Docker and High Security Microservices by Aaron Grattafiori
The Golden Ticket: Docker and High Security Microservices by Aaron GrattafioriThe Golden Ticket: Docker and High Security Microservices by Aaron Grattafiori
The Golden Ticket: Docker and High Security Microservices by Aaron GrattafioriDocker, Inc.
 
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...Docker, Inc.
 
From Monolith to Docker Distributed Applications
From Monolith to Docker Distributed ApplicationsFrom Monolith to Docker Distributed Applications
From Monolith to Docker Distributed ApplicationsCarlos Sanchez
 
Ansible Introduction
Ansible Introduction Ansible Introduction
Ansible Introduction Robert Reiz
 
Continuous Deployment with Jenkins on Kubernetes
Continuous Deployment with Jenkins on KubernetesContinuous Deployment with Jenkins on Kubernetes
Continuous Deployment with Jenkins on KubernetesMatt Baldwin
 
Webinar: Development Swarm Cluster with Docker Compose V3
Webinar: Development Swarm Cluster with Docker Compose V3Webinar: Development Swarm Cluster with Docker Compose V3
Webinar: Development Swarm Cluster with Docker Compose V3Codefresh
 
Best Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerBest Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerEric Smalling
 
Optimizing Docker Images
Optimizing Docker ImagesOptimizing Docker Images
Optimizing Docker ImagesBrian DeHamer
 

Tendances (20)

DevOps Practices: Configuration as Code
DevOps Practices:Configuration as CodeDevOps Practices:Configuration as Code
DevOps Practices: Configuration as Code
 
Testing Distributed Micro Services. Agile Testing Days 2017
Testing Distributed Micro Services. Agile Testing Days 2017Testing Distributed Micro Services. Agile Testing Days 2017
Testing Distributed Micro Services. Agile Testing Days 2017
 
What’s New in Docker - Victor Vieux, Docker
What’s New in Docker - Victor Vieux, DockerWhat’s New in Docker - Victor Vieux, Docker
What’s New in Docker - Victor Vieux, Docker
 
SF DevOps: Introducing Vagrant
SF DevOps: Introducing VagrantSF DevOps: Introducing Vagrant
SF DevOps: Introducing Vagrant
 
Using Containers for Building and Testing: Docker, Kubernetes and Mesos. FOSD...
Using Containers for Building and Testing: Docker, Kubernetes and Mesos. FOSD...Using Containers for Building and Testing: Docker, Kubernetes and Mesos. FOSD...
Using Containers for Building and Testing: Docker, Kubernetes and Mesos. FOSD...
 
Securing Containers, One Patch at a Time - Michael Crosby, Docker
Securing Containers, One Patch at a Time - Michael Crosby, DockerSecuring Containers, One Patch at a Time - Michael Crosby, Docker
Securing Containers, One Patch at a Time - Michael Crosby, Docker
 
Running your Java EE 6 applications in the Cloud (FISL 12)
Running your Java EE 6 applications in the Cloud (FISL 12)Running your Java EE 6 applications in the Cloud (FISL 12)
Running your Java EE 6 applications in the Cloud (FISL 12)
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildHow to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker Build
 
Seven Habits of Highly Effective Jenkins Users (2014 edition!)
Seven Habits of Highly Effective Jenkins Users (2014 edition!)Seven Habits of Highly Effective Jenkins Users (2014 edition!)
Seven Habits of Highly Effective Jenkins Users (2014 edition!)
 
DCSF19 Dockerfile Best Practices
DCSF19 Dockerfile Best PracticesDCSF19 Dockerfile Best Practices
DCSF19 Dockerfile Best Practices
 
Go Faster with Ansible (PHP meetup)
Go Faster with Ansible (PHP meetup)Go Faster with Ansible (PHP meetup)
Go Faster with Ansible (PHP meetup)
 
The Golden Ticket: Docker and High Security Microservices by Aaron Grattafiori
The Golden Ticket: Docker and High Security Microservices by Aaron GrattafioriThe Golden Ticket: Docker and High Security Microservices by Aaron Grattafiori
The Golden Ticket: Docker and High Security Microservices by Aaron Grattafiori
 
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
You Don't Have to Start Over! A Practical Guide for Adopting Docker in the En...
 
Docker Swarm scheduling in 1.12
Docker Swarm scheduling in 1.12Docker Swarm scheduling in 1.12
Docker Swarm scheduling in 1.12
 
From Monolith to Docker Distributed Applications
From Monolith to Docker Distributed ApplicationsFrom Monolith to Docker Distributed Applications
From Monolith to Docker Distributed Applications
 
Ansible Introduction
Ansible Introduction Ansible Introduction
Ansible Introduction
 
Continuous Deployment with Jenkins on Kubernetes
Continuous Deployment with Jenkins on KubernetesContinuous Deployment with Jenkins on Kubernetes
Continuous Deployment with Jenkins on Kubernetes
 
Webinar: Development Swarm Cluster with Docker Compose V3
Webinar: Development Swarm Cluster with Docker Compose V3Webinar: Development Swarm Cluster with Docker Compose V3
Webinar: Development Swarm Cluster with Docker Compose V3
 
Best Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerBest Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with Docker
 
Optimizing Docker Images
Optimizing Docker ImagesOptimizing Docker Images
Optimizing Docker Images
 

En vedette

An Integrated Pipeline for Private and Public Clouds with Jenkins, Artifactor...
An Integrated Pipeline for Private and Public Clouds with Jenkins, Artifactor...An Integrated Pipeline for Private and Public Clouds with Jenkins, Artifactor...
An Integrated Pipeline for Private and Public Clouds with Jenkins, Artifactor...VMware Tanzu
 
API World 2013 - Transforming the Netflix API
API World 2013 - Transforming the Netflix APIAPI World 2013 - Transforming the Netflix API
API World 2013 - Transforming the Netflix APIBenjamin Schmaus
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
How Cloud Foundry is CI'd
How Cloud Foundry is CI'dHow Cloud Foundry is CI'd
How Cloud Foundry is CI'dVMware Tanzu
 
CloudBees Jenkins and Pivotal Cloud Foundry - Continuous Delivery for Cloud N...
CloudBees Jenkins and Pivotal Cloud Foundry - Continuous Delivery for Cloud N...CloudBees Jenkins and Pivotal Cloud Foundry - Continuous Delivery for Cloud N...
CloudBees Jenkins and Pivotal Cloud Foundry - Continuous Delivery for Cloud N...Nima Badiey
 
Using PaaS for Continuous Delivery (Cloud Foundry Summit 2014)
Using PaaS for Continuous Delivery (Cloud Foundry Summit 2014)Using PaaS for Continuous Delivery (Cloud Foundry Summit 2014)
Using PaaS for Continuous Delivery (Cloud Foundry Summit 2014)VMware Tanzu
 
Continuous Integration: SaaS vs Jenkins in Cloud
Continuous Integration: SaaS vs Jenkins in CloudContinuous Integration: SaaS vs Jenkins in Cloud
Continuous Integration: SaaS vs Jenkins in CloudIdeato
 
Continuous Delivery at Netflix
Continuous Delivery at NetflixContinuous Delivery at Netflix
Continuous Delivery at NetflixRob Spieldenner
 
Continuous delivery applied
Continuous delivery appliedContinuous delivery applied
Continuous delivery appliedMike McGarr
 
Margaret mc millan play as sensory learning
Margaret mc millan play as sensory learningMargaret mc millan play as sensory learning
Margaret mc millan play as sensory learningpinar19
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012Amazon Web Services
 
Rudolf steiner
Rudolf steinerRudolf steiner
Rudolf steinerpinar19
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondMike McGarr
 
Building a Scalable CI Platform using Docker, Drone and Rancher
Building a Scalable CI  Platform using Docker, Drone and RancherBuilding a Scalable CI  Platform using Docker, Drone and Rancher
Building a Scalable CI Platform using Docker, Drone and RancherShannon Williams
 

En vedette (16)

An Integrated Pipeline for Private and Public Clouds with Jenkins, Artifactor...
An Integrated Pipeline for Private and Public Clouds with Jenkins, Artifactor...An Integrated Pipeline for Private and Public Clouds with Jenkins, Artifactor...
An Integrated Pipeline for Private and Public Clouds with Jenkins, Artifactor...
 
API World 2013 - Transforming the Netflix API
API World 2013 - Transforming the Netflix APIAPI World 2013 - Transforming the Netflix API
API World 2013 - Transforming the Netflix API
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
How Cloud Foundry is CI'd
How Cloud Foundry is CI'dHow Cloud Foundry is CI'd
How Cloud Foundry is CI'd
 
CloudBees Jenkins and Pivotal Cloud Foundry - Continuous Delivery for Cloud N...
CloudBees Jenkins and Pivotal Cloud Foundry - Continuous Delivery for Cloud N...CloudBees Jenkins and Pivotal Cloud Foundry - Continuous Delivery for Cloud N...
CloudBees Jenkins and Pivotal Cloud Foundry - Continuous Delivery for Cloud N...
 
Using PaaS for Continuous Delivery (Cloud Foundry Summit 2014)
Using PaaS for Continuous Delivery (Cloud Foundry Summit 2014)Using PaaS for Continuous Delivery (Cloud Foundry Summit 2014)
Using PaaS for Continuous Delivery (Cloud Foundry Summit 2014)
 
Continuous Integration: SaaS vs Jenkins in Cloud
Continuous Integration: SaaS vs Jenkins in CloudContinuous Integration: SaaS vs Jenkins in Cloud
Continuous Integration: SaaS vs Jenkins in Cloud
 
Continuous Delivery at Netflix
Continuous Delivery at NetflixContinuous Delivery at Netflix
Continuous Delivery at Netflix
 
Continuous delivery applied
Continuous delivery appliedContinuous delivery applied
Continuous delivery applied
 
Margaret mc millan play as sensory learning
Margaret mc millan play as sensory learningMargaret mc millan play as sensory learning
Margaret mc millan play as sensory learning
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
 
Rudolf steiner
Rudolf steinerRudolf steiner
Rudolf steiner
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyond
 
Jenkins CI in Action
Jenkins CI in ActionJenkins CI in Action
Jenkins CI in Action
 
Building a Scalable CI Platform using Docker, Drone and Rancher
Building a Scalable CI  Platform using Docker, Drone and RancherBuilding a Scalable CI  Platform using Docker, Drone and Rancher
Building a Scalable CI Platform using Docker, Drone and Rancher
 

Similaire à Building Cloud Tools for Netflix with Jenkins

Weave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapWeave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapPatrick Chanezon
 
Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deploymentjavaonfly
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSDocker, Inc.
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppetbuildacloud
 
DockerCon SF 2015: Docker at Lyft
DockerCon SF 2015: Docker at LyftDockerCon SF 2015: Docker at Lyft
DockerCon SF 2015: Docker at LyftDocker, Inc.
 
Dockerization of Azure Platform
Dockerization of Azure PlatformDockerization of Azure Platform
Dockerization of Azure Platformnirajrules
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorAnil Madhavapeddy
 
How do we use Kubernetes
How do we use KubernetesHow do we use Kubernetes
How do we use KubernetesUri Savelchev
 
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architectureOpenStack Korea Community
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Arun prasath
 
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...Amazon Web Services
 
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS Riyadh User Group
 
Immutable kubernetes architecture by linuxkit
Immutable kubernetes architecture by linuxkitImmutable kubernetes architecture by linuxkit
Immutable kubernetes architecture by linuxkit어형 이
 
KVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackKVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackBoden Russell
 
Continuous Delivery the hard way with Kubernetes
Continuous Delivery the hard way with KubernetesContinuous Delivery the hard way with Kubernetes
Continuous Delivery the hard way with KubernetesLuke Marsden
 
Virtualizing Apache Spark and Machine Learning with Justin Murray
Virtualizing Apache Spark and Machine Learning with Justin MurrayVirtualizing Apache Spark and Machine Learning with Justin Murray
Virtualizing Apache Spark and Machine Learning with Justin MurrayDatabricks
 
Automating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyAutomating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyPuppet
 
Exploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in PythonExploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in PythonIvan Ma
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
 
RTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACIRTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACIJoel W. King
 

Similaire à Building Cloud Tools for Netflix with Jenkins (20)

Weave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapWeave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 Recap
 
Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOS
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppet
 
DockerCon SF 2015: Docker at Lyft
DockerCon SF 2015: Docker at LyftDockerCon SF 2015: Docker at Lyft
DockerCon SF 2015: Docker at Lyft
 
Dockerization of Azure Platform
Dockerization of Azure PlatformDockerization of Azure Platform
Dockerization of Azure Platform
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
 
How do we use Kubernetes
How do we use KubernetesHow do we use Kubernetes
How do we use Kubernetes
 
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture
[OpenInfra Days Korea 2018] Day 2 - E4 - 딥다이브: immutable Kubernetes architecture
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment
 
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
 
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul MaddoxAWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
 
Immutable kubernetes architecture by linuxkit
Immutable kubernetes architecture by linuxkitImmutable kubernetes architecture by linuxkit
Immutable kubernetes architecture by linuxkit
 
KVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackKVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStack
 
Continuous Delivery the hard way with Kubernetes
Continuous Delivery the hard way with KubernetesContinuous Delivery the hard way with Kubernetes
Continuous Delivery the hard way with Kubernetes
 
Virtualizing Apache Spark and Machine Learning with Justin Murray
Virtualizing Apache Spark and Machine Learning with Justin MurrayVirtualizing Apache Spark and Machine Learning with Justin Murray
Virtualizing Apache Spark and Machine Learning with Justin Murray
 
Automating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyAutomating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David Nalley
 
Exploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in PythonExploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in Python
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
RTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACIRTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACI
 

Dernier

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Dernier (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Building Cloud Tools for Netflix with Jenkins

  • 1. Building and Deploying Netflix in the Cloud @bmoyles @garethbowles #netflixcloud
  • 2. Who Are These Guys? Brian Moyles Gareth Bowles
  • 3. What We Build Large number of loosely-coupled Java Web Services Common code in libraries that can be shared across apps Each service is “baked” - installed onto a base Amazon Machine Image and then created as a new AMI ... ... and then deployed into a Service Cluster (a set of Auto Scaling Groups running a particular service)
  • 5. Build Pipeline Artifactory yum libraries Jenkins CBF steps resolve compile publish report sync check build test source Perforce GitHub
  • 6.
  • 7. build.xml <project name="helloworld"> <import file="../../../Tools/build/webapplication.xml"/> </project> ivy.xml <info organisation="netflix" module="helloworld"> <publications> <artifact name="helloworld" type="package" e:classifier="package" ext="tgz"/> <artifact name="helloworld" type="javadoc" e:classifier="javadoc" ext="jar"/> </publications> <dependencies> <dependency org="netflix" name="resourceregistry" rev="latest.${input.status}" conf="compile"/> <dependency org="netflix" name="platform" rev="latest.${input.status}" conf="compile" /> ...
  • 9. Jenkins Statistics 1600 job definitions, 50% SCM triggered 2000 builds per day Common Build Framework updates trigger 800 rebuilds; by scaling up to 20 cloud slaves we can complete the flood of new builds in 30 minutes 2TB of build data
  • 10. Jenkins Architecture x86_64 slave 11 x86_64 slave 1 x86_64 slave buildnode01 1 x86_64 slave Standard buildnode01 custom slaves buildnode01 buildnode01 custom slaves custom slaves slave group misc. architecture custom slaves misc. architecture misc. architecture custom slaves Amazon Linux Single Master misc. architecture m1.xlarge misc. architecture Ad-hoc slaves Red Hat Linux 2x quad core x86_64 misc. O/S & architectures 26G RAM x86_64 slave 11 x86_64Custom x86_64slave 1 slave buildnode01 ~40 custom slaves buildnode01 slave group buildnode01 maintained by product Amazon Linux teams various us-west-1 VPC Netflix data center Netflix data center and office
  • 11. Other Uses of Jenkins Monitoring of our test and production Cassandra clusters Automated integration tests, including bake and deploy Production bake and deployment Housekeeping of the build / deploy infrastructure: Reap unreferenced artifacts in Artifactory Disable Jenkins jobs with no recent successful builds Mark Jenkins builds as permanent if they are used by an active deployment in prod or test Alert owners when slaves get disconnected
  • 12. Jenkins Scaling Challenges Flood of simultaneous builds can quickly exhaust all build executors and clog the pipeline Flood of simultaneous builds can hammer rest of the infrastructure (especially Artifactory) Making global changes to all jobs Some plugins don’t scale to our number of jobs / builds Hard to test every job before upgrading master or plugins Large amount of state encapsulated in build data makes restoring from backup time consuming
  • 13. Netflix Extensions to Jenkins Job DSL plugin: allow jobs to be set up with minimal definition, using templates and a Groovy-based DSL. Housekeeping and maintenance processes implemented as Jenkins jobs, system Groovy scripts
  • 15. The DynaSlave Plugin Genesis Original build fleet: 15 VMs on datacenter hardware, 8G RAM, single vCPU, 2 executors per node Many jobs build on SCM change. Changes to our common build framework create massive thundering herd since everything depends on it. Ask for more VMs? Modify CBF less frequently?
  • 16. The DynaSlave Plugin What We Wanted Leverage our extensive AWS infrastructure, tooling, and experience No manual fiddling with machines once they launch Quick and easy to maintain a fixed pool of slave nodes that can grow/shrink to meet build demand
  • 17. The DynaSlave Plugin What We Have Exposes a new endpoint in Jenkins that EC2 instances in VPC use for registration Allows a slave to name itself, label itself, tell Jenkins how many executors it can support EC2 == Ephemeral. Disconnected nodes that are gone for > 30 mins are reaped Sizing handled by EC2 ASGs, tweaks passed through via user data (labels, names, etc)
  • 18. The DynaSlave Plugin What’s Next Dynamic resource management: have Jenkins respond to build demand and manage its own slave pools Slave groups: Allows us to create specialized (and isolated from the genpop) pools of build nodes Refresh mechanism for slave tools (JDKs, Ant versions, etc) Enhanced security/registration of nodes Give it back to the community (watch techblog.netflix.com!)
  • 19. Further Reading http://techblog.netflix.com http://www.slideshare.net/adrianco @netflixoss https://github.com/netflix http://jobs.netflix.com
  • 20. Thank you @bmoyles @garethbowles
  • 21. Thank you Questions? @bmoyles @garethbowles

Notes de l'éditeur

  1. Abstract: Over the last couple of years Netflix&amp;#x2019; streaming service has become almost completely cloud-based, using Amazon&apos;s AWS. This talk will delve into our build and deployment architecture, detailing the evolution of our continuous integration systems which helped prepare us for the cloud move. \n
  2. We work on the Engineering Tools team at Netflix. Both of us came a long way to be here. \n\nOur team is all about creating tools and systems for our engineers to use to build, test and deploy their apps to the cloud. (and DC if they reaaaaally have to :))\n\nI&amp;#x2019;ll give an overview of our continuous integration system and how Jenkins fits into it, then Brian will talk about how we&amp;#x2019;ve extended Jenkins and some of the challenges we&amp;#x2019;ve found running it at such a large scale.\n\n
  3. To get to the cloud, we rearchitected the Netflix streaming service into many individual modules implemented as web services, usually web applications or shared libraries (jars).\nOur team was responsible for creating a set of easy to use tools to simplify and automate the build of the applications and shared libraries.\nWe also were responsible for building the base machine image, creating the architecture for automating the assembly (aka baking - nothing to do with Qwikster !) of the individual application images, and building the web-based tool which is used to deploy and manage the application clusters - but we&amp;#x2019;ll concentrate on our build process for this talk.\nNote that a key aspect of using so many shared services is that each service team has to rebuild often in order to pick up changes from the other services that they depend on. This is the CONTINUOUS part of continuous integration and is where Jenkins comes in.\n
  4. Here are a few details on how we build all those cloud services.\n
  5. We wrote a Common Build Framework, based on Ant with some custom Groovy scripting, that&amp;#x2019;s used by all our development teams to build different kinds of libraries and apps. \nFor the continuous integration to run all those builds, we picked Jenkins because it&amp;#x2019;s very feature rich, easy to extend, and has a very active community. \nWe use Perforce for our version control system as it&amp;#x2019;s arguably the best centralized VCS available. But we&amp;#x2019;re making increasing use of Git; for example, our many open sourced projects are all hosted on GitHub, and we use Jenkins to build them. \nWe publish library JARs and application WAR files to the Artifactory binary repository tool. This gives us access to the build metadata and allows us to add Ivy to Ant to abstract the build and runtime jars into a dynamic dependency graph. So each project only has to know about its immediate dependencies.\nUnlike many shops we don&amp;#x2019;t use Jenkins plugins to do build tasks such as publishing to Artifactory; these are implemented in our common build framework to give us finer-grained control over functionality without having to patch a bunch of plugins.\n\n\n
  6. Here is all you need to do in Jenkins to set up a typical project&amp;#x2019;s build job. You just tell Jenkins where to find the source code and add in the Common Build Framework, then specify what targets to call from your Ant build file.\n
  7. And here is most of a typical project&amp;#x2019;s Ant and Ivy files. You can see the Ant code simply pulls in one of the standard framework entry points like, library, webapplication, etc. \n\nThen the Ivy file specifies what needs to get built and what are the dependencies. We have some extra Groovy code added to our Ant scripts that can drive Ant targets based on the Ivy artifact definitions. This helps make the build definition declarative and yet flexible.\n\nYes, XML makes your eyes bleed, and there is a lot of redundancy here. But at least it&amp;#x2019;s small and manageable.\n\n
  8. Let&amp;#x2019;s take a closer look at how we use Jenkins as the core of our build infrastructure, plus a few other interesting uses we&amp;#x2019;ve come up with.\n
  9. *** Other 50% of jobs manual or run on a fixed schedule. ***\n
  10. Our Jenkins master runs on a physical server in our data center. The master provides the UI for defining build jobs, plus controlling and monitoring their execution. \nSlave servers are used to execute the actual builds. Our standard slaves can each run 4 simultaneous builds. Custom slave groups are set up for requirements such as C/C++ builds or jobs with high CPU or memory needs.\nWe vary the number of slaves from 15 to 30 depending on demand. This is currently a manual operation but we&amp;#x2019;re working on autoscaling.\nOur cloud slaves are set up in an AWS Virtual Private Cloud (VPC), which provides common network access between our data centre and AWS. Amazon&amp;#x2019;s us-west-1 region is physically located close to our data centre, so latency is not an issue.\nAd-hoc slaves in our DC or office are used by individual teams if they need an O/S variant other than those on our standard slaves, or a specific tool or licensed app.\n\nWe keep our standard slaves updated by maintaining a common set of tools (JDKs, Ant, Groovy, etc.) on the master and syncing the tools to the slaves when they are restarted. Custom slaves can also use this mechanism if they choose.\n\n\n \n
  11. At its heart Jenkins is just a really nice job scheduler, so we&amp;#x2019;ve found lots of other uses for it. Here are some of the main ones; in the interest of time I&amp;#x2019;m not going to describe each one in detail, but please hit us up with questions if you&amp;#x2019;re interested.\n\nHousekeeping jobs usually use system Groovy scripts for access to the Jenkins runtime. Looking at posting some of these to the public scripts repository.\n\nNow I&amp;#x2019;ll hand it over to Brian who is going to talk about some scalability challenges and how we&amp;#x2019;re addressing them.\n
  12. We&amp;#x2019;ve run into a number of scaling challenges as we&amp;#x2019;ve evolved our build pipeline: Thundering herd problems, modifying and managing 1600 jobs, making sure those 1600 jobs work from Jenkins version to Jenkins version, plugin version to plugin version, and so on.\nOur goal, of course, is to have one button build/test/deploy with as little human intervention as possible, and make the developer&amp;#x2019;s life as pain-free as we can. All of these get in our way.\n
  13. We&amp;#x2019;ve enhanced Jenkins with a few plugins and odd jobs: \n- We&amp;#x2019;re working on a job DSL that will allow us to create job templates and simplify the process of configuring new jobs \n- We&amp;#x2019;ve got a number of housekeeping and maintenance jobs running via Jenkins and system Groovy scripts doing things from disabling builds that consistently fail for a long period of time with no intervention (abandoned jobs) to enforcing consistency in job configuration\n\n
  14. And we created the DynaSlave plugin, our cloud-based army of build nodes, to directly address one of our scalability problems: executor exhaustion and deep build queues during thundering herds/build storms.\n
  15. When we started the project, our build node fleet was a set of virtual machines in our datacenter.\nAs I mentioned, when we change the build framework, everything tries to rebuild (which sounds crazy but is a good thing--*continuous integration*. The sooner we can find a problem, the sooner we can fix it).\nWe could&amp;#x2019;ve bounded our changes, restricted them to off hours, but at Netflix, there isn&amp;#x2019;t really such a thing as off hours and you&amp;#x2019;re bound to get in someone&amp;#x2019;s way! We could&amp;#x2019;ve deployed more VMs, but that involves other teams, leaves us with excess capacity and wasted resources during lulls...\n
  16. Plus we had this great platform built on top of AWS and EC2. Why not leverage that?\nWe get to take advantage of our tooling, our experience with the service, we can add and remove capacity on demand, and maybe even make Jenkins master of its own domain and let it control the build node population directly.\n\nAt the time we started building this (mid-2011), nothing plugin-wise we found could maintain a small fixed fleet of AWS resources for us. Plugins seemed to take aim at using EC2 for nothing but spikes in demand, whereas we wanted to forklift the whole fleet into the cloud.\n
  17. We put together a plugin that accomplished some of those goals. The DynaSlave plugin currently allows an EC2 node to launch and register itself with Jenkins, totally hands-free. The slaves can tell Jenkins details about what it wants to be, what it can build, and so on. We can tailor nodes to specific needs, create custom pools of nodes with different instance sizes. The plugin, today, has no idea these nodes are even in EC2--pool sizing is managed by AWS ASGs and our cloud management tools like ASGARD, our Amazon management console (Soon to be open sourced!)\n
  18. We&amp;#x2019;re not done, though. We have a number of enhancements in the pipeline, but one of the bigger bits is dynamic resource management.\n\nWe&amp;#x2019;re still doing some things manually, like controlling the pool size. If someone wants to make a change to our framework, they have to remember to scale the pool up, but not too big as that can kill other systems by proxy, and they have to remember to scale down after the event, but that is EXTRA tedious as resizing ASGs will swat nodes away that are still executing jobs.\nJenkins knows what the queue looks like, Jenkins knows how many slaves are doing work, so we want to make the plugin intelligent enough to manage its own pools, and when it scales the pool down, Jenkins can pause nodes that are idle and make sure those are the ones that are pulled by the ASG, as well as bleed off traffic from busy nodes that need to be reaped.\nWe&amp;#x2019;re planning on giving this back, so keep an eye on our blog at techblog.netflix.com for announcements to that effect.\n
  19. Here are some places to look for more info.\n\nAdrian&amp;#x2019;s presentations on Slideshare are a great resource if you want to know more about our cloud architecture in general.\n\nWe&amp;#x2019;re hiring !\n
  20. \n