Paychex, a recognized leader in the payroll, human resource, and benefits outsourcing industry, found that the demand for application deployments had increased beyond what could be supported by manual configuration. Keeping up with this demand required a shift from manually providing a service to developing an automated platform for self-service resulting in a culture change with new partnering across their DEV, OPS and Architecture teams.
David Jozis, Automation Engineer at Paychex, discusses the challenges they encountered when making these significant changes and how they were able to overcome them to accomplish 5x as many deployments as before.
2. Housekeeping
This webinar is being recorded
Links to the slides and the recording will
be made available after the presentation
You can post questions via the
GoToWebinar Control Panel
Follow the conversation on Twitter
3. About me
David Jozis
15+ years experience in coding
Professionally for about 8 years
Enjoying the innovation culture at Paychex for
just over 3 years
1 year as a developer
2 years in in the EBA group at Paychex
(Enterprise Build Automation)
Technology/coding is both my job and my
passion
3
4. Agenda
Introduction and context
Our build/deploy pipeline
What was wrong with how we configured it
What we did to configure it faster
Standardization
Automation
What we learned
This is not about how we sped up deployments, this is about how we sped up configuring new
deployments
4
6. About Paychex
Industry leader in Human Capital Management (HCM)
services with solutions for payroll, human resources,
insurance, and retirement
Established 1971
Headquartered in Rochester, NY; 100+ U.S. locations
600,000+ small- to medium-sized business clients
Technology-enabled service company
24/7/365 days per year support
13,000+ Employees
1,500 IT professionals
Paychex Flex | www.paychex.com
6
7. The Enterprise Build Automation team (EBA)
A “dev service” team
Develops/maintains/supports the build/deploy/release pipeline for consistent re-use across Paychex
We don’t write business application code
We don’t run the releases
Configure and write plugins for Bitbucket, Jenkins, Gradle, XL Deploy, XL Release
Made up of 17 members, 3 agile teams
Serves ~1,500 IT staff, ~80 agile development teams
~90:1 staff ratio
We interact with lots of different teams
7
11. Challenges around Bitbucket phase
Application naming inconsistent with Jenkins job or XL Deploy application
Fragile link between where build artifacts are created and where Jenkins looks for them (maintained in
two places)
Build/test logic in Jenkins is different from running locally
11
12. Challenges around Jenkins phase
Definition of XL Deploy metadata done through the Jenkins UI
No testing outside of Jenkins (locally building XLD packages?)
No loops or other dynamic configuration
Copy/paste mistakes
Not source-controlled – makes change tracking more difficult
No consistency enforced
Filling up XL Deploy with build artifacts, not just metadata
Dev have access to misconfigure things, such as no tags
12
13. Challenges around XL Deploy phase
Configuration of “Infrastructure” and “Environments” in XL Deploy done manually
Each team member of our shared services team configured applications slightly differently
Tag usage inconsistent – mainly based around infrastructure
Infrastructure based tags problematic for new datacenter with phased onboarding of applications
Missing information in onboarding requests
Needed to reconfigure every WebLogic deployment for a phased technology upgrade – too slow
13
14. Challenges around real infrastructure
Cluster naming inconsistent
Incorrect information turned over
WebLogic admin servers down
The infrastructure does not actually exist
Uses slightly different configuration due to manual setup
Deployment failures due to inconsistent/incorrect configuration
14
15. 15
I need my application configured for deployment
Developer EBA member
What sort of application? Does it have properties file, static web content?
It’s a WebLogic application. I think it needs these files.
I only found a cluster “my-exmpl”, is it that? Also, web-server-x doesn’t exist.
Which clusters does it go to? Which webservers do the static content go to?
It’s my-example-cluster and web-server-x.
I don’t know. CC: WebEngineer. (Discussion gets lost between teams.)
16. Summary of struggles
Lack of consistency
Lack of source of truth
Lack of accountability
Ineffective communication, too much back-and-forth, too many handoffs, etc.
Too slow
16
18. Inspiration from Netflix
Nebula is an opinionated set of plugins for the Gradle build system created by Netflix
Code is built and tested locally using Nebula
Changes are committed to a central Git repository
A Jenkins job executes Nebula, which builds, tests, and packages the application for
deployment
Gradle chosen because:
easy to write testable plugins
reducing the size of a project’s build file
More reading: https://medium.com/netflix-techblog/how-we-build-code-at-netflix-c5d9bd727f15
18
19. Inspiration from Amazon
Mandate at Amazon, from around 2002
All teams will henceforth expose their data and functionality through service interfaces
Teams must communicate with each other through these interfaces
It doesn’t matter what technology they use
How to apply this to our manual onboarding process?
More reading: https://apievangelist.com/2012/01/12/the-secret-to-amazons-success-internal-apis/
19
20. Standardization
A standard that isn’t enforced by code is a suggestion
Developed standard with agreement from
Architecture
Development
Infrastructure/Ops teams
Wrote a Java library that does standard validation
Reusable set of classes with common methods for validating different standardized elements
Different classes for different standards – WebLogic, OpenShift, .NET, Java stand-alones,
Pro*C artifacts, etc.
Throws helpful errors that guide the user to adherence
20
21.
22. Totem
The Onboarding Tool EBA Made
Java/GWT (Google Web Toolkit) Web Application
Exposes scripts that do what we used to do manually
Consistently
Automatically
Instantly
Has both Web UI and REST endpoints for every script
Absolutely minimal inputs
Didn’t use Jenkins Jobs primarily to get finer grained permissions control
Example: Anyone can run a “validate” on a script, but only LDAP group X can run Execute
22
23. Totem – How it works
Reads information from canonical source of truth in source control
Checks the source of truth against the real world for consistency
Uses the standards library to protect against non-standard configuration and input
Application naming
Tags
Infrastructure configuration
Permissions
Detects current state and only does what is necessary – can rerun
23
25. Wins
Enabled the customers!
No more EBA team involvement with onboarding common technologies
Culture change: Instead of providing a service, develop a platform for self-service
Much more scalable
5x more onboardings today than 10 people could perform in the same time
No deployment failures due to inconsistent XL Deploy configuration
Application specific tags means we control where things go
25
26. Shift left
Shift validation to run sooner
Created a Gradle plugin for publishing to XL Deploy
Parsed that metadata, applied the same standards library
Get build failures in Jenkins due to standards violations
Shift validation to run sooner again
Run the same Gradle script locally, before check-in
Get failures before committing configuration
26
27. Package task
Publish task
No duplicate maintenance of artifact paths
Generate/validate application specific tags
28. Additional improvements
Dynamic/flexible configuration
Can manipulate artifacts during the publish script to publish slightly different versions
Can publish in a loop
Used the above to publish slightly different artifacts for internal/external service deployments
They can be separately deployed, and be tagged separately
Created a Gradle plugin to publish the artifacts to Artifactory
No configuration, apply the plugin and it handles the rest dynamically by reading the XL Deploy
package configuration (and updating it)
Make the Gradle validation plugin check Jenkins job name and Bitbucket repository name in the future?
28
29. Current State: Paychex Build/Deploy Pipeline
Environment
Infrastructure
Git repository
manager/store
Runs build,
runs Gradle
publish script
Publishes build
artifacts to
Artifactory,
and package to
XL Deploy
Stores binaries
Stores deployment
metadata
29
30. 30
I’ve configured my Application for deployment. Configure the Infrastructure?
Developer Totem
I checked Web Engineering’s files in source control. Because of your
application name, I can see that it goes to servers X/Y, but X doesn’t
exist in reality, and there’s another server Z that is in that cluster. Please
work with them to resolve it and try again.
I’ve configured my Application for deployment. Configure the Infrastructure?
Done. Here’s all the changes I made.
31. Efficiency gains
One EBA team’s velocity in 2015 was about 15
points per 2 week sprint
3 onboardings per sprint with nothing else
Totem puts through ~147 points worth of
onboardings per 2 weeks
~10x more throughput
Equivalent of ~40 EBA members
Doesn’t need EBA involvement
EBA team now spends their time on projects that
will advance Paychex CI/CD capabilities
Velocity has increased 3x or more
Much less back-and-forth
0
20
40
60
80
100
120
140
160
One (of the two) EBA teams in
2015
Totem in 2017
Story points worth of onboardings per 2
weeks
Story points worth of onboardings per 2 weeks
31
32. Lessons learned
A standard that isn’t automatically enforced is a suggestion
Groovy DSLs can offer great flexibility for configuration
Self-service apps needs both a UI and REST API
Good automation can result in buy-in and culture change
Demand scales to meet supply
Everyone expects everything Just In Time now
More than 5x output required than 6 months ago
Assumptions will be made about what it does or doesn’t do
Fear of the unknown – training and communication are important!
32
33. Lessons learned
After you improve, get ready to improve again
Eat your own dogfood
We build and deploy Totem via the same pipeline
Good abstractions give you convenient points to extend from
We’re using the same Gradle scripts now to gather code coverage metrics and associate them with
artifacts
We were able to trivially add publishing binaries to Artifactory instead of XL Deploy because we put
our configuration for deployments in Gradle
Take what you do today, write it in code, then iterate
No need to start with perfection
33
34. Questions?
Don’t miss our next webinar – Nov. 7th
Guest speaker: Robert E Stroud CGEIT CRISC –
Principal Analyst, Forrester Research
35. 35
Thank you!
▪ Don’t miss our next webinar – Nov. 7th
▪ Guest speaker: Robert E Stroud CGEIT CRISC – Principal Analyst, Forrester Research
Notes de l'éditeur
My name is David Jozis
From Australia
Automation engineer in EBA team
Been coding for 15+ years
Professionally for about 8 years
I’ve really been enjoying the innovation culture at Paychex for just over 3 years
1 year as a developer
2 years in the EBA group at Paychex
Technology/coding is both my job and my passion
Coder by day, coder by night kind of thing
For the flow, starting with introduction and context
Paychex, my team
Our build/deployment pipeline
The challenges we faced after setting it up
How we overcame those problems through standardization and automation
What we learned
This is not about how we sped up deployments, this is about how we sped up configuring new deployments and removed ourselves from that process by automation
DevOps doesn’t stop at buying and using tools, it’s not “happily ever after”, after that you have to support those tools, (with consideration to people, culture, process, etc) and again, that’s what we are going to talk about here
Let’s start with who we are and the context of this story
Some background
Paychex is an industry leader in Human Capital Management
Provides payroll, human resources, insurance and retirement services
The only player in the market to provide all those services through a single platform
Called “Paychex Flex”
Important to us
Gives an advantage through being able to innovate and pivot quickly when circumstances or trends change
Lets clients start with what they need, and expand services as they go without changing the platform they interact with
We have 600,000 clients
Services catered to 1-1,000 employee businesses
Within Paychex, I work in a dev service team called Enterprise Build Automation (EBA)
Slightly outdated name
We develop/maintain/support the build/deploy/release pipeline
Consistent across Paychex dev teams
We don’t write business application code
We don’t execute the actual releases
Includes writing plugins for Bitbucket, Jenkins, Gradle, XL Deploy and XL Release
Bitbucket is another team, but we do have a plugin that lives there
Not just writing plugins, also support services to development
Lots of considerations
Platform performance
Ease of use
Security
Will get into some of the challenges we’ve faced
Made up of 17 team members split between 3 agile teams
Our customer base is 1,500 IT staff split into 80 agile development teams
That’s up to 90 customers to one of us
We interact with lots of teams, not just dev
Means lots of communication and switching gears
For context, the tools we interact with most at EBA span the following
(Show box) These are the tools we host and support directly
Shows the elements of the build/deploy/release pipeline that are common to all software that our team supports
Bitbucket is a Git repository manager
Source code repository
Many of the other tools are similar – script runners
Jenkins farms out software builds to agent hosts
XL Deploy stores the state of what it has done and performs diffs against that to determine what to do next
XL Release allows you to see and modify in-flight releases and support manual interaction
Artifactory
Repository store
Previously used just for dependencies
During this effort, moved to deploying out of Artifactory
From those tools, we’re going to narrow in on the old build/deploy pipeline
Highlighting challenges which stem from manually configuring our automated pipeline
We have resolved these challenges, they fit in the timeframe 2015-2016, finally solved early 2017
Developer checks in code
Jenkins builds it
XL Deploy deploys it
We call Jenkins or XL Deploy configuration for a new application either “build” or “deploy” onboarding respectively
It’s a great ideal, but by itself, has many problems
How to manage it?
Each tool configured separately
No consistency between repository name, Jenkins job name or XL Deploy application
Each configured separately
In Bitbucket, build scripts are told to put the build output in one particular place
Is Jenkins configured to look at the same place?
Separately maintained, fragile link
Build/test logic in Jenkins is different from running locally
Developers don’t run the Jenkins build logic before checking in
If it fails, is it a problem with the EBA platform, or developer change?
Previously, all of the configuration for XL Deploy deployment packages was done through the Jenkins UI
Using the XL Deploy plugin for Jenkins
Which now supports pipeline, but didn’t then
Can’t test creation of XL Deploy packages outside of Jenkins
Being a support team for this, this can matter
Had to keep re-running Jenkins jobs to test, or use other work-arounds
Can’t add dynamic loops, such as publish dynamically
All statically configured
XL Deploy configurations get copy/pasted, tags get copy/pasted too
Can lead to deploying things to the wrong hosts
Not source controlled, so it’s not always easy to see who last made changes, or have a review process like pull requests
No consistency for application naming conventions or tagging conventions or anything else
Publishing artifacts into XL Deploy instead of Artifactory
Devs can misconfigure things, such as removing the tags
Which makes XL Deploy deploy their software to every host it can
Configuration of “Infrastructure” and “Environments” in XL Deploy done manually
Each EBA member configured applications slightly differently
Tag usage inconsistent – mainly based around infrastructure
Infrastructure based tags problematic for new datacenter with phased onboarding of applications
Missing information in onboarding requests
Needed to reconfigure every WebLogic deployment for a phased technology upgrade – too slow
Cluster naming inconsistent
Incorrect information turned over
WebLogic admin servers down
The infrastructure does not actually exist
Uses slightly different configuration due to manual setup
Deployment failures due to inconsistent/incorrect configuration
The previous flow went something like this
Lack of consistency
Standards not followed consistently
Lack of source of truth
Depending on user input
Lack of accountability
Who owns which parts of the process?
Ineffective communication, too much back-and-forth, too many handoffs, etc.
Too slow
… And we had to stand up a new datacentre, and re-onboard everything we had already onboarded for a new version of WebLogic being introduced in a phased fashion
We couldn’t even control which applications went where due to shared tags
OPTION: Add information about what the impact of these things were?
Fortunately, we conquered the those issues through standardization and automation
Standardization being a requirement to automation
The result of standardizing and automating was
Increasing capacity to onboard applications to 5 times what it was
Removing EBA from the process entirely
So we saved the equivalent of 40 full time employees
Removed all deployment failures related to EBA configuration inconsistencies
Going to talk about how we accomplished that
Not in a vacuum
Other teams also were standardizing and automating at the same time
WebLogic setup for example
Architectural standards
Starting with some of what inspired us to take our approach
We liked the consistency wins Netflix had in some of the areas we struggled with initially
In order to automate, first we had to standardize
We learned fast that a standard that isn’t enforced by code is a suggestion
We had standards in place
They weren’t always applied or used or were even interpreted differently
Sometimes by developers, sometimes by members of EBA
Teams collaborated cross-functionally to develop a standard that would add benefits for everyone
Architecture
Development
Infrastructure/Ops teams
When that standard was developed, EBA developed a Java library that could validate against it
Reusable set of classes with common methods for validating different standardized elements
Different classes for different standards – WebLogic, OpenShift, .NET, Java stand-alones, Pro*C artifacts, etc.
Throws helpful errors that guide the user to adherence
Not necessarily the prettiest code initially
Here is a small glimpse of it
After standardizing, we automated
Main pain-point: configuring XL Deploy which we previously did via the UI, completely manually
Created Totem
Which stands for The Onboarding Tool EBA Made
Written using Java and Google Web Toolkit
Exposes what we used to do manually, but instead does it
Consistently
Automatically
Instantly
Before long we found people wanted to call Totem scripts from XL Release
Added a REST endpoint that supported any script
All scripts use only string input, which made this easy
The REST endpoint supports all future scripts as well
We emphasized minimal inputs
Makes the scripts safer to run since they depend more on the single source of truth instead of user input
Less questions to us about what to use for input
Most deploy onboarding scripts only take 2 arguments
The deployment package id
The environment tier
We didn’t use Jenkins
We needed finer grained permission control, such as all users can run Validate but only certain LDAP groups can run Execute
You get a better UI this way
In the future we can customize the UI to suit the use
We don’t need to occupy agents
Here’s Totem
Fairly simple UI
The flow is
Log in
Select a script, in this case our WebLogic onboarding script
Fill out input
Validate or execute or anything else
This example is trying to onboard Totem as a WebLogic application, which it isn’t, so it is throwing a validation error because it doesn’t follow the WebLogic standards
The next step was to shift validation to run sooner
Created a Gradle plugin for publishing to XL Deploy
Parsed that metadata, applied the same standards library
Get build failures in Jenkins due to standards violations
We found we can run it even sooner
Run the same Gradle script locally, before check-in
Get failures before committing configuration
This example is a snippet from how Totem itself is configured to publish to XL Deploy for deployment
Package task, publish task
Within the package task, defining your deployables, etc
No duplicate maintenance of artifact paths, we can find the file from the “war” task’s output here directly
We generate application specific tags – this method prepends the application name automatically
You just provide a classifier that splits up the different deployable types
Properties files, static web content, war file itself, etc, so that we can deploy the deployables to different infrastructure as required
Validates that the classifier you select is part of a known list of classifiers to support automatic and consistent onboarding