This case study describes how we leveraged serverless technology and the AWS serverless application model (SAM) to support the needs of virtual training classes for a major US Federal agency. Our firm was excited to be selected as the main training partner to help a major US Federal government agency roll out Agile and DevOps processes across an organization comprising more than 1500 people. And then the pandemic hit—and what was to have been a series of in-person classes turned 100% virtual! We created a set of fully populated docker images containing all of the test data, plugins, and scenarios required for the student exercises. For our initial implementation, we simply pre-loaded our docker images into elastic beanstalk and then replicated them as many times as needed to provide the necessary number of instances for a given class. While this worked out fine at first, we found a number of shortcomings as we scaled up to more students and more classes. Eventually we came up with a much easier solution using serverless technology: we stood up a single page application that could kickoff tasks using AWS step functions to run docker images in elastic container service, all running under AWS Fargate. This application is a perfect fit for serverless technology and describing our evolution to serverless and SAM may help you gain insights into how these technologies may be beneficial in your situation.
2. Outline
1. About me
2. Customer Context
3. Training Solution
4. Initial Deployment Topology
5. Drawbacks
6. Re-Architecting to Serverless
7. Final Product: Tim
8. Next Steps for Tim
9. Summary & Key Takeaways
3. Craeg Strong § Software Development since 1988
§ Large Commercial & Government Projects
§ Kanban Coach / DevOps Engineer
§ Kanban Trainer / SpecFlow Trainer
§ Performance & Scalability Architect
§ Certified Ethical Hacker
§ New York & Washington DC Area
CTO, Ariel Partners
AKT, KCP, KMP, CSM, CSP, CSPO,
ITILv3, PMI-ACP, PMP, CLP, SPC,
ICP-ACC, ICP-ATF, PSM-II, PSK
CEH, ACP-JSW, AC-JPA, ACP-MJCP
www.arielpatners.com
cstrong@arielpartners.com
@ckstrong1
4. US Air Force Business and Enterprise Systems Directorate
5. Mission: Train all 1,200 in Directorate, ASAP
Constraints
1. All working from home
2. GFE cameras inoperative or disabled
3. All roles, all ranks
4. Widely disparate needs
Maintenance vs Development
non-Software teams
5. Many have no background in Agile
6. Many have no access to Jira
7. Borrowing Jira from another Agency
(DI2E)
8. Training in one shot
Cannot be stretched over half-days
Solution Design
1. Two-day intensive course
2. Hourly zoom fatigue breaks
3. Browser-only zero Install
4. Everyone gets their own instance
5. Way more hands-on, less lecture
6. Highly realistic scenarios
7. Don’t ignore the elephants in the room
Dependencies
Forecasting
Documentation
Program Initiation & Customer Discovery
...Let’s Bring Some Friends
7. 7
Curated Set of Jira Plugins
No App Name Category Type
1 Ascend Integrated Color Cards for Jira Visual Boards Free
2 Okapya Software Solutions Checklist for Jira Visual Boards Commercial
3 Easy Agile User Story Maps for Jira Visual Boards Commercial
4 Atlassian Automation for Jira - Server Scripting Commercial
5 Adaptavist ScriptRunner for Jira Scripting Commercial
6 Beecom Products JSU Automation Suite for Jira
Workflows
Scripting Commercial
7 eazyBI Reports and Charts for Jira Reports Commercial
8 55 Degrees AB ActionableAgile for Jira Metrics Commercial
9 ALM Works Structure – Product Management at
Scale
Scaling Commercial
10 Fine Software JXL Spreadsheet Table Issue Editor Bulk Editing Commercial
9. Labor Intensive
Trainer Operator Support
Drawbacks with Initial Solution
• Operations person to start/stop instances with script
• Operator must have AWS CLI tools installed
full computer required, no mobile
• Provisioning a class is very slow
20 instances take up to 3 hours to spin up and validate
• Operator has to time the startup perfectly
Too soon: AWS $$ for idle instances
Too late: class is not ready
• Extra instances required in case instances crash
• Manual coordination to match students to URLs
• Trainer cannot see if student instances are healthy
• Switching a student from a dead instance is clumsy
• Primary instance constantly running: AWS $$
• If Operator does not spin down training instances
timely: AWS $$$
Costly
10. Re-architecting to Serverless
AWS Cloud
Infrastructure API Gateway
CloudFront
Elastic Container
Service
Elastic Container
Registry
Simple Storage
Service
Infrastructure as a
Service (IaaS) Serverless
Application Model
Container
orchestration Elastic Beanstalk Fargate Lambda Step Functions
App Services
DynamoDB Cognito CloudWatch
Web App
Redux Websockets
Application
Lifecycle
Management
11. Why Not Kubernetes?
• Kubernetes is designed to run distributed systems resiliently
This is not a distributed system. Quite the opposite– everything in one Docker
• Kubernetes automatically scales your app
We don’t need scaling. Everyone has their own unique instance
• Kubernetes automatically restarts in case of failure
We don’t need auto-restart. It is a better user experience to switch to a hot standby instance
• Kubernetes has a steep learning curve
• Kubernetes would increase complexity, cost
• Kubernetes could be added later if needed, under Fargate
Fargate has hypervisor isolation; Kubernetes alone only has kernel level isolation
Bottom Line: Kubernetes is overkill;
it is not the right solution for this use case
12. Web App Improvements
Web App
Why?
1. React: Preact not compatible with
some libraries
2. Redux: Need Sophisticated State
Management
3. MaterialUI: Need Rich Widget Set
4. Typescript: Helps Manage
Complexity
5. Websockets: superior performance
and scalability over polling
Benefits
1. Much more user friendly and
intuitive
2. Supports Non-Technical Users
3. Nothing to install, 100% browser
based
4. No more “security through
obscurity” à Proper Authentication
and Authorization
Redux Websockets
13. App Services Improvements
Why?
1. Go: Only Python, Go, Java have
native support in AWS. Go has
fastest startup time and better
inherent reliability
2. DynamoDB: For storing metadata
about an instance e.g., student
name, title, email, instance URL
3. CloudWatch: For monitoring
instance metrics e.g., CPU, disk,
memory
Benefits
1. Student instances start up faster
2. Trainers can see list of students with
their instances
3. Trainers can see ranks and titles
4. Trainers can monitor instance health
5. Trainers can instantly move a
student to an available instance
6. Trainers can take attendance during
class
App Services
DynamoDB Cognito CloudWatch
14. Container Orchestration Improvements
Why?
1. Fargate: Auto-manages and
provisions instances, provides
metrics via CloudWatch, and pay as
you go. Full hypervisor isolation
2. Step Functions: Trigger workflows
via events or via a schedule. Keeps
Lambdas very simple by capturing
workflow and orchestration
3. Lambda: serverless single-task
microservice. very low overhead,
Benefits
1. Pay as you go, only when services
are active.
Delayed start does not cost money per
minute: step functions are pay-per-step
2. Trainers can schedule a class far in
advance, with confidence the
instances will be there when needed
3. Trainers can schedule a time after
the class to automatically delete
Container
orchestration Elastic Beanstalk Fargate Lambda Step Functions
15. Infrastructure as a Service (IaaS) Improvements
Why?
1. SAM: Much higher-level
constructs than Terraform.
Purpose-built to support
serverless architectures.
Benefits
1. No need to purchase or
support Terraform
Infrastructure as a
Service (IaaS) Serverless
Application Model
New Docker Image?
Just define it in YAML
16. Application Lifecycle Management (ALM) Improvements
Why?
1. Github Actions: Much simpler
model-- YAML versus Groovy-based
Jenkinsfile. Extensive library of
third-party actions
2. GitHub Issues: Simpler version of
Jira tickets, good enough for small
teams.
3. GitHub Wiki: Simple wiki pages
suffice for most projects
Benefits
1. Huge labor saver– no need to setup
an environment to host the build
server.
2. Much lower learning curve
3. No extra licensing costs, modest
costs if free minutes exceeded
4. Everything is in one place: one
password to remember
5. READMEs can link to wiki pages for
more information (two-way links)
Application
Lifecycle
Management
17. Logical Architecture
Amazon API
Gateway
(WebSockets)
Amazon API
Gateway
Lambda
REST
function
Lambda
onConnect
function
Lambda
onMessage
function
Lambda
onDisconnect
function
AWS Step
Function Start
Class
AWS Step
Function Stop
Class
Lambda Start
Class function
Lambda Stop
Class function
AWS Fargate
Amazon
CloudWatch Logs
container
insights
Amazon
DynamoDB
Persistence Layer
Amazon Cognito
Authentication
Custom
Authentication
Static website
assets (html, js,
css)
Amazon
CloudFront
TIM Users
Lambda Auth
function
Serverless Architecture
19. AWS Step Function
Coordinates Delayed Spin-up
1) SAM Configuration File
Registers Our State Machine
3) AWS State Machine Console
Confirms Delayed Start
2) State Machine Specifies
Wait Until LaunchTimestamp
21. Vision Going Forward
Training Image
Manager
Instructor-Led Training
Tools
Agile Practices
Technical Practices
Programming Languages
Certified Training
Learning
Management
System
Video On Demand
Self-Service
Setup Learning Path
Track Progress
Grading
Calendar
Course Enrollments
Print Certificates / Diplomas
Training
Management
System
Class Management
Rostering
Signups
Postings
Collect Dues
Earlybird Discounts
Marketing Campaigns
22. Key Takeaways
Will Serverless technology work for me? Yes, if:
§ Work is bursty; you don’t pay for downtime
§ You have workflow steps and coordination
§ You are free to rearchitect logic into lambdas
§ You can use best supported platforms (Go, Java, Js)
Is Re-architecting to optimize your use of cloud
technology worthwhile? YES
§ Huge Usability, Responsiveness Gains
§ Significant Performance and Reliability
Improvements
§ Excellent tool and API support
We started here... We ended here
23. Cost Savings
Will Serverless technology save
money? Yes,
§ You don’t pay for downtime
§ Services only running when they are
actually needed
§ We realized 45% savings, but YMMV