Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2LAL9sv.
Brian Chambers and Caleb Hurd share how Chick-fil-A manages connections and deployments using two to-be-announced open source projects. They explain how they obtain operational visibility to their services, including logging, monitoring, and tracing. They also share early lessons and battle stories learned from running Kubernetes at the Edge. Filmed at qconnewyork.com.
Caleb Hurd is a Site Reliability Engineer at Chick-fil-A. His passion is enabling small development teams to deliver their software reliably and rapidly to customers by empowering them with the tools and pipelines to do so. Brian Chambers is an Enterprise Architect at Chick-fil-A. He focuses on delivering new platforms and capabilities like Self-Service Analytics, Cloud, IoT to the business.
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
chick-fil-a-k8-clusters
3. Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
4. What to expect from the session
• Intro
• How is CFA using K8s?
• What does our
architecture look like?
• How are we
engineering around
K8s for our business?
• Q&A
6. AT PEAK HOUR
1 sandwich every 16 seconds
1 box of nuggets every 25 seconds
1 order of waffle fries every 14 seconds
1 car through the drive thru every 22 seconds
267 total transactions
14. Engineering Around K8s
• How we build and repair bare
metal clusters
• SRE Lessons Learned
• How we deploy applications to
thousands of clusters
15. Challenges of Bare Metal K8s clustering at scale
• Goal: #code2prod
• Simple enough for a non-
technologist to install
• Manageable remotely
• Automated device discovery
and self-clustering
• Self healing & HA
16. How we Bare Metal Cluster K8s at scale
Highlander Hooves Up
TOOLS
Sherlock FleetRKEImage
PROCESS
17. Bootstrapping Clusters
• Highlander
– Node coordination and
clustering leader election
using UDP
– Execute clustering (RKE)
– Swap KubeDNS for CoreDNS
– Base OAuth identity
negotiation
– Controller Pods (control
plane activity/Istio)
18. Initializing Clusters
What we considered
• Kops = love it, no bare metal
• Kubespray = slow + brittle
• kubeadmin = maybe in the future
• RKE = fairly simple, works for us
Future State?
• Stick w/ RKE, Kubeadmin, or roll our own to meet our needs
19. Resetting Cluster State
• Requirement: Need to be
able to re-image remotely
• Solution: Overlay FS + HAMS
– Manages wiping clusters
and restoring to base
20. Hooves Up
• Self-healing AWS SSM
Registration
• Free even for non-AWS
deployments
• Able to do remote
commands and patch
reporting/management
21. Lessons learned
• Use K8s feature set and don’t reinvent the wheel
• MVP. MVP. MVP.
• Ensure aggregated and searchable logging
• Deep health checks are a must --> Use /healthz
• Every service needs “/metrics”
endpoint
22. How do we deploy to our restaurants?
• Large number of
deployment targets
• Complex success/fail
criteria
• Array of application types
• What approaches did we
consider?
kubectl
/
23. Introducing Fleet
• Design Goals
– Simple to use / reason about
– Use declarative approach
– Support for variety of deployment
models (canary, blue/green)
– Rollout over flexible time period
– Sane rollback behaviors
– Leverage standard k8s API
– Full visibility
24. Fleet Ecosystem Components
• Fleet Client
– Git webhook, REST call, CLI
• Fleet Server API
– Code generation for
deployment, service,
ingress files
– Git management for cluster
repositories
– Deployment status tracking
• Atlas
– Repository of deploy-ready,
k8s compliant application
files
• Vessel
– Deployed on cluster, git
pull, kubectl apply, report
status
• Dashboards
31. Where you can find us
www.linkedin.com/in/brian-chambers
www.linkedin.com/in/calebrhurd
@brianchambers21
@calebrhurd
https://medium.com/@cfatechblog
https://github.com/chick-fil-a
32. Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/chick-fil-
a-k8-clusters