SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Billy Yuen
billy_yuen@intuit.com
Canary Release in K8s
Making Performance environment obsolete
● Who we are
● Our journey into Kubernetes
● Why Canary Release
● How we solve it
● Next Step
Agenda
3 Intuit Confidential and Proprietary
Intuit mission
Powering Prosperity Around the World
4 Intuit Confidential and Proprietary
Who we are
Founded
9,000
Employees
50M
Customers
1993
IPO
$6B
FY18
Revenue
21
Locations
1983
5 Intuit Confidential and Proprietary
Challenges in our cloud journey
● Too much time spent on infrastructure tasks.
○ AWS/Chef expertise
○ No standard deployment pipeline
● High cost for cross teams contributions.
● Engineers just want to get features out to the customers
asap without worrying of the deployment/infrastructure.
6 Intuit Confidential and Proprietary
Intuit Development Platform (Modern SaaS)
Splunk
(Logging)
PagerDuty
(Alerts)
Appdynamics
(Monitoring)
Wavefront
(Monitoring)
ServiceNow
(CM)
IDPS
(Secrets)
Intuit Kubernetes Service (IKS)
(Core Kubernetes with Intuit Network & Security policies & best practices)
EKS
Security &
Compliance
Kops
Continuous Operations
(Monitoring, Analytics, Remediation)
Olympus
(SSO & AWS Roles)
NetGenie
(Certs))
GitHub
(Apps as Code)
IBP 2.0 Jenkins
(Build & Test - CI/d)
Quality
Frameworks
(TDS, Overwatch, TrinityJS,
Hubble…)
JFrog
Artifactory
(CDP)
Argo CD
(GitOps)
JSK + Config +
Experimentatio
n
Intuit API (v4)
Streaming/
Messaging
Dev Patterns
Serverless
Framework
Argo
workflows
UX Fabric
Multi-Cluster Service Mesh and Gateway Service Catalog
AWS Infrastructure VPC, ALB/NLB, S3, RDS, DynamoDB, Elasticache, ...
Developer and Operations
Experience
Onboarding Monitoring Management
Multi-Cluster
Mgmt (IKSM)
Discover Lean/Play
Metrics/Analytics
(Team Speed Dashboards)
7 Intuit Confidential and Proprietary
Key Components of Modern SaaS platform
● CI/CD pipeline supporting GitOps for container
○ Jenkins 2.0 for pipeline
○ Artifactory as Docker image repo
○ Argo CD for deployment
● Monitoring
○ Pod metrics in Wavefront using heapster
○ Splunk for log analysis
○ AppDynamics as APM
8 Intuit Confidential and Proprietary
What is performance environment?
● Solving for
○ Identifying bottlenecks
○ Performance/Latency/Capacity
● Challenges
○ Very difficult to simulate production traffics
○ Hard to replicate production dataset
○ Dependencies not like production
9 Intuit Confidential and Proprietary
What is a Canary Release?
● “ ... a small set of end users selected for testing act
as the canaries ... negative results from a canary
release can be inferred from telemetry and metrics in
relation to key performance indicators … ”
● What we measure:
○ Pod metrics
○ JVM metrics
○ App metrics
10 Intuit Confidential and Proprietary
Common questions on Canary Release
● How is Canary Release different from Blue/Green?
○ Blue/Green will take 100% of the traffic and is used to solve for
quick fallback to minimize potential downtime.
● How can I release software that’s not fully tested?
○ Your functional tests are supposed to catch functional issues.
○ Canary is to catch performance drift and other scale issues in
prod.
11 Intuit Confidential and Proprietary
Canary Analysis Tools
● Netflix Kayenta (hosting)
○ Requires minimum 60 data points per metric.
○ Calculates mean and Std. Dev. per metric.
○ Score = Sum of weight x group metric score (aka
Model).
○ Support for custom Judge implementation.
● Wavefront as data store for canary and prod metrics.
12 Intuit Confidential and Proprietary
Changes to production pipeline
● Collect JVM and App Metrics
○ Jolokia (JVM) and Telegraph (WF integration) sidecars
○ Netflix Servo (MBeans) for App Metrics
● Support Canary Deployment (Jenkins pipeline)
○ Canary Deployment Stage using Argo CD
○ Wait and Compute Score
○ Approval Stage for prod deployment (if score > 90)
13 Intuit Confidential and Proprietary
Canary Release Flow
PR
Jenkins Pipeline
Deploy
Stage
Sanity
Test
Deploy
Canary
Wait &
Compute
Score
Approval
Deploy
Prod
Wave Front
(metrics)
Canary
Pod
Prod Pods
K8s
Kayenta
S
er
vi
c
e
J
o
l
o
k
i
a
T
el
e
g
r
a
p
h
Compute Score
Model A Model B
Prod PodsProd Pods
14 Intuit Confidential and Proprietary
The Canary Analysis Model
● Pod (Heapster)
○ CPU, memory, Page Fault
● JVM Heap Usage (Jolokia & Telegraph)
○ Thread Count, GC Count
● Application Level (Jolokia & Telegraph & Servo)
○ Business metrics
○ Server Errors Count
○ 200, 400, 500 Count
15 Intuit Confidential and Proprietary
Canary Model Refinement
● Start with Happy Path (in DR)
○ Assert on similar result (=100)
● Test for “Unhappy Paths” (in DR)
○ Spike in Application errors (< 100)
○ Spike in Memory/Thread for GC/Thread count (<100)
○ Combination of the two spikes to assert if the score
aggregates shows a lower score.
● Refine using prod traffic with manual gate
○ Assert Canary Score against other monitoring tools.
16 Intuit Confidential and Proprietary
What we have learned
● Start with as many metrics as possible because:
○ Each run will take time (minimum one hour).
○ “What if” scenarios can be applied to Collected
Metrics.
● Minimally ten metric groups to have meaningful score.
● How do you compare set of latency metrics in an one
minute window? Mean, TP50, TP99?
17 Intuit Confidential and Proprietary
In Summary
● Performance Environment is never the same as production.
● Canary Release detects performance drift and bottlenecks
using production environment and traffic.
● Canary Release Process
○ Define Metrics and Model
○ Orchestrate the canary release
○ Collect Metrics
○ Compute/Validate the score
18 Intuit Confidential and Proprietary
Next Step (Making it Scale!)
● Argo Rollouts for canary deployment
○ Eliminate custom deployment in Jenkins pipeline.
○ Enable scale up canary and scale down prod.
○ Add Baseline support.
● Prometheus for Metric Collection
○ Eliminate sidecars like Jolokia and Telegraph.
● Service Mesh to throttle Canary (5%) and Baseline (5%).
Thank you!
+Parin Shah and Danny Thomson
contribution!
Billy Yuen
billy_yuen@intuit.com
We’re Hiring!!

Contenu connexe

Tendances

The Power of GitOps with Flux & GitOps Toolkit
The Power of GitOps with Flux & GitOps ToolkitThe Power of GitOps with Flux & GitOps Toolkit
The Power of GitOps with Flux & GitOps Toolkit
Weaveworks
 

Tendances (20)

DevOps: The Future of Software Development
DevOps: The Future of Software DevelopmentDevOps: The Future of Software Development
DevOps: The Future of Software Development
 
Openshift argo cd_v1_2
Openshift argo cd_v1_2Openshift argo cd_v1_2
Openshift argo cd_v1_2
 
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCDDevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
DevOps Fest 2020. Дмитрий Кудрявцев. Реализация GitOps на Kubernetes. ArgoCD
 
Get started with gitops and flux
Get started with gitops and fluxGet started with gitops and flux
Get started with gitops and flux
 
Gitops: the kubernetes way
Gitops: the kubernetes wayGitops: the kubernetes way
Gitops: the kubernetes way
 
Argocd up and running
Argocd up and runningArgocd up and running
Argocd up and running
 
The Power of GitOps with Flux & GitOps Toolkit
The Power of GitOps with Flux & GitOps ToolkitThe Power of GitOps with Flux & GitOps Toolkit
The Power of GitOps with Flux & GitOps Toolkit
 
GitOps w/argocd
GitOps w/argocdGitOps w/argocd
GitOps w/argocd
 
Beyond OpenStack | OpenStack in Real Life
Beyond OpenStack | OpenStack in Real LifeBeyond OpenStack | OpenStack in Real Life
Beyond OpenStack | OpenStack in Real Life
 
Accelerate your business and reduce cost with OpenStack
Accelerate your business and reduce cost with OpenStackAccelerate your business and reduce cost with OpenStack
Accelerate your business and reduce cost with OpenStack
 
Helm at reddit: from local dev, staging, to production
Helm at reddit: from local dev, staging, to productionHelm at reddit: from local dev, staging, to production
Helm at reddit: from local dev, staging, to production
 
Cloud Native Islamabad - Getting Closer to Continuous Delivery with Knative
Cloud Native Islamabad - Getting Closer to Continuous Delivery with KnativeCloud Native Islamabad - Getting Closer to Continuous Delivery with Knative
Cloud Native Islamabad - Getting Closer to Continuous Delivery with Knative
 
Enabling GitOps - Architecture for Implementation
Enabling GitOps - Architecture for ImplementationEnabling GitOps - Architecture for Implementation
Enabling GitOps - Architecture for Implementation
 
Gitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operationsGitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operations
 
GitOps with ArgoCD
GitOps with ArgoCDGitOps with ArgoCD
GitOps with ArgoCD
 
The journey to GitOps
The journey to GitOpsThe journey to GitOps
The journey to GitOps
 
OpenStack and DevOps - DevOps Meetup
OpenStack and DevOps - DevOps MeetupOpenStack and DevOps - DevOps Meetup
OpenStack and DevOps - DevOps Meetup
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
CICD pipelines with GitOps
CICD pipelines with GitOpsCICD pipelines with GitOps
CICD pipelines with GitOps
 
GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)
 

Similaire à Container world 2019 Canary Release

Similaire à Container world 2019 Canary Release (20)

DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runners
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Your Application Deserves Better than Kubernetes Ingress: Istio vs. Kubernetes
Your Application Deserves Better than Kubernetes Ingress: Istio vs. KubernetesYour Application Deserves Better than Kubernetes Ingress: Istio vs. Kubernetes
Your Application Deserves Better than Kubernetes Ingress: Istio vs. Kubernetes
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance Testing
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...
 
Reliability at scale
Reliability at scale Reliability at scale
Reliability at scale
 
OpenFlow @ Google
OpenFlow @ GoogleOpenFlow @ Google
OpenFlow @ Google
 
Lisbon Mulesoft Meetup - Logging Aggregation & Visualization
Lisbon Mulesoft Meetup - Logging Aggregation & VisualizationLisbon Mulesoft Meetup - Logging Aggregation & Visualization
Lisbon Mulesoft Meetup - Logging Aggregation & Visualization
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
 
Migrating to an Agile Architecture, Will Demaine, Engineer, Fat Llama
Migrating to an Agile Architecture, Will Demaine, Engineer, Fat LlamaMigrating to an Agile Architecture, Will Demaine, Engineer, Fat Llama
Migrating to an Agile Architecture, Will Demaine, Engineer, Fat Llama
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Dernier (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 

Container world 2019 Canary Release

  • 1. Billy Yuen billy_yuen@intuit.com Canary Release in K8s Making Performance environment obsolete
  • 2. ● Who we are ● Our journey into Kubernetes ● Why Canary Release ● How we solve it ● Next Step Agenda
  • 3. 3 Intuit Confidential and Proprietary Intuit mission Powering Prosperity Around the World
  • 4. 4 Intuit Confidential and Proprietary Who we are Founded 9,000 Employees 50M Customers 1993 IPO $6B FY18 Revenue 21 Locations 1983
  • 5. 5 Intuit Confidential and Proprietary Challenges in our cloud journey ● Too much time spent on infrastructure tasks. ○ AWS/Chef expertise ○ No standard deployment pipeline ● High cost for cross teams contributions. ● Engineers just want to get features out to the customers asap without worrying of the deployment/infrastructure.
  • 6. 6 Intuit Confidential and Proprietary Intuit Development Platform (Modern SaaS) Splunk (Logging) PagerDuty (Alerts) Appdynamics (Monitoring) Wavefront (Monitoring) ServiceNow (CM) IDPS (Secrets) Intuit Kubernetes Service (IKS) (Core Kubernetes with Intuit Network & Security policies & best practices) EKS Security & Compliance Kops Continuous Operations (Monitoring, Analytics, Remediation) Olympus (SSO & AWS Roles) NetGenie (Certs)) GitHub (Apps as Code) IBP 2.0 Jenkins (Build & Test - CI/d) Quality Frameworks (TDS, Overwatch, TrinityJS, Hubble…) JFrog Artifactory (CDP) Argo CD (GitOps) JSK + Config + Experimentatio n Intuit API (v4) Streaming/ Messaging Dev Patterns Serverless Framework Argo workflows UX Fabric Multi-Cluster Service Mesh and Gateway Service Catalog AWS Infrastructure VPC, ALB/NLB, S3, RDS, DynamoDB, Elasticache, ... Developer and Operations Experience Onboarding Monitoring Management Multi-Cluster Mgmt (IKSM) Discover Lean/Play Metrics/Analytics (Team Speed Dashboards)
  • 7. 7 Intuit Confidential and Proprietary Key Components of Modern SaaS platform ● CI/CD pipeline supporting GitOps for container ○ Jenkins 2.0 for pipeline ○ Artifactory as Docker image repo ○ Argo CD for deployment ● Monitoring ○ Pod metrics in Wavefront using heapster ○ Splunk for log analysis ○ AppDynamics as APM
  • 8. 8 Intuit Confidential and Proprietary What is performance environment? ● Solving for ○ Identifying bottlenecks ○ Performance/Latency/Capacity ● Challenges ○ Very difficult to simulate production traffics ○ Hard to replicate production dataset ○ Dependencies not like production
  • 9. 9 Intuit Confidential and Proprietary What is a Canary Release? ● “ ... a small set of end users selected for testing act as the canaries ... negative results from a canary release can be inferred from telemetry and metrics in relation to key performance indicators … ” ● What we measure: ○ Pod metrics ○ JVM metrics ○ App metrics
  • 10. 10 Intuit Confidential and Proprietary Common questions on Canary Release ● How is Canary Release different from Blue/Green? ○ Blue/Green will take 100% of the traffic and is used to solve for quick fallback to minimize potential downtime. ● How can I release software that’s not fully tested? ○ Your functional tests are supposed to catch functional issues. ○ Canary is to catch performance drift and other scale issues in prod.
  • 11. 11 Intuit Confidential and Proprietary Canary Analysis Tools ● Netflix Kayenta (hosting) ○ Requires minimum 60 data points per metric. ○ Calculates mean and Std. Dev. per metric. ○ Score = Sum of weight x group metric score (aka Model). ○ Support for custom Judge implementation. ● Wavefront as data store for canary and prod metrics.
  • 12. 12 Intuit Confidential and Proprietary Changes to production pipeline ● Collect JVM and App Metrics ○ Jolokia (JVM) and Telegraph (WF integration) sidecars ○ Netflix Servo (MBeans) for App Metrics ● Support Canary Deployment (Jenkins pipeline) ○ Canary Deployment Stage using Argo CD ○ Wait and Compute Score ○ Approval Stage for prod deployment (if score > 90)
  • 13. 13 Intuit Confidential and Proprietary Canary Release Flow PR Jenkins Pipeline Deploy Stage Sanity Test Deploy Canary Wait & Compute Score Approval Deploy Prod Wave Front (metrics) Canary Pod Prod Pods K8s Kayenta S er vi c e J o l o k i a T el e g r a p h Compute Score Model A Model B Prod PodsProd Pods
  • 14. 14 Intuit Confidential and Proprietary The Canary Analysis Model ● Pod (Heapster) ○ CPU, memory, Page Fault ● JVM Heap Usage (Jolokia & Telegraph) ○ Thread Count, GC Count ● Application Level (Jolokia & Telegraph & Servo) ○ Business metrics ○ Server Errors Count ○ 200, 400, 500 Count
  • 15. 15 Intuit Confidential and Proprietary Canary Model Refinement ● Start with Happy Path (in DR) ○ Assert on similar result (=100) ● Test for “Unhappy Paths” (in DR) ○ Spike in Application errors (< 100) ○ Spike in Memory/Thread for GC/Thread count (<100) ○ Combination of the two spikes to assert if the score aggregates shows a lower score. ● Refine using prod traffic with manual gate ○ Assert Canary Score against other monitoring tools.
  • 16. 16 Intuit Confidential and Proprietary What we have learned ● Start with as many metrics as possible because: ○ Each run will take time (minimum one hour). ○ “What if” scenarios can be applied to Collected Metrics. ● Minimally ten metric groups to have meaningful score. ● How do you compare set of latency metrics in an one minute window? Mean, TP50, TP99?
  • 17. 17 Intuit Confidential and Proprietary In Summary ● Performance Environment is never the same as production. ● Canary Release detects performance drift and bottlenecks using production environment and traffic. ● Canary Release Process ○ Define Metrics and Model ○ Orchestrate the canary release ○ Collect Metrics ○ Compute/Validate the score
  • 18. 18 Intuit Confidential and Proprietary Next Step (Making it Scale!) ● Argo Rollouts for canary deployment ○ Eliminate custom deployment in Jenkins pipeline. ○ Enable scale up canary and scale down prod. ○ Add Baseline support. ● Prometheus for Metric Collection ○ Eliminate sidecars like Jolokia and Telegraph. ● Service Mesh to throttle Canary (5%) and Baseline (5%).
  • 19. Thank you! +Parin Shah and Danny Thomson contribution! Billy Yuen billy_yuen@intuit.com We’re Hiring!!