De-Centralizing Operations with APM
Speakers: Kevin Evans, VP of DevOps and Cloud Services; Donnell Baker, Sr. Manager SRE; Angus Claus, Director SRE, Concur
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
De-Centralizing Operations with APM [FutureStack16]
1. Concur, an SAP Company
De-Centralizing Operations with APM
Kevin Evans, VP of DevOps and Cloud Services, Concur
Angus Claus, Director Service Management, Concur
Donnell Baker, Sr. Manager SRE, Concur
2. Concur, an SAP Company
De-Centralizing Operations with APM
Kevin Evans, VP of DevOps and Cloud Services
Angus Claus, Director Service Management and SRE
Donnell Baker, Sr. Manager SRE
November 2016
4. A very brief review of the Concur service
delivery journey
SDLC* methodology
Release cadence
Architecture
Ops model
* Systems development lifecycle
2003
Waterfall
9 – 18 months
On premise
as a service
IT ops
2008
Mixed
waterfall / agile
Initial: 4 months
Eventual: 1 month
Monolithic hybrid
Centralized
hosted ops
2014
Agile
Initial: 1 month
Eventual: 1 day
Microservice aspirations
Evolving
End 2 End
DevOps
5. Foundational principles
• SaaS == Software as a Service
– We are building a service, not just software
• Decentralization and empowerment
– Decompose system into a set of services
– Dedicated team owns each service
– Team is enabled and responsible for every aspect of that service
• Simplicity, simplicity, simplicity
6. Motivation for transitioning to the DevOps model
Why?
Innovation
velocity
Operational
accountability
Architecture
change
7. Core Practices
• Automation is Key to everything – Pets vs. Cattle
– Scale, Consistency, Velocity, Manageability
• Engineering vs. Administration
– Centralized group focus moves from managing servers and infrastructure to building
tools that enable others to do this work
• Frictionless and the lesson of the free Market
– Mandates don’t exist. Make the desired path the easiest to achieve compliance
• Build Culture where Everyone is accountable to the Service
– Direct ownership enables teams to be accountable for their service
• Learn from Others
– We are not the first ones to travel this path, what have other companies done to solve
the problems. How can we take these learning's, adjust, and apply
8. What does DevOps mean at Concur?
End-to-end (E2E) teams
High degree of freedom
• Technology
• Velocity of release
Self-service for operational services
Operationally accountable for service
• Performance
• Availability
• Security
• Quality
• Cost of Ownership
Customers Auditors Go to Market
E2E
Teams
E2E
Teams
E2E
Teams
E2E
teams
E2E
Teams
E2E
Teams
E2E
Teams
E2E
teams
E2E
Teams
E2E
Teams
E2E
Teams
E2E
teams
Limited
centralized
ownership
Delivery pipeline
SecurityQE
Cloud
Service
s
Production environments
13. Our Strategy - The “Rails”
Centralized Control and Standardization
• Standard Naming Convention
– App Names
– Labels
• Developers are First Class Citizens
– API Driven Configuration
• Centralized services
– Plug-in Abstraction
14. Our coverage
UI
MT
DB
4 Billion Transactions/Month
460+ Unique logins from 96 Agile
teams
New Relic charts and data used in
weekly Service Reviews
99%+ of Major incidents reference New
Relic data
8,700+ of Alerts/Month
40,000+
519 Servers
80 Apps
2 Plug-ins
436 Servers
29 Apps
1 Plug-ins
4 Plug-ins
17. Zero Touch Configuration
• Provide - Configuration as Code:
– Role Type
– SLA KPIs (Apdex Thresholds)
– Escalation Path (Pagerduty)
• You get Out of Box:
– Default Alarm / Notification Channel Set Up
– Basic Alarming
– Monitored SLA (Apdex)
– Dashboards
• Custom:
– Bespoke Alarms – Error Rates, Response Times, Min / Max Throughput
– Error Code Exclusion
– Auto Remediation
18. What Will Keep Us Successful?
• Role Based Access Control (RBAC)
• Pipeline Delivery (Control Plane)
– Build
– Ship
– Run
Distributed and diverse teams
Part of the solution versus a consumer
Mention beta and early adopters like Infra
-
Average with bumpers is 65-70
Note on min bar reqs
We want to automate this to facilitate the E2E implementation through a delivery pipeline
Because – highly distributed topology with many inter dependencies
Manual implementation doesn’t scale