9. Code Test & Stabilize
Code
Complete
We wrote all the code months before
we shipped.
10. Planning M1 M2
We had a perfect schedule and knew
exactly when it would be ready!
11. Planning
Customer feedback – we should
change the way a feature works. We
didn’t get it quite right…
… but we’re booked solid already.
M1
12. “Great feedback. Thanks! We’ll take a
look in planning for the next release. We
should get it to you….
in a few years.”
13. Diego Lo Giudice and Dave West, Forrester
February 2011
Transforming Application Delivery
Firms today experience a much
higher velocity of business change.
Market opportunities appear or
dissolve in months or weeks instead
of years.
“
”
14. 3-week sprints
Vertical teams
Team rooms
Continual Planning & Learning
PM & Engineering
Continual customer engagement
Everyone in master
8-12 person teams
Publicly shared roadmap
Zero debt
Specs in PPT
Open source
Flattened organization hierarchy
User satisfaction determines success
Features shipped every sprint
4-6 month milestones
Horizontal teams
Personal offices
Long planning cycles
PM, Dev, Test
Yearly customer engagement
Feature branches
20+ person teams
Secret roadmap
Bug debt
100 page spec documents
Private repositories
Deep organizational hierarchy
Success is a measure of install numbers
Features shipped once a year
16. Let’s try to give our teams three things….
Autonomy, Mastery, and Purpose.
17. Group A
• Business plan
• Established in the market
• Well funded
• Hiring the best people
Group B
• Working for free
• In their spare time
• Because they want to
22. Program Management is responsible for:
WHAT we’re building, and
WHY we’re building it
Engineering is responsible for
HOW we’re building it, and that
we’re building it with QUALITY
23. Cross discipline
10-12 people
Self managing
Clear charter and goals
Intact for 12-18 months
Physical team rooms
Own features in production
Own deployment of features
29. S1 S2 S3 S4 S5 Stabilization S6
A
B
“Let’s do this Agile thing… but we should probably
reserve some time to stabilize things.”
30. We all follow a simple rule we call the “Bug Cap”:
Rule: If your bug count exceeds your bug cap… stop working
on new features until you’re back under the cap.
5 50x =10
31. We are delivering value to customers and an
increased velocity.
• More features in the 2016 calendar year (262 features)…
• Than the previous 4 years combined (256 features).
• 249 features already in 2017… with three months left.
https://www.visualstudio.com/en-us/articles/news/features-timeline
22
58
65
111
262
249
2012 2013 2014 2015 2016 2017
40. All the heavy lifting is done by this custom command line tool,
Which is the same tool we use to deploy in dev and test environments
Delays are achieved via manual intervention tasks
configured to continue after the delay
45. • DevOps efforts are usually motivated by:
– Faster time to market
– Happier customers
– Improved efficiency
– Increased reliability
• Metrics help combat “when I ship it, I’m done”
• Recommended steps
1. Establish shared objectives
2. Identify metrics for each category
3. Watch for unintended consequences
4. Establish culture of learning
• Use composable metrics (Engaged users, Satisfaction, etc) to
empower teams
Business
Customer
Operations
Health
46.
47. Service Fabric Cluster
VM Scale Set
SPS
Public IP
FQDN
Azure Key Vault
Blob Storage
Accounts
Blob Storage
Accounts
Database ClusterDatabase Cluster
Azure Load Balancers
Docker Build MachinesDocker Build Machines
Docker Registry
TFS
Public IP
FQDN
VM Scale Set
SPS
Public IP
FQDN
Azure Key Vault
Blob Storage
Accounts
Database Cluster
Azure Load Balancers
Docker Build Machines
Docker Registry
TFS
Public IP
FQDN
Notes de l'éditeur
When we first started our own agile transformation 4 years ago….
Started with what we had
Evolved in flight
Multi-tenancy
Online upgrade
Binaries and SQL schema updated both of them at once there will a down time
Binaries version it and knows how to talk to both old SQL and New SQL
Update the schema of the SQL server
Read lock - application tier request for any information
Database upgrade will have the write lock . Update the schema and release it
Its similar to blue green deployment
It not like a rolling update
Adopted cloud principles, mindset
Tracing
Feature flags
Events they want to show the new features
Without any redeployment we can enable a feature
Controlled through PowerShell or Web UI
User can opt in for the preview features – feature flags
When to use feature flag – team decision to use it
Circuit breakers
Outage due to cascading failures
Database has a problem
Call from asp.net
Latency and Concurrency -
Fail fast and Degrade gracefully
Nextflix
Circuit breaker
Testing - Fault injection
Resource utilization(CPU , Request time)
Noisy Neighbor in the multi tenant environment
Delaying
Blocking
SQL database : Provide the client the information about the resource utilization and when the client will be blocked
Xevents – SQL Azure . Which user is the resource
Decouple deployment and exposure
Flags provide runtime control down to individual user
Change without redeployment
Controlled via PowerShell or web UI
Supports early feedback, experimentation
Quick off switch
Git Lightweight topic branching instead of many branches
Use the tools you build for Example : Microsoft is using the TFS to deploy TFS or you can thing use the same set of scripts which you use for deploying to dev Test and prod
Doing Continuous Deployment. Saying if it hurts do it often so that you improve on it .
Debugging in production can be easy when we instrument everthing
Light weight topic branches
Local commits
PR request
(process tax reduction)
(feature engineer – create short-lived topic branch off of master)
This is where we’re at today – 175 commits/day into Master… build breaks perhaps 1 / month
Short-lived release branches
Many people, large tree, flat branch structure… how?
Shift-left:
Controlling build breaks… frequent small check-ins, shift-left - PR workflow helped here
Controlling product breaks… shift-left quality journey
Move to Git Spring ‘14 – helped a few different ways…
PR workflow - first class support for build validation
First-class cherry-pick workflow – easier to cherry-pick and put it where it belongs than to merge code you didn’t write
Git allows for "powerful local experimentation“ – idea that local branches are empowering
-----
Did not change overnight – several versions, including push while we were still under TFVC
Ineffiency because of testers and devs
Functional tests which are flakky
There are no more testers
Testing team which was done was reassigned to others
Testing
Shift left
Test in production
Test suite - huge automation tests which takes more time
Flaaky tests
L0 and L1 are unit test and L2 and L3 are the Functional test
Stubs and mock shims
Stubs and mocks – Green field Inteface
Which take denedency on other objects
Shims – Which provide the implementation at the runtime . Net Framework
L2 – Isolated test with fake identities
L3 test – Production UI tests
60000 test in 5-6 minutes
64000 test 6 minutes
Code reviews
Tools which will scan – basic
Public Whitepaper: Microsoft Enterprise Cloud Red Teaming (Walton, 2016) - https://gallery.technet.microsoft.com/Cloud-Red-Teaming-b837392e
Credential available in shares
Password key tokens in the code
Phishing - malware
Cross site scripting
SQL injection
Protecting the secrets
Keyvault manage
Keys
Certficates
Secrets
Rotate the secrest through automation
Azure Security center Infrasturure alerts like malware suspicious processes
Kalypso ??? Log analytics Query
Manual steps- onenote
IT team
downtime
Deploy to single instance
When it’s the engineering team’s problem, they will tool it well.
They will use consistent tooling in dev, test, prod. We Use RM to automate and orchestrate your deployment and enable the engineering team to do deployments
Config changes though powershell
No downtime
31 microservice deployment
3 week sprint deploy
Regularly
Hot fix
Green throught sprint
Safe deployments – Across rings
Update Binaries
Update Database
Deployment using Feataure flags and made available as a preview feature
Rings and account?
Rings and account?
Increment deployment and automated health check
Difference between feature flag and increment deployment
Deploy during peak time
Wait during each of the deployment
Watch any problem reported by use
Watch for telemetry
Issues
Monitoring everything
Incident happen – process of coordinating and resolving the issue
If there as issue in production . It’s the responsibility of the dev to provide the fix which will make the issues resolve quickly
Dev are also responsible for the telemetry of the feature(less or more)
Competive feature – Incident is informed and also the details about the issues are sent
Outside in
Inside out .All the customer activity request
Kusto or log analytics where we can write SQL query and make the sense of Data
Activity ID - correlation ID
Application insights
Precise alerting based on monitoring data
Low devops helath and hight customer satisifcation
Advanced – Trouble shoot insights
Robot analyze the issue and attach the insight to the incident
Robot can monitor a metric
Snapshot of the state
Remove the node from load balancer
Add a new node
SRE – platform issues
Feature issue will be routed back to Feature team to improve the monitoring
Feature team are responsible for writing code to deploy the infrastructure
Technical debt – as a simple example
Metrics - Depends on the stage of the product
Standard – Lead time
Mean time to Detection
Mean time to recovery
Metric should be such that it drives the behaviour in the right direction
Service health
Operation – cost and the resource utilization
Customer satisfaction
Business – engaged user
Metric should be such that it drives the behaviour in the right direction
Service health
Slow commands
Failed commands
Enginerring debt
Opertaion
Customer satisfaction – mails