This is the story of transforming Microsoft to One Engineering System with a globally distributed 24x7x365 service on the public cloud. We’ll show you round the system that handles the load of some of the most demanding engineering teams in the world and share some stories about how they got there.
Sam Guckenheimer - Moving to One Engineering System
1.
2. About Me
Sam Guckenheimer
Product Owner, Visual Studio Cloud Services
14 years Microsoft
30 years software industry
@SamGuckenheimer
https://visualstudio.com/devops
3. Microsoft Org Chart ~2011
Unintended
consequence:
No reuse
would go
unpunished.
4. There cannot be a more important thing for
an engineer, for a product team, than to
work on the systems that drive our
productivity.
So I would, any day of the week, trade off
features for our own productivity.
I want our best engineers to work on our
engineering systems, so that we can later
on come back and build all of the new
concepts we want.
4
Purpose of One Engineering System
5. An engineering north star…
…the source across the company is available to anyone
…any dev can offer improvements to anything in the company
…the IP the company has built up over the years is made of re-usable components
…anybody can find and potentially re-use components from anywhere else
…devs are rewarded for creating popular components
…there is zero lag from when a dev makes a change & when the rest of the company sees it
… build and test time is directly proportional to the change made
…devs can move to another team and already know how to work
8. Git experience on Windows repo (with GVFS on TFS)
Git GVFS
Improve
ment
clone 12hrs 2.5mins 288x
checkout 3hrs 30secs 360x
status 8mins 2.3sec 209x
commit 30mins 6.9secs 261x
9. Live Site Culture and Engineering
Live Site Health
Time to Detect
Time to Communicate
Time To Mitigate
Customer Impact
Incident prevention items
Aging live site problems
Customer support metrics
SLA per customer account
(SLA, MPI, top drivers)
Engineering
Bug cap per engineer
Aging bugs in important
categories
Pass rate & coverage by
test level
Velocity
Time to build
Time to self test
Time to deploy
Time to learn
(Telemetry pipe)
Usage
Acquisition
Engagement
Dedication
Churn
Feature usage
Transition to Git
Motivation for Why Git @ Microsoft
Learnings from Adopting Git
Contribute to Open Source Git
Issues with Us Internally - We have some long lived code bases
They don’t neatly factor themselves into small repositories (like microservices)
How many of you would like to clone the Windows Git repo to your laptop?
Troubleshooting Daughter’s Internet Access (Sam)
Sequential Migration and Refactor Code Bases
First iteration of solving this problem was Git LFS, didn’t work, we contributed/participant to the community
Why did Git LFS not work for us at least? (Too slow)
New: Git Virtual File System - lazy loading
Build at Scale - Something similar to Electric Cloud (slack of the datacenters)
Using Git, Allows us to take advantage of the Pull Request Workflow
including branch policies (supports org scale)
Segue: Takes us back to Team Dashboard (example of team autonomy and enterprise alignment)
Kanban Board - Expedite Lanes lets you handle live site issues
One thing live site culture requires is that it requires us to be on the same telemetry pipeline.
Azure and services built-on Azure
Where is the problem?
(opening the Service Insights dashboard)
Hot, warm, and cold paths
Hot - optimized for speed
Warm - optimized for troubleshooting (data doesn’t need to be kept a long time)
Cold - business analysis (long running)
Availability
Root Cause Analysis from our VP
Support Site with Live Status (and updates)
Valuable, I saw really vibrant community, great involvement from all our teams. Helped moved the needle on perception.
Next one is Feb 2018