SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez nos Conditions d’utilisation et notre Politique de confidentialité.
SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez notre Politique de confidentialité et nos Conditions d’utilisation pour en savoir plus.
Why should I bother collecting metrics? How can they help me? My CFD is pretty and colourful, but what is it actually trying to tell me?
CFD, control chart, lead time distribution, percentiles...Metrics can be daunting to start with but if you know how to interpret them they can drive continuous improvement and forecast the future and take your Kanban system to the next level! It’s much easier than you think, no need for complex maths or expensive software.
At Sky Network Services a few teams are using Kanban and metrics. In this talk I’ll share our experience: what metrics we use, how we use each one of them, what little data we collect to get a whole lot of value, what pitfalls we encountered.
thank you everyone for coming to my talk. Other sessions are great Can I ask: who here has an idea of what Kanban is? who here already knows what a CFD is? Great, you are in the right place!
My name is Mattia, I’m from a small lovely town in Italy called Verona. You might know it as the city of Romeo and Juliet Background of software development, now team leader. I don’t consider myself a real coach but I like to help teams improve (run workshops, retros, talks, etc.) I work at Sky Network Services, we’re the department in Sky that deals with the Voice and Broadband. The way I explain it to my wife is “we make the internet work”. Oh btw we’re hiring
this presentation is a slightly modified version of one that I do internally for teams that want to learn about Kanban metrics I use it to share our experience about why we like using metrics, what metrics we use and how, and how we collect the data
not mandatory, but it helps if you have an idea of what kanban is and its values (e.g. limiting WIP) if you need help on this consider going to the London Lean Kanban Day, last year it was great and I’m sure this year it’s going to be even better!
Why do I even need metrics? what problem are we trying to solve? #1 - improve (hints on where you should improve, validate experiments); Kanban is all about continuous improvement -> start with what you do, and use data to improve basically you are constantly running experiments and validating them with your data #2 - forecast future, move away from estimates and use historical data to predict the future Forecasting is a big topic that deserves to be discussed on its own, for this presentation I’m focusing on #1 (but if you want to know more we can have a chat later, or look up Troy Magennis, Kanbandan, Dimitar Bakardzhiev
the typical reaction when you start talking about metrics is “how about no, thanks” The argument is usually that metrics can be gamed and they will cause dysfunctions (you’ll destroy the system but the metrics will say you’re doing great) example: velocity; we have to complete 10 points in each iteration -> double the points for each story! So yes, we are stepping in a high risk area when we talk about metrics
That’s why we distinguish between good and bad metrics: good metrics are about improving the system as a whole, rather than rewarding/punishing individuals. We recognize Deming’s rule about the performance of a system (95/5). They always work with a systemic approach, team metrics absolutely not used as target! they are a feedback mechanism. Only for internal use, no exposure to management, PMs, etc. they are usually expressed as trends rather than single numbers; single numbers tends to become targets, trends instead can tell you a more generic “you are improving” they are leading to let you act on them, rather than lagging when you can’t do anything about it
we use Jira but that has been setup with a very generic process to fit all teams; we’re pretty much a physical shop; we have physical boards to represent the real workflow and we either print cards or use post its this is the main board of my team, representing the main part of our process we represent our WIP limits with placeholders -> next = 3, dev = 2, test = 3; an empty placeholder is a pull signal we’ve got the next two stories to work on in next, then dev, functional testing (with “waiting” as a buffer), and waiting for a cut our point of commitment is when a story enters Next, so that’s when our Lead Time starts CLICK for “Iteration based” apps, stories have to wait in “waiting for cut” until the end of the the even iteration (2-4 weeks). Then we do a cut of the application and it goes through release testing, then to production. the “on demand” apps we can do the cut as soon as we want, so we try to do it as soon as possible and release as soon as possible “direct” stories are small things that have to be done (emails, reports, etc.) and go straight to done we treat these three as 3 different work item types, based on either different workflow or different speed in the same workflow This is very important because for most metrics you want to differentiate between work item types. They will have different lead time and different demand In particular for “On demand” and “Direct” we calculate the lead time from Next to Done But for “Iteration-based” we only count the time from Next to Cut and call it “Iteration Cycle time”, as the rest of the time is fixed time
collecting the data is really simple. We record transitions; we stamp the card each time it moves from one state to the next this piece of information is enough to get most of the metrics we use then we regularly put this information in a spreadsheet, where we record the work item type and the transitions. You might decide to track some extra pieces of information, e.g. for bugs we want to record the environment it was found in. It’s up to you btw there are formulas to only count working days
you’re probably wondering why we don’t use a tool. for collecting the data: make sure you are recording reality (real workflow) and you can change the data (e.g. if you forget to update it) Jira is quite bad, can’t update the dates for displaying and analysing the data: Jira is rubbish. Other tools do something, but in my experience you still want access to the raw data and go crazy with a spreadsheet. You will want to reorganise data, rearrange, split it differently, play with it...tools don’t have enough flexibility to do data mining. If you use a tool, make sure you have access to the raw data
Our spreadsheet: the only input is some details about the story, what state it’s in, and then for each state when it enters that state and how long it stayed in there Collect CFD data every day - it’s inferrable but it makes it easier All the rest is calculated or inferred (organized in various sheets) Our spreadsheet has grown in complexity over time because of many experiments, eventually I will redo it and publish it
I left this slide in but I’ll skip it, you can read it later if you want; the math involved is really easy, and all the formulas are already in excel
Probably the most famous Kanban chart for each day it shows how many stories are in each state
can be used in retrospectives and root cause analysis to look at history (but needs good facilitator)
can be used as leading, but it might be just easier to look at the board keep queueing states as thin as possible don’t let any state grow too much alert when you don’t see flow (when chart is not going steadily up)
one of the most famous (with CFD); usually done with dots, but it was easier to use columns (because of non-numeric data) Objective: retrospect: talk about stories that took longer or shorter than expected, and improve your process or policies; as you improve you should see trends of improvement leading: see stories approaching the limit and decide if you want to act on it Tips & Traps: use percentiles instead of std. deviation for the limits (std. dev only if you have a normal distribution) hard to talk about problems all the time (too busy to improve)
we calculate some stats about our cycle time (or lead time). Objective: forecasting. Lets you answer questions like “how long do stories usually take? what are the chances that the next story is going to take longer than 10 days?” Tips and traps: distinguish between “all time” and “last X months”. I usually look at the last 5 or 6 months (the process is constantly improving, older data is not representative anymore) shows trends of improvements, but doesn’t really tell you why
on x-axis you’ve got days, on y the number of stories that took that long. You should find a skewed distribution. You can draw the curve that interpolates the data - it’s called Weibull distribution Get the probability of each bucket, and if you sum them you can find that 50% of stories take 6 days or less, 85% of stories take 10 days or less So next time you ask “how long will the next story take?” you can decide how certain you want to be and pick a number. “how long will the next story take with a 80% confidence?” If I want a story to be done by a particular date, I know it needs to be in Next at least 10 days before the fixed date to have 83% confidence
Objective: forecasting (on a story level, but this is the data you’d use to do a montecarlo simulation) Tips & Traps: long tail is symptom of high standard deviation (high variability) multiple peaks are often hiding multiple work item types from the shape of the weibull you can draw some conclusions
concept of Health of a story, based on how long the story has been in progress for based on the lead time distribution I know that 50% stories take up to 6 days, so I consider that green. After 6 days it becomes yellow, we start worrying about it. And then red and black. Objective: leading. what should I work on today? It’s a way to escalate problems and raise alarms Tips and traps: remember to do it per work item type (can’t expect different work items to take the same amount of time)
this is quite specific to our context for those stories that are iteration-based we show how long they wait in “release preparation” You can see that release testing takes up most of the time. That’s the effect of having a big batch, with all the dysfunctions that come from it. We put this together with the low value that we get from it, which is number of bugs found, to decide that it would have been crazy for new projects to follow the same approach That’s why for new applications we moved to a flow approach context specific, but it’s an example of how you can use data to drive your argument
throughput: how many stories are done in a particular amount of time? we use the iteration as cadence, so “how many stories are done in two weeks?” Objective: how many stories should I plan in next iteration? are we going faster? Traps: If you split by work item type you might have iterations where you did nothing of that particular work item type; so the average is kind of weird depending on what project we’re working on, if it only involves iteration-based applications I would look at one throughput or another
Shows how often a story is started Also called takt time in lean This is how often you can change your mind about what to do next (instead of deciding every two weeks) Arrivals and departures should be balanced
this is a controversial one, people tend to have strong feelings for or against story points for stories of 1 points, they took from 2 to 10 days; stories of 2 points took 2 to 20 days; etc. very low correlation between story points and actual lead time this worked as a shock factor and we decided to stop wasting time with planning poker, or fingers in the air Now we do story breakdown and use historical data (e.g. lead time distribution) to forecast how long they’ll take
like at disneyland “how long do I wait from here?” we use 50th and 80th percentile to show how long stories will take from here, and how long they’re going to spend in here Tips and Traps: remember, it’s only valid on a “per-work item” basis (can’t mix) useful to take decisions in the middle of the iterations
another context specific metrics. When stories are in next we create a list of tasks for the story to agree on the scope and the acceptance criterias we monitored how long tasks take for a while, and now we can predict quite accurately how long a story is going to take based on the n. of tasks highly accurate for dev time (+ creating tasks makes scope clearer) Objective: forecast: how long is a story going to take, based on n. of tasks? leading: how much is left to do on this story? should we swarm? or should I rather start another story? Tips & Traps: high correlation between dev and n. tasks very low correlation between test and n. tasks (makes sense) also helps with defining scope of stories
one simple way of keeping track of quality: count number of bugs; we express it as “n. of bugs per stories” so that we can keep them into account when we plan in future
one of the best incarnation of lean mindset. Shows how long stories have spent in queue states, therefore no one was actively working on them (pure waste in lean) It’s a demonstration of deming’s rule - just by removing wait time we could improve our performance of 50% How do you reduce wait time? probably reduce WIP, have true cross-functional people, attack sources of variability Tips: represent queue states on your board by using red labels
shows where stories spend most of the time; interesting to compare it with what the team perceives are the states that take longer. Clearly shows that development is only a small fraction of time; is this our intended process, or are we here just by chance? does this reflect the importance of each state? e.g. if we think development is the main state, are we happy with it being just a small fraction? lots of potential but hard to use for the risk of people feeling blamed
GOOD drive changes to process: tells you what you should improve on and gives you directions for what to change validate experiments and arguments: can see if a change is having the effect you wanted, or find good arguments for your point enable forecasting helps answer “what should I work on today” infinite learning possibilities, only your imagination is the limit google spreadsheet as tool worked very well
WATCH OUT resist the temptation to automate everything, or even worse create a custom tool! You will keep changing charts, metrics, reorganize them, etc. Use intelligent automation, just to make manual steps easier. example: you could capture the CFD data automatically, but I still prefer to have a look and copy/past them when I’m happy they’re right don’t obsess over precision, you’re looking for trends rather than precision. Number won’t always 100% precise but it doesn’t matter only use the last period of time, for example the last 6 months. Data older than that might not be accurate anymore, or might not reflect your current process people still know it better: if you have a reason to think that reality is different from what a metric is saying, you’re probably right; it’s worth investigating, you’re probably on to learning something new!
PROBLEMS it was hard to get the team on board, and I actually never succeeded. NIM team is particularly difficult, people tend to have strong opinions and you have to find the right moment when they’re willing to listen. This is true in general. Be prepared to do it yourself, don’t expect people to help you until they think it’s helping them. It still works, you can look at the metrics and decide what’s the next change you’ll introduce. Just make sure you’re never forcing anything on the team, that’s an instant fail it’s hard to make these metrics speak: what do they mean? how do I interpret a particular chart? how do I read this? It’s hard to make them user friendly enough. So again, you can still use them as a management tool to drive the changes you introduce, but you need to make then easier to use if you want people to look at them as soon as you have work item types with different process the complexity explodes; I don’t know how to fix this, you need to make your metrics even more user friendly difficult to make it visible; ideally these metrics should be printable as some kind of dashboard, so before standup and after every iteration someone refreshes them and prints them. But I never managed to do it as soon as you open the spreadsheet you get an information overload that often scares people. Again, need to improve the usability. Add instructions, help, etc. should organise the metrics better, probably by their usage (example: daily vs iteration, lead time vs predictability) write down when important changes or important events happen, so that when you’re looking at the past you know “that’s when we changed the WIP limit in dev, have we improved?” it doesn’t matter what the numbers are saying, you’ll never be able to convince people with just numbers. You need to translate that to a feeling, make them feel a problem and then they’ll listen. example: it doesn’t matter that all metrics are telling you that you have too much WIP, people will be scared of WIP limits no matter what
Kanban Metrics in practice for leading Continuous Improvement
● from Verona, Italy
● software dev & continuous improvement
● Kanban, Lean, Agile “helper”
● Sky Network Services
Why are we here?
a little knowledge of Kanban helps
(limiting WIP, lead time, value vs waste, queues, batches, etc.)
Why do we need metrics?
#1: drive continuous improvement #2: forecast the future
But I thought metrics were bad....
Good vs Bad metrics
● look at improving the whole system ● reward/punish individuals
“95% performance is attributable to
the system, 5% to the people”
W. Edwards Deming
● feedback about state of reality ● used as target
● leading (let you change behaviour) ● lagging (tell you about the past)
● all metrics must improve ● local optimisations
Inputs: story details; start time and duration of each state
Public version: https://goo.gl/0A9QSN
For you to copy, reuse, get inspired,
All the maths you need
● Min, Max
Normal: data is distributed
around a central value
e.g. height of UK population
Skewed: data has a long tail
on one side (positive or
e.g. income of UK population
Lead time of stories follows
● Average (mean)
avg(1,2,2,2,3,14) = (1+2+2+2+3+14)/6 = 4
● Median: separates the high half from the low half. Less impacted by outliers
median(1,2,2,2,3,14) = 2
● Mode: value that occurs more frequently
mode(1,2,2,2,3,14) = 2
● Standard Deviation: measures the amount of dispersion from the average. When
high, values are spread over a large range.
stdev(1,2,2,2,3,14) = 4.5; stdev(1,2,2,2,3,5) = 1.2;
● Percentile: percentage of elements that fall within a range
50% perc(1,2,2,3,7,8,14) = 3; 80% perc(1,2,2,3,7,8,14) = 7.8;
● Normal Distribution vs Skewed Distribution:
Cumulative Flow Diagram
Description: Each day shows how many stories are in each state
Cumulative Flow Diagram
Ideal CFD: thin lines growing in parallel at a steady rate -> good flow!
Cumulative Flow Diagram
● Objective: retrospect (but needs a good facilitator)
CFD used for Retrospective
● Objective: demonstrate effectiveness of changes
changed WIP limit in DEV from 3 to 2
Cumulative Flow Diagram
● Objective: decide what you should work on today
● Objective: forecasting: rough info about lead time, wip, delivery date (although
they’re easier to use when tracked separately)
(taken from CFD article by Pawel Brodzinski)
growing lines: indicate large WIP + context switching.
action: use WIP limits
stairs: indicates large batches and timeboxes
action: move towards flow (lower WIP,
more releases, cross-functional people)
flat lines: nothing’s moving on the board
action: investigate blockers, focus on finishing, split in
single flat line: testing bottleneck
action: investigate blockers, pair with testers,
typical timeboxed iterationdropping lines: items going back
action: improve policies
Description: For each story it shows how long it took. Displays Upper and Lower control
limits; when a story falls out of limits something went wrong and you should talk about it.
Cycle/Lead Time stats + History
Description: Stats to get to know your cycle time and lead time. They let you predict “how
long is the next story likely to take?”. Visualize trends of improvement
Lead Time distribution
lead time (days)
Description: For each lead time bucket (#days), how many stories have taken that long.
Useful to show as a percentage to know probability.
Description: Indicates if the story is in good health or if we should worry about it. Based
on lead time distribution
0-4 gg 5-7 gg 8-10 gg >10 gg
Cycle Time vs Release Prep. Time
Description: For each story shows how long it spent in the iteration and in release
preparation (Context specific). Used to discuss cost vs value of release testing.
● Data driven coaching - Troy Magennis
● Seven Deadly Sins of Agile Measurement - Larry Maccherone
● The Impact of Lean and Agile Quantified - Larry Maccherone
● Kanban at Scale: A Siemens Success Story - Bennet Vallet
● FocusedObjectives@Github - Troy Magennis
● Visual feedback brings key Agile principles to life - Bazil
● How visualisation improves Psychological Safety - Bazil
● Cycle Time Analytics - Troy Magennis
● Top Ten Data and Forecasting Tips - Troy Magennis
● Forecasting Your Oranges - Dan Brown
● Using Maths to work out Potentially Deliverable Scope -
● Forecasting Cards - Alexei Zheglov
● Story Points and Velocity: The Good Bits - Pawel
● No correlation between estimated size and actual time
taken - Ian Carroll
● Analyzing the Lead Time Distribution Chart - Alexei
● Inside a Lead Time Distribution - Alexei Zheglov
● Lead Time: what we know about it, how we use it - Alexei
● The Economic Impact of Software Development Process
Choice - Troy Magennis
● Flow Efficiency - Julia Wester
● Cumulative Flow Diagram - Pawel Brodzinski
really, really appreciated! Help me improve