2. About Erik
Work Stuff Me Stuff
• Healthcare, Finance, • Huge foodie and amateur
Green Energy cook
• Huge Conglomerates, • Wearer of bowties
Small Employee Owned,
Fortune 500 • Homebrewer and beverage
imbiber
• Agile Solutions Manager
• Scrum Coach & Trainer • Passionate about Agile (have
multiple kanban boards up in
• Passionate about Agile
my house)
11. We Need Tangibles
• As gauges or indicators
- For status, quality, doneness, cost, etc.
• As predictors
- What can we expect in the future?
• As decision making tools
- Can we release yet?
• A visual way to peer into a mostly non-visual world
- Because we don‟t completely understand what‟s going on
in the software/project and we need to
12. History
• Tons of research, mostly from the 80-90‟s
• Based on industrial metrics
• Implementation of metrics in project management has grown
exponentially
• Hasn‟t really affected project success (what a metric!)
Metrics Usage
Software Project
Success Rate
1980 1985 1990 1995 2000 2005 2010
Chaos Report from 1995 to 2010: project success rate goes from 16% to
30%
13. Traditional to Agile
Long time horizon Short sprints
Intangible for months Inspection every sprint
Manual Risk Mitigation Inherent Risk Mitigation
May sacrifice quality by Builds Quality In
fixing schedule/scope
Many metrics used and Chief metric is working
needed software
14. SCRUM BUILDS QUALITY IN
Definition of Done
+ Acceptance Criteria
Quality
Sprint Review
+ Stakeholder and Customer Feedback
Quality
15. The only
metric
that
really
matters
is what I
say
about
your
product.
18. The Hawthorne Effect
• When you measure something, you influence it
• You can exploit this effect in a positive way
• Most traditional metrics have a negative Hawthorne
effect
• Gaming = Hawthorne Effect * Deliberate Personal
Gain
“Tell me how you will measure me and I will tell you how I will
behave”
19. Hawthorne Effect 5 min
• Where have you seen this in software
development?
• Where have you experienced gaming?
– What have you gamed?
• What do you measure now that might
have negative Hawthorne effects or easily
be gamed?
20. The Hawthorne Effect
TRY AVOID
• Identify positive/negative • Using metrics with
Hawthorne effects on each negative Hawthorne
metric that exists
effects
• Measuring things you want
more of • Easily gamed systems
• No-questions-asked policy • Measuring things you
of reporting gaming (so you don‟t really want more
can simply stop wasting
your time gathering that
of, and don‟t really have
metric) an effect on outcomes
22. Measure Up
• Austin Corollary: You get what you measure, and
only what you measure; and you tend to lose
others you cannot measure:
collaboration, creativity, happiness, dedication to
customer service …
• Suggests “measuring up”
– Measure the team, not the individual
– Measure the business, not the team
• Helps keep focus on outcomes, not output
23. Measure Up 5 min
• What are some possible outcomes of the
following common metrics:
– Lines of code
– Defects/person
– Defects/week
– Velocity
24. Measure Up 5 min
• How about these?
– Accepted Features or Features/Month
– Revenue or Revenue/Feature
– Customer Retention or Churn Rate
– Net Promoter Score
– Happiness
25. Measure Up
TRY AVOID
• Customer reported • Defects during development
defects • Capacity/Efficiency
• Team Throughput • Velocity (or worse: LoC)
• Accepted Features • New Customers
• Customer LTV • Cost
• Value
26.
27. The Measurement Paradox
“Not everything that can be counted
counts, and not everything that counts can be
counted”
– Albert Einstein
• Software development is a complex system
– Metrics used in isolation don‟t measure what you think
they do
– Stakeholders are focused on the system
• Beware „low hanging fruit‟
– Value of Measurement = 1/Ease of Measuring
28. Easy to Measure. Too Isolated. 5 min
Number of Test Cases
600
500
400
300
200
100
0
December January February March
29. The Measurement Paradox
TRY AVOID
• Measuring up! • “If we just had more
• Making measurements data…”
visible only at the • Management by metrics
appropriate level
• Measuring what really
matters, and has a direct • Sets of easy-to-gather
line-of-sight contribution metrics that purport to tell
to outcomes you something about the
system/outcome.
31. Guiding Principles
• We no longer view or use metrics as isolated
gauges, predictors, or decision making tools; rather
they indicate a need to investigate something and
have a conversation, nothing more.
• We realize now that the system is more complex than
could ever be modeled by a discrete set of
measurements; we respect this.
• We understand there are some behavioral psychology
concepts associated with measuring people and their
work; we respect this.
32. No Single Prescription
• What really matters?
– Listen to the customer
– Trends over static numbers
• Will this help us be more agile?
• For each one, let‟s ask:
– What is this really measuring?
– Who is the metric for? Who should see it?
– What behaviors will this drive?
– What‟s the risk of negative Hawthorne effects or gaming?
– Are we measuring at the right level? Up?
33. Metrics for the Team
• These are primarily for the team
(can be communicated to management)
– Sprint Burndown
– Velocity
– Release Burndown
• From the management level, intense focus or
incentivizing on these is not good
• Allow the team to use empirical data, and remain
transparent and honest
37. Metrics for Management
• These are for the team and for
management
– Working Software
– Throughput
– Happiness
• Higher level measurements (measure up!)
• Positive Hawthorne effects
39. Throughput
• Measures how much “stuff” is:
– Getting Done
– Adding Value
– The right “stuff”
• Need to view team AND business throughput
simultaneously
– Careful with correlation and causation
– Empirical way to gauge value/spend
• In place of direct capacity or productivity measures
40. Throughput 5 min
• What does this mean to you?
– How to you define “the right stuff”
• How would you measure it?
– What does “value” mean in your context?
41. Throughput: Team
Delivered Features or Value Points
2010.Q1 2010.Q2 2010.Q3 2010.Q4 2011.Q1 2011.Q2 2011.Q3 2011.Q4 2012.Q1 2012.Q2
42. Throughput: Business
• Revenue • Revenue/Feature
• If we‟re delivering – Revenue-data-driven
features all the decision making
time, how is that • Split A/B testing
effecting revenue? – Does variant A or B
• Are our development result in more revenue?
efforts effecting • Cohort Analysis
revenue? Or is it – How is revenue
something else? changing across cross-
sections of
prospects/customers?
48. Shifting Mindsets
What‟s the opposite of a fragile, defect-
ridden,
return-to-sender, crappy product?
1st Premise: Better Premise:
Zero Defects! High Value / High Revenue
Meets requirements! High Customer Satisfaction
A quick-to-change Agile
Product
50. ACSO
I
Adjusted Consolidated Segment Operating
Income
Income, without marketing costs or stock-based employee comp
(basically)
ACSOI looks good – let‟s go public!
(and use ACSOI in S-1 filing to the SEC for our upcoming $1 Billion Dollar IPO )
51. (This is for Groupon employees, but I'm posting it publicly since it will leak anyway)
After four and a half intense and wonderful years as CEO of
Groupon, I've decided that I'd like to spend more time with my family.
Just kidding - I was fired today. If you're wondering why... you haven't
been paying attention. From controversial metrics in our S-1 [IPO]
to … the events of the last year and a half speak for themselves.
…
If there's one piece of wisdom that this simple pilgrim would like to
impart upon you: have the courage to start with the customer. My
biggest regrets are the moments that I let a lack of data override my
intuition on what's best for our customers. This leadership change
gives you some breathing room to break bad habits and deliver
sustainable customer happiness - don't waste the opportunity!
…
52. Final Thoughts
• Measure Up. Start with the Customer.
• Build it quick enough & often enough to make
measuring on the build side irrelevant. Focus
measurements on the Customer side.
There‟s no place like Prod.
• The only metric that really matters is what
your customers say about your product.
What are they saying about yours?
54. Resources
•Goldratt– The Goal http://amzn.to/NTEEQR
•Mike Grifiths - Leading Answers: “Smart Metrics” http://bit.ly/yfV643
• Elisabeth Hendrickson – Test Obsessed : “Question from the Mailbox:
What Metrics Do You Use in Agile?” http://bit.ly/xtSDdg
• Ian Spence – Measurements for Agile Software Development Organizations: “Better Faster
Cheaper Happier” http://bit.ly/y4UKIt
• N.E. Fenton – “Software Metrics: Successes, Failures & New Directions” http://bit.ly/ybwUzA
• Robert Austin– “Measuring and Managing Performance in Organization” http://amzn.to/wTfgx3
• Mary Poppendieck – Lean Software Development “Measure Up” http://bit.ly/zppVTC
• Jeff Sutherland – Scrum Log: “Happiness Metric – The Wave of the Future”
http://bit.ly/xO8ETS
55. Professional
Scrum Master
by Scrum.org
– Improving the Profession of Software Development
www.centare.com/events
Extended Early Bird Pricing: sign up by Wednesday March 13
Plus: 20% promo code “ALMCHICAGO”
Erik Weber
Mar 27-28 2013
v3.1
56. May 9th
drivecentare.eventbrite.com
fast. forward. thinking. 20% off Promo Code: ALMCHICAGO
Notes de l'éditeur
Meet Bruno. Bruno is a portfolio manager at JP Morgan. He’s a smart guy, highly educated, put in decades of work before getting a senior position.Manages Synthetic Credit Portfolio as a hedge – this was supposed to be a safe position, an insurance policy if you will, to hedge against risk in other more riskier portfolios. Well, these things get pretty complex, and in April 2012, in a manner of days, it became clear that something was very wrong.6-9 billion dollar loss.
Matt Levine @ Dealbreaker:How should one read JPMorgan’s Whale Report? One way to read it is as a depressing story about measurement. There were some people and whales, and there was a pot of stuff, and the people and whales sat around looking at the stuff and asking themselves, and each other, “what is up with that stuff?” The stuff was in some important ways unknowable: you could list what the stuff was, if you had a big enough piece of paper, but it was hard to get a handle on what it would do. But that was their job. And the way you normally get such a handle, at a bank, is with a number, or numbers, and so everyone grasped at a number. Everyone tried to understand the pool of stuff through one or two or three numbers, and everyone failed dismally through some combination of myopia and the fact that each of those numbers was sort of horrible or tampered or both, each in its own special way. When we’re dealing with complex things (like a synthetic credit portfolio), it becomes harder and harder to manage it with metrics.
I’ve been there. I’ve come to believe that more metrics, more data, doesn’t necessarily mean more understanding.
When you’re on a long project – 6 months, a year or longer, we need someway to gauge these things.Developing software is a complex system that is mostly intangible. So we use these measurements as a window into that world. What’s going on here? When will we be done? What’s our quality like? Etc.It’s human nature to explain things we can’t see.
What do you think about this metric? Actually it’s a really bad one – there’s correlation/causation errors going on, and overall “project success” is way too complicated a system to judge based on one metric.Chaos Report from 1995 to 2010: project success rate goes from 16% to 30%
Agile takes all the worry and all that risk and packages it up into cute little time boxes. Agile inherently limits risk. Even if one of these boxes explode, the project isn’t a failure. And every few weeks we produce a valuable increment of product, we have the chance to inspect it and adapt our approach, reprioritize, replan etc. Managers no longer need to be worried about and have this anxiety over predicting project performance over months and months. We have real tangible results every few weeks. We can inspect it and determine the ACTUAL characteristics of the product that we used to use metrics to try to get at. Agile Projects inherently limit riskTime Boxes, WIP, DoD, AC, fast feedback(lead in) So that’s nice, but how do you define quality on this increment and on the product as a whole?
Two ways. In on any single increment we use the above mindset. These are not strict equations, I’m not doing any math here, it’s just a way to think about quality in the agile world. DoD: Shared definition among the team of what “done” means. Typically you see things like coding standards, unit test coverage, tests pass, deployable, reviewed, etc. Every piece of work must adhere to the DoD.AC: Product Owners business-language criteria for how a specific piece of work must function. Sometime written in the GIVEN-WHEN-THEN format, a practice associated with ATDD. So as we string increments of working software together, how do we get at the quality of the product? We use the mindset at the bottom for this.On the product level, it’s no longer so much about defining quality in a quantitative sense as it is about having a development process that can easily react to change. React to negative customer feedback as well as suggestions for new features and what’s most important to the customer at the moment.Stakeholders that don't show up at the Sprint Review will still be nervous, and rightly so. The corollary is: every time a manager/stakeholder/etc. asks for a report, instead of giving it to them stress the importance of showing up at the Sprint Review.
You have clear development principles that help limit risk (DoD) (verification) and clear business objectives that help limit risk (Acceptance Criteria ) (validation). This ensures some base level of quality in your product, and then through frequent stakeholder and customer feedback, we ensure ongoing quality and value of product. Our chief metric in scrum is working software. That said, what other metrics do we need? Right?
Explain Hawthorne Experiment @ Western Electric. Select group of workers old they were being studied, and their productivity changed. All the researchers did was minutely change the lighting levels.Also called demand characteristics: refers to an experimental artifact where participants form an interpretation of the experiment's purpose and unconsciously change their behavior to fit that interpretation
For example, measuring test pass/fail status always causes pass percentage to rise. But it is an artificial rise, due to people not wanting to fail tests or splitting up tests into smaller and smaller units to drive the percentage calculation up (which is just creating waste).
If people can’t seem to find good examples:Hawthorne: The JCI example of one sprint before/after we starting measuring test pass/fail metrics. Test was originally failed, and then passed with a bug.Gaming:Schwaber’s example of clearchannel employee’s auto-generating the sprint burndown because managers were getting on their case about it.
If people can’t seem to find good examples:Hawthorne: The JCI example of one sprint before/after we starting measuring test pass/fail metrics. Test was originally failed, and then passed with a bug.Gaming:Schwaber’s example of clearchannel employee’s auto-generating the sprint burndown because managers were getting on their case about it.
Robert Austin. Measuring and Managing Performance in Organization. Nucor Steel. Based plant managers salaries on productivity – of ALL plants, not just theirs.The obvious example here is defect counts.Edward Demming, the noted quality expert, insisted that most quality defects are not caused by individuals, but by management systems that make error-free performance all but impossible.
Easy to Measure. Zero Value.
“There are so many possible measures in a software process that a selection of metrics will not likely turn up something of value” – Watts Humphrey Metrics used in isolation probably don’t measure what you think they do.-System is more complex than this. We’re probably not ever going to be able to measure enough to give us a simple indicator of the system. - Isolated metrics entice people to draw system wide conclusions.-> Primary/Secondary MetricBeware long hanging fruit. Also, old literature praises low hanging fruit!-> Just because we can measure something easily doesn’t actually mean it’s meaningful.
Ask: Does everyone agree this is a easy to gather metric? What is this metric really telling us? Stakeholders: “How come we have less tests than a few sprints ago? That can’t be right. We must not be testing enough.” Stakeholders: “On my last project we had thousands of tests, why are there only a couple hundred? That can’t be right, we must not be testing enough, I bet this thing is littered with bugs.”This is an example of things that are easy to measure, and things measured in isolation. The system – the software development machine – is far too complex to be making broad quality statements based on such isolated measurements. But we’re so used to doing that. So you can start to see that some traditional metrics might not really fit the bill. Let’s go on
In his 14 Points, Deming said “Eliminate management by numbers and numerical goals. Instead substitute with leadership.” The more we rely on metrics to tell us what happened, the more we distance ourselves from the actual work being done.
We realize that measuring a system as complex as the software development machine, doesn’t really provide understanding, just data. Sometimes bad data, sometimes good data. And we realize that the obvious answer isn’t always right – like blaming bad developers for buggy products – “it must be the developers” – we respect that there is likely more going on in the system than any one root cause of anything. Further, if we use metrics the wrong way, we build games and systems that reward paying attention to the metric and not the success of the company.Overall we believe that being agile is important to the goal – our goal being making really good software products that have high value and delight customers. So we will use metrics that help us be agile. That encourage us to embrace lean and XP and good development practices.
Trends over static numbers: tear the labels off the y axisIs this setting up stakeholders to draw a system conclusion based on an isolated metric?Understand and respect the complex systemNo single prescription – figure out what makes sense for you. Take these considerations into account. We’ll go over a bunch of possible metrics next, but I’m not advocating a simple recipe for anyone. I’m certainly not saying you have to use all of these.
"you will see these, they are very useful for teams. They aren't really what you should be chiefly interested in, in fact the more you care about these the more they garner negative Hawthorne effects, possibly gaming. These are low level things that need to be driven from empirical data at the team level, so they can be honest and transparent with their work. This is a good thing. Too much focus here is too low level. So as managers and executives, here are some ways you can measure up..."
Indicates team progress. A way to visualize what’s done and what’s WIP and what’s left to do. Tool to use to see when we’ll be DONE with a particular chunk of value.Don’t like hours? Don’t want a graph? Fine: use a task board, count tasks, stories-to-done, whatever. It’s just a tool so that you as a team know how work is progressing, and can visualize that and discuss it as a team.If it’s not given to management, there is little risk of negative hawthorne effect or gaming.
Forecasts what can get to DONE in a SprintMeasures throughput, not capacityNot individualsNo comparing across teamsNot really for management, certainly not for incentives (risk of gaming)
Helps the business know when a larger chuck of functionality might be DONE. Not really part of scrum but also something you usually can’t get away without doing. At least this method of planning is based on empirical evidence of past sprints velocity and what’s actually on the backlog now, and also look at the cone of uncertainty there – we’re not promising a date, we’re just giving a forecast as accurately as we can while still being able to sleep at night.Increments are great, and this tells us when enough increments put together will satisfy some large business objective.
Our chief metric is working software. Did we get to the end of the sprint and have potentially shippable product? How do you measure this? A simple thumbs up or thumbs down. Get everyone in a room and do it. Not good enough? Then document it. We keep a running go/nogo document.if you can gather everybody who had a hand in creating the increment and get them to give a thumbs up/thumbs down, this is more powerful than mngt by numbers. Humans can dissect the complexity of software development, and they will, in the right environment, process all the information from the past sprint and come to a conclusion on whether or not the increment is good to ship.Try not to focus on what didn’t get done – keep the positive Hawthorne effect going by asking for and getting working software. Teams should be transparent on what doesn’t get done, but keep the focus positive.Why not just do this in waterfall? Get everyone in a room at after a year long project and give it the thumbs up? Well in some sense you do – we often ignore all those other metrics we’ve spent so long gathering. We rationalize sev 1’s down to 2’s, etc. In agile you can do this more safely because YOU HAVE CONTEXT. You have really good context and memory within a timebox. The risk is limited.
You will need to define what throughput means to you. We’ll talk about revenue here. You may define value/throughput in terms of cost savings, compliance to regulations, etc.
Alternative: Average Cycle Time per Feature.
Measuring Revenue is obvious. It’s the highest level we can go. But we still have correlation/causation problems. Without structuring specific experiments like Variant A/B testing or Cohort Analysis, we can never really know if our development dollars are a wise investment. Perhaps the revenue growth is due to our recent ad campaign or our awesome salesforce, or something else. Measuring Revenue/Feature in some way allows us to get at the specific ROI of developing specific features.Overall, measuring features delivered or value point velocity (from the previous slide) is dangerous if you don’t quickly take it to the next level: cold hard cash.
Modern social science and Positive Psychology has shown that happiness is a prerequisite to success. http://happiily.comEncourages self-awarenessLeading indicatorNameHow happy are you with Crisp? (scale 1-5)Last update of this row (timestamp)What feels best right now?What feels worst right now?What would increase your happiness index?Other comments
Isn’t this what we want?If we start with the premise that what we want is a zero defect product, we’re naturally driven to measure build-side things. Like defect counts and test statuses etc.But, if we look at this a different way, and say the opposite … is a high revenue, highly satisfied customer base, quickly changing and adapting product, we’re driven to measure other higher-level customer-side things, and lower level metrics seems less important.
The rationale behind the use of ACSOI is that marketing and subscriber acquisition expenses have value long into the future: they build a brand, therefore, they should be spread out over time. In the first quarter of 2011, Groupon reported a $117 million operating loss, but ACSOI was almost $82 million. That's because some $180 million of online marketing spending -- plus more than $18 million of stock-based employee compensation -- had been stripped out.