Estimation is dead - long live sizing, by John Coleman 24 Nov 22 to Agile Azerbaijan in person and Pozitive Technologies online
As per https://www.infoq.com/articles/sizing-forecasting-scrum/
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
Estimation is dead - long live sizing, by John Coleman 24Nov22.pdf
1. Estimation is dead
- long live sizing
John Coleman
@JohnColemanIRL
https://linktr.ee/johncolemanxagility
https://www.infoq.com/articles/sizing-
forecasting-scrum/
2. Estimate is no
longer in the
Scrum guide
Product Backlog
The Product Backlog is an emergent, ordered list of what is needed to
improve the product. It is the single source of work undertaken by the
Scrum Team.
Product Backlog items that can be Done by the Scrum Team within
one Sprint are deemed ready for selection in a Sprint Planning event.
They usually acquire this degree of transparency after refining
activities.
Product Backlog refinement is the act of breaking down and further
defining Product Backlog items into smaller more precise items. This
is an ongoing activity to add details, such as a description, order, and
size. Attributes often vary with the domain of work.
The Developers who will be doing the work are responsible for the
sizing. The Product Owner may influence the Developers by helping
them understand and select trade-offs.
3. Forecast in the Scrum guide
The Sprint
Various practices exist to forecast progress, like burn-downs, burn-ups, or cumulative flows. While proven useful, these do not
replace the importance of empiricism. In complex environments, what will happen is unknown. Only what has already happened
may be used for forward-looking decision making.
Sprint Planning
Selecting how much can be completed within a Sprint may be challenging. However, the more the Developers know about their past
performance, their upcoming capacity, and their Definition of Done, the more confident they will be in their Sprint forecasts.
5. Sizing caveats
People who do the work do the sizing, no one else!
Complex work is uncomparable - when dealing with complexity, know that these
techniques are almost always inaccurate
If we don’t ”clean up up the kitchen” as a habit, the accumulation of mess will
lead work will take longer than before
The most popular sizing techniques are either based on data or educated
guesses
7. Flow metrics -
Kanban Guide for
Scrum Teams
Throughput: The number of product backlog items finished
per unit of time
Cycle Time: End-date minus Start-date +1
Work Item Age: The elapsed calendar time between when
a work item started and the current time; this applies only
to items still in progress
Work in Progress (WIP): The number of work items started
but not finished
9. Relative
estimation
1 of 2
Time reference - Comparing current work
items to the time it took to complete historical
reference items
Assigning numeric values - Examples include
using story points based on the Fibonacci
sequence and often carried out collaboratively
with playing cards (planning poker)
T-shirt sizing - Assigning s, s/m, m, m/l, xl, xxl,
xxxl, xxxxl to Product Backlog Items instead of
numeric value
Wall estimation - Assigning numeric values by
collaboratively placing and moving cards on a
wall, also referred to as magic estimation or
silent estimation
10. Relative
estimation
2 of 2
It comes in different flavors
If you estimate, the best thing that can happen is the
estimates are correct
Estimates are prone to the "flaw of averages" (Sam
Savage). Is 50:50 an excellent way to set expectations?
The average of independent blind assessments can be
near enough to the truth (credit to Dave Snowden) -
how often estimates blind in Scrum Teams though
If you don't estimate at all, you don't waste time;
hopefully, you will discover/deliver outcomes sooner
11. Rightsizing
How much time could you save caring more about whether
the team can complete an item within the Sprint and less
about making that item infinitely smaller?
Think of the reduced cognitive load on the Product Owner
resulting from fewer PBIs
Counting the number of (valuable right-sized) PBIs delivered
to Done per Sprint is valuable for Sprint Planning and
forecasting goals
If throughput is sporadic or irregular, we have more significant
problems than forecasting; we have a "plumbing problem"
Using average throughput also pursues the "flaw of averages;"
Monte Carlo probabilistic forecasting is preferable
12. One interpretation of
#NoEstimates
STRIVE FOR AN EVEN
DISTRIBUTION OF
"BALLPARK" ITEM SIZES
THROUGHOUT A
BACKLOG
COUNT RUNNING TESTED
STORIES OR RUNNING
TESTED FEATURES TO
DEMONSTRATE
PROGRESS IN OUTPUT
TERMS
FOCUS ON SIMPLIFYING
THE WHAT FOR THE WHY -
A FOCUS ON DESIRED
OUTCOMES
RIGHT-SIZING - IDENTIFY
SMALL ENOUGH ITEMS
FOR INTAKE
SLICING INTO 24-HOUR
TIMEBOXING OF ITEMS
ENCOURAGES THE
CREATION OF
EXPERIMENTS THAT
VALIDATE
ASSUMPTIONS/HYPOTHE
SES TOWARDS A GOAL,
DISCOVER TO DELIVER
USE ROLLING-WAVE
FORECASTS TO
COMMUNICATE
UNCERTAINTY
13. One
interpretation
of
#NoEstimates
• Counting the number of (valuable right-
sized) PBIs delivered to Done per Sprint is
valuable for Sprint Planning and
forecasting goals
• "Rolling Wave Forecast” based on
throughput with variance limits is
preferable
14. Time reference
Potential downsides
Requires suitable reference items from the
past
Prone to abuse be people with a focus on
people utilization
Unsuitable for probabilistic forecasting
Potential upsides
Speaks in the customers language
Easy to pick reference items from the past
Waiting time is included in our memory of
how long it takes
Simple to do
15. Story points
Potential upsides
Useful to avoid bringing “elephants” into Work In
Progress
Could be used to limit work in progress
Easy to pick reference items from the past
Simple to do
Developers like the conversation it triggers
Often paired with t-shirt sizing or wall estimation
Could be combined with probabilistic forecasting,
but should it?
Potential downsides
Creator regrets story points
Only for the team
Story point inflation
BS story points
Often paired with planning poker (time consuming)
16. T-shirt sizes
Potential upsides
Useful to avoid bringing “elephants” into
Work In Progress
Could be used to limit work in progress
Easy to pick reference items from the past
Developers like the conversation it triggers
Simple to do
Requires very little detail
Potential downsides
Converted to numbers quite often, numbers
that get used to forecast when work might
be done
17. Wall / table estimation
Potential upsides
Useful to avoid bringing “elephants” into Work In
Progress
Could be used to limit work in progress
Easy to pick reference items from the past
Developers like the conversation it triggers
Simple to do
Requires very little detail
Guesstimate for potential value sized as well as effort
typically, priming ordering for value divided by size
Really quick
Potential downsides
Converted to numbers quite often, numbers that get
used to forecast when work might be done
Often one and done – should be revisited regularly
18. Guesstimating / counting the number/
range of items to deliver a goal
Potential upsides
Suitable for recurring probabilistic
forecasting or rolling-wave forecasting,
giving dates and uncertainty
Developers can “ballpark” the range
Useful for sizing a chunk of Product
Backlog, e.g, “elephant” sized items in the
Product Backlog
Can be used across teams
Potential downsides
People prefer relative sizing, and almost
“cannot let go”
Misunderstood that all items need to be of
equal size
For non-software different product backlog
item render it like comparing apples with
oranges
Prone to the use of averages
19. Rightsizing
Potential upsides
Simple
Less “analysis paralysis”
Supports recurring probabilistic forecasting
Potential downsides
Items right-sized just in time or in product
backlog refinement
Misunderstood that all right sized items must
be of equal size
Disconnect in Kanban community about use
of item split rate to support probabilistic
forecasting
If most days a team has no throughput,
probabilistic forecasting will have low quality
20. #NoEstimates
Potential upsides
Split items as necessary, potentially into discovery
items
Small batch is the goal
Forecasting using data – “running tested stories”
Accepts uncertainty and imperfect information
Useful for recurring forecasts
Low time investment
Seeks a mixture of item sizes
Potential downsides
In the wrong hands, splitting items into nonvaluable
items
People prefer to be wrong than uncertain
22. “John, that’s
about ten
minutes of work,
but things are so
crap around
here, make that
three days”
Estimated effort has little to do with how
long something takes
23. Variable quality with sizing an item
Factors for how long things take
The batch size – the level of effort actually needed
Waiting
time
…
Sizing for the level of effort considers
Complexity of the work
Riskiness of the work
Whether we did something similar before
Perception of skill levels required to complete the
work and availability of those skills
Availability of tools and skills using those tools
If you’re good, dependencies
25. If your forecasts are routinely
correct, you're a freak of nature
Forecasting is rarely perfect due to the following:
•Waiting time due to dependencies is a huge factor in how long work takes and is affected by many unpredictable
events.
•Even in straightforward work environments, people overestimate how efficiently their day will go.
•Often, people doing complex work in the pursuit of speed leave work behind them that is untidy and potentially
embarrassing (accidental complication).
•Complex work involves many unknown variables.
•Lack of focus
•Changing priorities
27. Monte Carlo simulations
model a future based on
data and assumptions
Forecasting, at its essence, is
about risk management
It answers the question - How
much risk is contained in our
current plans?
Lower quality forecasts also
mean inadequate risk
management
28. Estimation
qService Level Expectation based on an educated guess, e.g., 85% of right sized items are done
in 18 days or less
qIndividual item sizing – useful if you only need to focus on one next unstarted item
qGuesstimate Probabilistic item forecast - 90% guesstimating a range of a number of valuable
items to deliver a goal, based on guesstimate min/max range of valuable items
qProbabilistic guesstimate story point forecast - 90% guesstimating a range of a number of
story points to deliver a goal, based on guesstimate min/max range
qStory point range - 90% guesstimating a range of a number of story points to deliver a goal and
using probabilistic forecasting based on guesstimate min/max range
Options for managing expectations
29. qService Level Expectation based on cycle time data, e.g., 85% of right sized items are done in 18 days or less
qIndividual item age – useful if you only need to focus on one next started item but unfinished item
qData Probabilistic item forecast - 90% guesstimating a range of a number of valuable items to deliver a goal, based on throughput data of valuable
items
qRolling wave forecast - Throughput data range - 90% guesstimating a range of a number of items to deliver a goal and using throughput data
qThroughput data average - Best guess of number of valuable items divided by average throughput data (number of items done) per
day/week/sprint/month…
qProbabilistic story point forecast - 90% guesstimating a range of a range story points to deliver a goal and using probabilistic forecasting based on
story points data
qStory point data average - Best guess of number of story points divided by average number of points really done per day/week/sprint/month…
qCounting subtasks - Best guess of number of non-valuable items divided by average throughput (number of non-valuable items done) per
day/week/sprint/month…
Options for managing expectations
Forecasting
30. Estimation
qService Level Expectation based on an educated guess, e.g., 85% of right sized
items are done in 18 days or less
qIndividual item sizing – useful if you only need to focus on one next unstarted item
qGuesstimate Probabilistic item forecast - 90% guesstimating a range of a number
of valuable items to deliver a goal, based on guesstimate min/max range of valuable
items
qProbabilistic guesstimate story point forecast - 90% guesstimating a range of a
number of story points to deliver a goal, based on guesstimate min/max range
qStory point range - 90% guesstimating a range of a number of story points to deliver a
goal and using probabilistic forecasting based on guesstimate min/max range
Forecasting
qService Level Expectation based on cycle time data, e.g., 85% of right sized items
are done in 18 days or less
qIndividual item age – useful if you only need to focus on one next started item but
unfinished item
qData Probabilistic item forecast - 90% guesstimating a range of a number of
valuable items to deliver a goal, based on throughput data of valuable items
qRolling wave forecast - Throughput data range - 90% guesstimating a range of a
number of items to deliver a goal and using throughput data
qThroughput data average - Best guess of number of valuable items divided by
average throughput data (number of items done) per day/week/sprint/month…
qProbabilistic story point forecast - 90% guesstimating a range of a range story
points to deliver a goal and using probabilistic forecasting based on story points data
qStory point data average - Best guess of number of story points divided by average
number of points really done per day/week/sprint/month…
qCounting subtasks - Best guess of number of non-valuable items divided by average
throughput (number of non-valuable items done) per day/week/sprint/month…
Options for managing expectations
31. Better options
Manage expectations about uncertainty not
dates
qWe're using an empirical approach
operating one Sprint at a time
qThe Sprint Goal is not even a guarantee
qThe real answer is we don't know, but let's
start and learn quickly"
qYou might not even use Now?, Next ??,
Later ???
Being agile - don't manage expectations at all,
let people go see
qDiscover and deliver capabilities
qReview outcomes with the customers and
end-users
qLearn what can be learned
qAct on what we have discovered
32. Key take aways Avoid story points, counting non-valuable product backlog
items, counting unDone work as Done, use of averages
Avoid
Consider historical reference items but beware
of accidental complication
Consider
Try probabilistic forecasting based on counting valuable
product backlog items to Done
Try
Try #NoEstimates and “rolling wave forecasts” of valuable
product backlog items to Done
Try
For complex work, promote managing expectations about
uncertainty over managing expectations about dates
Promote
36. About me
agility chef, executive agility
guide, product manager
#2 Agile Thinkers 360, Top
50 Agile Leaders
Leadershum
Flight Levels Coach,
ProKanban Professional
Kanban Trainer, Scrum.org
Professional Scrum Trainer,
LeSS Friendly Scrum
Trainer
author of Kanplexity™,
underpinned by Cynefin®
creator of Xagility™ co-author of Kanban Guide
Host of Xagility™ & Agility
Island podcasts
Organizer for Meetup LeSS
Baku Meetup group which
was active during covid - an
official scrum.org
community and an official
LeSS meetup
37. Ideal time
Potential upsides
Time is what the customer wants
Simple to do
Potential downsides
When was you last ideal day?
Does not include waiting time, the 90+%
contributor of how long work takes
Doesn’t help infer when the work might be
done
Supports a people utilization mindset
38. Cost estimation
Potential upsides
Time is what the customer wants
Simple to do
Estimating the number of sprints could be
useful for commercial bids for example
Potential downsides
Less useful for actionable Product Backlog
Items that would go into a sprint
39. Three point (min, mid, max)
Potential upsides
Reveals some of the uncertainty
Room for optimists and pessimists
Does not use averages
Can be used for number of items, number of
story points, ideal time, reference items
Waiting time is included in our memory of how
long it takes
Simple to do
Can be converted to story points
Potential downsides
Average performance often used against
mix/max sizes for forecasting afterwards
Only for the team
Prone to inflation
Can be converted to story pointsJ
42. Get stronger flow...
without adding more people
Better to have slack than overwhelm, so
people have time to help each other
Split items into smaller but still valuable
items when needed
Show empathy within the workflow, but
also upstream and downstream
Look after aging
• unblock, focus, finish / cancel
• do ensemble work
Don’t forget to feed the system
Lower aging =>
Lower cycle times =>
after a time-lag ...
More stable throughput… then
higher throughput
Prioritize within throughput, adjust
for noise
43. Sizing is
devalued by
•Not having caveats associated with the start date,
e.g., nine weeks from the date we start
•Not recognizing the amount of work in progress and
the progress (or not) of that work
•The severity of impediments
•Not ordering items higher up the Product Backlog
according to delivery risk
•A sub-optimal approach to handling dependencies
•Confusing outputs with outcomes; a customer/end-
user outcome is a change in customer/end-user
behavior
•Not engaging in discovery activities when the risk of
not harvesting potential value is high, compounded by
assuming that every item moves from discovery to
delivery
•Delusions of accuracy and pursuing more accuracy
44. Other sizing
sub-optimal
trends
•Size per skill - typically caused by focus on resource
efficiency over flow efficiency
•Size inflation. In extreme cases, I refer to this it’s as
bingo
•Not taking quality seriously- typically caused by
pressure for more "velocity"
•Not taking the customer seriously
•Size normalization across teams
•Counting complete but fake product backlog items,
items that don't deliver value, as throughput
•Not focusing, not finishing
•Delusions of predictability for work that is
uncomparable with work from the past
•Lack of discovery to find the items we maybe should
not build; if we run low-cost experiments, we might fall
upon better ideas
45. Community
opinions on
Monte Carlo
simulations
Communities are not aligned on this approach.
One project is only executed once
While probabilities may help inform decisions,
the problem is that they don't make the
decision any easier
Estimation is often used as a proxy for a
decision (should we do this project or not?)
The reasons for using estimates differ from
probabilistic forecasting.
I have seen many probabilistic forecasts based
on guesstimates and a lack of history, yet they
were not far off in the end
46. Often there is
another question
behind the
question “when
will it be done”,
such as:
•How can I transfer worry to someone else?
•What progress is being made?
•What risks remain?
•When will we get some return on this investment?
•What trade-offs can we tolerate regarding which work
can discover/deliver the potential value, e.g., the 80:20
rule?
•What trade-offs can we tolerate in terms of reducing
some or all of effectiveness, efficiency, and
predictability, e.g., running some experiments?
•What progress trade-offs can we tolerate in terms of
required "dead work" to avoid execution bias, such as
laboratory setup?
•How much investment will go into acquiring skills,
e.g., education or apprenticeship?
47. Waiting time
Reduced by
Working together
Leaving slack so people help each other
A better visualization of how the work flows
Active management of work in progress
Flow review & improvement rigor
Starting when we have
• capacity to start
• alignment with our dependency partners
• alignment upstream and downstream
Increased by
High utilization of people
Pushing work into the system before capacity
allows for anyone who does work on the item,
including final review
Lower quality of in progress queue management,
e.g., re-prioritizing in-progress items based on
potential value
Lower quality of dependency management /
elimination
Management of the level of constrained-
resource or shared-resource queues
48. A meta-question
of "what does
winning the game
mean?" is well
worth considering.
Is the team being given a game it can win?
And if the team can win, what are the odds?
Probabilistic forecasts can help, e.g., Monte-
Carlo simulations
Despite the hazards, people fear that
stakeholders will make up arbitrarily fixed
undoable dates in a vacuum
Sometimes teams want to attain a ballpark
date range to get ahead of stakeholder
expectations
Interestingly, most of us can accept a weather
forecast that gets updated regularly based on
the latest information