Story points considered harmful - or why the future of estimation is really in our past

Story Points Considered Harmful
Or why the future of estimation is really in our past...
#ard2012, Riga All pictures available on

Vasco Duarte
@duarte_vasco
http://bit.ly/vasco_blog
http://bit.ly/vasco_slideshare
Talk Developed with @josephpelrine

Tell me again:
why did we
move here?

The Flat Earth Society
(also known as the
International Flat Earth
Society or the
International Flat Earth
Research Society) is an
organization that seeks
to further the belief that
the Earth is flat instead
of an oblate spheroid
More at: http://theflatearthsociety.org

Expert estimation
Consensus estimation
Function Point Analysis
COCOMO

Precognition [pree-kog-nish-uhn]
pre·cog·ni·tion
[pree-kog-nish-uhn]
1. knowledge of a future event or situation,
especially through extrasensory means.

(Hindsight is always twenty-twenty)
-Anonymous (the other one!)
Life Can only be understood backwards, but
it must be lived forwards…
- Soren Kierkegaard

To be or not to be complex! That is the question!

• Q1: Is there sufficient difference between
what Story Points and ’number of items’
measure to say that they don’t measure the
same thing?
• Q2: Which one of the two metrics is more
stable? And what does that mean?
• Q3: Are both metrics close enough so that
measuring one (# of items) is equivalent to
measuring the other (Story Points)?

Data summary
• Nine (9) data sets (a few more by now...)
• I was not a stakeholder or had any role in any
of these projects
• Data came from different companies and
different sized teams

The Data
spr22 spr21 spr20 spr19 spr18 spr17 spr16 spr15
0
5
10
15
20
25
30
35
40
45
Story pts done
items done
Correlation: 0,755
Team A / Company N
spr21 spr20 spr19 spr18 spr17
0
5
10
15
20
25
30
35
40
45
Sp Normalized
Items done normalized
Correlation
(w/out normalization):
0,92
Team CB / Company N
spr F spr E spr D spr C spr B spr A
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
Sp Normalized
Items done normalized
Team HC / Company N
Correlation (w/out)
normalization: 0,83
spr14 spr13 spr12 spr11 spr10 spr9 spr8 spr7
0
10
20
30
40
50
60
Story pts done
items done
Team CF / Company N
Correlation: 0,51
(0,71 without the
spr14)

The Data
sprint
40
sprint
41
sprint
39
sprint
38
sprint
37
sprint
36
sprint
35
sprint
34
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
Sp Normalized
Items done
normalized
Team HCM / Company N
Correlation
(w/out normalization):
0,88
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
160.00
180.00
1 3 5 7 9 11 13 15 17 19 21
SP normalized
Items Normalized
Team AT / Company AT
Correlation: 0,75
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Story point velocity
Item velocity
Correlation:
0,70
Team 2 / Company RF
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Story point velocity
Item velocity
Correlation: 0,75
Team 1 / Company RF

The Data
• What does this mean:
– Q1: With so high
correlation it is likely that
both metrics represent a
signal of the same
underlying information.
– Q2: The normalized data
has similar value of
Standard Variation (equaly
stable). No significant
difference in stability
– Q3: They seem to measure
the same thing so...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
10
20
30
40
50
60
70
80
# of stories
Sum of SP
Team AS /
Company L
Correlation
(w/out
normalization):
0,92

We should analyse the claims
that justify Story Points...

• Claim 1: allows us to change our mind whenever
we have new information about a story
• Claim 2: works for both epics and smaller stories
• Claim 3: doesn’t take a lot of time
• Claim 4: provides useful information about our
progress and the work remaining
• Claim 5: is tolerant of imprecision in the
estimates
• Claim 6: can be used to plan releases
• Source: Mike Cohn, User Stories Applied, page 87
More details at: http://bit.ly/ard2012_estimation

Claim 1: allows us to change our mind whenever
we have new information about a story
• No explanation about what this means in the
User Stories Applied book
• Measuring completed number of items allows
for immediate visibility of the impact of the
new items in the progress (project burndown)

Claim 2: works for both epics and
smaller stories
• Allowing for large estimates for items in the
backlog does help to account for the impact of
very large items by adding uncertainty.
• The same uncertainty exists in any way we may
use to measure progress. The fact is that we
don’t really know if an Epic (say 100 SPs) is really
equivalent to a similar size aggregate of User
Stories (say 100 times 1 SP story). Conclusion:
there is no significant added information by
classifying a story in a 100 SP category.

Claim 3: doesn’t take a lot of time
• Not my experience. Although some progress
has been done by people like Ken Power (at
Cisco) with the Silent Grouping technique, the
fact that we need such technique should
dispute any idea that estimating in SP’s
”doesn’t take a lot of time”
Silent Grouping technique: http://slidesha.re/AgileKonstanz_silentgrouping

Claim 4: provides useful information about our
progress and the work remaining
• This claim holds if, and only if you have estimated
all of your stories. Even the stories that will only
be developed a few months or even a year later
(for long project). This approach is not very
efficient (Claim 3).
• Basing your progress assessment on the Number
of Items completed in each Sprint is faster to
calculate (# of items in the PBL / velocity per
Sprint = number of Sprints left) and can be used
to provide critical information about project
progress. Example:

The example you are about to see is a real life
example. One where the data collected made a
big impact on an important business decision.
The names have been changed to protect the
innocent...

Sprint x
Project feature burndown
0
20
40
60
80
100
120
140
Sprints
NumberofFeaturesopen
n. items done in this sprint (Actual Velo
Projected Items still open at the end of
Actual items open at the end of the spr
(including results of removing items)
Pilot feature burndown
Pilot projection
Ideal Pilot burndown
Ideal Release Feature burndown
Evolution
of velocity
Start of
pilot/beta
Release
date
Start of
pilot/beta
Actual
progress
trend
What
progress
trend
should be
What
progress
trend
should be

Sprint x + 1
0
20
40
60
80
100
120
140
160
180
x
x+1
x+2
x+3
x+4
x+5
x+6
x+7
x+8
x+9
x+10
x+11
x+12
Sprints
n. items done in this sprint (A
Velocity)
Projected Items still open at
the sprint
Actual items open at the end
(including results of removing
Pilot projection
Ideal Release Feature burnd

The Velocity Bet
Their history stated the following velocity evolution in the last 3
sprints: 1 8 8
They were learning the product and area in the first few sprints,
which allowed for a ”getting-up-to-speed” assumption. Additionally
they had committed to 15 items in the Sprint planning meeting.
The product Owner stated that the R&D team would start doing 15
items per sprint (which would help them meet the goal of releasing
the pilot and the release on time.)
What was the result after the sprint?

Sprint x + 2
0
20
40
60
80
100
120
140
160
180
February
M
arch
A
pril
M
ay
June
JulyA
ugust
S
eptem
berO
ctober
N
ovem
ber
D
ecem
berJanuaryFebruary
M
arch
Sprints
n. items done in this spri
Velocity)
Projected Items still ope
the sprint
Actual items open at the
sprint (including results o
items)
Pilot projection
Ideal Release Feature b
They did 10 items. A 20%
increase in velocity.

We release Stories/Backlog
items, not story points...

The Number of Items technique in a
nutshell
• When doing Backlog Grooming or Sprint
Planning just ask: can this Story be completed
in a Sprint by one person? If not, break the
story down!
• For large projects use a further level of
abstraction: Stories fit into Sprints, therefore
Epics fit into meta-Sprints (for example: meta-
Sprint = 4 Sprints)

Why it works
• By continuously harmonizing the size fo the
Stories/Epics you are creating a distribution of
the sizes around the median:

• Assuming a normal distribution of the size of
the stories means that you can assume that
for the purposes of looking at the long term
estimation/progress of the project, you can
assume that all stories are the same size, and
can therefore measure progress by
measuring the number of items completed
per Sprint.

Q4: Which ”metric” is more
accurate when compared to
what actually happened in the
project?

Which metric predicted most
accurately the output of the
whole project?
a) After only the first 3 Sprints
b) After only the first 5 Sprints

Disclaimer...
This is only one project! We
need more data to verify or
contradict these findings...

After just 3 sprints
# of Stories predictive powerStory Points predictive power
The true
oputput: 349,5
SPs completed
The predicted
oputput: 418
SPs completed
+20%
The true
oputput: 228
Stories
The predicted
oputput: 220
Stories
-4%!

After just 5 sprints
# of Stories predictive powerStory Points predictive power
The true
oputput: 349,5
SPs completed
The predicted
oputput: 396
SPs completed
+13%
The true
oputput: 228
Stories
The predicted
oputput: 220
Stories
-4%!

Click here!
Sign-up and get the paper today!
Sign-up and receive this paper which
explains why we need #NoEstimates
and how to get started!
Includes:
• Why estimates should not be used,
and how they fail
• An example of how #NoEstimates
can reach a 4% accuracy to actuals
• How to apply #NoEstimates:
Vasco’s recipe!

Story points considered harmful - or why the future of estimation is really in our past

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (8)

En vedette

En vedette (8)

Similaire à Story points considered harmful - or why the future of estimation is really in our past

Similaire à Story points considered harmful - or why the future of estimation is really in our past (20)

Plus de Vasco Duarte

Plus de Vasco Duarte (19)

Dernier

Dernier (20)

Story points considered harmful - or why the future of estimation is really in our past

Notes de l'éditeur