Kelly Waters is Web Technology Director for IPC Media, one of the UK's largest publishers of consumer magazines and web sites. Kelly has been in software development for about 25 years and is a well-known narrator of agile development principles and practices, as a result of his popular blog 'Agile Software Development Made Easy!' (www.agile-software-development.com). Kelly is a DZone MVB and is not an employee of DZone and has posted 40 posts at DZone. You can read more from them at their website. View Full User Profile
I've written quite a bit about various aspects of estimating in agile software development. I think it's about time I joined up the dots...
The Product Backlog is a feature list. Or a list of User Stories if that's your approach. Either way, it is a simple list of things that are of value to a user - not technical tasks - and they are written in business language, so they can be prioritised by the Product Owner.
There are no details about each feature until it is ready to be
developed, just a basic description and maybe a few notes if applicable.
'POINTS MAKE SIZES'
Each item on the Product Backlog is given a points value to represent its size. Size is an intuitive mixture of effort and complexity. It's meant to represent 'how big it is'.
I like to use the Fibonacci number sequence for the points values. Fibonacci goes 1, 2, 3, 5, 8, 13 - where each number is the sum of the previous two. This builds a natural distribution curve into
the estimates. The bigger something's size, the less precise the
estimate can be, which is reflected in the widening range between the
numbers as they get bigger.
Points are an abstract number. They do not convert to a unit of time. They are simply a *relative*
indication of size. In other words, a 2 is about twice the size of a 1.
A 5 is bigger than a 3, but smaller than an 8. Developers find it hard
to estimate accurately in hours or days when they don't yet know the
details of the requirements and what the solution involves. But it's
easier to compare the size of two features relative to each other.
ESTIMATE AS A TEAM
The points should be assigned to each backog item as a team. The collective intelligence
- or wisdom of crowds - is an important way to apply multiple people's
experience to the estimate. If you have a very big team, you can split
up so it's quicker to do this, but the estimating groups should ideally
involve at least 3 people, so you dont just get two opposing opinions.
Planning Poker is a fun technique to facilitate rapid estimating
as a team. The team discusses a feature verbally to understand more
about what it entails and how it might be done. Each team member writes
what they think its size is (in points) on a card. All team members reveal their card at the same time.
Differences in opinion are used to provoke further discussion. Maybe
one person saw risks and complexity that others didn't. Maybe another
persion saw a simpler solution. The team re-votes until there is a
concensus, then moves on to the next item.
DONE MEANS DONE
During the Sprint, or iteration, the team only counts something as Done when it is completely done,
i.e. tested and signed off by the Product Owner. At that time, and only
at that time, the team scores the points for the item.
The team shows its commitment and daily progress on a graph, so it is measurable and visible at a glance. This is called a Burndown Chart.
The burndown shows the total number of points committed to,
depreciating over time to the end of the Sprint. This is the target
line. It also shows the actual number of points scored each day - i.e.
the sum of points for all items that are 100% done and signed off so
far. The team plots this each day before their daily stand-up meeting.
When the actual line is above the target line, the team is behind. When
it's below, they're ahead.
At the end of the Sprint, the team's score is called their Velocity.
The team tracks its Velocity over time. This allows the team to see if
it's improving. Of course at some point it will stabilise, if the team
is stable. If not, this is an issue in itself. When Velocity is
relatively stable - in my experience that will be after 3 or 4 Sprints
- it can be reliably used to decide how much (i.e. how many points) the
team should commit to in the next Sprint.
RELIABILITY / PREDICTABILITY
As a result, the team can measure how reliable - or how predictable
- they are. The metric for this is Velocity (points scored) as a
percentage of points planned. As Velocity stabilises, the team's Reliability will
get better, and the team will be better at predicting what they can
deliver. Ironically, the team doesn't need to get better at estimating
to get better at delivering on their commitments. Even if they are
terrible at estimating, as long as they are consistently terrible, with this method they will still get better at predicting what they can deliver.
POINTS VERSUS TIME
One of the benefits of points is that it does not relate to time.
Resist the temptation to convert it. If a team plans on 100 points and
delivers 50, can you imagine telling your stakeholders that you are
only planning future Sprints for half the team's time. If a team
commits to 100 points and delivers 150, imagine telling the team you're
planning on doing 60 hours each per week. It just doesn't work. Points are not a measure of time.
They are abstract, relative sizes, and a measure of how much can be
delivered. That's why it works. It works because the team can adjust
its commitment based on what its track record shows it can usually deliver.
This does not measure a team's productivity.
Velocity does tell you if a team is getting more or less productive.
But you can't really use Velocity to compare the productivity of two
teams, as their circumstances are different.
And you can't use it to determine whether a team's Velocity is as high
as it should be. For this, you still need to use your judgement, based
on previous experience and taking into account many subjective factors.
PLAYING THE SYSTEM
Using these two metrics - Velocity and Reliability -
it's hard to cheat the system. If a team commits low, they acheive
Reliability but Velocity goes down. If a team commits too high, their
Velocity goes up but their Reliability goes down. This is like the balanced scorecard concept. The metrics are deliberately measuring opposing things, so they can't easily be played.