Removing the bubbles: solving bottlenecks in software product development
A challenge with software product development is visualising the work so that you can spot where there are delays in the process of converting ideas from “concept to cash”. This post shows how a cumulative flow diagram helped identify a pattern of queues over time. Removing these queues had many benefits such as fewer errors, increased team communication and improved team capacity.
Make the work visible
The first step is making the work visible. In knowledge work, such as software development, it is difficult to see the work being done, which is why a visualisation approach such as kanban can be so useful. Here’s a view of a kanban board from a previous client team:
The kanban board is useful for a “moment in time” view, but it’s not possible to easily see patterns that might develop over time. Looking at the kanban board on a particular day doesn’t make it easy to answer questions like these:
- How long have these work items been waiting in this column (stage)?
- How long does it usually take for work items in this stage of the process to complete?”
- How often do we see queues in this step? How long do they last for?
- Are these queues a special event or do they happen regularly (touching on the difference between common and special cause I’ve mention in a previous blog)
To find these answers and look more clearly for patterns over time we built a cumulative flow diagram (CFD, also called a ‘finger chart’) by counting the number of post-it notes in each stage (column) in the team’s process after each daily stand-up. Unlike my previous post on using three forks and a hand-drawn chart to help a team improve in this case we used an Excel spread sheet.
Visualise the work over time to better understand queues (‘bubbles’)
The cumulative flow diagram for this team helped make visible that there were consistent queues of work in the functional testing and acceptance testing processes over time. These queues are visible as “bubbles” that develop in the cumulative flow diagram. See the the highlighted in orange and red stages below (click the image for a larger version).
Do the detective work necessary to understand what causes the queues (‘bubbles’)
Around two-thirds of the way through the above chart (which covered about 36 weeks) we decided to focus on studying what was causing the queues to develop in functional and acceptance testing.
The functional testing involved someone other than the person who developed the functionality (user story) validating that it worked functionally (there were no obvious errors). Once functional testing was complete then the acceptance testing stage was performed by a business analyst or the product manager.
The team were releasing to production every second Wednesday. On the middle Wednesday the person who did the functional testing switched to doing the integration testing (ensuring the the features which were created as a package to go to production worked individually and combined, as well as running a set of manual regression test scripts to ensure that the new functionality hadn’t had any impact on the rest of the website). During the week spent on Integration testing, no functional testing was done, which we believed was the cause of the queues or orange bubbles on the chart.
Creating a new policy to reduce the queues (‘bubbles’)
We sat down with the person who performed the Functional and Integration Testing and mapped out the schedule of their work across the fortnight between releases (see the hand-drawn diagram we came up with below).
We also mapped out a new “policy” that described what the person doing testing did for for the week spent integration testing:
While performing the Integration Testing in the week before the release, if there are any work items in the Functional Testing column, spend up to an hour each day doing them.
We experimented with the new policy for the last third of the cumulative flow chart. The cumulative flow diagram showed that the queue (bubble) in the Functional Testing (orange) step virtually disappeared, as did the queue in the Acceptance Testing (red) stage. The CFD not only highlighted the initial problem, but it also validated the experimental change we made in policy resulted in an improvement (it allowed us to answer the critical question – “did the change we made to our process result in an improvement?”)
It’s the system!
This example is a demonstration of how changing the way the work is structured can produce improvements without having to change the work that team members were doing. This example shows that the queues caused by the way the work was structured (e.g. the system we had designed) rather than the work of the team members. It speaks to Deming’s ‘provocation’ that “95% of the variation [in how long the work takes] is due to the system and not the individuals”.
There were many benefits to the changes that we made above:
- Removing the queue in functional testing meant that if a problem was found then the developer got faster feedback. Getting feedback faster reduced the time it took a developer to “get their head back into the issue” and fix the problems. It also improved the communication between members of the team – the developers were more likely to speak to the person who did test at stand-up about the work that was coming because they new it would be tested quickly, rather than potentially sitting in a queue waiting for a week.
- By reducing the bottleneck in Functional Testing also reduced the same bottleneck in Acceptance Testing.
- The reduced “thrashing” from having issues discovered close to the release date meant the team’s capacity to do work increased.
- As there were fewer queues it reduced the pressure on team members, helping them feel less rushed which improved the quality of life for the team, reduced “rushing” leading to better quality and team morale.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)