I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 635 posts at DZone. You can read more from them at their website. View Full User Profile

Battle with legacy: reducing ifs

08.14.2013
| 2915 views |
  • submit to reddit

Last week I navigated a 1500-line long function, which was revealed to be the culprite of a bug affecting a few countries which were recently switched from an older payment flow to a new, simplified one.

The function itself was littered with if() (and else and else if) statements, which we can easily smell as a problem to solve. But in this practical case, why do conditionals raise my concerns?

Some of the reasons are specifically related to legacy code.

Lots of logs

Ever-branching code is a phenomenon that causes the birth of extensive logging in a project.

Since you can never be sure what the code is doing:

if (...) {
  // 1
} else if (...) {
  // 2
  if (...) {
  // 3
  }
}

Did execution pass from 1, 2, or 3? Or in which combinations of them?

Given the difficulty of evaluating execution by hand, the only way to know in which branches an execution has entered is to log the hell out of this code:

if (...) {
  $this->log("1 is ready");
} else if (...) {
  $this->log("2 is ready");
  if (...) {
  $this->log("Entering 3");
  }
}

This logging has an overhead on performance, but over everything else on the understandability of the code. It's hard enough to follow a maze without having to strip the log calls that it contains and that separate the interesting pieces of functionality.

Difficulty to test

How many execution path the code shown previously can take?

  • passing only from 1
  • passing only from 2
  • passing from 2 and from 3
  • not passing from any of these

To these test cases that need to be exercised by 4 different test scenarios, add early returns to give the code the capability to execute just some of the code in a branch:

    if (...) {
        $this->log("Entering 3a");
            if (...) {
                return;
            }
        $this->log("Entering 3a");
    }

As the length of these procedures increases, it becomes impossible to test all the paths. But while segregating responsibilities in smaller pieces (such as objects or closures) creates a clear protocol between them, long procedures driven by conditionals have a big state space:

$answer = 42;
// 1500 lines of code
if ($answer === ..) {
  // still using that variable?
}

This enormous state space, containing all globals and local variables of the function, is the spine of the procedure and lets the execution use many combinations of the variables instead of encapsulating them into different places.

Difficult to extend

Finally, the ultimate problem with castles of conditionals is that they become impossible to extend. When you encounter a poorly tested castle, you tend to shy away from modifying it and assign responsibilities to other parts of the code. This can be a sensible strategy if you intend to absorb new requirements in different components and crush the long procedures later, but can perpetuate the problem as new legacy code is created because of the movement of responsibilities away from the right place for them to be handle (given the current requirements; there is no absolutely right way to implement a feature).

For example, consider this example with encrypted phone numbers: my current application is able to handle them in many different countries. The long, if-stained procedure would be the right place to handle the translation from an external crypted value to an internal standard database row.

However, since lions and other beasts were hidden inside the long procedure, some country-specific drivers had to become more complex to do the translation themselves (inserting the encrypted value in a database or searching it in case it was already known). This proved to be a problem, because it made the country-specfic code talk to a database it shouldn't know about: legacy code triggering the creation of other legacy code to compensare for its defects.

Conclusion

The combination of these factors caused the bug: the new payment flow was tested as a black box, but since this giant function was at the heart of legacy code it was left untouched when the first end-to-end scenarios became green. The specifications couldn't know about the complexity that has been hidden in that single place, because it grew to that size before the introduction of these tests. Discovering this bug in the test suite would have required an enormous amount of tests, to cover all the execution paths that the function could take.

Nothing new under the sun: legacy code has some good functional properties (it works for the existing use cases) but very bad non-functional ones: extensibility is forbidden due to regressions and the little understandability of the state space of the code, while testability costs too much. For every castle of conditionals a moment comes when it crumbles...

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)