michael.gr: Artificial Code Coverage

Abstract:

In this paper I put forth the proposition that contrary to popular belief, 100% code coverage can be a very advantageous thing to have, and I discuss a technique for achieving it without excessive effort.

(Useful pre-reading: About these papers)

The problem

Conventional wisdom says that 100% code coverage is unnecessary, or even undesirable, because achieving it requires an exceedingly large amount of effort not for the purpose of asserting correctness, but instead for the sole purpose of achieving coverage. In other words, it is often said that 100% code coverage has no business value.

Let me tell you why this is wrong, and why 100% code coverage can indeed be a very good thing to have.

If you don't have 100% code coverage, then by definition, you have some lower percentage, like 87.2%, or 94.5%. The remaining 12.8%, or 5.5% is uncovered. I call this the worrisome percentage.

As you keep working on your code base, the worrisome percentage fluctuates:

one day you might add a test for some code that was previously uncovered, so the worrisome percentage decreases;
another day you may add some code with no tests, so the percentage increases;
yet another day you may add some more code along with tests, so even though the number of uncovered lines has not changed, it now represents a smaller percentage;

...and it goes on like that.

If the worrisome percentage is high, then you know for sure that you are doing a bad job, but if it is low, it does not mean that you are doing a good job, because some very important functionality may be left uncovered, and you just do not know. To make matters worse, modern programming languages offer constructs that achieve great terseness of code, meaning that a few uncovered lines may represent a considerable amount of uncovered functionality.

So, each time you look at the worrisome percentage, you have to wonder what is in there: are all the important lines covered? are the uncovered lines okay to be left uncovered?

In order to answer this question, you have to go over every single line of code in the worrisome percentage, and examine it to determine whether it is okay that it is being left uncovered. What you find is, more often than not, the usual suspects:

Some `ToString()` function which is only used for diagnostics;
Some `Equals()` and `HashCode()` functions of some value type which does not currently happen to be used as a key in a hash-map;
Some `default` `switch` clause which can never be reached, and if it was to ever be reached it would throw;

...etc.

So, your curiosity is satisfied, your worries are allayed, and you go back to your usual software development tasks.

A couple of weeks later, the worrisome percentage has changed again, prompting the same question: what is being left uncovered now?

Each time you need to have this question answered, you have to re-examine every single line of code in the worrisome percentage. As you do this, you discover that in the vast majority of cases, the lines that you are examining now are the exact same lines that you were examining the previous time you were going through this excercise. After a while, this starts getting tedious. Eventually, you quit looking. Sooner or later, everyone in the shop quits looking.

The worrisome percentage has now become terra incognita: literally anything could be in there; nobody knows, and nobody wants to know, because finding out is such a dreary chore.

That is not a particularly nice situation to be in.

The solution

So, here is a radical proposition: If you always keep your code coverage at 100%, then the worrisome percentage is always zero, so there is nothing to worry about!

When the worrisome percentage is never zero, then no matter how it fluctuates, it never represents an appreciable change in the situation: it always goes from some non-zero number to some other non-zero number, meaning that we used to have some code uncovered, and we still have some code uncovered. No matter what happens, there is no actionable item.

On the other hand, if the worrisome percentage is normally zero, then each time it rises above zero it represents a definite change in the situation: you used to have everything covered, and now you have something uncovered. This signifies a clear call to action: the code that is now being left uncovered needs to be examined, and dealt with.

By dealing with uncovered code as soon as it gets introduced, you bring the worrisome percentage back to zero, thus achieving two things:

You ensure that next time the worrisome percentage becomes non-zero, it will represent a new call to action.
You never find yourself in the unpleasant situation of re-examining code that has been examined before; so, the examination does not feel like a dreary chore.

The conventional understanding of how to deal with uncovered code is to write a test for it, and that is why achieving 100% code coverage is regarded as onerous; however, there exist alternatives that are much easier. For any given piece of uncovered code, you have three options:

Option #1: Write a test for the code.

This is of course the highest quality option, but it does not always represent the best value for your money, and it is not even always possible. You only need to do it if the code is important enough to warrant testing, and you can only do it if the code is in fact testable. If you write a test, you can still minimize the effort of doing so, by utilizing certain techniques that I talk about in other posts, such as michael.gr - Audit Testing, michael.gr - Testing with Fakes instead of Mocks, and michael.gr - Incremental Integration Testing.
Option #2: Exclude the code from code coverage.

Code that is not testable, or not important enough to warrant testing, can be moved into a separate module which does not participate in coverage analysis. Alternatively, if your code coverage analysis tool supports it, you may be able to exclude individual methods without having to move them to another module. In the DotNet world, this can be accomplished by marking a method with the `ExcludeFromCodeCoverage` attribute, found in the `System.Diagnostics.CodeAnalysis` namespace. In the Java world, IntelliJ IDEA offers a setting for specifying what annotation we want to use for marking methods to be excluded from code coverage, so you can use any annotation you like. (See michael.gr - IntelliJ IDEA can now exclude methods from code coverage.) Various different code coverage analyzers support additional ways of excluding code from coverage.
Option #3: Artificially cover the code.

With the previous two options you should be able to bring the worrisome percentage down to a very small number, like 1 or 2 percent. What remains is code which should really be excluded from coverage, but it cannot, due to limitations in available tooling: although code coverage analyzers generally allow excluding entire functions from coverage analysis, they generally do not offer any means of excluding individual lines of code, such as the unreachable `default` clause of some `switch` statement. You can try moving that line into a separate function, and excluding that function, but you cannot exclude the call to that function, so the problem remains.

The solution in these cases is to cause the uncovered code to be invoked during testing, not in order to test it, but simply in order to have it covered. This might sound like cheating, but it is not, because the stated objective was not to test the code, it was to exclude it from coverage. You would have excluded that line from coverage if the tooling supported doing so, but since it does not, the next best thing, (and the only option you are left with,) is to artificially include it in the code coverage.

Here is a (hopefully exhaustive) list of all the different reasons due to which code might be left uncovered, and what to do in each case:

The code should really be covered, but you forgot to write tests for it, or you have plans to write tests in the future.

Go with Option #1: write tests for it. Not in the future, now.
The code is not used and there is no plan to use it.

This is presumably code which exists for historical reasons, or for reference, or because it took some effort to write it and you do not want to admit that the effort was a waste by throwing away the code.

Go with Option #2 and exclude it from coverage.
The code is only used for diagnostics.

The prime example of this is `ToString()` methods that are not normally invoked in a production environment, but give informative descriptions of our objects while debugging.

Go with Option #2: Exclude such methods from coverage.
The code is not normally reachable, but it is there in case something unexpected happens.

The prime example of this is `switch` statements that cover all possible cases and yet also contain a `default` clause just in case an unexpected value somehow manages to creep in.

Go with Option #3: Artificially cover such code. This may require a bit of refactoring to make it easier to cause the problematic `switch` statement to be invoked with an invalid value. The code most likely throws, so catch the exception and swallow it. You can also assert that the expected exception was thrown, in which case it becomes more like Option #1: a test.
The code is reachable but not currently being reached.

This is code which is necessary for completeness, and it just so happens that it is not currently being used, but nothing prevents it from being used at any moment. A prime example of this is the `Equals()` and `HashCode()` functions of value types: without those functions, a value type is incomplete; however, if the value type does not currently happen to be used as a key in a hash-map, then those functions are almost certainly unused.

In this case, you can go with any of the three options:
- You can go with Option #1 and write a proper test.
- You can go with Option #2 and exclude the code.
- You can go with Option #3 and artificially cover the code.
The code is not important enough to have a test for it.

Say you have a function which takes a tree data structure and converts it to text using box-drawing characters (W) so as to be able to print it nicely as a tree on the console. Since the function receives text and emits text, it is certainly testable, but is it really worth testing? If it ever draws something wrongly, you will probably notice, and if you do not notice, then maybe it did not matter anyway.

In this case you can go either with Option #2 and exclude such functions, or Option #3 and artificially cover them.
The code is literally or practically untestable.

For example:
- If your application has a Graphical User Interface (GUI), you can write automated tests for all of your application logic, but the only practical way to ascertain the correctness of the GUI is to have human eyes staring at the screen. (There exist tools for testing GUIs, but I assess them as acutely impractical.)
- If your application controls some hardware, you may have a hardware abstraction layer with two implementations, one which emulates the hardware, and one which interacts with the actual hardware. The emulator will enable you to test all of your application logic without having the actual hardware in place; however, the implementation which interacts with the actual hardware is practically untestable by software alone.
- If you have a piece of code that queries the endianness of the hardware architecture and operates slightly differently depending on it, the only path you can truly cover is the one for the endianness of the hardware architecture you are actually using. (You can fake the endianness query, and pretend that your hardware has the opposite endianness, but you still have no guarantees as to whether the bit-juggling that you do in that path is right for the opposite endianness.)
In all of the above cases, and in all similar cases, we have no option but #2: exclude the code from coverage.

Conclusion

If testing has business value, then 100% code coverage has business value, too.

A code coverage percentage of 100% is very useful, not for bragging, but for maintaining certainty that everything that ought to be tested is in fact being tested.

Achieving a code coverage percentage of 100% does require some effort, but with techniques such as Artificial Coverage the effort can be reduced to manageable levels.

Ideally, Artificial Coverage should never be necessary, but it is a practical workaround for the inability of coverage tools to exclude individual lines of code from analysis.

Cover image by Patrick Robert Doyle from Unsplash

2024-03-29

Artificial Code Coverage

Abstract:

The problem

The solution

Conclusion

No comments:

Post a Comment