michael.gr: Testana: A better way of running tests

Abstract:

A software testing tool is presented, which uses dependency analysis to greatly optimize the process of running tests.

(Useful pre-reading: About these papers)

What is Testana?

Testana is a console application that you launch when you want to run your tests. So far, I have created two implementations of Testana:

A Java implementation, supporting JUnit 4 annotations in Maven-based projects.
A C# implementation, supporting MSTest attributes in MSBuild solutions.

What does Testana achieve that existing tools do not?

Testana runs only the subset of test modules that actually need to run, based on the last successful run time of each test module, and whether it, or any of its dependencies, have changed.
Testana always considers all test modules in your entire code base as candidates for running, so you never have to manually select a subset of the tests to run in the interest of saving time.
Testana runs test modules by order of dependency, meaning that tests of modules that have no dependencies run first, tests of modules that depend on those run next, and so on.
Testana runs test methods in Natural Method Order, which is the order in which the methods appear in the source file. (This is the norm in C#, but not in Java, where extra measures are necessary to accomplish.)
Testana runs test methods in ascending order of inheritance, meaning that test methods in the base-most test class run first, and test methods in derived test classes run afterwards.
Testana discovers and reports mistakes in the formulation of test methods, instead of ignoring the mistakes, which is what most other test frameworks do. (Silent failure.)
Testana does not catch any exceptions when debugging, thus allowing your debugger to stop on the source line that threw the exception. (Testana will catch and report exceptions when not debugging, as the case is when running on a continuous build server.)

How does Testana work?

Testana begins by constructing the dependency graph of your software system. Since this process is expensive, Testana cashes the dependency graph in a file, and recalculates it only when the structure of the system changes. The cache is stored in a text file, which is located at the root of the source tree, and is meant to be excluded from source control.

Then:

Testana locates the modules that depend on nothing else within the system, and runs the tests of those modules.
Once these tests are done, Testana finds modules that depend only on modules that have already been tested, and runs their tests.
Testana keeps repeating the previous step, until all tests have been run.

Testana keeps a diary where it records the last successful run time of each test module. This diary is also stored in a text file, which is also located at the root of the source tree, and is also meant to be excluded from source control.

Next time Testana runs, it considers the last successful run time of each test module, versus the last modification time of that module and its dependencies. Testana then refrains from running the test module if neither it, nor any of its dependencies, have changed.

Why should I care about running only the tests that need to run?

The usual situation with large code bases is that tests take an unreasonably long time to run, so developers tend to take shortcuts in running them. One approach some developers take is that they simply commit code without running any tests, leaving it up to the continuous build server to run the tests and notify them of any test failures. This has multiple disadvantages:

It causes repeated interruptions in the workflow, due to the slow turnaround of the continuous build, which is often of the order of an hour, sometimes longer, and even in the fastest cases, always longer than a normal person's attention span. (This is so by definition; if it was not, then there would be no problem with quickly running all tests locally before committing.)
The failed tests require additional commits to fix, and each commit requires a meaningful commit message, which increases the overall level of bureaucracy in the development process.
The commit history becomes bloated with commits that were done in vain and should never be checked out because they contain bugs that are fixed in later commits.
Untested commits that contain bugs are regularly being made to branches in the repository; these bugs stay there while the continuous build does its thing; eventually the tests fail, the developers take notice, and commit fixes. This whole process takes time, during which other unsuspecting developers might pull from those branches, thus receiving the bugs. Kind of like Continuous Infection.

Testana solves the above problems by figuring out which tests need to run based on what has changed, and only running those tests. This cuts down the time it takes to run tests to a tiny fraction of what it is when blindly running all tests, which means that running the tests now becomes piece of cake and can usually be done real quick before committing, as it should.

Also, running the tests real quick right after each pull from source control now becomes feasible, so a developer can avoid starting to work on source code on which the tests are failing. (How often have you found yourself in a situation where you pull from source control, change something, run the tests, the tests fail, and you are now wondering whether they fail due to the changes you just made, or due to changes you pulled from the repository?)

Why should I care about considering all test modules in my entire code base as candidates for running?

Another approach taken by some developers, in the interest of saving time, is manually choosing which tests to run, based on their knowledge of what may have been affected by the changes they just made.

One simple reason why this is problematic is that it requires cognitive effort to figure out which tests might need running, and manual work to launch them individually; it is not as easy as pressing a single button that stands for "run whatever tests need to run in response to the changes I just made."
A far bigger problem is that in manually selecting the tests to run, the developer is making assumptions about the dependencies of the code that they have modified. In complex systems, dependency graphs can be difficult to grasp, and as systems evolve, the dependencies keep changing. This often leads to situations where no single developer in the house has a complete grasp of the dependency graph of the entire system. Unfortunately, unknown or not-fully-understood dependencies are a major source of bugs, and yet by hand-selecting what to test based on our assumptions about the dependencies, it is precisely the not-fully-understood dependencies that are likely to not be tested. This is a recipe for disaster.

Testana solves the above problems by always considering all test modules as candidates for running. It does not hurt to do that, because the tests that do not actually need to run will not be run by Testana anyway.

Why should I care about running test modules in order of dependency?

Existing test frameworks do not do anything intelligent in the direction of automatically figuring out some order of test execution that has any purpose or merit. The order tends to be arbitrary, and not configurable. In the best case it is alphabetic, but this is still problematic, because our criteria for naming test modules usually have nothing to do with the order in which we would like to see them executing.

For example, it is very common for a code base to contain a module called "Utilities", which most other modules depend on; Since it is a highly dependent-upon module, it should be tested first, but since its name begins with a "U", it tends to be tested last.

Testana executes test modules in order of dependency. This means that modules with no dependencies are tested first, modules that depend upon them are tested next, and so on until everything has been tested. Thus, the first test failure is guaranteed to point at the most fundamental problem; there is no need to look further down in case some other test failure indicates a more fundamental problem. Subsequently, Testana stops executing tests after the first failure, so it saves even more time.

For more information about this way of testing, see michael.gr - Incremental Integration Testing.

Why should I care about running test methods in natural order?

Test frameworks in the C# world tend to run test methods in natural order, which is great, but in the Java world, the JUnit framework runs test methods in random order, which is at best useless, and arguably treacherous.

One reason for wanting the test methods to run in the order in which they appear in the source file is because we usually test fundamental operations of our software before we test operations that depend upon them. (Note: it is the operations of the components under test that depend upon each other, not the tests themselves that depend upon each other!) So, if a fundamental operation fails, we want that to be the very first error that gets reported.

Tests of operations that rely upon an operation whose test has failed might as well be skipped, because they can all be expected to fail. Reporting those failures before the failure of the more fundamental operation is an act of sabotage against the developer, because it is sending us looking for problems in places where there are no problems to be found, and it is making it more difficult for us to locate the real problem, which typically lies in the test that failed first in the source file.

To give an example, suppose I am developing some kind of data store with insert and find functionality, and I am writing tests to make sure this functionality works. The find-item-in-store test necessarily involves insertion before finding, so I am likely to precede it with an insert-item-to-store test. In such a scenario, it is counter-productive to be told that my find-item-in-store test failed, sending me to troubleshoot the find function, and only later to be told that my insert-item-to-store test failed, which obviously means that it was in fact the insert function that needed troubleshooting; if insert-item-to-store fails, it is game over; no other operation on this store can possibly succeed, so there is no point in running any other tests on it, just as there is no point in beating a dead horse.

Finally, another very simple, very straightforward, and very important reason for wanting the test methods to be executed in natural order is because seeing the test methods listed in any other order is brainfuck.

A related rant can be found here: michael.gr - On JUnit's random order of test method execution.

Why should I care for running test methods in ascending order of inheritance?

This feature of Testana might be irrelevant to you if you never use inheritance in test classes, but I do, and I consider it very important. I also consider the typical behavior of existing test frameworks on this matter very annoying, because they tend to do the exact opposite of what is useful.

Inheritance in test classes can help to achieve great code coverage while reducing the total amount of test code. Suppose you have a collection hierarchy to test: you have an ArrayList class and a HashSet class, and you also have their corresponding test classes: ArrayListTest and HashSetTest. Now, both ArrayList and HashSet inherit from Collection, which means that lots of tests are going to be identical between ArrayListTest and HashSetTest. One way to eliminate duplication is to have a CollectionTest abstract base class, which tests only Collection methods, and then have both ArrayListTest and HashSetTest inherit from CollectionTest and provide additional tests for functionality that is specific to ArrayList and HashSet respectively. Under this scenario, when ArrayListTest or HashSetTest runs, we want the methods of CollectionTest to be executed first, because they are testing the fundamental (more general) functionality.

To make the example more specific, CollectionTest is likely to add an item to the collection and then check whether the collection contains the item. If this test fails, there is absolutely no point in proceeding with tests of ArrayListTest which will, for example, add multiple items to the collection and check to make sure that IndexOf() returns the right results.

Again, existing test frameworks tend to handle this in a way which is exactly the opposite of what we would want: they execute the descendant (more specialized) methods first, and the ancestor (more general) methods last.

Testana corrects this by executing ancestor methods first, descendant methods last.

What additional error checking does Testana perform?

While running tests, Testana will warn the programmer if it discovers any method that has been declared as a test method but fails to meet the requirements for a test method.

Usually, test frameworks require that a test method must be a public instance method, must accept no parameters, and must return nothing; however, when these frameworks encounter a method that is declared as a test and yet fails to meet those requirements, (for example, a test method declared static,) they fail to report the mistake.

Testana does not fail to report such mistakes.

Can Testana be fooled by Inversion of Control?

No. In a scenario where class A receives and invokes interface I without having a dependency on class B which implements I, the test of A still has to instantiate both A and B in order to supply A with the I interface of B, so the test depends on both A and B, which means that Testana will run the test if there is a change in either A or B.

Can Testana be fooled by the use of mocks?

Yes, Testana can be fooled by mocks, because that is what mocks do: they make a mockery out of the software testing process. In a scenario where class A receives and invokes interface I without having a dependency on class B which implements I, and the test of A also refrains from depending on B by just mocking I, Testana will of course not run the test of A when there is a change in B. This, however, should not be a problem, because you should not be using mocks anyway; for more information, see michael.gr - If you are using mock objects you are doing it wrong.

Can Testana be fooled by the use of fakes?

No, as long as you do your testing properly. A test that utilizes a fake will be run by Testana only when there is a change in the fake, not when there is a change in the real thing; however, you should have a separate test which ensures that the behavior of the fake is identical to the behavior of the real thing in all aspects that matter. This test will be run by Testana when you modify either the fake, or the real thing, or both. Thus:

If you make a breaking change to the real thing, then your tests will show you that you need to make the corresponding change to the fake; the change in the fake will in turn cause Testana to run the tests that utilize the fake.
If you make a non-breaking change to the real thing, then the fake will remain unchanged, and this is what gives you the luxury of not having to re-run tests utilizing the fake when you make a change that only affects the real thing.

For more information, see michael.gr - Software Testing with Fakes instead of Mocks.

What about undiscoverable dependencies due to weak typing, the use of REST, etc?

The following "hip" and "trendy" practices of the modern day are not supported by Testana, and there is no plan to ever support them:

Squandering dependencies via weak typing.
Obscuring dependencies via duck-typing.
Denaturing dependencies via stringly-typing.
Disavowing dependencies via configuration files.
Abnegating dependencies via non-programmatic interfaces such as REST.
Fragmenting dependencies via cross-language invocations (following the polyglot craze.)

Seriously, stop all this fuckery and use a single, real programming language, (that is, a programming language with strong typing,) encode your dependencies via the type system, and everything will be fine. For more information, see michael.gr - On scripting languages.

How compatible is Testana with what I already have?

The Java implementation of Testana:

Works with maven projects (pom.xml files.)
Supports JUnit 4.

Supports only the basic, minimum viable subset of JUnit 4 functionality, namely the @Test, @Before, @After, and @Ignore annotations, without any parameters.

The C# implementation of Testana:

Works with MSBuild projects (.sln and .csproj files)
Supports MSTest.

Supports only the basic, minimum viable subset of MSTest functionality, namely the [TestClass], [TestMethod], [ClassInitialize], [ClassCleanup], and [Ignore] attributes, without any parameters.

Support for more languages, more project formats, more test frameworks, and more functionality may be added in the future.

How is it like using Testana?

You run Testana every time you want to run your tests. You launch it at the root of your source tree, without any command-line arguments, and its default behavior is to figure out everything by itself and do the right thing.

Note that the first time you run Testana, there may be a noticeable delay while information is being collected; the information is cached, so this delay will not be there next time you run Testana.

The first time you run Testana, it will run all tests.

If you immediately re-run Testana, it will not run any tests, because nothing will have changed.

If you touch one of your source files, build your project, and re-run Testana, it will only run tests that either directly or indirectly depend on the changed file.

If you run Testana with --help it will give you a rundown of the command-line arguments it supports.

Where can I find Testana?

The Java implementation of Testana is here: https://github.com/mikenakis/Public/tree/master/testana

The C# implementation of Testana is coming soon. (As soon as I turn it into an independent solution, because currently it is a project within a larger solution.)

Notes

In episode 167 of the Software Engineering Podcast (SE Radio 167: The History of JUnit and the Future of Testing with Kent Beck) at about 40':00'' Kent Beck says that recently failed tests have the highest probability of failing again in the near future, so he suggests using this statistical fact at as a heuristic for picking which tests to run first. Testana optimizes the testing process deterministically, so there is no need to resort to heuristics.

Cover image: The Testana logo, profile of a crash test dummy by michael.gr. Based on original work by Wes Breazell and Alexander Skowalsky. Used under CC BY License.

2024-10-28

Testana: A better way of running tests