2024-04-01

Audit Testing

Abstract

An automated software testing technique is presented which spares us from having to stipulate our expectations in test code, and from having to go fixing test code each time our expectations change.

(Useful pre-reading: About these papers)

The Problem

The most common scenario in automated software testing is ensuring that given specific input, a component-under-test produces expected output. The conventional way of achieving this is by feeding the component-under-test with a set of predetermined parameters, obtaining the output of the component-under-test, comparing the output against an instance of known-good output which has been hard-coded within the test, and failing the test if the two are not equal.

This approach works, but it is inefficient, because during the development and evolution of a software system we often make changes to the production code fully anticipating the output of certain components to change. Unfortunately, each time we do this, the tests fail, because they are still expecting the old output. So, each change in the production code must be followed by a round of fixing tests to make them pass. This imposes a considerable burden on the software development process.

Note that under Test-Driven Development things are not any better: first we modify the tests to start expecting the new output, then we observe them fail, then we modify the components to produce the new output, then we watch the tests pass. We still have to stipulate our expectations in test code, and we still have to change test code each time our expectations change, which is inefficient.

Audit Testing is a technique for automated software testing which aims to correct this.

The Solution

Under Audit Testing, the assertions that verify the correctness of the output of the component-under-test are abolished, and replaced with code that simply saves the output to a text file. This file is known as the Audit File.

The test may still fail if the component-under-test encounters an error while producing output, in which case we follow a conventional test-fix-repeat workflow, but if the component-under-test manages to produce output, then the output is saved in the Audit File and the test completes successfully without examining it.

The trick is that the Audit File is saved right next to the source code file of the test, which means that it is kept under Version Control. In the most common case, each test run produces the exact same audit output as the previous run, so nothing changes, meaning that all is good. If a test run produces different audit output from a previous test run, then the tooling alerts the developer to that effect, and the Version Control System additionally indicates that the Audit File has been modified and is in need of committing. Thus, the developer cannot fail to notice that the audit output has changed.

The developer can then utilize the "Compare with unmodified" feature of the Version Control System to see the differences between the audit output that was produced by the modified code, and the audit output of the last known-good test run. By visually inspecting these differences, the developer can decide whether they are as expected or not, according to the changes they made in the code.

  • If the observed differences are not as expected, then the developer needs to keep working on their code until they are.
  • If the observed differences are as expected, then the developer can simply commit the new code, along with the new Audit File, and they are done.

This way, we eliminate the following burdens:

  • Having to hard-code into the tests the output expected from the component-under-test.
  • Having to write code in each test which asserts that the output of the component-under-test matches the expected output.
  • Having to go fixing test code each time there is a (fully expected) change in the output of the component-under-test.

The eliminated burdens are traded for the following new responsibilities:

  • The output of the component-under-test must be converted to text and written to an audit file.
  • When the tooling or the version control system alerts us that an audit file has changed, the differences must be reviewed, and a decision must be made as to whether they are as expected or not.

This represents a considerable optimization of the software development process.

Note that the arrangement is also convenient for the reviewer, who can see both the changes in the code and the resulting changes in the Audit Files.

As an added safety measure, the continuous build pipeline can deliberately fail the tests if an unclean working copy is detected after running the tests, because that would mean that the tests produced different results from what was expected, or that someone failed to commit some updated audit files.

Non-deterministic noise reduction

For Audit Testing to work, our tests must be completely free from any sources of non-determinism, otherwise the Audit Files will be noisy, meaning that they will be exhibiting spurious differences from test run to test run.

Thus, the following known best practices for testing are not just "good to know" anymore, they must be followed thoroughly and unfailingly:

  • Never use wall-clock time in tests; always fake the clock, making it start from some arbitrary fixed origin and incrementing by a fixed amount each time it is queried.
  • Never use random numbers or any other construct that utilizes them; if randomness is necessary in some scenario, then fake it using a pseudo-random number generator seeded with a known fixed value. If for some reason you have to use GUIDs/UUIDs, make sure to fake every single instance of them using a deterministic generator.
  • Never allow any multi-threading during testing; all components must be tested while running strictly single-threaded, or at the very least multi-threaded but in lock-step fashion.
  • Never allow any external factors such as file creation times, IP addresses resolved from DNS, etc. to enter into the tests. Fake your file-system; fake The Internet if necessary. For more information about faking stuff, see michael.gr - Software Testing with Fakes instead of Mocks.

In short, anything that would cause flakiness in software tests will cause noisiness in Audit Testing.

Deterministic noise reduction

Additionally, it is a good idea to eliminate sources of even deterministic noise in the tests, so that when there are expected differences in the output, they are not more numerous than expected. For example:

  • Replace plain hash maps with sorted or ordered hash maps. This is useful because a single key addition or removal may cause a plain hash map to start enumerating some or all of its keys in a different order.
  • When sorting data, use as many sorting keys as necessary to give every item a specific unique order. Introduce additional sorting keys if necessary. This is useful because the sorting order of items with identical keys is undefined, so the addition or removal of an item can cause other items with the same sorting key to be arbitrarily rearranged. 

Deterministic noise reduction aims to ensure that for every unique mutation of the test data we see a corresponding unique change in the audit output, instead of a large number of irrelevant changes, which may cause the single change that matters to get lost in the noise. This makes it easier to determine that the modifications we made to the code have exactly the intended consequences and not any unintended consequences.

Note that in some cases, deterministic noise reduction can be implemented in the tests rather than in the production code.  For example, instead of replacing a plain hash map with an ordered hash map in production code, our test can obtain the contents of the plain map and sort them before writing them to the audit file.  However, this may not be possible in cases where the hash map is several transformations away from the auditing, so replacing a plain hash map with an ordered hash map is sometimes necessary in production code.

Deterministic noise reduction in production code can be either always enabled, or only enabled during testing. The most performant choice is to only have it enabled during testing, but the safest choice is to have it always enabled.

Failure Testing

Failure Testing is the practice of deliberately supplying the component-under-test with invalid input and ensuring that the component-under-test detects the error and throws an appropriate exception. Such scenarios can leverage Audit Testing by simply catching exceptions and serializing them to the Audit File.

Applicability

Audit Testing is most readily useful when the Component Under Test produces results as text, or results that are directly translatable to text. With a bit of effort, any kind of output can be converted to text, so Audit Testing is universally applicable.

Must Audit Files be committed?

It is in theory possible to refrain from storing Audit Files in the source code repository, but doing so would have the following disadvantages:

  • It would deprive the code reviewer from the convenience of being able to see not only the changes in the code, but also the differences that these changes have introduced in the audit output of the test.
  • It would require the developer to always remember to immediately run the tests each time they pull from the source code repository, so as to have the unmodified Audit Files produced locally, before proceeding to make modifications to the code which would further modify the Audit Files.
  • It would make it more difficult for the developer to take notice when the Audit Files change.
  • It would make it more difficult for the developer to see diffs between the modified Audit Files and the unmodified ones.

Of course all of this could be taken care of with some extra tooling. What remains to be seen is whether the effort of developing such tooling can be justified by the mere benefit of not having to store Audit Files in the source code repository.

Conclusion

Audit Testing is a universally applicable technique for automated software testing which can significantly reduce the effort of writing and maintaining tests by sparing us from having to stipulate our expectations in test code, and from having to go fixing test code each time our expectations change.




Cover image: "Audit Testing" by michael.gr.

No comments:

Post a Comment