michael.gr: The confusion about the term Unit Testing

Virtually everyone claims to be doing Unit Testing, but there is a surprising amount of disagreement as to how unit testing is defined. Let us see what the authorities on the subject have to say about it. What follows is mainly quotations from reputable sources, with some minimal commentary by me.

Wikipedia

Let us begin by checking the Wikipedia entry for Unit Testing:

Unit testing, a.k.a. component or module testing, is a form of software testing by which isolated source code is tested to validate expected behavior.
Unit testing describes tests that are run at the unit-level to contrast testing at the integration or system level.

Further down in the history section, Wikipedia lists some of the earliest known efforts of what we would today call unit testing, where the common theme is testing separately smaller parts of large software systems before integrating them together.

I am in full agreement with Wikipedia's definition, but Wikipedia is everyone's favorite source to cite if it agrees with their preconceptions, or proclaim untrustworthy if it does not, so can we find any other definition that corroborates the above?

IEEE

In the Definitions section of IEEE 1008-1987 Standard for Software Unit Testing we read:

[Warning! wooden language ahead!]

test unit³: A set of one or more computer program modules together with associated control data, (for example, tables), usage procedures, and operating procedures that satisfy the following conditions:
1) All modules are from a single computer program
2) At least one of the new or changed modules in the set has not completed the unit test⁴
3) The set of modules together with its associated data and procedures are the sole object of a testing process

And the footnotes:

³ A test unit may occur at any level of the design hierarchy from a single module to a complete program. Therefore, a test unit may be a module, a few modules, or a complete computer program along with associated data and procedures.

⁴ A test unit may contain one or more modules that have already been unit tested.

As we can see, IEEE's definition says nothing about isolation; instead, it considers an entire set of modules, of which only one might need testing, as a unit.

So, we have found a source that contradicts Wikipedia. It is a tie. Now we need to find a third opinion, to form a majority.

Kent Beck

Surely, Kent Beck, the inventor of Test-Driven Development and author of JUnit must have defined the term, right? Well, as it turns out, no.

In his original "Simple Smalltalk Testing: With Patterns" paper the closest he gets to providing a definition is this sentence:

I recommend that developers write their own unit tests, one per class.

Can "one test per class" be regarded as a definition of the term? I do not think so. I do not think it even makes sense as a statement, with modern programming languages and tooling.

In Test Driven Development by Example (2002) the closest that Kent Beck gets to providing a definition is this sentence:

The problem with driving development with small scale tests (I call them “unit tests”, but they don’t match the accepted definition of unit tests very well) is that you run the risk [...]

So, Kent Beck seems to regard unit tests as small-scale tests, which is not really a definition, and he acknowledges that there exists some other, accepted definition, but he does not say what that definition is. Perhaps Kent Beck thinks of a unit test as a unit of testing, as in a unit of information or a unit of improvement, but we cannot be sure.

Although Kent Beck makes no other attempt to define the term, in the same book he does mention a couple of times that a unit test should be concerned with the externally visible behavior of a unit, not with its implementation.

As a result, it should come as no surprise to hear that Kent Beck does not use mocks. In the video Thoughtworks Hangouts: Is TDD dead? (youtube, text digest) at 21':10'' Kent Beck states:

My personal practice is I mock almost nothing.

Martin Fowler

One often-cited author who is known for defining terms and elucidating concepts is Martin Fowler. So, what does he have to say about unit testing?

Martin Fowler's page on Unit Test begins by acknowledging that it is an ill-defined term, and that the only characteristics of unit testing that people seem to agree on are that they are supposed to be a) small-scale, b) written by the programmers themselves, and c) fast. Then, Martin Fowler proceeds to talk about two schools of thought that understand the term differently:

The "classicist" school of thought, which favors "sociable" unit tests, places emphasis on testing the behavior of a component, allowing the component to interact with its collaborators and assuming that the collaborators are working correctly. Martin Fowler places himself in this school of thought.
The "mockist" school of thought, which favors "solitary" unit tests, insists on testing each component in isolation from its collaborators, and therefore requires that every collaborator must be replaced with a "test double" for the purpose of testing. Martin Fowler states that he respects this school of thought, but he does not belong to it.

Okay, so this did not lead us to a single definition of unit testing, but at least it helped us further define two competing definitions.

It is also worth noting that Martin Fowler does not use mocks, either. In the video Thoughtworks Hangouts: Is TDD dead? (youtube, text digest) at 23':56'' Martin Fowler adds:

I'm with Kent, I hardly ever use mocks.

Robert C. Martin (Uncle Bob)

Among industry speakers, one of the most recognizable names is Robert C. Martin, a.k.a. Uncle Bob, author of the highly acclaimed book Clean Code. In his blog, under First-Class Tests he writes:

Unit Test: A test written by a programmer for the purpose of ensuring that the production code does what the programmer expects it to do.

This is not very useful. According to this definition, a unit test could be virtually anything.

Further down Uncle Bob gives a separate definition for integration tests, so maybe he regards the two as different, which would imply that he regards unit tests as testing units in isolation, but we cannot really be sure.

To confuse things, further down he mentions mocks only in the context of what he calls functional tests, so maybe he thinks of mocks as not belonging to unit tests, (which then begs the question how the unit tests can achieve isolation,) but we cannot be sure about that, either.

One thing we can be sure of is that Uncle Bob is also not particularly in favor of mocks. On that same page we read:

I, for example, seldom use a mocking tool. When I need a mock (or, rather, a Test Double) I write it myself.

Note that Uncle Bob finds it important enough to state his preference for a test double rather than a mock. That is probably because what he writes himself is fakes, not mocks. (Both fakes and mocks are different kinds of test doubles, see Martin Fowler: Test Double and Martin Fowler: Mocks Aren't Stubs.)

Ian Cooper

An interestingly conflicting opinion comes from Ian Cooper, an outspoken TDD advocate.

In TDD, Where Did It All Go Wrong? (InfoQ 2017) Ian Cooper states that in TDD a unit test is defined as a test that runs in isolation from other tests, not a test that isolates the unit under test from other units. In other words, the unit of isolation is the test, not the unit under test.

Ian Cooper obviously acknowledges that the prevailing understanding of unit tests is that they isolate the unit under test from other units, and he introduces a dissenting understanding, as if TDD is so radical that it justifies redefining long established terms. This is at best a refreshingly different take on the subject, and at worst a completely unfounded mental acrobatic.

The notion that the term "unit" in unit testing refers to the test rather than the component-under-test is inadmissible at the very least because it does not rhyme with integration testing and end-to-end testing:

Integration testing is about running our tests on integrations of system components, not about running tests somehow integrated with each other;
End-to-end testing is about running our tests on our entire system as a whole, not about somehow stringing all of our tests together.

therefore:

Unit testing is about running our tests on individual components of our system, not about running the tests individually. (Although I grant you that having isolation between individual tests is also a good idea, when possible.)

It is worth noting that Ian Cooper also belongs to the ranks of those who do not approve of mocks. In the same talk, at 49':45'' he says:

I argue quite heavily against mocks because they are over-specified.

Glenford Myers

So far we have had only a moderate amount of luck in finding a majority opinion to define unit testing. Let us try to locate the original source of the term, shall we?

I do not know for sure that the first recorded use of the term is in the 1979 classic The Art of Software Testing by Glenford Myers, but the book is so old that it seems reasonable to suppose so.

The original 1979 edition (ISBN 9780471043287, 0471043281) is not easy to obtain, so I cannot ascertain this, but I strongly suspect that the term "unit" did not appear in it; instead, it was likely added in the 2nd edition, revised by other authors and published in 2004. Nonetheless, I think it is safe to assume that when back in 1979 Glenford Myers was writing of "module testing" what he meant was precisely that which we now call unit testing.

In chapter 5 "Module (Unit) Testing" of the 2nd edition we read:

Module testing (or unit testing) is a process of testing the individual subprograms, subroutines, or procedures in a program. That is, rather than initially testing the program as a whole, testing is first focused on the smaller building blocks of the program.

Later in the same chapter the author acknowledges this form of testing to be white-box testing:

Module testing is largely white-box oriented.

Further down, he even lays down the foundations of what later came to be known as mocks:

[...] since module B calls module E, something must be present to receive control when B calls E. A stub module, a special module given the name “E” that must be coded to simulate the function of module E, accomplishes this.

So, this definition is in line with Wikipedia's definition; we finally have a majority.

Conclusion

Although not unanimous, the prevailing opinion seems to be that the term unit refers to the component under test, and it is specifically called a unit because it is supposed to be tested in isolation from its collaborators, in contrast to integration testing and end-to-end testing where components are allowed to interact with their collaborators.

This prevailing opinion comes from:

Wikipedia
Glenford Myers
the mockist school of thought mentioned by Martin Fowler
hints about a popular understanding of unit testing outside of TDD, which Ian Cooper tries to redefine in the context of TDD.

A lot of the confusion seems to stem from the fact that testing a component in isolation requires mocking its collaborators, but almost all of the people cited in this research realize that the use of mocks is misguided, so they either refrain from accurately defining the term, or try to give alternative definitions of the term, or speak of different schools of thought, in an attempt to legitimize violations of the requirement for isolation, so that they can still call what they do unit testing, even though it really is not.

Cover image: Created by michael.gr using ChatGPT, and then retouched to remove imperfections. The prompt used was: "Please give me an image of a crash test dummy in the style of The Thinker, by Auguste Rodin."

2025-04-06

The confusion about the term Unit Testing