michael.gr: If you are using mock objects you are doing it wrong.

Mocking by michael.gr, based on 'mock' by 'Iconbox' from https://thenounproject.com/icon/mock-2657532/

Abstract:

The practice of using Mock Objects in automated software testing is examined from a critical point of view and found to be highly problematic. Opinions of some well known industry speakers are cited. The supposed benefits of Mock Objects are shown to be either no real benefits, or achievable via alternative means.

(Useful pre-reading: About these papers)

Introduction

The automated software testing technique which is predominant in the industry today is Unit Testing. The goal of Unit Testing is to achieve defect localization, and to this effect it requires each component to be tested in strict isolation from its collaborators.

Testing components in isolation from each other poses certain challenges:

While being tested, the component-under-test makes invocations to collaborator interfaces; since the collaborator components are not present, some kind of substitute must be there to implement the collaborator interfaces and receive those invocations.
For each invocation that the component-under-test makes to a collaborator, it expects to receive back some result; therefore, the substitute receiving the invocation must be capable of generating a result that matches the result that would be generated by the real collaborator.

The technique which is predominant in the industry today for providing the component-under-test with substitutes of its collaborators is Mock Objects, or just mocks.

How do mocks work?

Mocks are based on the premise that the real work done by collaborators in a production environment is irrelevant during testing, and all that the component-under-test really needs from them is the results that they return when invoked. A test exercises the component-under-test in a specific way, therefore the component-under-test is expected to invoke its collaborators in ways which are known in advance; thus, regardless of how the real collaborators would work, the mocks which replace them do not need to contain any functionality; all they need to do is to yield the same results that the real collaborators would have returned, which are also known in advance.

To this effect, each test dynamically creates and configures as many mocks as necessary to substitute each one of the collaborators of the component-under-test, with the help of some mocking framework. These frameworks are so popular that there exists a proliferation of them: JMock, EasyMock, Mockito, NMock, Moq, JustMock, and the list goes on.

A mock object is configured to expose the same interface as the real collaborator that it substitutes, and to expect specific methods of this interface to be invoked, with specific argument values, sometimes even in a specific order of invocation. If anything goes wrong, such as an unexpected method being invoked, or a parameter having an unexpected value, the mock fails the test. A very common practice is to also fail the test if an expected method is not invoked.

For each one of the expected methods, the mock is configured to yield a prefabricated result which is intended to match the result that the real collaborator would have produced if it was being used, and if it was working exactly according to its specification.

Or at least, that is the intention.

Drawbacks of Mocks

Complex and laborious

In each test it is not enough to invoke the component-under-test to perform a computation and check the results; we also have to configure a mock for each one of the collaborators of the component, to anticipate every single call that the component will be making to them while performing the computation, and for each call to fabricate a result which matches the result that the real collaborator would have returned from that call.
Luckily, mocking frameworks lessen the amount of code necessary to accomplish this, but no matter how terse the mocking code is, the fact still remains that it constitutes substantial additional functionality which represents considerable additional complexity.
One of the well-known caveats of software testing is that a test failure does not necessarily indicate a defect in the production code; it always indicates a defect either in the production code or in the test itself, and the only way to know is to troubleshoot. Thus, the more code we put in tests, and the more complex this code is, the more time we end up wasting in chasing and fixing bugs in the tests themselves rather than in the code that they are meant to test.

Over-specified

By anticipating every single call that the component-under-test makes to its collaborators, we are claiming to have detailed knowledge of the inner workings of the component-under-test, and we are concerned not only with what it accomplishes, but also with every little detail about how it goes on about accomplishing it. Essentially, we are implementing all of our application logic twice: once with production code expressing the logic in imperative mode, and once more with testing code expressing the same logic in expectational mode. In both cases, we write copious amounts of code describing what should happen in excruciatingly meticulous detail.
Note that over-specification might not even be a goal in and of itself in some cases, but with mocking it is unavoidable in all cases: Each request that the component-under-test sends to its collaborators could conceivably be ignored, but the component-under-test still needs to receive some meaningful result in response to that request, so as to continue functioning during the remainder of the test; unfortunately, the only way that mocks can fabricate individual responses is by anticipating individual requests, even if the intention of the test is not to verify whether the requests are being made.

Presumptuous

When using mocks we are claiming to not only have detailed knowledge of the calls that the component-under-test makes to its collaborators, but also detailed knowledge of the results that would be returned by the real collaborators in a production environment.
Furthermore, the results returned by a collaborator depend on the state that the collaborator is in, which in turn depends on previous calls made to it, but a mock is by its nature incapable of emulating state, so when using mocks we are also claiming to have knowledge of the state transitions that the real collaborators undergo in a production environment, and of the effect that these state transitions have on the results that they return.
Such exorbitant presumptuousness might be okay if we are building high-criticality software, where each collaborator is likely to have requirements and specification that are well-defined and unlikely to change; however, in all other software, which is regular, commercial, non-high-criticality software, things are a lot less strict: not only the requirements and specifications change all the time, but also, by established practice, both the requirements, and the specification, and even the documentation, tend to be the code itself, and the code changes every time a new commit is made to the source code repository. Thus, the only way to know exactly how a collaborator behaves tends to be to actually invoke it and see what it does, while the mechanism which ensures that it does what it is supposed to do is the tests of that collaborator itself, which are unrelated to the tests of components that invoke it.
As a result of all this, the practice of mocking often places us in the all too familiar situation where our Unit Tests all pass with flying colors, but our Integration Tests miserably fail because the behavior of the real collaborators turns out to be different from what the mocks assumed it would be.

Fragile

By its nature, a mock object has no option but to fail the test if the interactions between the component under test and its collaborators deviate from what it expects. However, these interactions may legitimately change as software evolves. This may happen due to the application of a bug-fix, due to refactoring, or simply because as we write new code we invariably have to also modify existing code to interact with the new code that we are adding. Thus, when using mocks, every time we change the behavior of production code, we also have to fix tests to expect the new behavior. (Not only do we have to write all of our application logic twice, we also have to perform all of its maintenance twice.)
The original promise of Automated Software Testing was to enable us to continuously evolve our software without fear of breaking it. The idea is that whenever you modify the production code, you can re-run the tests to ensure that everything still works. When using mocks this does not work, because every time you change the slightest thing in the production code, the tests break. As a result, many programmers are hesitant to make needed changes to production code because of all the changes in testing code that would be required. The understanding is growing within the software engineering community that mock objects actually hinder software development instead of facilitating it.

Non-reusable

Mocks exercise the implementation of a component rather than its interface. Thus, when using mocks, it is impossible to reuse the same testing code to validate multiple different components that implement the same public interface but employ different collaborators. For example:

It is impossible to completely rewrite the component and reuse the old tests to make sure that the new implementation works exactly as the old one did.
It is impossible to use a single test suite to exercise both a real component and its fake.

Unenlightening

Ideally, a set of tests for a certain component should act as sample code demonstrating usage scenarios of that component. A programmer who is not familiar with a particular component should be able to read the tests of that component and gain a fairly good idea of what it can do, what it cannot do, and how to write production code that interacts with it.
Unfortunately, when using mocks, the tests are full of cryptic mock-related jabber, which obscures the actual usage of the component-under-test, and so the enlightening bits are lost in the noise.

What do others say?

I am certainly not the only one to voice dissatisfaction with mocks. People have been noticing that although automated software testing is intended to facilitate refactoring by ensuring that the code still works after each change that we make, the use of mocks often hinders refactoring, because the tests are so tied to the implementation that you cannot change anything without breaking the tests.

In the video Thoughtworks - TW Hangouts: Is TDD dead? (youtube, text digest) at 21':10'' Kent Beck states "My personal practice is I mock almost nothing."
In the same video, at 23':56'' Martin Fowler adds "I'm with Kent, I hardly ever use mocks."
In the Fragile Test section of his book xUnit Test Patterns: Refactoring Test Code (xunitpatterns.com) author Gerard Meszaros admits that "extensive use of Mock Objects causes overcoupled tests."
In his presentation TDD, where did it all go wrong? (InfoQ, YouTube) at 49':32'' Ian Cooper states "I argue quite heavily against mocks because they are overspecified."

Note that in an attempt to avoid sounding too blasphemous, these people refrain from suggesting that mocks should be abolished; however, it is evident that 3 out of 4 of them are strongly against mocks, and we do not need to read much between the lines to figure out that they would probably be calling for the complete abolition of mocks if they had a viable and universally applicable alternative to propose.

So, if not mocking, then what?

Mocking has been such a great hit with the software industry because it achieves multiple different goals at once. Here is a list of the supposed benefits of mocking, and for each one of them an explanation of why it is not really a benefit, or how it can be achieved without mocking:

Mocking achieves defect localization by eliminating collaborators from the picture and allowing components to be tested in strict isolation from each other.

Defect localization is useful, but it is not an absolute necessity, and it does not have to be done to absolute perfection as mocking aims to do; we can achieve more than good enough defect localization by testing each component in integration with its collaborators, simply by arranging the order in which tests are executed to ensure that by the time a component gets tested, all of its collaborators have already passed their tests. See michael.gr - Incremental Integration Testing.

Mocking allows a component to be tested without the performance overhead of instantiating and invoking its real collaborators.

The performance overhead of instantiating and invoking the real collaborators is not always prohibitive, or even noticeable, so in many cases it is perfectly fine to test a component in integration with its real collaborators. See michael.gr - Incremental Integration Testing.
In the limited number of cases where the performance overhead is indeed prohibitive, it can be avoided with the use of Fakes instead of Mocks. See michael.gr - Software Testing with Fakes instead of Mocks.

Mocking allows us to examine invocations being made by the component-under-test to its collaborators, to ensure that they are issued exactly as expected.

In most cases, examining the invocations made by the component-under-test to its collaborators is in fact bad practice, because it constitutes white-box testing. The only reason why this is being widely practiced in the industry is because mocking does not work otherwise, so in this regard mocking contains a certain element of a self-serving paradigm.
In those rare cases where examining the invocations is in fact necessary, it is still bad practice to do so programmatically, because it results in tests that are over-specified and fragile.
What we can do instead is to record the interactions during each test run, visually compare the latest recording with that of the last known good run, and decide whether the differences match our expectations; if they do not match, then we must keep working on our code; but if they do match, then we are done without the need to go fixing any tests. See michael.gr - Audit Testing and michael.gr - Collaboration Monitoring.

Mocking allows us to fabricate the results returned from a collaborator to the component-under-test, so as to guarantee that they are free from defects that could be caused by bugs in the implementation of the real collaborator.

Fabricating the results that would have been returned by a real collaborator is in fact bad practice, because it will not magically make any bugs go away, (in this sense it can be likened to ostrich policy,) and because as I have already explained, it is highly presumptuous. The definitive authority on what results are returned by a certain collaborator is the real implementation of that collaborator, or a fake thereof, which in turn necessitates integration testing. See michael.gr - Incremental Integration Testing.

Mocking allows us to verify the correctness of components that generate their output by means of forwarding results to collaborators rather than by returning results from invocations.

Even in this case, Collaboration Monitoring can be used instead of mocking, to verify that the results are generated as expected without having to programmatically describe what the results should be and without having to go fixing tests each time we modify the component under test and deliberately change something about the results it generates. See michael.gr - Audit Testing and michael.gr - Collaboration Monitoring.

Mocking allows us to start testing a component while one or more of its collaborators are not ready yet for integration because they are still in development, and no fakes of them are available either.

This is true, but once the collaborators (or fakes thereof) become available, it is best to integrate them in the tests, and to unceremoniously throw away the mocks. See michael.gr - Incremental Integration Testing.

Mocking allows us to develop a component without depending on factors that we have no control over, such as the time of delivery of collaborators, the quality of their implementation, and the quality of their testing. With the use of Mocks we can claim that our component is complete and fully tested, based on nothing but the specification of its collaborators, and we can claim that it should work fine in integration with its collaborators when they happen to be delivered, and if they happen to work according to spec.

True, but this implies a very bureaucratic way of working, and utter lack of trust towards the developers of the collaborators; it is best if it never comes to that.
We can still avoid the use of mocks by creating fakes of the collaborators ourselves. See michael.gr - Software Testing with Fakes instead of Mocks.

To summarize, mocks can always be replaced with one or more of the following:

Fakes (see michael.gr - Software Testing with Fakes instead of Mocks)
Incremental Integration Testing (see michael.gr - Incremental Integration Testing)
Audit Testing (see michael.gr - Audit Testing) and Collaboration Monitoring (see michael.gr - Collaboration Monitoring)

Conclusion

As we have shown, the practice of using Mock Objects in automated software testing is laborious, over-specified, presumptuous, and leads to tests that are fragile and non-reusable, while each of the alleged benefits of using mocks is either not a real benefit, or can be realized by other means, which we have named.

Mandatory grumpy cat meme - "Mock objects - they are horrible"

2023-01-15

If you are using mock objects you are doing it wrong.