2022-10-19

Incremental Integration Testing


Abstract (descriptive)

A new method for Automated Software Testing is presented as an alternative to Unit Testing. The new method retains the benefit of Unit Testing, which is Defect Localization, but eliminates the need for mocking, thus greatly lessening the effort of writing and maintaining tests.

(Useful pre-reading: About these papers)

Abstract (informative)

Unit Testing aims to achieve Defect Localization by replacing the dependencies of the Component Under Test with Mocks. The use of Mocks is laborious, complicated, over-specified, presumptuous, and constitutes testing against the implementation, not against the interface, thus leading to brittle tests that hinder refactoring rather than facilitating it. To avoid these problems, a new method is proposed, called Incremental Integration Testing. The new method allows each component to be tested in integration with its dependencies, (or with Fakes thereof,) thus completely abolishing Mocks. Defect Localization is achieved by arranging the order in which tests are executed so that the dependencies of a component get tested before the component gets tested, and by stopping as soon as the first defect is encountered. Thus, when a test discovers a defect, we can be sufficiently confident that the defect lies in the component being tested, and not in any of its dependencies, because by that time all of its dependencies have already passed their tests.

The problem:  Dependency-Induced Uncertainty

The goal of software testing is to exercise software components under various usage scenarios to ensure that they meet their requirements. The goal of automated software testing is to do so using software.

The most simple and straightforward way to test a software component is to set up some input, invoke the component to perform a certain job, and then examine the output to ensure that it is what it is expected to be. 

However, a software component rarely does its job all by itself; more often than not, it is part of a system, and it delegates parts of its job to other components in the system. When component A makes use of component B we say that A depends on B, or that B is a dependency of A. 

When testing a component that has dependencies, we are faced with a problem: the test may tell us that there is a defect somewhere, but it is unclear whether the defect lies in the component being tested, or in one or more of its dependencies. For the purposes of this paper I would like to give a name to the problem, so I will call it the problem of Dependency-Induced Uncertainty in Testing

Ideally, we would like each software test to be conducted in such a way as to detect defects specifically in the component that is being tested, instead of extraneous defects in its dependencies; in other words, we would like to achieve Defect Localization, which is to say that we want to minimize Dependency-Induced Uncertainty in Testing.

The existing solution: Unit Testing

Unit Testing (wikipedia) was invented specifically in order to address the problem of Dependency-Induced Uncertainty in Testing. It takes an extremely drastic approach: if the use of dependencies introduces uncertainties, one way to eliminate those uncertainties is to eliminate the dependencies. Thus, Unit Testing aims to test each component in strict isolation. Hence, its name. 

To achieve this remarkably ambitious goal, Unit Testing refrains from supplying the component under test with the actual dependencies that it would normally receive in a production environment; instead, it supplies the component under test with specially crafted substitutes of its dependencies. There exist a few different kinds of substitutes, but by far the most widely used kind is Mocks.

Each Mock must be hand-written for every individual test that is performed; it exposes the same interface as the real dependency that it substitutes, and it expects specific methods of  that interface to be invoked by the component-under-test, with specific argument values, sometimes even in a specific order of invocation. If anything goes wrong, such as an unexpected method being invoked, an expected method not being invoked, or a parameter having a wrong value, the Mock fails the test. When one of the expected methods is invoked, the Mock does nothing of the sort that the real dependency would do; instead, the Mock is hard-coded to yield a fabricated response which is intended to be exactly the same as the response that the real dependency would have produced if it was being used, and if it was working exactly according to its specification. 

Or at least, that is the intention.

Drawbacks of Unit Testing

  • Complex and laborious
    • In each test it is not enough to simply set up the input, invoke the component, and examine the output; we also have to anticipate every single call that the component will make to its dependencies, and for each call we have to set up a mock, expecting specific parameter values, and producing a specific response aiming to emulate the real dependency under the same circumstances.

    • Luckily, mocking frameworks lessen the amount of code necessary to accomplish this, but no matter how terse the mocking code is, the fact still remains that it implements a substantial amount of functionality which represents considerable complexity.

    • One of the well-known caveats of software testing at large (regardless of what kind of testing it is) is that a test failure does not necessarily indicate a defect in the production code; it always indicates a defect either in the production code, or in the test itself. The only way to know is to troubleshoot.

    • Thus, the more code we put in tests, and the more complex this code is, the more time we end up wasting in chasing and fixing bugs in the tests themselves rather than in the code that they are meant to test.

  • Over-specified

    • Unit Testing is concerned not only with what a component accomplishes, but also with every little detail about how the component goes on about accomplishing it. This means that when we engage in Unit Testing we are essentially expressing all of our application logic twice: once with production code expressing the logic in in imperative mode, and once more with testing code expressing the same logic in expectational mode. In both cases, we write copious amounts of code describing precisely what should happen in excruciatingly meticulous detail.
    • During testing, if the slightest thing happens that deviates from the expectations, the test fails. However, the behavior of a component may legitimately change as software evolves, but even the legitimate changes will invariably break the tests. This means that every time we change the behavior of the production code, we have to go fix all the tests to expect the new behavior.
    • The original promise of Automated Software Testing was to enable us to continuously evolve software without fear of breaking it. The idea was that whenever you make a modification to the software, you can re-run the tests to ensure that you have not broken anything. With Unit Testing, this does not work, because every time you change the production code you have to also change the tests, even if the requirements of the production code have not changed.

    • Note that over-specification might not even be goal in and of itself in some cases of Unit Testing, but in all cases it is unavoidable due to the fact that Unit Testing eliminates the dependencies: the requests that the component under test sends to its dependencies could conceivably be routed into a black hole and ignored, but in order for the component under test to continue working, (so as to be tested,) it still needs to receive a meaningful response to each request; thus, in order to be able to produce the needed responses, the test has to expect each request, even if the intention of the test was not to know how, or even whether, the request is made.
  • Presumptuous

    • Each Unit Test claims to have detailed knowledge of not only how the component-under-test invokes its dependencies, but also how each real dependency would respond to each invocation in a production environment, which is a highly presumptuous thing to do. 

    • Such presumptuousness might be okay if we are building high-criticality software, where each dependency is likely to have requirements and specification that are well-defined in official documents, and highly unlikely to change; however, in all other software, which is regular, commercial, non-high-criticality software, things are a lot less strict: not only the requirements and specifications change all the time, but also quite often, the requirements, the specification, even the documentation, is the code itself, and the code changes every time a new commit is made to the source code repository. This might not be ideal, but it is pragmatic, and it is established practice. Thus, the only way to know exactly how a component behaves tends to be to actually invoke the latest version of that component and see how it responds, while the mechanism which ensures that these responses are what they are supposed to be is the tests of that component itself, which are unrelated to the tests of components that depend on it.

    • As a result of this, Unit Testing often places us in the all too familiar situation where our Unit Tests all pass with flying colors, but our Integration Tests miserably fail because the behavior of the real dependencies turns out to be different from what the mocks assumed it would be.
  • Fragile
    • Non-reusable

          The above disadvantages of Unit Testing are direct consequences of the fact that it is White-Box Testing by nature, because it is intentionally testing against the implementation, not against the interface. What we need to be doing instead is Black-Box testing, which means that Unit Testing should be avoided, despite the entire Software Industry's addiction to it.

          Note that I am not the only one, nor the first one, to voice dissatisfaction with Unit Testing with Mocks. People have been noticing that although tests are intended to facilitate refactoring by ensuring that the code still works after refactoring, tests often end up hindering refactoring, because they are so tied to the implementation that you can't refactor anything without breaking the tests. This problem has been identified by renowned personalities such as Martin Fowler and Ian Cooper, and even by Ken Beck, the inventor of Test-Driven Development (TDD).

          In the video Thoughtworks - TW Hangouts: Is TDD dead? (youtube) at 21':10'' Kent Beck says "My personal practice is I mock almost nothing" and at 23':56'' Martin Fowler says "I'm with Kent, I hardly ever use mocks". In the Fragile Test section of his book xUnit Test Patterns: Refactoring Test Code (xunitpatterns.com) author Gerard Meszaros states that extensive use of Mock Objects causes overcoupled tests. In his presentation TDD, where did it all go wrong? (InfoQ, YouTube) at 49':32'' Ian Cooper says "I argue quite heavily against mocks because they are overspecified."

          Note that in an attempt to avoid sounding too blasphemous, none of these people call for the complete abolition of mocks, they only warn against the excessive use of mocks. Furthermore, they seem to have little to say about any alternative means of avoiding Dependency-Induced Uncertainty (achieving Defect Isolation) and yet they continue to call what they do Unit Testing despite the fact that they do not seem to be isolating the components under test.

          Ian Cooper even goes as far as to suggest that in the context of Test Driven Development (TDD) the term Unit Testing does not refer to isolating the components under test from each other, but rather isolating the tests from each other. With this mental acrobatic he achieves the best of both worlds: the can disavow mocks while continuing to call what he practices Unit Testing. (Because apparently, to tell people that Unit Testing is wrong is way too blasphemous even for a Software Engineer with a 20 cm long beard, extensive tattoos, and large hollow earrings.) I do fully agree that it is the tests that should be kept isolated, but I consider this re-definition of the term to be arbitrary and unwarranted. Unit Testing has already been defined, its definition is quite inambiguous, and according to this definition, it is problematic; so, instead of trying to change the definition, we must abandon Unit Testing and start doing something else, which requires a new name.

          A new solution: Incremental Integration Testing

          If we were to abandon Unit Testing, then one might ask what should we be doing instead. Obviously, we must somehow continue testing our software, and we would like to also continue doing so without Dependency-Induced Uncertainty.

          As it turns out, eliminating the dependencies is just one way of dealing with Dependency-Induced Uncertainty; another, more pragmatic approach is as follows:

          Allow each component to be tested in integration with its dependencies, but only after each one of the dependencies has undergone its own testing, and has successfully passed it.
          Thus, any observed malfunction can be attributed with a high level of confidence to the component being tested, and not to any of its dependencies, because the dependencies have already been tested.

          I call this Incremental Integration Testing.

          An alternative way of arriving at the idea of Incremental Integration Testing begins with the philosophical observation that strictly speaking, there is no such thing as a Unit Test; there always exist dependencies which by established practice we never mock and invariably integrate in Unit Tests without blinking an eye; these are, for example:

          • Many of the external libraries that we use.
          • Most of the functionality provided by the Runtime Environment in which our system runs. 
          • Virtually all of the functionality provided by the Runtime Library of the language we are using.

          Nobody mocks standard collections such as array-lists, linked-lists, hash-sets, and hash-maps; very few people bother with mocking filesystems; nobody would mock an advanced math library, a serialization library, and the like; even if one was so paranoid as to mock those, at the extreme end, nobody mocks the MUL and DIV instructions of the CPU; so clearly, there are always some things that we take for granted.

          We allow ourselves the luxury of taking these things for granted because we believe that they have been sufficiently tested by their respective creators and can be reasonably assumed to be free of defects. So, why not also take our own creations for granted once we have tested them? Are we testing them sufficiently or not?

          Prior Art

          An internet search for "Incremental Integration Testing" does yield some results. An examination of those results reveals that they are referring to some strategy for integration testing which is meant to be performed manually by human testers, constitutes an alternative to big-bang integration testing, and requires full Unit Testing of the traditional kind to have already taken place. I am hereby appropriating this term, so from now on it shall mean what I intend it to mean. If a context ever arises where disambiguation is needed, the terms "automated" vs. "manual" can be used.

          Implementing the solution: the poor man's approach

          As explained earlier, Incremental Integration Testing requires that when we test a component, all of its dependencies must have already been tested. Thus, Incremental Integration Testing necessitates exercising control over the order in which tests are executed.

          Most testing frameworks execute tests in alphanumeric order, so if we want to change the order of execution all we have to do is to appropriately name the tests, and the directories in which they reside.

          For example:

          Let us suppose that we have the following modules:

          com.acme.alpha_depends_on_bravo
          com.acme.bravo_depends_on_nothing
          com.acme.charlie_depends_on_alpha

          Note how the modules are listed alphanumerically, but they are not listed in order of dependency.

          Let us also suppose that we have one test suite for each module. By default, the names of the test suites follow the names of the modules that they test, so again, a listing of the test suites in alphanumeric order does not match the order of dependency of the modules that they test:

          com.acme.alpha_depends_on_bravo_tests
          com.acme.bravo_depends_on_nothing_tests
          com.acme.charlie_depends_on_alpha_tests

          To achieve Incremental Integration Testing, we add a suitably chosen prefix to the name of each test suite, as follows:

          com.acme.T02_alpha_depends_on_bravo_tests
          com.acme.T01_bravo_depends_on_nothing_tests
          com.acme.T03_charlie_depends_on_alpha_tests

          Note how the prefixes have been chosen in such a way as to establish a new alphanumerical order for the tests. Thus, an alphanumeric listing of the test suites now lists them in order of dependency of the modules that they test: 

          com.acme.T01_bravo_depends_on_nothing_tests
          com.acme.T02_alpha_depends_on_bravo_tests
          com.acme.
          T03_charlie_depends_on_alpha_tests

          At this point Java programmers might object that this is impossible, because in Java, the tests always go in the same module as the production code, directory names must match package names, and test package names always match production package names. Well, I have news for you: they don't have to. The practice of doing things this way is very widespread in the Java world, but there are no rules that require it: the tests do not in fact have to be in the same module, nor in the same package as the production code. The only inviolable rule is that directory names must match package names, but you can call your test packages whatever you like, and your test directories accordingly. Java developers tend to place tests in the same module as the production code simply because the tools (maven) have a built-in provision for this, without ever questioning whether there is any actual benefit in doing so. (There isn't. As a matter of fact, in the DotNet world there is no such provision, and nobody complains.) Furthermore, Java developers tend to place tests in the same package as the production code for no purpose other than to make package-private entities of their production code accessible from their tests, but this is testing against the implementation, not against the interface, and therefore misguided. So, I know that this is a very hard thing to ask from most Java programmers, but trust me, if you would only dare to take a tiny step off the beaten path, if you would for just once do something for reasons other than "everyone else does it", you can very well do the renaming necessary to achieve Incremental Integration Testing.

          Now, admittedly, renaming tests in order to achieve a certain order of execution is not an ideal solution. It is awkward, it is thought-intensive since we have to figure out the right order of execution by ourselves, and it is error-prone because there is nothing to guarantee that we will get the order right. That's why I call it "the poor man's solution".  Let us now see how all of this could be automated.

          Implementing the solution: the automated approach

          Here is an algorithm to automate Incremental Integration Testing:

          1. Begin by building a model of the dependency graph of the entire software system.
            • This requires system-wide static analysis to discover all components in our system, and all dependencies of each component. I did not say it was going to be easy.
            • The graph should not include external dependencies.
          2. Test each leaf node in the model.
            • A leaf node in the dependency graph is a node which has no dependencies; at this level, a Unit Test is indistinguishable from an Integration Test, because there are no dependencies to either integrate or mock.
          3. If any malfunction is discovered during step 2, then stop as soon as step 2 is complete.
            • If a certain component fails to pass its test, it is counter-productive to proceed with the tests of components that depend on it. Unit Testing seems to be completely oblivious to this little fact; Incremental Integration Testing fixes this.
          4. Remove the leaf nodes from the model of the dependency graph.
            • Thus removing the nodes that were previously tested in step 2, and obtaining a new, smaller graph, where a different set of nodes are now the leaf nodes. 
            • The dependencies of the new set of leaf nodes have already been successfully tested, so they are of no interest anymore: they are as good as external dependencies now.
          5. Repeat starting from step 2, until there are no more nodes left in the model.
            • Allowing each component to be tested in integration with its dependencies, since they have already been tested.

          No testing framework that I know of (JUnit, NUnit, etc.) is capable of doing any of the above; for this reason, I have developed a utility which I call Testana, that does exactly that. 

          Testana will analyze a system to discover its structure, will analyze modules to discover dependencies and tests, and will run the tests in the right order so as to achieve Incremental Integration Testing. It will also do a few other nice things, like examine timestamps and refrain from running tests whose dependencies have not changed.

          Testana currently supports Java projects under maven, with JUnit-style tests. For more information, see michael.gr - GitHub project: mikenakis-testana.

          What if my dependencies are not discoverable?

          Some very trendy practices of our modern era include:

          • Using scripting languages, where there is no notion of types, and therefore no possibility of discovering dependencies via static analysis.
          • Breaking up systems into separate source code repositories, so there is no single system on which to perform system-wide static analysis to discover dependencies.
          • Incorporating multiple different programming languages in a single system, (following the polyglot craze,) thus hindering system-wide static analysis, since it now needs to be performed across different languages.
          • Making modules interoperate not via normal programmatic interfaces, but instead via various byzantine mechanisms such as REST, whose modus operandi is binding by name, thus making dependencies undiscoverable.

          If you are following any of the above trendy practices, then you cannot programmatically discover dependencies, so you have no way of automating Incremental Integration Testing. Thus, you will have to specify by hand the order in which your tests will run, and you will have to keep maintaining this order by hand.

          Sorry, but retarded architectural choices do come with consequences.

          What about performance?

          One might argue that Incremental Integration Testing does not address one very important issue which is very well taken care of by Unit Testing with Mocks, and that issue is performance:

          • When dependencies are replaced with Mocks, the tests tend to be fast.
          • When actual dependencies are integrated, such as file systems, relational database management systems, messaging queues, and what not, the tests can become very slow. 

          Is there anything we can do about this?

          To address the performance issue I recommend the use of Fakes, not Mocks.

          One book that names and describes Fakes, Mocks, etc. is "xUnit Test Patterns: Refactoring Test Code" by Gerard Meszaros, (xunitpatterns.com) though I have read about them from martinfowler.com - TestDouble. In short, a Fake is a module that offers the complete functionality of the real module that it substitutes, (or at any rate the subset of that functionality that we have a use for,) but is more suitable for testing than the real thing, usually by being much more lightweight and much faster than the real thing. Fakes usually achieve this by means of some severe compromise, such as:

          • Having limited capacity.
          • Not being scalable.
          • Not being distributed.
          • Not persisting anything to storage.

          For example:

          • Various in-memory file-system libraries exist for various platforms, which can be used in place of the actual file-systems of those platforms.
          • In Java, HSQLDB and H2 are in-memory databases that can be used in place of an actual RDBMS.
          • In DotNet, EntityFramework allows the creation of an in-memory DbContext.
          • EmbeddedKafka can be used in place of an actual pair of Kafka + Zookeeper instances.

          Note that the terminology is a bit unfortunate: Fakes are in fact a lot less fake than Mocks; Mocks are the ultimate in fakery; Fakes actually support the functionality of the real thing, while the compromises that they make in order to achieve this tend to be irrelevant when testing.

          By supplying a component under test with a Fake instead of a Mock we benefit from great performance, while utilizing a dependency which has already been tested by its creators and can be reasonably assumed to be free of defects. In doing so, we continue to avoid White-Box Testing and we keep Dependency-Induced Uncertainty at a minimum.

          Furthermore, nothing prevents us from having our CI/CD server run the test of each component twice:

          • Once in integration with Fakes
          • Once in integration with the actual dependencies

          This will be slow, but CI/CD servers generally do not mind. The benefit of doing this is that it gives further guarantees that everything works as intended.

          Developing Fakes

          In some cases we may want to create a Fake ourselves, as a substitute of one of our own modules. Not only will this allow dependent components to start their testing as early as possible without the need for Mocks, but also, a non-negligible part of the effort invested in the creation of the Fake will be reusable in the creation of the real thing, while the process of creating the Fake is likely to yield valuable lessons which can guide the creation of the real thing. Thus, any effort that goes into creating a Fake of a certain module represents a much better investment than the effort of creating a multitude of throw-away Mocks for various isolated operations on that module. 

          One might argue that keeping a Fake side-by-side with the real thing may represent a considerable additional maintenance overhead, but in my experience the overhead of doing so is nowhere near the overhead of maintaining a proliferation of mocks for the real thing. 

          • Each time the implementation of the real thing changes without any change to its specification, (such as, for example, when applying a bug fix,) some mocks must be modified, some must even be rewritten, while the Fake usually does not have to be touched at all.
          • When the specification of the real thing changes, both the Mocks have to be rewritten, and the Fake has to be modified, but the beauty of the Fake is that it is a self-contained module which implements a known abstraction, so it is easy to maintain, whereas every single snippet of mocking code is nothing but incidental complexity, and thus hard to maintain.
          • In either case, a single change in the real thing will require at most a single corresponding change in the Fake, whereas if we are using Mocks we invariably have to change a large number of mock snippets scattered throughout the test suites.

          Furthermore, once we start making use of Incremental Integration Testing to free ourselves from the burden of White-Box Testing, new possibilities open up which greatly ease the development of Fakes: it is now possible to write a test for a certain module, and then reuse that test in order to test its Fake. The test can be reused because it is a Black-Box Test: it does not care how the module works internally, so it can test the real thing just as well as the Fake of the real thing. Once we run the test on the real thing, we run the same test on the Fake, and if both pass, then from that moment on we can continue using the Fake in place of the real thing when testing anything that depends on it.

          Finally, if we are using an external component for which no Fake is available, we may wish to create a Fake for it ourselves. First, we write a test suite which exercises the external component, not really looking for defects in it, but instead using its behavior as reference for writing the tests. Once we have built our test suite to specifically pass the behavior of the external component, we can reuse it against the Fake, and if it also passes, then we have sufficient reasons to believe that the behavior of the Fake matches the behavior of the external component. In an ideal world where everyone would be practicing Black-Box testing, we should even be able to obtain from the creators of the external component the test suite that they have already built for testing their creation, and use it to test our Fake. (And in an even more ideal world, anyone who develops a component for others to use would be shipping it together with its Fake, so that nobody needs to get dirty with its test suite.) 

          Benefits

          Incremental Integration Testing has the following benefits:

          • It greatly reduces the effort of writing and maintaining tests, by eliminating the need for mocking code in each test.
          • It allows our tests to engage in Black-Box Testing instead of White-Box Testing. For an in-depth discussion of what is wrong with White-Box Testing, please read michael.gr - White-Box vs. Black-Box Testing.
          • It makes tests more effective and accurate, by eliminating assumptions about the behavior of the real dependencies.
          • It simplifies our testing operations by eliminating the need for two separate testing phases, one for Unit Testing and one for Integration Testing.
          • It is unobtrusive, since it does not dictate how to construct the tests, it only dictates the order in which the tests should be executed.

          Disadvantages (and counter-arguments)

          • It assumes that a component which has been tested is free of defects.
            • Argument: 
              • A well-known caveat of software testing is that it cannot actually prove that software is free from defects, because it necessarily only checks for defects that we have anticipated and tested for. As Edsger W. Dijkstra famously put it, "program testing can be used to show the presence of bugs, but never to show their absence!'
            • Counter-arguments:
              • I am not claiming that once a component has been tested, it has been proven to be free from defects; all I am saying is that it can reasonably be assumed to be free from defects. Incremental Integration Testing is not meant to be a perfect solution; it is meant to be a pragmatic solution.
              • The fact that testing cannot prove the absence of bugs does not mean that everything is futile in this vain world, and that we should abandon all hope in despair: testing might be imperfect, but it is what we can do, and it is in fact what we do, and practical, real-world observations show that it is quite effective.
              • Most importantly: Any defects in an insufficiently tested component will not magically disappear if we mock that component in the tests of its dependents.
                • In this sense, the practice of mocking dependencies can arguably be likened to Ostrich policy. (Ostrich Policy on Wikipedia).
                • On the contrary, continuing to integrate that component in subsequent tests gives us incrementally more opportunities to discover defects in it.
          • It does not completely eliminate Dependency-Induced Uncertainty in testing.
            • Argument:
              • If a certain component has defects which were not detected when it was tested, then these defects will cause Dependency-Induced Uncertainty when testing components that depend on it.
            • Counter-arguments:
              • It is true that Incremental Integration Testing suffers from Dependency-Induced Uncertainty when dependencies have defects despite having already been tested. It is also true that Unit Testing with Mocks does not suffer at all from Dependency-Induced Uncertainty when dependencies have defects; but then again, neither does it detect those defects. For that, it is necessary to always follow a round of Unit Testing with a round of Integration Testing. However, when the malfunction is finally observed during Integration Testing, we are facing the exact same problem that we would have faced if we had done a single round of Incremental Integration Testing instead: a malfunction is being observed which is not due to a defect in the root component of the integration, but instead due to a defect in some unknown dependency. The difference is that Incremental Integration Testing gets us there faster.
              • Let us not forget that the primary goal of software testing is to guarantee that software works as intended, and that the elimination of Dependency-Induced Uncertainty is an important but nonetheless secondary goal. Incremental Integration Testing goes a long way towards reducing Dependency-Induced Uncertainty, but it does not completely eliminate it, in favor of other conveniences, such as making it far more easy to write and maintain tests. So, it all boils down to whether Unit Testing represents overall more or less convenience than Incremental Integration Testing. I assert that Incremental Integration Testing is unquestionably far more convenient than Unit Testing.
          • It only tests behavior; it does not check what is going on under the hood.
            • Argument:
              • With Unit Testing, you can ensure that a certain module not only produces the right results, but also that it follows an expected sequence of steps to produce those results. With Incremental Integration Testing you cannot observe the steps, you can only check the results. Thus, the internal workings of a component might be slightly wrong, or less than ideal, and you would never know.
            • Counter-argument:
              • This is true, and this is why Incremental Integration Testing might be unsuitable for high-criticality software, where White-Box Testing is the explicit intention, since it is necessary to ensure not only that the software  produces correct results, but also that its internals are working exactly according to plan. However, Incremental Integration Testing is not being proposed as a perfect solution, it is being proposed as a pragmatic solution: the vast majority of software being developed in the whole world is regular, commercial-grade, non-high-criticality software, where Black-Box Testing is appropriate and sufficient, since all that matters is that the requirements are met. Essentially, Incremental Integration Testing represents the realization that in the general case, tests which worry not only about the behavior, but also about the inner workings of a component, constitute over-engineering. For a more in-depth discussion about this, please read michael.gr - White-Box vs. Black-Box Testing.
              • In order to make sure that everything is happening as expected under the hood, you do not have to stipulate in excruciating detail what should be happening, and you do not have to fail the tests at the slightest sign of deviation from what was expected. Another way of ensuring the same thing is to simply:
                • Gain visibility into what is happening under the hood
                • Be notified when something different starts happening
                • Examine what is now different
                • Vouch for the differences being expected.
          For more details about this, see the appendix about Interaction Visibility.
          • It only tests behavior which is exposed by the public interface of a component.
            • Argument:
              • When the component under test invokes a dependency, we regard this to be part of the behavior of that component, therefore the dependency must be mocked so as to verify that the invocation is being made as expected. Another way of expressing this is by proposing that the interface of a component is not limited to the functionality that the component publicly exposes for other components to invoke, but it also includes interactions between the component and its dependencies.
            • Counter-argument:
              • This is a very twisted view of software architecture. The right view is as follows:
                • Interfaces must represent abstraction boundaries.
                • Abstractions must be complete. (Non-partial.)
                • Abstractions must be airtight. (Non-leaky.)
                • Therefore, the public interface of a component must be the only means necessary for eliciting any desired behavior from it, and also the only means necessary for surveying its behavior.
                • Thus, all dependencies must be thought of as internal dependencies; in other words, they must in fact always be strictly private, and interactions between a component and its dependencies may never be thought of as being part of the public behavior of the component.
                • When component A must invoke component B, not because A needs B in order to function, but because the system needs B to be invoked for the system to function, then this fact must be exposed in the public interface of A. This means that we have to pass B to A as a parameter, in the public interface of A. Parameters are not dependencies, so the rule which requires dependencies to be private is upheld.
          The benefit of the above with respect to testing is that all components can at all times be tested as Black-Boxes.
          • It prevents us from picking a single test and running it.
            • Argument:
              • With Unit Testing, we can pick any individual test and run it. With Incremental Integration Testing, running an individual test of a certain component is meaningless unless we first run the tests of the dependencies of that component.
            • Counter-argument:
              • This is only true if the dependencies have changed.
                • If the dependencies have not changed, then you do not have to re-run their tests, you can simply go ahead and run only the tests of the component that has changed.
                • If the dependencies have changed, then you must in fact run their tests first, otherwise the individual test that you picked to run is meaningless.
              • If you are unsure as to exactly what has changed, or what the dependencies are, then consider using a tool like Testana, which figures all this out for you. See michael.gr - GitHub project: mikenakis-testana.
          • It requires additional tools.
            • Argument:
              • Incremental Integration Testing is not supported by any of the popular testing frameworks, which means that in order to start practicing it, new tools are necessary.
              • Since Incremental Integration Testing is brand new, obtaining such tools might be very difficult, if not impossible.
              • Furthermore, such tooling is going to be non-trivial to build, because it has to do advanced stuff like system-wide static analysis.
            • Counter-argument:
              • My intention is to show the way; if people see the way, the tools will come.
              • If you are using Java with maven and JUnit, there is already a tool that you can use, see michael.gr - GitHub project: mikenakis-testana.
              • Even in lack of tools, it is possible to start experimenting with Incremental Integration Testing by following the poor-man's approach, which consists of simply naming the tests, and the directories in which they reside, in such a way that your existing testing framework will run them in the right order. This approach is described in detail in the corresponding section of this paper.

          Is Unit Testing with Mocks good for anything?

          Unit Testing with Mocks is useful in a few scenarios that I can think of:

          • Unit Testing with Mocks is useful if we want to start testing a component while one or more of its dependencies are not ready yet for integration because they are still in development, and no Fakes of them are available either. Having said that, I must add that once the dependencies (or Fakes thereof) become available, it is best to start using them, and to unceremoniously throw away the Mocks.
          • Unit Testing with Mocks is useful in high-criticality software, where the specifications of software components are detailed in official documents that very rarely change, every round of Unit Testing is followed by a round of Integration Testing, and the goal usually is to ensure not only that a component exhibits the correct behavior, but also that it interacts with its dependencies exactly as expected. In such cases, Mocks are useful for simulating the behavior of the dependencies strictly as described in their specification documents, regardless of the possibility that the tests of those dependencies may have failed to detect defects in them. (This is somewhat paranoid, but when testing high-criticality software, paranoia is the order of the day.) Having said that, I must add that that even in the case of high-criticality software, the goal of ensuring that a component interacts with its dependencies exactly as expected does not require stipulating these interactions in testing code; it can be achieved in a much more cost-effective way by means of Interaction Visibility; see related appendix.
          • Unit Testing with Mocks is useful when the developers of a certain component do not want the quality and thoroughness of their work to depend on things that they have no control over, such as the time of delivery of dependencies, the quality of their implementation, and the quality of their testing. (In other words, when the developers of a certain component do not trust the developers of its dependencies.) With the use of Mocks we can claim that our module is complete and fully tested, based on nothing but the specification of its dependencies, and we can claim that it should work fine in integration with its dependencies when they happen to be delivered, and if they happen to work according to spec.

          Conclusion

          Unit Testing was invented in order to eliminate Dependency-Induced Uncertainty in Testing, but as we have shown, it is laborious, complicated, over-specified, presumptuous, and constitutes White-Box Testing. Incremental Integration Testing is a pragmatic approach for non-high-criticality software which minimizes Dependency-Induced Uncertainty without using mocks, and in so doing it greatly reduces the effort of developing and maintaining tests, it avoids presumptuousness, and it allows us to remain strictly within the realm of Black-Box Testing.

          Appendix: Interaction Visibility

          This appendix is work in progress. It gives a preliminary description of a mechanism which I am still in the process of perfecting and which aims to make Incremental Integration Testing also suitable (and preferable) for the development of high-criticality software.

          In high-criticality software we usually want to ensure not only that the Component Under Test behaves as expected, but also that it interacts with its dependencies as expected. For this reason, some people regard the use of Mocks as justified in the development of high-criticality software, because it allows us to stipulate with great precision what should be happening under the hood.

          However, in order to ensure that a component interacts with its dependencies as expected it is not in fact necessary to stipulate in code what these interactions should be, nor is it necessary to fail the tests at the slightest sign of deviation from what was expected; all we need is visibility into the interactions, so that we can visually examine them and tell whether they are what we expected them to be. 

          When we revise a component, and as a result of this revision the component now interacts with its dependencies in a slightly different way, it is extremely counter-productive to have the tests fail and to have to go fix all the mocks so that they stop expecting the old interactions and start expecting the new interactions. (Besides, what people tend to achieve this way is tests that "test around" bugs, meaning that they specifically pass if the bugs are in place.) All we need is the ability to detect the fact that the interactions have changed as a result of a revision in the production code, and to be able to see precisely what has changed, so that we can determine whether the observed changes are expected according to the revision. If they are not, then we have to keep working on our revision; but if they are, then we are actually done without the need to fix any tests!

          To gain visibility into the interactions between a component and its dependencies we can use a mechanism that I call Interaction Snooping.

          • In a traditional Unit Test with Mocks, the Component Under Test is wired to Mocks of its dependencies.
          • In an Incremental Integration Test, the Component Under Test is wired to its actual dependencies or fakes thereof.
          • In an Incremental Integration Test with Interaction Snooping, the Component Under Test is also wired to its actual dependencies or fakes thereof, but on each wire we now interject an Interaction Snooper

          An Interaction Snooper is a decorator of the interface represented by the wire, so its existence is completely transparent both to the Component Under Test and to the dependency that it is wired to. (Provided that we have the luxury to ignore some inevitable digression in timing.) The Interaction Snooper intercepts every call made by the Component Under Test to the dependency, and records information about the call before forwarding the call to the dependency. Note that in languages like Java and C# which support reflection and intermediate code generation, Interaction Snoopers do not have to be written by programmers, they can be generated automatically.

          The information recorded by an Interaction Snooper includes the name of the function that was called, a serialization of each parameter that was passed, and a serialization of the result that was returned. This information gets saved in a Snoop File. A Snoop file is a text file, and it is saved in the source code tree, right next to the test class that generated it. So, for example, if we have `SuchAndSuchTest.java`, right next to it there will be a `SuchAndSuchTest.snoop`. 

          By storing the Snoop Files in the source tree, we leverage our Version Control System and our Integrated Development Environment to take care of the rest of the workflow:

          • When a revision in the code causes a change in some interaction, we will take notice because our Version Control System will show the corresponding Snoop File as modified and in need of committing.
          • By asking our Integrated Development Environment to show us a "diff" between the current snoop file and the original version, we can see precisely what has changed without having to pore through the entire snoop file.
          • If the observed interactions are not exactly what we expected them to be according to the revisions we just made in the production code, we keep working on our revisions.
          • When we are confident that the changes in the interactions are expected according to the revisions that we made, we commit our revisions, along with the Snoop Files.
          • If our commit undergoes code review, then the reviewer will also be able to see both the changes in the production code, and the corresponding changes in the Snoop Files, and decide whether they are as expected.

          Thus, even high-criticality systems can gain complete visibility of what is going on under the hood without the need for Mocks.


          No comments:

          Post a Comment