2019-12-01

The case for software testing

What to reply to a non-programmer who thinks that testing is unnecessary or secondary

At some point during his or her career, a programmer might come across the following argument, presented by some colleague, partner, or decision maker:
Since we can always test our software by hand, we do not need to implement Automated Software Testing.
Apparently, I reached that point in my career, so now I need to debate this argument. I decided to be a good internet citizen and publish my thoughts. So, in this post I am going to be deconstructing that argument, and demolishing it from every angle that it can be examined. I will be doing so using language that is easy to process by people from outside of our discipline.

In the particular company where that argument was brought forth, there exist mitigating factors which are specific to the product, the customers, and the type of relationship we have with them, all of which make the argument not as unreasonable as it may sound when taken out of context. Even in light of these factors, the argument still deserves to be blown out of the water, but I will not be bothering the reader with the specific situation of this company, so as to ensure that the discussion is applicable to software development in general.



In its more complete form, the argument may go like this:

(Useful pre-reading: About these papers

Automated Software Testing represents a big investment for the company, where all the programmers in the house are spending copious amounts of time doing nothing but writing software tests, but these tests do not yield any visible benefit to the customers. Instead, the programmers should ensure that the software works by spending only a fraction of that time doing manual testing, and then we can take all the time that we save this way and invest it in developing new functionality and fixing existing issues. 
To put it more concisely, someone might say something along these lines:
I do not see the business value in Automated Software Testing.
This statement is a bunch of myths rolled up into an admirably terse statement. It is so disarmingly simple, that for a moment you might be at loss of how to respond. Where to begin, really. We need to look at the myths one by one. Here it goes:

Myth #1: Software testing represents a big investment.


No it doesn't. Or maybe it does, but its ROI is so high that you absolutely don't want to miss it.

If you do not have software testing in place, then it is an established fact in our industry that you will end up spending an inordinate amount of time researching unexpected application behavior, troubleshooting code to explain the observed behavior, discovering bugs, fixing them, and often repeating this process a few times on each incident because the fix for one bug often creates another bug, or causes pre-existing bugs to manifest, often with the embarrassment of an intervening round-trip to the customer, because the "fixed" software was released before the newly introduced bugs were discovered.

Really, it works the same way as education. To quote a famous bumper sticker:
You think education is expensive? Try ignorance!
Furthermore, your choice of Manual Software Testing vs. Automated Software Testing has a significant impact on the development effort required after the testing, to fix the issues that the testing discovers.  It is a well established fact in the industry that the sooner a bug is discovered, the less it costs to fix it.

  • The earliest time possible for fixing a mistake is when making it. That's why we use strongly typed programming languages, together with Integrated Development Environments that continuously compile our code as we are typing it: this way, any syntax error or type violation is immediately flagged by the IDE with a red underline, so we can see it and fix it before proceeding to type the next line of code. The cost of fixing that bug is near zero. (And one of the main reasons why virtually all scripting languages are absolutely horrible is that in those languages, even a typo can go undetected and become a bug.) 
  • If you can't catch a bug at the moment you are introducing it, the next best time to catch it is when running automated tests, which is what you are supposed to do before committing your changes to the source code repository. If that doesn't happen, then the bug will be committed, and this already represents a considerable cost that you will have to pay later for fixing it. 
  • The next best time to catch the bug is by running automated tests as part of the Continuous Build System. This will at least tell you that the most recent commit contained a bug.  If there is no Continuous Build with Automated Software Testing in place, then you suffer another steep increase in the price that you will have to pay for eventually fixing the bug.
  • By the time a human being gets around to manually testing the software and discovering the bug, many more commits may have been made to the source code repository. This means that by the time the bug is discovered, we will not necessarily know which commits contributed to it, nor which programmers made the relevant commits, and even if we do, they will at that moment be working on something else, which they will have to temporarily drop, and make an often painful mental context switch back to the task that they were working on earlier. Naturally, the more days pass between committing a bug and starting to fix it, the worse it gets.
  • At the extreme, consider trying to fix a bug months after it was introduced, when nobody knows anything about the changes that caused it, and the programmer who made those changes is not even with the company anymore. Someone has to become intimately familiar with that module in order to troubleshoot the problem, consider dozens of different commits that may have contributed to the bug, find it, and fix it. The cost of fixing that bug may amount to more than a programmer's monthly salary.

This is why the entire software industry today literally swears in the name of testing: it helps to catch bugs as early as possible, and to keep the development workflow uninterrupted, so it ends up saving huge amounts of money.

Myth #2: Software testing represents an investment.


No, it does not even. Software testing is regarded by our industry as an integral part of software development, so it is meaningless to examine it as an investment separate from the already-recognized-as-necessary investment of developing the software in the first place.

Beware of the invalid line of reasoning which says that in order to implement a certain piece of functionality all we need is 10 lines of production code which cost 100 bucks, whereas an additional 10 lines, that would only be testing the first 10 lines, and would cost an extra 100 bucks, are optional.

Instead, the valid reasoning is that in order to implement said functionality we will need 20 lines of code, which will cost 200 bucks. It just so happens that 10 of these lines will reside in a subfolder of the source code tree called "production", while the other 10 lines will reside in a subfolder of the same tree called "testing"; however, the precise location of each group of lines is a trivial technicality, bearing no relation whatsoever to any notion of "usefulness" of one group of lines versus the other. The fact is that all 20 of those lines of code are essential in order to accomplish the desired result.

That's because production code without corresponding testing code cannot be said with any certainty to be implementing any functionality at all. The only thing that can be said about testless code is that it has so far been successful at creating the impression to human observers that its behavior sufficiently resembles some desired functionality. Furthermore, it can only be said to be successful to the extent that it has been observed thus far, meaning that a new observation tomorrow might very well find that it is doing something different.

That's a far cry from saying that "this software does in fact implement that functionality".

Myth #3: Software testing is just sloppiness management.


This is usually not voiced, but implied.  So, why can't programmers write correct software the first time around?  And why on earth can't software just stay correct once written?

There is a number of reasons for this, the most important ones have to do with the level of maturity of the software engineering discipline, and the complexity of the software that we are being asked to develop.

Maturity


Software development is not a hard science like physics and math.  There exist some purely scientific concepts that you learn in the university, but they are rarely applicable to the every day reality of our work. When it comes to developing software, there is not as much help available to us as there is to other disciplines by means of universal laws, fundamental axioms, established common practices and rules, ubiquitous notations, books of formulas and procedures, ready made commercially available standardized components to build with, etc. It is difficult to even find parallels to draw for basic concepts of science and technology such as experimentation, measurement, and reproducibility. That's why software engineering is sometimes characterized as being more of an art than a science, and the fact that anyone can potentially become a programmer without necessarily having studied software engineering does not help to dispel this characterization.

Automated Software Testing is one of those developments in software engineering that make it more like a science than like an art. With testing we have finally managed to introduce the concepts of experimentation, measurement, and reproducibility in software engineering. Whether testability alone is enough to turn our discipline into a science is debatable, but without testing we can be certain that we are doing nothing but art.

Complexity


The software systems that we develop today are immensely complex. A simple application which presents a user with just 4 successive yes/no choices has 16 different execution paths that must be tested. Increase the number of choices to 7, and the number of paths skyrockets to 128. Take a slightly longer but entirely realistic use case sequence of a real world application consisting of 20 steps, and the total number of paths exceeds one million. That's an awful lot of complexity, and so far we have only been considering yes/no choices. Now imagine each step consisting of not just a yes/no choice, but an entire screen full of clickable buttons and editable fields which are interacting with each other. This is not an extreme scenario, it is a rather commonplace situation, and its complexity is of truly astronomical proportions.

Interestingly enough, hardware engineers like to off-load complexity management to the software. Long gone are the times when machines consisted entirely of hardware, with levers and gears and belts and cams all carefully aligned to work in unison, so that turning a crank at one end would cause printed and folded newspapers to come out the other end. Nowadays, the components of the hardware tend to not interact with each other, because that would be too complex and too difficult to change; instead, every single sensor and every single actuator is connected to a central panel, from which software takes charge and orchestrates the whole thing.

However, software is not a magical place where complexity just vanishes; you cannot expect to provide software with complex inputs, expect complex outputs, and at the same time expect the insides of it to be nothing but purity and simplicity: a system cannot have less complexity than the complexity inherent in the function that it performs.

The value of moving the complexity from the hardware to the software is that the system is then easier to change, but when we say "easier" we do not mean "simpler"; all of the complexity is still there and must be dealt with. What we mean when we say "easier to change" is that in order to make a change we do not have to begin by sending new blueprints to the steel foundry. That's what that you gain by moving complexity from the hardware to the software: being able to change the system without messy, time-consuming, and costly interactions with the physical world.

So, even though we have eliminated those precisely crafted and carefully arranged levers and gears and belts and cams, their counterparts now exist in the software, you just do not see them, you have no way of seeing them unless you are a programmer, and just as the slightest modification to a physical machine of such complexity would be a strenuous ordeal, so is the slightest modification to a software system of similar complexity a strenuous ordeal.

Software can only handle complexity if done right. You cannot develop complex software without sophisticated automated software testing in place, and even if you develop it, you cannot make any assumptions whatsoever about its correctness. Furthermore, even if it appears to be working correctly, you cannot make the slightest change to it unless automated software testing is in place to determine that it is still working correctly after the change. That is because you simply cannot test thousands or millions of possible execution paths in any way other than in an automated way.

Myth #4: Testing has no visible benefit to the customers


Yes it does. It is called reliable, consistent, correctly working software. It is also called software which is continuously improving instead of remaining stagnant due to fear of it breaking if sneezed at. It is also called receiving newly introduced features without losing old features that used to work but are now broken. And it is even called receiving an update as soon as it has been introduced instead of having to wait until some poor souls have clicked through the entire application over the course of several days to make sure everything still works as it used to.

Myth #5: Manual testing can ensure that the software works.


No it cannot. That's because the complexity of the software is usually far greater than what you could ever possibly hope to test by hand. An interactive application is not like a piece of fabric, which you can visually inspect and have a fair amount of certainty that it has no defects. You are going to need to interact with the software, in a mind-boggling number of different ways, to test for a mind-boggling number of possible failure modes.

When we do manual testing, in order to save time (and our sanity) we focus only on the subset of the functionality of the software which may have been affected by recent changes that have been made to the source code.  However, the choice of which subsets to test is necessarily based on our estimations and assumptions about what parts of the program may have been affected by our modifications, and also on guesses about the ways in which these parts could behave if adversely affected. Alas, these estimations, assumptions, and guesses are notoriously unreliable: it is usually the parts of the software that nobody expected to break that in fact break, and even the suspected parts sometimes break in ways quite different from what anyone had expected and planned to test for.

And this is by definition so, because all the failure modes that we can easily foresee, based on the modifications that we make, we usually examine ourselves before even calling the modifications complete and committing our code.

Furthermore, it is widely understood in our industry that persons involved in the development of software are generally unsuitable for testing it.  No developer ever uses the software with as much recklessness and capriciousness as a user will. It is as if the programmer's hand has a mind of its own, and avoids sending the mouse pointer in bad areas of the screen, whereas that is precisely where the user's hand is guaranteed to send it. It is as if the programmer's finger will never press that mouse button down as heavily as the user's finger will. Even dedicated testers start behaving like the programmers after a while on the job, because it is only human to employ acquired knowledge about the environment in navigating about the environment, and to re-use established known good paths. It is in our nature. You can ask people to do something which is against their nature, and they may earnestly agree, and they may even try their best, but the results are still guaranteed to suffer.

Then there is repetitive motion fatigue, both of the physical and the mental kind, that severely limit the scope that any kind of manual testing will ever have.

Finally, there is the issue of efficiency. When we do manual software testing, we are necessarily doing it in human time, which is excruciatingly slow compared to the speed at which a computer would carry out the same task. A human being testing permutations at the rate of one click per second could theoretically test one million permutations in no less than 2 working months, the computer may do it in a matter of minutes. And the computer will do this perfectly, while the most capable human being will do this quite sloppily in comparison. That's how inefficient manual software testing is.

Myth #6: Manual testing takes less time than writing tests.


No it doesn't. If you want to say that you are actually doing some manual testing worth speaking of, and not a joke of it, then you will have to spend copious amounts of time doing nothing but that, and you will have to keep repeating it all over again every single time the software is modified.

In contrast, with software testing you are spending some time up-front building some test suites, which you will then be able to re-execute every time you need them, with comparatively small additional effort.  So, manual testing for a certain piece of software is an effort that you have to keep repeating, while writing automated test suites for that same piece of software is something that you do once and from that moment on it keeps paying dividends.

This is why it is a fallacy to say that we will just test the software manually and with the time that we will save we will implement more functionality: as soon as you add a tiny bit of new functionality, you have to repeat the testing all over again.  Testing the software manually is a never ending story.

The situation is a lot like renting vs. buying: with renting, at the end of each month you are at exactly the same situation as you were in the beginning of the month: the home still belongs in its entirety not to you, but to the landlord, and you must now pay a new rent in full, in order to stay for one more month. With buying, you pay a lot of money up front, and some maintenance costs and taxes will always be applicable, but the money that you pay goes into something tangible, it is turned into value in your hands in the form of a home that you now own.

Furthermore, the relative efficiency of manual testing is usually severely underestimated. In order to do proper manual testing, you have to come up with a meticulous test plan, explaining what the tester is supposed to do, and what the result of each action should be, so that the tester can tell whether the software is behaving according to the requirements or not.  However, no test plan will ever be as unambiguous as a piece of code that is actually performing the same test, and the more meticulous you try to be with the test plan, the less you gain, because there comes a point where the effort of writing the test plan starts being comparable to the effort of writing the corresponding automated test instead. So, you might as well write the test plan down in code to begin with.

Of course one round of writing automated software testing suites will always represent more effort than a few rounds of manually performing the same tests, so the desirability of one approach vs. the other may depend on where you imagine the break-even point to be. If you reckon that the break-even point is fairly soon, then you already see the benefit of implementing automated software testing as soon as possible. If you imagine it will be after the IPO, then you might think it is better to defer it, but actually, even in this case you might not want to go this way, more about that later.

Well, let me tell you: in the software industry the established understanding is that the break-even point is extremely soon.  Like write-the-tests-before-the-app soon.  (A practice known as Test-Driven Development.)

Myth #7: You can keep developing new functionality and fixing existing issues without software testing in place.


In theory you could, but in practice you can't. That's because every time you touch the slightest part of the software, everything about the software is now potentially broken. Without automated software testing in place, you just don't know. This is especially true of software which has been written messily, which is in turn especially common in software which has been written without any Automated Software Testing in place from the beginning.  Paradoxically enough, automated software testing forces software designs to have some structure, this structure reduces failures, so then the software has lesser testing needs.

To help lessen change-induced software fragility, we even have a special procedure governing how we fix bugs: when a bug is discovered, we do not always just go ahead and fix it. Instead, what we often do is that we first write a test which checks for the bug according to the requirements, without making any assumptions as to what might be causing it. Of course, since the bug is in the software, the test will initially be observed to fail. Then, we fix the bug according to your theory as to what is causing it, and we should see that test succeeding. If it doesn’t, then we fixed the wrong bug, or more likely, we just broke something which used to be fine. Furthermore, all other tests better also keep succeeding, otherwise in fixing this bug we broke something else. As a bonus, the new test now becomes a permanent part of the suite of tests, so if this particular behavior is broken again in the future, this test will catch it.

If you go around "fixing bugs" without testing mechanisms such as this in place, you are not really fixing bugs, you are just shuffling bugs around. The same applies to features: if you go around "adding features" without the necessary testing mechanisms in place, then by definition you are not adding features, you are adding bugs.

Myth #8: Software testing has no business value


Yes it does. The arguments that I have already listed should be making it clear that it does, but let me provide one more argument, which shows how Automated Software Testing directly equates to business value.

A potentially important factor for virtually any kind of business is investment. When an investor is interested in a software business, and if they have the slightest clue as to what it is that they are doing, they are likely to want to evaluate the source code before committing to the investment. Evaluation is done by sending a copy of the software project to an independent professional software evaluator. The evaluator examines the software and responds with investment advice.

The evaluator may begin by using the software as a regular user to ensure that it appears to do what it is purported to do, then they may examine the design to make sure it makes sense, then they may examine the source code to make sure things look normal, etc. After spending not too much time on these tasks, the evaluator is likely to proceed to the tests. Software testing is so prevalent in the software industry, that it is unanimously considered to be the single most important factor determining the quality of the software.

If there are no tests, this is very bad news for the investment advice.

If the tests do not pass, this is also very bad news.

If the tests succeed, then the next question is how thorough they are.

For that, the evaluator is likely to use a tool called "Code Coverage Analyzer". This tool keeps track of the lines of code that are being executed as the program is running, or, more likely, as the program is being exercised by the tests. By running the tests while the code coverage analysis tool is active, the evaluator will thus obtain the code coverage metric of the software. This is just a single number, from 0 to 100, and it is the percentage of the total number of source code lines that have been exercised by the tests. The more thorough the tests are, the higher this number will be.

This is a very useful metric, because in a single number it captures an objective, highly important quality metric for the entirety of the software system. It also tends to highly correlate to the actual investment advice that the evaluator will end up giving. The exact numbers may vary depending on the product, the evaluator, the investor, the investment, and other circumstances, but a rough breakdown is as follows:
  • below 50% means "run in the opposite direction, this is as good as Ebola." 
  • 50-60% means "poor", 
  • 60-70% means "decent",
  • 70-80% means "good", 
  • 80-90% means "excellent",
  • 90-100% means "exceptional".
Of course, the graph of programming effort required vs. code coverage achieved is highly non-linear. It is relatively easy to pass the 45% mark; it becomes more and more difficult as you go past the 65% mark; it becomes exceedingly difficult once you cross the 85% mark.

In my experience and understanding, conscientious software houses in the general commercial software business are striving for the 75% mark. In places where they only achieve about 65% code coverage they consider it acceptable but at the same time they either know that they could be doing better, or they have low self-respect. High criticality software (that human life depends on, or a nation's reputation,) may have 100% coverage, but a tremendous effort is required to achieve this.  In any case, what matters is not so much what the developers think, but what the evaluator thinks; and evaluators tend to use the established practices of the industry as the standard by which they judge.  The established practices call for extensive software testing, so if you do not do that, then your evaluation is not going to look good.

So, is there business value in software testing? investment prospects alone say yes, regardless of the technical merits of it. Furthermore, software evaluation may likely be part of the necessary preparations for an IPO to take place, so even if you imagined the break-even point of automated testing vs. manual testing to be after the IPO, there is still ample reason to have them all in perfect working order well before the IPO.

The above is applicable for businesses that are exclusively into software development. I do not know to what degree parallelisms can be drawn with companies for which software is somewhat secondary, but I suspect it is to no small extent.

2 comments:

  1. Yo Mike! Big kudos for the great article!

    (I am sure you know all that I want to mention in the next paragraphs, I just need to vent my thoughts and feelings after reading; hope it is thought-provoking, because thinking === GREATER GOOD)

    | ... so it ends up saving huge amounts of money.

    This is the good reason, but not the BEST (which I will mention below). Be like my old team when one day PM told that single calling code equals single country (+1 CC?), and someone wrote a component relying on this "well-known fact". After half year we randomly found why some phone numbers were messed up. Ironically, the harm that was done to our company - zero bucks, we haven't lost anything due to this bug. We were b2b company that signed up other companies on board and thus really cared about having more sales and signed contracts rather than a good product.

    Tests are useless waste of time for company that is sales driven. Am I right? Or not so?

    The BEST thing for writing tests is that it documents expectations on code level (not biz, but for us, devs). If you ever need to fix something done by some random dude who now moved to Arctica, test is a good guidance (ofc, if that person wrote a good test and not some mocked up from top to bottom monster).

    A quick thought about code reviews. There is the BELIEF that reviews prevent bad code (poorly written tests including). In fact, I have never seen in my career teams where code reviews were helpful (but I was in a great team without no code reviews and permit to fix random places during the development - we trusted each other and cared about well being). If you have a good tech lead, but unsure about the rest of the team, for the God's sake, let tech lead be the only person who reviews code. By not doing so, if there is less than 51% of team are competent developers, you end up with political circus (been there, seen it, friends get LGTM for $hitcode, foes get comments like "change space, change quote, move comment to next line"; so... you end up making situational friends ;) - needless to say, what happens to code base).

    | ... every time you touch the slightest part of the software, everything about the software is now potentially broken.

    Not mentioning that since you are not a solo developer, you end up with merge conflicts due to other people work (even logical, e.g. Country class starts using 2-letter codes instead of 3-letter and uses same old good String for input of constructor - well, hope that your buddy added invariant to the class constructor). Tests synchronize decision making process, they autofix logical bugs between you and buddy.

    ReplyDelete

  2. | ... in a single number it captures an objective, highly important quality metric for the entirety of the software system.

    I wished to find in an article more about spec tests (BDD, Gherkin). Code lines coverage is not always applicable, and even in the case of unit tests where it is applicable, branch+predicate coverage is as relevant as ever.

    So, devs end up with a need to convert use cases to autotests. I had a great Product Owner (C++ dev in past), who was writing Gherkin scripts inside Jira tickets xD, and team needed to just connect actions to words - then voila! we have autotests for use cases that a user encounters.

    The excuse - it is difficult to setup runner for specs. The solution - fire knaves, hire pros! :D

    | Software testing has no business value

    This is THE PLAGUE of modern software engineering - business decides how programmers should do their work. Moreover there is the BELIEF that writing bug-free code is easy. In the end of the day, software rot trumps all business decisions and team ends up with polluted unsupportable code. This is the one single reason why now I don't even consider job offers to random teams that have already 2-3 years old software - just too high risk to end up with already non-fixable $hitcode (was there, seen undocumented SQL scripts with 5000 LOC and zero documentation - never again).

    I wish everyone to end up sooner or later in a team with good practices and low stress! Stay good!

    ReplyDelete