2019-12-01

On Software Testing

At my current workplace I was recently presented with the following dumbfounding argument:
Since we can always test our software by hand, we do not need to implement software testing.
In its more complete form, the argument goes like this:
Software testing represents a big investment for the company, where all the programmers in the house will be doing nothing but writing software tests for months, but these tests will not yield any visible benefit to the customers. Instead, the programmers should ensure that the software works by spending only a fraction of that time doing manual testing, and then we can take all the remaining time that we save this way and invest it in developing new functionality and fixing existing issues. 

To put it more concisely:
Software testing does not seem to have business value.
At the time that I was hearing this argument it sounded disarmingly self evident and beautiful in its simplicity, but of course anyone who has been anywhere even near the software industry in the last 20 years or so knows that it is patently false from pretty much every single angle that you might choose to look at it, so I now need to explain why.



This could have been a piece of private correspondence between me and the executive who put forth the proposition, but I figured that others out there might have stumbled upon the same concern, hence this public writing.

Mind you, the subject of this company is not primarily software, and the executive who said this is not a software engineer, so the fact that such words have actually been uttered is not entirely as preposterous as it sounds. Furthermore, there are other mitigating factors which are specific to this particular company, such as the fact that we are only catering to a handful of customers whom we know by first name so to speak, and who are very forgiving because they understand that what they have in their hands is kind of a prototype, etc. So, the notion that we could conceivably do without software testing is not as unreasonable as it may sound. But it is still unreasonable. And here is why.

Myth #1: Software testing represents a big investment.


No it doesn't. Or maybe it does, but its ROI is so high you don't want to miss it.

If you do not have software testing in place then it is an established fact in our industry that you will end up spending an inordinate amount of time researching unexpected application behavior, troubleshooting code to explain the observed behavior, discovering bugs, fixing them, and often repeating this process a few times on each incident because the fix for one bug often creates another bug, or causes another pre-existing bug to manifest, usually with the embarrassment of an intervening round-trip to the customer, because the "fixed" software was released before the newly introduced bugs were discovered.

Really, it works the same way as education. To quote a famous bumper sticker:
You think education is expensive? Try ignorance!

Myth #2: Software testing represents an investment.


No, it does not even. Software testing is an integral part of software development, so it is meaningless to examine it as an investment separate from the already-recognized-as-necessary investment of developing the software in the first place.

Beware of the invalid line of reasoning which says that in order to implement a certain piece of functionality all we need is 10 lines of production code, which cost 100 bucks, whereas an additional 10 lines that would cost an extra 100 bucks and would only test the first 10 lines are optional.

Instead, the valid reasoning is that in order to implement said functionality we will need 20 lines of code, which will cost 200 bucks.

It just so happens that 10 of these lines will reside in a subfolder of the source code tree called "production", while the other 10 lines will reside in a subfolder of the same tree called "testing"; however, the precise location of each group of lines is a trivial technicality, bearing no relation whatsoever to any notion of "usefulness" of one group of lines versus the other. The fact is that all 20 of those lines of code are essential in order to accomplish the desired result.

That's because production code without corresponding testing code cannot be said with any certainty to be implementing any functionality at all. The only thing that can be said about testless code is that it has so far been successful at creating the impression to human observers that its behavior sufficiently resembles some desired functionality. Furthermore, it can only be said to be successful at that to the extent that it has been observed thus far, meaning that a new observation tomorrow might very well find that it is doing something completely different.

That's a far cry from saying that "this software does in fact implement that functionality".

Myth #3: Software testing is just slopiness management.


This is usually not voiced, but implied.  So, why can't programmers write correct software the fist time around?  And why on earth can't software just stay correct once written?

There is a number of reasons for this, the most important ones have to do with the level of maturity of the software engineering discipline, and the complexity of the software that we are asked to develop.

Maturity


Software development is not a hard science like physics and math.  There are a few purely scientific concepts that you learn in the university, but they are very rarely applicable to the every day reality of our work. When it comes to developing software, there is not as much help available to us as there is to other disciplines by means of known universal laws, fundamental axioms, established common practices and rules, ubiquitous notations, books of formulas and procedures, ready made commercially available standardized components, etc. It is difficult to even find parallels to draw for fundamental concepts of science and technology such as experimentation, measurement, and reproducibility. That's why software engineering is sometimes characterized as being more of an art than a science, and the fact that anyone can potentially become a programmer without necessarily having studied software engineering does not help to dispel this characterization.

Software testing is one of those developments in software engineering that make it more like a science than like an art. With software testing we have finally managed to introduce the concepts of experimentation, measurement, and reproducibility in software engineering. Whether testability alone is enough to turn our discipline into a science is debatable, but without testing we can be certain that we are doing nothing but art.

Complexity


The software systems that we develop today are immensely complex. A simple application which presents a user with just 4 successive yes/no choices has 16 different paths that must be tested. Increase the number of choices to 8, and the number of different paths skyrockets to 256. Take a slightly longer but entirely realistic use case sequence of an application consisting of 20 steps, and the total number of paths exceeds one million. That's an awful lot of complexity, and so far we have only been considering yes/no choices. Now imagine each step consisting of not just a yes/no choice, but an entire screen full of clickable buttons and editable fields.

It is true that software is ideal for handling complexity. That's why hardware engineers like to off-load complexity management to software. Long gone are the times when machines consisted entirely of hardware, with levers and gears carefully designed and crafted and working in unison so that turning a crank in one end would cause printed and folded newspapers to come out the other end. Nowadays, the components of the hardware tend to not interact with one another, because that would be too complex; instead, every single sensor and every single actuator is connected to a central panel, from which software takes charge and orchestrates the whole thing.

But software can only handle complexity if done right. You cannot develop complex software, and even if you develop it, you cannot make any assumptions whatsoever about its correctness, without sophisticated automated software testing in place. That is because you simply cannot test thousands or millions of possible execution paths in any way other than in an automated way.

Myth #4: Testing has no visible benefit to the customers


Yes it does. It is called reliable, consistent, correctly working software. It is also called software which is continuously improving instead of remaining stagnant due to fear of it breaking if touched. It is also called receiving newly introduced features without losing old features that used to work but are now broken. And it is even called receiving an update as soon as it has been introduced instead of having to wait until some poor souls have mindlessly clicked through the entire application for several days to make sure everything still works as it used to.

Myth #5: Manual testing can ensure that the software works.


No it cannot. That's because the complexity of the software is usually far greater than what you could ever possibly hope to test by hand.  An interactive application is not like a piece of fabric, which you can visually inspect and have a fair amount of certainty that it has no defects. You are going to need to interact with the software, in a mind-boggling number of different ways, to test for a mind-boggling number of possible failure modes.

When we do manual testing, in order to save time (and our sanity) we focus only on the subset of the functionality of the software which may have been affected by any recent changes that have been made to the source code.  Even for small subsets of functionality, the number of possible failure modes is still far greater than what fits in the short-term memory of humans, so we resort to preparing test plans and meticulously following them.  However, these test plans are necessarily based on our estimations and assumptions about what parts of the program may have been affected by our modifications, and also on guesses about the ways in which these parts could behave if adversely affected. Alas, these estimations, assumptions, and guesses are notoriously unreliable: it is usually the parts of the software that nobody expected to break that in fact break, and even the suspected parts sometimes break in ways quite different from what anyone had expected and planned to test for. And this is by definition so, because all the failure modes that we can foresee we usually examine before even calling the modifications complete.

This should come as no surprise, because preparing a test plan based on what has changed is an art in and by itself, and subsequently it is notoriously error-prone. And of course no testing will ever be any better than the test plan that it followed.

Furthermore, it is widely understood in the industry that persons involved in the development of software are generally unsuitable for testing it.  No developer ever uses the software with as much capriciousness and recklessness as a user will. It is as if the programmer's hands just have a mind of their own, and avoid sending the mouse pointer there where the user's hands will send it. It is as if the programmer's finger will never click the mouse button as heavily as the user's finger will. Even dedicated testers start behaving like the programmers after a while on the job, because it is only human to employ acquired knowledge about the environment in navigating about the environment, and to re-use established known good paths. It is in our nature. You can ask people to do something which is against their nature, and they may agree, and they may even try their best, but the results are still guaranteed to suffer.

Then there is repetitive motion fatigue, both of the physical and the mental kind, that severely limit the scope that any kind of manual testing will ever have.

Finally, there is the issue of efficiency. When we do manual software testing, we are necessarily doing it in human time, which is excruciatingly slow compared to the speed at which a computer would carry out such a repetitive task.  A human being testing permutations at the rate of one click per second will test one million permutations in about 2 working months, the computer will do it in a few seconds. And the computer will do this perfectly, while the most capable human being will do this quite sloppily in comparison. That's how inefficient manual software testing is.

Myth #6: Manual testing takes less time than writing tests.


No it doesn't. If you want to say that you are actually doing some manual testing worth speaking of, and not a joke of it, then you will have to spend copious amounts of time doing nothing but that, and you will have to keep repeating it every single time the software is modified.

In contrast, with software testing you are spending some time up-front building some test suites, which you will then be able to re-execute every time you need them, with comparatively small additional effort.

It is a lot like renting vs. buying: with renting, at the end of each month you are at exactly the same situation as you were in the beginning of the month: the home still belongs in its entirety to the landlord, not to you, and you must now pay a new rent in full, in order to stay for one more month. With buying, you pay a lot of money up front, and you keep paying some money every month for an additional while, but all this money goes into something tangible instead of the pockets of the landlord. Some maintenance costs and taxes are still applicable, but all the money that you have paid towards your residence is value in your hands in the form of a home that you now own.

Of course the desirability of automated software testing vs. manual testing depends on where you imagine the break-even point to be. If you reckon it will be fairly soon, then it might be worth it. If you reckon it will be after the IPO, then it might not be worth it. (Actually, it is still more than worth it, more about that further down.)

Well, let me tell you: in the software industry the established understanding is that the break-even point is EXTREMELY soon.  Like WRITE-THE-TESTS-BEFORE-THE-SOFTWARE soon.  (A practice known as Test-Driven Development.)

Myth #7: You can keep developing new functionality and fixing existing issues without software testing in place.


In theory you could, but in practice you can't. That's because every time you touch the slightest part of the software, everything about the software can now potentially be broken. This is especially true of software which has been written messily, which is in turn especially common among software which has been written without any automated software testing in place from the beginning.

This fact is so universally accepted in the software industry, that we even have a special procedure governing how we fix bugs: when a bug is discovered, we do not just go ahead and fix it. Afterall, what does it mean to discover a bug? It means to a) establish that a certain behavior of the application is not in accordance with the requirements, and to then b) contrive a theory as to exactly what it is in the code that causes the observed behavior. But it is just a theory: it could be right, it could be wrong. Furthermore, what does it mean to fix a bug? It means to make the necessary changes in the code so that the undesired behavior is not there any more, while all other behavior which was desired remains unaffected. This has severe implications on how you can go about fixing a bug, because in the vast majority of cases, fixing a bug is not as simple as just fixing it: the trick is to not break anything else.

So, the right approach to fixing a bug is to first write a test which tests for the bug according to the requirements, without making any assumptions as to what causes the bug. Of course, since the bug is in the software, the test will initially be observed to fail. Then, you go fix the bug, according to your theory as to what is causing it, and you should see this test succeeding. If it doesn't, then you fixed the wrong bug, or more likely, you just broke something which used to be fine. Furthermore, all other tests better also keep succeeding, otherwise in fixing this bug you broke something else. As a bonus, the new test now becomes a permanent part of the suite of tests, so if this particular part of the software is ever broken in the future, this test might catch it.

If you go around "fixing bugs" without testing mechanisms such as this in place, you are not really fixing bugs, you are just shuffling bugs around.  The same applies to features: if you go around "adding features" without the necessary testing mechanisms in place, then by definition you are not adding features, you are adding bugs.

Myth #8: Software testing has no business value


Yes it does. A number of obvious reasons have already been listed, but let me mention one more.

A potentially important factor for business is investment. When an investor is interested in a software business, and if they know what it is that they are doing, they are likely to want to evaluate the source code before committing to the investment. Evaluation is done by sending a copy of the source code to a professional software evaluator. The evaluator examines the software and responds with investment advice. This evaluation procedure can also very likely be part of the necessary preparations for an IPO to take place.

The evaluator usually begins by using the software as a regular user to ensure that it appears to do what it is purported to do, then they examine the source code to make sure that it does not look like it was written by monkeys, then they look at design choices to ensure that in places where standard practices and procedures are applicable, they have indeed been followed, etc. After spending not too much time on these tasks, the evaluator tends to proceed to the tests.

If there are no tests, this is very bad news for the investment advice. If the tests do not pass, this is also very bad news. If the tests succeed, then the next question is how thorough they are.

For that, the evaluator is likely to use a tool called "Code Coverage Analyzer". This tool keeps track of the lines of code that are being executed as the program is running, or, more likely, as the program is being exercised by the tests. By running the tests while using the code coverage analysis tool, the evaluator will thus obtain the code coverage metric of the software. This is just a single number, from 0 to 100, and it is the percentage of the total number of source code lines that have been exercised by the tests. Obviously, the more thorough the tests are, the higher this number will be.

This is a very useful metric, because in a single number it captures an objective, highly important quality metric for the entirety of the software system. It also tends to have a very direct correlation with the actual investment advice that the evaluator will give.  Of course the exact numbers may vary depending on the evaluator, and the investor, and other mitigating circumstances, but a rough breakdown is as follows:

  • below 50% means "run in the opposite direction, this is as good as Ebola." 
  • 50-60% means "poor", 
  • 60-70% means "decent",
  • 70-80% means "good", 
  • 80-90% means "excellent",
  • 90-100% means "exceptional".

Of course, the graph of code coverage vs. effort is highly non-linear. It is relatively easy to pass the 45% mark; it becomes more and more difficult as you go past the 65% mark; it becomes extremely difficult once you pass the 85% mark.

In my experience and understanding, conscientious software houses in the general commercial software business are striving for the 75% mark. In places where they only achieve about 65% code coverage they consider it acceptable but at the same time they either know that they could be doing better, or they have low self-respect. High criticality software (when human life depends on it, or a nation's reputation,) may have 100% coverage, but the effort required to achieve this is inconceivable.  In any case, it does not matter so much what the developers think; what matters is what the evaluator thinks; and from what I know evaluators do tend to care about this metric a lot.

So, is there business value in software testing? if you are looking for investment, or if you are planning to go for an IPO, then almost certainly.

The above is applicable for businesses that are exclusively into software development. I do not know to what degree parallelisms can be drawn with companies for which software is somewhat secondary, but I suspect it is to no small extent.

No comments:

Post a Comment