Perhaps the first question we encounter when trying to figure out how to a test a component is:
This question can be broken down into two (only slightly) simpler questions:
This paper is an introduction to the problem of choosing the right test cases.
A Test Case is a script, program, or other mechanism that exercises a software component to ascertain that a specific correctness assertion is true. In general, it creates a specified initial state, invokes the tested component in a specified way, observes its behavior, and checks to ensure that the behavior was correct. Different assertions (or variations on a single assertion) are likely to be tested by different test cases.
Test Cases are usually organized into Test Suites. A Test Suite is a collection of related Test Cases that is likely to be run as a whole. They are usually grouped together because, taken as a whole, they testify to the correctness of a particular component (or a particular aspect of its functionality). Different suites might exercise different components or different types of functionality. It is also common for all of the test-cases in a test suite to be written for and execute under a single test execution framework. These are discussed in another note on Testing Harnesses.
Test cases (and suites of test cases) can be characterized on the basis of the types of questions they try to answer. They often fall into a few broad categories:
As a matter of practice, these test cases are usually added to the functional or error handling suites. We also use the term "regression testing" to describe the regular execution of these suites (most of whose test cases were not written in response to bugs). We call this regression testing because we are re-running tests we have previously passed to ensure that we have not broken anything.
By adding randomly injected error situations to a load test, we turn it into a stress test that observes how:
There are several characteristics that a good test case should have:
The test behavior tested by the test case should be a direct measure of the correctness of the program.
There is no point in writing test cases that assess behavior whose relation to overall program correctness are not clearly understood. Only test things we care about, and for which correct behavior is clearly defined.
Note that the ability to measure the correctness of a component may be greatly affected by the design of that component. These issues are introduced in another note on Software Testability.
The test case should yield a simple "yes" or "no" assessment of the program's correctness.
If it takes a human being to study complex output and determine whether or not it is (as a whole) correct:
The test case should always yield the same results when run (in the same environment) on the same product. Otherwise the result would not be dispositive.
Most programs do not contain non-deterministic elements, and variable results are evidence of a problem. There are situations where determinism may not be achievable:
Here, it is common to specify that such a suite should run, without failure, for some number of hours (or days, months, etc).
Ideally, we should strive to test each possible ordering (and if time windows are a factor, relative timings). Hopefully, our results (for any given ordering and timing) will be deterministic.
Some test suites may take a relatively long time to run. There may be situations where we only want to run a few of the test cases in a suite (specifically to exercise some recently changed code). Such testing is much more convenient if it is possible to choose and run only specified test cases.
This implies that each test case must completely establish the initial conditions it requires. If test case 52 depended on conditions established by test cases 1-51:
It should be possible to run any test case, or test suite with a single command or script.
If it takes a complex process to run test cases, or test suites:
The test case (or suite) should include all of the tools and data (that are not normally present on every system) that are necessary to exercise the tested component.
If the test cases do not include the needed tools and data:
All lines of code are not equally likely to be buggy. Numerous studies have found that the Pareto principle also holds for the distribution of bugs over modules ... that 20% of the modules account for 80% of the bugs. In many systems the distribution is even more radical, with 5% of modules accounting for 50% of all bugs. There is no great mystery behind this distribution: Bugs are more likely to be found in subtle and complex code, while most modules and routines do relatively obvious and simple things.
If the distribution of bugs among modules is not uniform, it would seem foolish to allocate testing effort equally to all modules. Quite to the contrary, we would like to allocate our testing effort in direct proportion to the risk ... if only we had some way of assessing that risk. Every problem is unique, but there are several factors that seem to be highly correlated with bug risk:
While it is difficult to create a comprehensive or universal list, the simple fact is that designers and developers have a good sense of which modules are pretty obviously correct, and "where be the monsters". We can simply ask these people to rate each module (as High, Medium, or Low) on the basis of:
The resulting ladder will tell us where we need to invest the lions' share of our testing efforts.
Black Box testing is a test case identification approach that says that all test cases should be based on the component specifications. Viewed in historical perspective, this philosophy makes a great deal of sense:
Since the components were designed to meet the specifications, it makes sense to write test cases to the assertions in the specifications.
Any test case that was not based on the specifications would be testing functionality that was not required, and hence should be irrelevant to the acceptance.
If it was not possible to determine the acceptability of a component based solely on tests against the specifications, then the problem must surely be missing specifications, and that is where the problem should be addressed.
If we reflect back on the characteristics of good requirements, we may recall that measurability was among them. It is when we attempt to turn requirements into test cases that we will most appreciate the work that went into making those assertions measurable. Deriving test cases from reasonable specifications is a process not unlike formulating equations from word problems. If a requirement has the form:
There has, in recent years, been much work on automated specification based testing: techniques for taking routine interface specifications (and assertions about input/output relationships) and automatically generating (a frightening number of) test cases to exercise the specified routine.
This gets to the heart of the problem with Black Box testing. It is quite possible that the functionality described by the specifications can generate a ludicrous (e.g. 10^100) number of test cases. Automated test case generation methodology (e.g. Bounded Exhaustive Testing) hopes (by generating a large enough number of test cases) to find some of those that will fail. Unfortunately, the combinatorics are against us in this quest. Efficiency demands that we find a more targeted way of defining test cases than by throwing darts at a very large N-dimensional dart board.
Recognizing the need for methodology to guide the selection of test cases (from among the many implied by the specifications) Black Box testers came up with a few heuristics.
If I tell you that an integer parameter has a range from 1-100, it is should be obvious that the numbers (0,1,2,99,100,101 and -1) might be of particular interest. Why?
Boundary Value Analysis is the practice of looking at specified parameter domains, and selecting values near the edges, and clearly outside. It is a non-arbitrary selection process, that is entirely based on the specifications, meaningfully measures compliance with a small number of test cases, and (in practice) turns up a fair number of problems.
Many functions take multiple parameters, which interact in interesting ways. In such a situation the parameter domains define an N-dimensional solid, and we need a systematic way to sample points from all through the implied volume.
One such technique is Pair-wise testing. We take one pair of parameters, and explore a range of values (perhaps selected by Boundary Value Analysis) for each. Then we move on to another pair of parameters, and we do this until all pairs have been tested. This technique works well for exploring the interactions of pairs of parameters, but does nothing to exercise richer combinations.
Another such technique is Orthogonal Array testing. Here we sample all corners of the N-dimensional solid, and then start choosing random points in N-space. This technique yields a fairly uniform test density throughout the N-dimensional solid.
While it is hard to argue with the basic principle of specification based acceptance testing, simple heuristics like Boundary Value Analysis and Monte Carlo techniques for sampling points from a large volume are not efficient ways of gaining confidence. We need better informed test case selection techniques.
White Box testing is a test case identification approach that says we are allowed to look beyond the specifications, and into the details of how the component is designed. It has two tremendous advantages over Black Box testing:
The second may involve testing behavior that is not part of the functional specifications. Such tests might be inappropriate for acceptance tests of components delivered (to meet specifications) from an independent contractor. The do, however, have the potential to create much higher confidence than specification based tests, and there is no reason that such tests cannot be used by the component developer.
A key concept in test case definition is equivalence partitions. These are related to the mathematical notion of an equivalence class (a set of numbers that can all be treated as the same for the purpose of a particular relation). In software testing, an equivalence partition is all combinations of parameters that yield the same (or equivalent) computations. The assumption is that once you have validated the behavior of a routine for one set of parameters (from an equivalence partition), it is reasonable to assume that it will also work for all other parameter combinations from the same equivalence partition. This is a powerful tool for collapsing the space of possible input combinations.
The danger of this approach is that equivalence partitions may not be obvious. One might think that the add operation on a 32-bit computer treats all numbers the same ... but this assumption ignores the possibility of overflow then adding two 31-bit quantities. It may be necessary to map the flow of every piece of data through every operation in the routine to discern the equivalence partitions created by each operation, and understand how they might be transformed by subsequent operations.
Much of white box testing methodology is techniques for identifying equivalence partitions.
Analytically enumerating the transitive closure of all possible data-flow paths to infer the equivalence partitions they create on the initial input values can be non-trivial process, quite comparable to formal verification (program proving). This has driven people to seek simpler means of inferring what the equivalence partitions are. One of the most common techniques is code coverage.
The basic notion of code coverage is that every statement in the program should be executed at least once. A little reflection, however, quickly brings us to the realization that this is insufficient because multiple equivalence partitions of parameters could flow through the same statement. This leads to other coverage notions (e.g. branch coverage, path coverage). Different types of code coverage are well discussed in other reading assignments.
Most code coverage techniques work with the assistance of coverage measurement tools. Such tools typically take the output of the compiler, and add additional instrumentation code (to count how many times each point was reached). We then run our test suites against the instrumented code, collect the data, and see what code was, and was not executed.
Once we have identified code that is not being executed, we examine the code to understand the context and inputs that would cause that code to be exercised, and then we define a new test case to create those conditions.
The primary argument in favor of white box testing is that it can achieve a very thorough exercising of a module with a minimum number of test cases. The primary argument against white box testing is that it requires a much more skilled engineer (who understands the design of the component being tested) to develop the test cases. When is it worth the extra effort?
This brings us back to the question of risk. If the code is fairly simple, and it is believed that it can be well tested by obvious exercises of its primary functionality, there is little reason to do any more than a simple functional verification. If the code is highly complex, and contains error-prone mechanisms that are difficult to exercise or observe in simple operations, we should consider how white-box techniques can be used to enhance our confidence.
It is not a question of which technique is "right", but rather of what kinds of problems you have, and which technique is likely to be better suited to those problems.
White-box techniques have the potential to collapse huge spaces of possible inputs into a relatively manageable number of test cases. Even after this collapse, we may find that there are far more test cases than we can reasonably write. Do we have to test the correct execution of every statement?
If there is only one path through the code, and a correct result implies that each of the intermediate steps was also correct, then by testing the whole computation we have bought considerable confidence about the correctness of all of its tributary steps. If code is simple enough that:
If, on the other hand:
A Test Plan plan is a document that describes the way that testing will be used to gain confidence about the correctness of specified components:
After reading a Test Plan you should have a good sense of how we plan to go about gaining confidence about the correctness of our product, and how much confidence we should have as a result of that process. You should know which tools will be used, when, how, and by whom, how these results will be reported, and how they will be used to determine whether or not the product is acceptable.
Test Case Specifications are, to a test developer, what component specifications and design are to a product developer. They enumerate the test cases to be implemented, and describe the functional requirements for each.
In most cases, the supporting testing framework will have already been selected, and such details are almost never included in the individual test case specifications.
A test case specification should include:
As with other specifications and designs, there is no universally appropriate level of detail. For simple things to be done by people who are familiar with the problem, very brief descriptions may be entirely adequate. For complex things to be done by people who are new to the process, it may be necessary to spell out everything in painful detail. When in Rome, do as the Romans do.
Well designed test systems should run themselves and interpret their own output. As such, the running and interpretation of test suites should require no skill and little effort.
Deciding which test cases to run in order to get the greatest amount of confidence at an acceptable price, and enumerating the details associated with each of those test cases may require as much skill and effort as goes into the design and construction of the software to be tested. In most cases, the best way to ensure that a system can be thoroughly tested for a reasonable price is to architect the testability into the system when it is initially designed. You cannot properly archtect a system unless you know how you are going to test it.