An Introduction to Software Test Cases

Mark Kampe
$Id: testing.html 149 2007-11-14 23:35:46Z Mark $

1. Introduction to Test Cases

Perhaps the first question we encounter when trying to figure out how to a test a component is:

It doesn't take much analysis to realize that the number of possible test cases may rival the number of particles in the universe. Then we move on to a much more interesting question:

This question can be broken down into two (only slightly) simpler questions:

  1. Where is our confidence currently low?
  2. Which test cases will significantly improve that confidence?
It is important that these two questions be well understood, because:
  1. Our confidence function varies widely over our code, and there is little value to be gained by additional testing of code whose correctness has already been well established.
  2. Most (of the possible) test cases are redundant, and two well chosen test cases can easily deliver more confidence than a million poorly chosen test cases.

This paper is an introduction to the problem of choosing the right test cases.

1.1 Test Cases, and Test Suites

A Test Case is a script, program, or other mechanism that exercises a software component to ascertain that a specific correctness assertion is true. In general, it creates a specified initial state, invokes the tested component in a specified way, observes its behavior, and checks to ensure that the behavior was correct. Different assertions (or variations on a single assertion) are likely to be tested by different test cases.

Test Cases are usually organized into Test Suites. A Test Suite is a collection of related Test Cases that is likely to be run as a whole. They are usually grouped together because, taken as a whole, they testify to the correctness of a particular component (or a particular aspect of its functionality). Different suites might exercise different components or different types of functionality. It is also common for all of the test-cases in a test suite to be written for and execute under a single test execution framework. These are discussed in another note on Testing Harnesses.

1.2 Types of Test Cases

Test cases (and suites of test cases) can be characterized on the basis of the types of questions they try to answer. They often fall into a few broad categories:

There are many other types of testing (e.g. usability testing, integration testing, interoperability testing) but this paper attempts to focus on the types of test cases that are usually run (by developers) to validate a specific component.

1.3 Characteristics of a Good Test Case

There are several characteristics that a good test case should have:

1.4 Testing and Risk

All lines of code are not equally likely to be buggy. Numerous studies have found that the Pareto principle also holds for the distribution of bugs over modules ... that 20% of the modules account for 80% of the bugs. In many systems the distribution is even more radical, with 5% of modules accounting for 50% of all bugs. There is no great mystery behind this distribution: Bugs are more likely to be found in subtle and complex code, while most modules and routines do relatively obvious and simple things.

If the distribution of bugs among modules is not uniform, it would seem foolish to allocate testing effort equally to all modules. Quite to the contrary, we would like to allocate our testing effort in direct proportion to the risk ... if only we had some way of assessing that risk. Every problem is unique, but there are several factors that seem to be highly correlated with bug risk:

While it is difficult to create a comprehensive or universal list, the simple fact is that designers and developers have a good sense of which modules are pretty obviously correct, and "where be the monsters". We can simply ask these people to rate each module (as High, Medium, or Low) on the basis of:

  1. the likelihood that there will be errors in its implementation.
  2. the likelihood that errors will not be turned up in basic functional testing.
  3. the likely impact (to program functionality) of errors in this module.

The resulting ladder will tell us where we need to invest the lions' share of our testing efforts.

2. Black Box Testing

Black Box testing is a test case identification approach that says that all test cases should be based on the component specifications. Viewed in historical perspective, this philosophy makes a great deal of sense:

It is difficult to argue with any of that reasoning. Component acceptance testing should be based entirely on component specifications.

2.1 Specification-based Testing

If we reflect back on the characteristics of good requirements, we may recall that measurability was among them. It is when we attempt to turn requirements into test cases that we will most appreciate the work that went into making those assertions measurable. Deriving test cases from reasonable specifications is a process not unlike formulating equations from word problems. If a requirement has the form:

Then we have to figure out how to create situation X, generate input Y, and capture the actions and outputs so that we can compare them with the expectations. Where requirements are written in more general terms, we must attempt to come up with a series of more direct statements (as above), which, taken together, would seem to imply the intended general capability.

There has, in recent years, been much work on automated specification based testing: techniques for taking routine interface specifications (and assertions about input/output relationships) and automatically generating (a frightening number of) test cases to exercise the specified routine.

This gets to the heart of the problem with Black Box testing. It is quite possible that the functionality described by the specifications can generate a ludicrous (e.g. 10^100) number of test cases. Automated test case generation methodology (e.g. Bounded Exhaustive Testing) hopes (by generating a large enough number of test cases) to find some of those that will fail. Unfortunately, the combinatorics are against us in this quest. Efficiency demands that we find a more targeted way of defining test cases than by throwing darts at a very large N-dimensional dart board.

2.2 Parameter Value Selection

Recognizing the need for methodology to guide the selection of test cases (from among the many implied by the specifications) Black Box testers came up with a few heuristics.

2.2.1 Boundary Value Analysis

If I tell you that an integer parameter has a range from 1-100, it is should be obvious that the numbers (0,1,2,99,100,101 and -1) might be of particular interest. Why?

These should be obvious, even to someone who has never thought about how to implement the specified function.

Boundary Value Analysis is the practice of looking at specified parameter domains, and selecting values near the edges, and clearly outside. It is a non-arbitrary selection process, that is entirely based on the specifications, meaningfully measures compliance with a small number of test cases, and (in practice) turns up a fair number of problems.

2.2.2 Large Volume Sampling

Many functions take multiple parameters, which interact in interesting ways. In such a situation the parameter domains define an N-dimensional solid, and we need a systematic way to sample points from all through the implied volume.

One such technique is Pair-wise testing. We take one pair of parameters, and explore a range of values (perhaps selected by Boundary Value Analysis) for each. Then we move on to another pair of parameters, and we do this until all pairs have been tested. This technique works well for exploring the interactions of pairs of parameters, but does nothing to exercise richer combinations.

Another such technique is Orthogonal Array testing. Here we sample all corners of the N-dimensional solid, and then start choosing random points in N-space. This technique yields a fairly uniform test density throughout the N-dimensional solid.

2.3 The Limits of Black Box Testing

While it is hard to argue with the basic principle of specification based acceptance testing, simple heuristics like Boundary Value Analysis and Monte Carlo techniques for sampling points from a large volume are not efficient ways of gaining confidence. We need better informed test case selection techniques.

3. White Box Testing

White Box testing is a test case identification approach that says we are allowed to look beyond the specifications, and into the details of how the component is designed. It has two tremendous advantages over Black Box testing:

  1. It makes it relatively easy to identify a small number of test cases that will more thoroughly exercise the component.
  2. It is capable of exercising, and assessing the correctness of mechanisms that have the potential to affect component correctness, but are difficult to exercise or observe based solely on the component specifications.
The first of these is primarily a means of improving the efficacy of Black Box testing, and is often referred to as Grey Box testing.

The second may involve testing behavior that is not part of the functional specifications. Such tests might be inappropriate for acceptance tests of components delivered (to meet specifications) from an independent contractor. The do, however, have the potential to create much higher confidence than specification based tests, and there is no reason that such tests cannot be used by the component developer.

3.1 Equivalence Partitioning

A key concept in test case definition is equivalence partitions. These are related to the mathematical notion of an equivalence class (a set of numbers that can all be treated as the same for the purpose of a particular relation). In software testing, an equivalence partition is all combinations of parameters that yield the same (or equivalent) computations. The assumption is that once you have validated the behavior of a routine for one set of parameters (from an equivalence partition), it is reasonable to assume that it will also work for all other parameter combinations from the same equivalence partition. This is a powerful tool for collapsing the space of possible input combinations.

The danger of this approach is that equivalence partitions may not be obvious. One might think that the add operation on a 32-bit computer treats all numbers the same ... but this assumption ignores the possibility of overflow then adding two 31-bit quantities. It may be necessary to map the flow of every piece of data through every operation in the routine to discern the equivalence partitions created by each operation, and understand how they might be transformed by subsequent operations.

Much of white box testing methodology is techniques for identifying equivalence partitions.

3.2 Code Coverage

Analytically enumerating the transitive closure of all possible data-flow paths to infer the equivalence partitions they create on the initial input values can be non-trivial process, quite comparable to formal verification (program proving). This has driven people to seek simpler means of inferring what the equivalence partitions are. One of the most common techniques is code coverage.

The basic notion of code coverage is that every statement in the program should be executed at least once. A little reflection, however, quickly brings us to the realization that this is insufficient because multiple equivalence partitions of parameters could flow through the same statement. This leads to other coverage notions (e.g. branch coverage, path coverage). Different types of code coverage are well discussed in other reading assignments.

Most code coverage techniques work with the assistance of coverage measurement tools. Such tools typically take the output of the compiler, and add additional instrumentation code (to count how many times each point was reached). We then run our test suites against the instrumented code, collect the data, and see what code was, and was not executed.

Once we have identified code that is not being executed, we examine the code to understand the context and inputs that would cause that code to be exercised, and then we define a new test case to create those conditions.

3.3 Black-Box vs White-Box

The primary argument in favor of white box testing is that it can achieve a very thorough exercising of a module with a minimum number of test cases. The primary argument against white box testing is that it requires a much more skilled engineer (who understands the design of the component being tested) to develop the test cases. When is it worth the extra effort?

This brings us back to the question of risk. If the code is fairly simple, and it is believed that it can be well tested by obvious exercises of its primary functionality, there is little reason to do any more than a simple functional verification. If the code is highly complex, and contains error-prone mechanisms that are difficult to exercise or observe in simple operations, we should consider how white-box techniques can be used to enhance our confidence.

It is not a question of which technique is "right", but rather of what kinds of problems you have, and which technique is likely to be better suited to those problems.

3.4 When to be How Exhaustive

White-box techniques have the potential to collapse huge spaces of possible inputs into a relatively manageable number of test cases. Even after this collapse, we may find that there are far more test cases than we can reasonably write. Do we have to test the correct execution of every statement?

If there is only one path through the code, and a correct result implies that each of the intermediate steps was also correct, then by testing the whole computation we have bought considerable confidence about the correctness of all of its tributary steps. If code is simple enough that:

  1. the likelihood of error is low.
  2. any error would cause the output of the function to be incorrect.
  3. if there was such an error, it would not be terribly difficult to track down.
Then there is no point in trying to define test cases to exercise the individual sub-elements of the computation so as to isolate the source of (the unlikely and easily diagnosed) error. If writing additional test cases is unlikely to find any additional problems, then don't write them. Don't write new test cases unless they are likely to give you new information. In situations like this, give a little thought to the range of (black box) specification and error tests you want to run, and leave it at that.

If, on the other hand:

  1. the code is complex enough that the likelihood of error is high.
  2. the computations are sufficiently stateful and complex that internal errors might not produce externally visable symptoms for a while.
  3. the persistence of the data and complexity of the interactions is such that finding the source of such an error would be very difficult.
These are reasons to create more detailed instrumentation and carefully crafted test cases. You should invest your testing development in proportion to your perceived risk.

4. Test Plans

A Test Plan plan is a document that describes the way that testing will be used to gain confidence about the correctness of specified components:

After reading a Test Plan you should have a good sense of how we plan to go about gaining confidence about the correctness of our product, and how much confidence we should have as a result of that process. You should know which tools will be used, when, how, and by whom, how these results will be reported, and how they will be used to determine whether or not the product is acceptable.

4.1 Test Case Specifications

Test Case Specifications are, to a test developer, what component specifications and design are to a product developer. They enumerate the test cases to be implemented, and describe the functional requirements for each.

In most cases, the supporting testing framework will have already been selected, and such details are almost never included in the individual test case specifications.

A test case specification should include:

  1. the name of the test case
    So that we have a way of referring to this test case.
  2. the component and functional area it tests
    So the reader knows what we are talking about.
  3. a simple statement of the assertion it tests
    Perhaps including reference to the relevent specifications and requirements.
  4. what pre-conditions must be established
    This establishes the context in which the test should be run (e.g. what data files should be where). These should only be the pre-conditions for this particular test case. More general pre-conditions (e.g. what software should be installed on what kind of system, and configured how) should be described at a higher level.
  5. what operations will be invoked (and if not obvious, how)
    This may be as simple as "run command x with arguments y", but it could possibly entail the construction of a new driver framework.
  6. what results will be captured (and if not obvious, how)
    This may be as simple as record the returned value, and save the contents of a few output files, but it could involve special stub-modules and diagnostic instrumentation.
  7. how we will determine whether or not the results are correct
    What our general expectations are, and how we will process the collected output to ascertain that those expectations have been fulfilled.

As with other specifications and designs, there is no universally appropriate level of detail. For simple things to be done by people who are familiar with the problem, very brief descriptions may be entirely adequate. For complex things to be done by people who are new to the process, it may be necessary to spell out everything in painful detail. When in Rome, do as the Romans do.

5. Summary

Well designed test systems should run themselves and interpret their own output. As such, the running and interpretation of test suites should require no skill and little effort.

Deciding which test cases to run in order to get the greatest amount of confidence at an acceptable price, and enumerating the details associated with each of those test cases may require as much skill and effort as goes into the design and construction of the software to be tested. In most cases, the best way to ensure that a system can be thoroughly tested for a reasonable price is to architect the testability into the system when it is initially designed. You cannot properly archtect a system unless you know how you are going to test it.