精华区文章阅读

发信人: don (逍遥物外·造极登峰), 信区: SoftEng
标  题: software testibility
发信站: 哈工大紫丁香 (2000年10月31日21:27:53 星期二), 站内信件

hi,all:
  to any one who may be interested in， From ieee software, 1995
  enjoy!
davew
31/10
Testibility
Software verification is often the last defense against disasters caused
by faulty software development. When lives and fortunes depend on
software, software quality and its verification demand increased
attention. As computer software begins to replace human decision makers,
a fundamental concern is whether a machine will be able to perform
the tasks with the same level of precision as a skilled person. If not,
a catastrophe may be caused by an automated system that is less
reliable than a manual system. Therefore we must have a means of
assessing that critical automated systems are acceptably safe and
reliable. In this paper, we concentrate on a verification technique
for assessing reliability.
The IEEE Standard Glossary of Software Engineering Terminology (1990)
defines software verification to be the
"process of evaluating a system or component to determine whether the
products of a given development phase satisfy the conditions imposed
at the start of that phase."
Restated, software verification is the process that assesses the
degree of "acceptability" of the software, where acceptability is judged
according to the specification. Software verification is broadly
divided into two classes: dynamic software testing and formal
verification (which typically involves some level of static theorem
proving). Dynamic software testing is the process of executing the
software repeatedly until a confidence is gained that either (1) the
software is correct and has no more defects, which is commonly
referred to as probable correctness, or (2) the software has a high
enough level of acceptability. Testing can alternatively be subdivided
into two main classes: white-box and black-box. White-box testing
bases its selection of test cases on the code itself; black-box
testing bases its selection on some description of the legal input
domain.
Static theorem proving is the mathematical process of showing that the
function computed by a program matches the function that is specified.
No program executions occur in this process, and the end result is a
binary value: either the function computed by the program matches the
specification or it does not. Problems arise in this rigorous process,
because of questions concerning program termination and the
correctness of the rigorous process itself (Who will prove the proof?).
Furthermore, the process of completing such a proof can be more
difficult than writing the program itself.
We describe a different type of verification that can complement both
dynamic testing and static theorem proving. This new type of
verification, which we will call "software testability," focuses on
the probability that a fault in a program will be revealed by testing.
We define software testability as the probability that a piece of
software will fail on its next execution during testing (with a
particular assumed input distribution) if the software includes a
fault.
Verification, by the standard IEEE definition, is a way of assessing
whether the input/output pairs are correct. Testability examines a
different behavioral characteristic: the likelihood that the code can
fail if something in the code is incorrect. Computer science researchers
have spent years developing software reliability models to answer the
question: " what is the probability that this code will fail?" Our
testability asks a different question: " what is the probably this
code will fail if it is faulty?" Musa labels a similar measurement as
the fault exposure ratio, K, in his reliability formulae. The
empirical methods for estimating testability are distinct from Musa's
techniques, however.
Our research has emphasized random testing, because of its attractive
statistical properties. However, in full generality, software
testability could be defined for different types of testing (e.g. data
flow testing, mutation testing, etc.). The IEEE Standard Glossary of
Software Engineering Terminology (1990) defines testability as:
"(1) the degree to which a system or component facilitates the
establishment of test criteria and the performance of tests to determine
whether those criteria have been met, and (2) the degree to which a
requirement is stated in terms that permit establishment of test
criteria and performance of tests to determine whether those criteria
have been met."
Note here that in order to determine "the degree" you must have a
"test criteria," and hence testability is simply a measure of how hard
it is to satisfy a particular testing goal. Examples of testing goals
include coverage and complete fault eradication. Testability requires an
input distribution (commonly called a user profile), but this
requirement is not unique to testability; any statistical prediction
of semantic behavior during software operation must include an
assumption about the distributions of inputs during operation.
The reader should note that our definition of software testability
differs from earlier definitions of testability such as the IEEE
definition above. In the past, software testability has been used
informally to discuss the ease with which some input selection
criteria can be satisfied during testing. For example, if a tester
desired full branch coverage during testing and found it difficult to
select inputs that cover more than 50% of the branches, then the
software would be classified as having poor testability. Our
definition differs significantly, because we are not just trying to find
sets of inputs that satisfy coverage goals; we are quantifying the
probability that a particular type of testing will cause failure. We
focus our definition of testability on the semantics of the software,
how it will behave when it contains a fault. This is different from
asking whether it facilitates coverage or is correct.
Software testability analysis is related to but distinct from both
software testing and formal verification. Like software testing,
testability analysis requires empirical work to create estimates. Unlike
testing, testability analysis does not require an oracle. Thus
testing and testability are complementary: testing can reveal faults
(testability cannot) but testability can suggest locations where
faults can hide from testing (something testing cannot do). The next
section explores how testability analysis can be used in conjunction
with testing and formal methods to give a clearer view of developing
software.
Three Pieces of a Puzzle
Software testability, software testing, and formal verification are
three pieces in a puzzle: the puzzle is whether the software that we
have has a high enough true reliability. Every system has a true (or
fixed) reliability which is generally unknown; hence we try to
estimate that value through reliability modeling. If we are lucky enough
to have a piece of software that (1) has undergone enormous amounts
of successful testing, (2) has undergone formal verification, and (3)
has high testability, then we have three pieces that fit together to
suggest that the puzzle is solved---high reliability is achieved.
Software testing, formal verification, and software testability all
offer information about the quality of a piece of software. Each
technique supplies a unique perspective, different evidence that the
analyst must take into account. Even with these three "clues," the
analyst is still making guesses; but having all three clues is better
than having only two.
As a hypothetical example of how the three analyses can work together,
consider a system that has 50 modules. Each of the modules is tested
with 100 random tests, and (in their current versions) all modules
pass these tests. In addition, the system passed 100 random tests. Ten
of the modules, judged the most intricate and critical, are subjected to
formal verification at various points in their development. Testability
analysis reveals that 5 of the modules are highly insensitive to
testing; i.e., testing is unlikely to find faults in these modules if
faults exist. Only one of these 5 modules has been formally verified. At
this point, verification resources should concentrate on the 4
modules that have low testability and have not been formally verified;
they are the most vulnerable to hidden faults.
As another example, consider a system built entirely of formally
verified modules. Using a development approach inspired by cleanroom,
the analysts wait until after system integration to do random system
testing. During this testing, some faults are discovered and the code is
repaired. Regression testing and new random tests reveal no more
failures, but testability analysis identifies several places in the code
where testing is highly unlikely to reveal faults. These pieces of code
are subjected to further formal analysis and non-random tests are
devised to exercise these sections more extensively.
These examples illustrate that testability information cannot replace
testing and formal verification; but they also suggest that testing
and formal verification should not be relied upon exclusively either.
Our view of software development and verification is inclusive. The most
effective software practitioners will take advantage of all available
information in order to build and assess quality software. In the rest
of this paper, we focus on software testability; but this discussion
is always in the context of a technique that is complementary to testing
and formal verification.
Software Testability: The Big Picture
To better provide a understanding of what we mean by software
testability, consider two simple analogies.
If software faults were gold, then software testing would be gold
mining. Software testability would be a geologist's survey done before
mining takes place. It is not the geologist's job to dig for gold.
Instead, the geologist establishes the likelihood that digging at a
particular spot would be rewarding. A geologist might say, "This
valley may or may not have gold, but if there is gold, it will be in the
top 50 feet, and it will be all over this valley." At another location,
the geologist might say, "Unless you find gold in the first 10 feet
on this plateau, there is no gold. However, on the next plateau you will
have to dig 100 feet before you can be sure there is no gold."
When software testing begins, such an initial survey has obvious
advantages over testing blind. Testability suggests the testing
intensity, whereas the geologist gives the digging depth. Testability
provides the degree of difficulty which will be incurred during
testing of a particular location to detect a fault. If after testing
to the degree specified by testability we observe no failures, then we
can be reasonably sure that our program is correct.
In a second analogy, we illustrate how fewer tests can yield an
equivalent confidence in correctness if we are sure the software will
not hide faults: imagine that you are writing a program for scanning
black and white satellite photos, looking for evidence of a large barge.
If you are sure that the barge will appear as a black rectangle, and
that any barge will cover at least a 10 by 20 pixel area in an image,
then the program can use techniques that could not be used if the
barge size were not established beforehand. For example, assume that the
original image has been subsampled so that each pixel in the new
image is the average of a five by five square of pixels in the
original image. This subsampled image could be scanned 25 times more
quickly than the original; with the barge size guaranteed to be large
enough, any barge would still be detectable in the lower resolution
image. (The shape of a suspected barge could be determined by more
detailed examination of the original image at higher resolution.) But if
a barge might exist in the image as a smaller rectangle, then the low
resolution image might hide the barge inside of one of its averaged
pixels. The lower bound on barge size makes the lower resolution image
sufficient for locating barges. There is a direct relationship between
the minimum barge size and the amount of speed-up that can be
accomplished by subsampling.
Looking for a barge in the image is analogous to looking for faults in a
program; instead of examining groups of pixels, we examine the output
from test case executions. If a fault always will cause a larger
proportion of inputs to fail during testing (this is analogous to a
bigger barge), then fewer random tests will be required to reveal the
fault (a coarser grid can be used to locate the barge). If we can
guarantee that any fault in a program will cause the program to fail for
a sufficiently large proportion of tests, then we can reduce the number
of tests necessary to be confident that no faults exist. These two
analogies indicate why testability information is a useful strategy
complementary to testing. In the next section we discuss how to design
testability into your system.
----------The end. ;-)  真的看到这儿了吗？ great!
--
一条驿路，一种氛围。
一朵梨花，一种思考。
希望能在Linux这条驿路上与你同行！

※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: PR-AI.hit.edu.cn]

SoftEng 版 (精华区)