|
|
| RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center and Operated by IIT Research Institute
INSIDE
T h e J o u r n a l o f t h e
7
Tutorial: Testing for
MTBF
9
PRISM - A New Tool
from RAC
10
Calls for Papers
11
Special Insert
START 2000-1:
Sustained
Maintenance
Planning
15
Industry News
18
From the Editor
20
Calendar
21
In Progress at RAC
Reliability Analysis Center
First Quarter - 2000
A Discussion of Software Reliability
Modeling Problems
By: Jorge Romeu, Reliability Analysis Center
Introduction
A quarter of a century has passed since the first
software reliability model appeared. Many
dozens more, of various types, have been devel-
oped since. And many practitioners still dis-
agree on the practical uses of models in soft-
ware managing, staffing, costing and release
activities. The present article examines this sit-
uation, discusses some of its causes and sug-
gests some approaches to improve it.
This author believes that the current user dissat-
isfaction stems from the manner in which relia-
bility, as a concept, is applied to the software
environment and on how the related models
have evolved. This paper, therefore, begins by
providing an overview of the characteristics of
software reliability models and of their devel-
opment efforts. This is followed by a discus-
sion of the assumptions underlying software
reliability models and other related problems.
Finally, some suggestions on how to improve
the situation are provided.
Software Reliability
Broadly speaking, reliability is the probability
of satisfactory operation of a system or device,
under specific conditions, for a specific time. In
software systems, the concept of reliability is
complicated by several factors. An operator
and a hardware subsystem are always associat-
ed with the software. Hence, documentation,
training, and interface problems can (directly or
indirectly) induce software failures, thus
becoming part of the reliability assessment
process.
It is also important to understand the origins of
the concept of software reliability. It evolved as
a result of the increasing use of embedded soft-
ware in already existing hardware systems.
Hardware reliability had been successfully
developed and understood since the early 50s.
Hence, it was only natural that the same type of
hardware professionals (e.g. systems and elec-
trical engineers) would develop the first soft-
ware models by extending and adapting their
previously successful hardware ones. But the
hardware modeling techniques, as we will later
see, did not always work well in the software
environment.
This lack of model portability occurred due to
some basic differences between the two envi-
ronments. It is true that, in general, both hard-
ware and software reliability models can be
broadly divided into three categories: Structural
(theoretical), Part Count (component) and
Black Box (empirical). Examples of theoretical
hardware/software models are few and usually
are of systems of minor complexity (independ-
ent series, parallel component). Of the second
type, we find MIL-HDBK-217 models for hard-
ware, and models based on software science or
cyclomatic complexity for software. Black Box
models, usually time-driven, include the Army
Materiel and Systems Analysis Activity
(AMSAA) model for hardware, and Jelinski-
Moranda, Musa, Goel-Okumoto, etc. for soft-
ware. Also included are other types of empirical
models based on time series, input domain,
seeding, etc.
However, this author finds one substantial dif-
ference between the two modeling activities.
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 0
2
After its development stage, hardware is
mass-produced and the resulting indus-
trial product is used in relatively similar
environments. For example, after the
prototype has been developed and tested,
a military helicopter is mass-produced
and flown by similarly trained pilots in
attack and rescue missions.
Software, on the other hand, always
remains a prototype on its own, of
which exact copies are made and used by
a wide variety of people with very differ-
ent interests and applications. For exam-
ple, a matrix-inversion software may be
used by a high school student to solve a
2x2 system of linear equations or by a
Ph.D. candidate to invert thousands of
multivariate correlation matrices in a
simulation study.
The modeling stage is, therefore, perma-
nent for the (prototype) software product.
Hence, its characteristics and problems
have a more significant impact in this
environment than in the hardware one, as
we will see next.
Modeling Problems
A major problem encountered in the soft-
ware reliability modeling activity arises
from the involvement of two different
groups of individuals, modelers and
practitioners, each having a different
product and process in mind and seeking
a different result. The fundamental dif-
ferences between these two groups make
the existing software reliability models a
professional success for many modelers
but unsatisfactory working tools for
many practitioners.
Modelers are usually researchers or aca-
demicians while most practitioners are
software developers. Academicians and
researchers rely on publishing papers,
which are peer reviewed and assessed for
their theoretical value by other academi-
cians and researchers, to obtain their
tenure, promotion or doctorates. To pub-
lish their work, modelers use sophisticat-
ed statistical theories that require strict
(and sometimes unrealistic or unjusti-
fied) underlying assumptions.
Practitioners (managers, developers) on
the other hand, need to staff, cost, and
release software products. Practitioners
work with programmers, under time con-
straints and must rely on insufficient and
sometimes
deficient
information.
Software practitioners need models and
approaches that are feasible (implement-
ed without incurring exorbitant costs or
excessive burden) and practical (can be
used to staff, cost, release the software,
etc.).
Theoretical models are based on many
mathematically driven software assump-
tions that, in practice, do not hold or are
weak. In addition, many models of the
Black Box (e.g. time-based) class fail to
capture several other important factors
that affect software reliability.
In general, modelers achieve their goals
but practitioners (whose needs are not
met) remain unsatisfied. For, even when
the software reliability models developed
have indeed helped users in their work,
they have not completely solved their
practical problems in a satisfactory man-
ner.
Such dichotomy of interests is, in the
opinion of this author, the source of most
of the problems encountered in software
reliability modeling. For models that
have been based on theoretical assump-
tions, many times far removed from real-
ity, cannot produce accurate results. This
is not to say that other problems, such as
defining software reliability, agreeing on
software metrics, etc. do not complicate
the matter even further. We will, in the
following pages, discuss some of the
major discrepancies that arise from the
mentioned dichotomy between software
theory and practice.
Validity of Software
Reliability Model
Assumptions
Some software reliability model assump-
tions do not hold or are weak because
they have a purely theoretical (mathe-
matical) origin. Note that not all model
assumptions are invalid all the time or in
all the models. Some (Black Box) model
assumptions and related topics and the
reasons for their possible lack of validity
are:
Definition and Criticality of
Failures: In many cases, failures
are user dependent and poorly
defined. This makes their identifi-
cation in the field also difficult.
Definition of Time Units: Include
calendar time, execution time, etc.,
which may differ substantially or
may not always be accurately
recorded. Some models (Musa)
have found ways to deal with this
by converting units from one time
domain to another. The assumption
also implies that testing intensity is
time homogeneous.
Fixed Number of Faults: Assumes
that no additional faults are intro-
duced and that every debugging
attempt is successful. Some mod-
els (e.g. imperfect debugging of
Goel) attempt to address these
issues.
All Faults Have the Same Failure
Rate: This implies that all faults
have the same probability of
occurrence. But failure probabili-
ty is in fact associated with input
domain and user profile. Hence,
all failures are not equally likely.
For example, a software failure
occurs only when a specific input
is given. But some users may pro-
vide such input very frequently.
For this user, the program will
have a high failure rate. Another
consequence of failures not being
equally likely is that reliability will
be affected by the order in which
faults are discovered. Say two dif-
ferent testing teams have uncov-
ered two different sequences of
n faults. They may obtain two
different reliability estimates (and
this is complicated by the specific
user profile).
All Software Faults are Always
Exposed: Faults are encountered
only if that part of the software
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 0
3
where they reside is exercised. If
there is a fault that prevents the
execution of some part of the soft-
ware until it is removed, the faults
that exist downstream, in that part
of the code, are not exposed until
the initial one is uncovered and
removed.
Faults are Immediately Removed:
Testing will not usually be stopped
once
a
fault
is
uncovered.
Adaptive procedures (removing/
patching part of the code; restrict-
ing the input) will be used to con-
tinue the testing, while the fault is
uncovered and fixed.
Only One Failure at a Time
Occurs: This is a required Poisson
Process assumption and not all
software necessarily complies with
it. There may occur multiple fail-
ures simultaneously (and not all
faults will be corrected before
restarting the testing).
Testing is Homogeneous: Testing
effort is not always the same; per-
sonnel may vary as well as time
dedicated to it and to other concur-
rent functions. In addition, if a
critical fault is uncovered or a
deadline approaches, testing may
become more intensive.
Failure Rate is (only) Proportional
to Error Content: There are many
other factors, such as user profile,
complexity of the problem, lan-
guage, programming experience,
etc.
Number of Failures in Disjoint
Intervals is Independent: This is
also another key Poisson Process
assumption. There is a finite num-
ber of faults in the software, which
are sequentially removed. If we
encounter and remove a large
number of faults in one interval,
then in the next time interval there
will be less faults to find, and vice
versa. Hence, the number of faults
uncovered in two disjoint, adjacent
time intervals is affected by the
number
of
faults
previously
uncovered.
Times
Between
Failures
are
Independent: Since the number of
failures encountered in disjoint
intervals is not independent, this
associated assumption is not true
either.
Testing Proceeds Only After a
Fault is Removed (corrected): This
is an ideal situation that does not
occur in practice. Adaptive proce-
dures are used to proceed with
testing.
All the Code is Tested, All the
Time: Some testing may occur
before all the code is completed.
Then, if a fault is encountered and
located in a given module, this
fault may be removed or patched
(adaptive procedure) to proceed
with the testing.
Run Time versus Think Time: Run
(test) time models penalize devel-
opment strategies that spend more
desk (think) time analyzing the
program than in testing. Calendar
time captures both of these activi-
ties (think and run times) but this
time measurement is weak.
Specific Prior Distribution: Some
modelers have attempted to deal
with the reality of different failure
rates for different faults by assum-
ing a prior distribution and then
using a Bayesian model for the
reliability. The form of such a prior
is selected for mathematical rea-
sons, in order to obtain a closed
form solution for the correspon-
ding posterior.
Reliability Growth Continues with
Additional Testing Time: It is
implicitly assumed that, as test
time proceeds, new faults will be
uncovered and removed. Hence,
the
software
reliability
will
increase. This precludes the intro-
duction of new faults as well as the
increase in program complexity by
the maintenance operation.
Seeded Faults Have the Same
Failure Rate as Indigenous: In this
approach to software reliability,
the developer intentionally intro-
duces a number of faults in the
program (e.g. fault seeding).
Then the testing team uncovers
some of them during testing, along
with other indigenous faults (not
seeded, but actual programming
ones). Based on the number of
indigenous and seeded faults
uncovered, an estimate of total
number of program faults is
obtained.
This
estimation
approach assumes that the com-
plexity and location of seeded
faults are the same as the complex-
ity and program location of the
indigenous faults. This is not nec-
essarily so.
Software Input and User Profiles
are Known and Representative:
Some software reliability models
use input domain profiles, which
are difficult to establish.
Then,
some users exercise some parts of
the code more than others do,
establishing a particular user pro-
file. Estimating such profiles con-
stitutes a complex statistical prob-
lem, involving multivariate param-
eter estimation and goodness of fit
tests. In addition, these profiles
are user dependent, requiring one
for each different user.
Failure
Data
Collection
is
Accurate: In software develop-
ment, the basic activity is to devel-
op good code. Data collection is
usually a peripheral activity that
programmers are assigned, in addi-
tion to their work. Data collection
forms (problem reports, etc.) are
complicated and data elements
such as exact times, etc. may not
be recorded accurately.
In addition, other issues associated with
software development and model use,
impact software reliability model assess-
ment. Some of them are:
Software Doesnt Wear Out with
Time: This has always been a key
difference between software and
hardware reliability.
Hardware
devices age.
Software also
ages just in a different way!
As time proceeds and software
maintenance occurs, new func-
tions, modules, hardware, capabil-
ities, are added/modified. Its com-
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 0
4
plexity increases to the point, that
it becomes more economic to
retire it than to continue main-
taining it. This process is, concep-
tually, akin to hardware aging
processes.
Development Phases and Fault
Exposure: In different software
development phases, different
types of faults are uncovered.
Hence, combining failure data
from different phases for model
input may be detrimental to the
overall software reliability estima-
tion.
Experimentation: Many software
development experiments under-
taken to assess software reliability
models and methods have been
implemented using very specific
problems and subjects. This situa-
tion poses restrictions on the
extrapolation of results.
The
experimental subjects are usually
students or pre-selected profes-
sional teams. The problems are
usually theoretical, or replications
of available real life ones. Neither
has been randomly selected and
may be far from representative.
Experimental results are still use-
ful and provide valuable insight,
but care should be exercised in
their interpretation and especially
in extrapolation.
Additional Factors Not Accounted
For: Most software models are
affected by project requirements,
software environment and docu-
mentation, user profile, program-
mer and management experience,
and other factors not accounted for
in the model. Their contribution
to the unreliability of the software
program is therefore not reflected
in the model results.
Initial Reliability Estimations: In
some software models, initial esti-
mates are a function of (1) the
processor speed, (2) programming
language used (via expansion
rates) and (3) error exposure rate.
Program size cancels out and does
not constitute a factor at this early
time. Other previously mentioned
factors that also affect software
complexity and reliability are not
included in this initial estimation.
Fault Exposure Ratios: These
ratios, used for initial estimation,
were obtained several years ago.
In a rapidly advancing area such
as software programming, where
new languages, new environments
(e.g. visual) and technologies are
coming out every day, such fault
exposure ratios may no longer be
representative. In addition, they
were obtained from specific envi-
ronments and projects of the past
and may not represent the projects
and new application areas of
today.
Language Exposure Ratios: These
ratios are subject to the criticism
made of fault exposure ratios. In
addition, new programming lan-
guages (e.g. Java) have appeared
recently for which accurate lan-
guage exposure ratios may not yet
be available.
Having a Large Pool of Software
Reliability Models from Which to
Choose: This constitutes an addi-
tional and serious problem. Since
no single model has been com-
pletely
established,
software
developers must choose one. For
example, the developer may try
fitting several models and then
choose the most accurate among
them, based on past behavior.
However, can one be sure that past
behavior always guarantees a
models correct future behavior?
Model selection is not an easy
task.
Some Suggestions to
Improve Software Reliability
Modeling
Software reliability models are used to
assess the end result of the software
development process. This final result
(program) is a function of at least three
broad factors: people, project and envi-
ronment. The people include program-
mers, management, testers, etc.
The
project is represented by its characteris-
tics: size, complexity, requirements,
functions, interfaces, etc. The environ-
ment includes all the characteristics of
the software development shop: manage-
ment style, software tools, methods, etc.
This author believes that, in addition to
using and improving the software relia-
bility models, resources should also be
dedicated to obtaining a better under-
standing of ones own software organiza-
tion (strengths and weaknesses) and to
improve it by training its people and
readjusting its methods.
In-depth forensic analysis of an organiza-
tions past work will reveal its strong and
weak points.
It will also provide for
assessing key components of the three
factors mentioned in the previous para-
graph. This author also proposes that
error prevention, rather than correction,
be emphasized. Organizational improve-
ments based on the results of forensic
analysis may provide substantial mid and
long-range software reliability gains.
The development by the Software
Engineering Institute of the Capability
Maturity Model (SEI/CMM), with its
five-level classification of software
shops (from basic practices to continu-
ous, quantitative improvement process)
is a tacit recognition of the urgent need
for organizational improvements.
It is known that about 70% of software
faults result from problems introduced
during the requirements definition and
design phases (including software reuse
problems). Dedicating more time and
staff to better understanding, stating and
conveying to programmers the special
requirements and design issues of each
project will pay off at the end. Better
training, software tools and programmer
and resource time management may also
contribute to reducing programming
stress and thus software errors.
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 0
5
Finally, forensic analysis may also show
that improvements are needed for impor-
tant concepts such as fault avoidance and
prevention techniques (tools, training),
fault tolerance (recovery blocks, n-ver-
sion programming), fault detection/cor-
rection (walkthroughs).
Conclusions
This author recognizes the excellent
work of many software reliability model
developers. He has worked with some of
them in industry and academia and
respects their many talents and achieve-
ments. There are ample examples in the
software engineering literature that show
how software reliability models, directly
and indirectly, have:
Improved the software estimation,
assessment,
prediction,
etc.
processes
Improved the understanding of the
software development process
Contributed to the development of
new software engineering tools
and techniques
However, it is also unquestionable that
the problems discussed in this article
affect model efficiency and accuracy and
are at the heart of the dissatisfaction of
some model users. For, in general, soft-
ware reliability models are based upon:
Poor (few, weak measurement
scale) data
Weak (invalid, incomplete, unre-
alistic) assumptions
Incomplete (lacking many impor-
tant) factors
Therefore, current software reliability
models can only provide an approxima-
tion of the results that we want and need.
All things considered, however, and
under the circumstances discussed in this
article this is the best they can do, at this
time. Therefore, this author also propos-
es that the models be improved. This can
be accomplished by seriously revisiting
the modeling problems discussed in this
paper. By approaching the stated model-
developer versus model-practitioner
dichotomy in a constructive manner, all
parties may agree to:
Not ask or expect unrealistic
results from software reliability
models
Have practitioners work more
closely with software model
developers
Provide an incentive for adapting
(as opposed to developing new)
models
Assess each software organiza-
tions strengths and weaknesses
Improve, correspondingly, each
organization
(programming/
processes)
Strive for error prevention rather
than error correction
Then, use a software reliability
model, judiciously, to assess
improvement.
Bibliography
1.
Handbook of Software Reliability
Engineering. Michael R. Lyu,
Editor. IEEE/Computer Society
Press-McGraw Hill. 1995.
2.
Software Reliability: Measurement,
Prediction, Application. Musa, J., A.
Iannino and K. Okumoto. McGraw
Hill. 1987.
3.
Software
Reliability:
Models,
Assumptions,
Limitations
and
Applicability. Goel A., IEEE-TR
Software Engineering. Vol. 11.
December 1985.
4.
Discussion of Statistical Measures
to Evaluate and Compare Predictive
Quality of Software Reliability
Estimation Methods. J. L. Romeu.
Proceedings of the 1997 Biennial
Meeting
of
the
International
Statistical Institute (ISI). Istanbul,
Turkey.
5.
Classifying Combined Hardware/
Software
Reliability
Models.
Romeu,
J.
L.
and
K.
Dey.
Proceedings of the 1984 RAMS
Conference. San Francisco, CA.
6.
Some
Measurement
Problems
Detected in the Analysis of Software
Productivity
Data
and
their
Statistical Consequences. Romeu,
J. L. and S. Gloss-Soler. Proceedings
of the 1983 COMPSAC Conference.
Chicago, IL.
RAC Points of Contact
Points of contact have been assigned for many of the services and products available from RAC. For the convenience of our cus-
tomers and Journal readers, a list of all of our contacts is provided below. Please contact the indicated individuals for information
and for answers to your questions on the topics as noted.
RAC Point of Contact Topic
Dave Dylis
315-339-7055
ddylis@iitri.org
PRISM and Data collection and sharing
Ned Criscimagna
301-918-1526
ncriscimagna@iitri.org
RAC Journal
Mary Priore
315-339-7135
mpriore@iitri.org
RAC Web Site
Bruce Dudley
315-339-7045
bdudley@iitri.org or rac@iitri.org
Technical Inquiries
Nan Pfrimmer
800-526-4803
npfrimmer@iitri.org
Training
Gina Nash
800-526-4802
gnash@iitri.org
Product Catalog and orders
|
|
|
|