RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center and Operated by IIT Research Institute INSIDE T h e J o u r n a l o f t h e 7 Tutorial: Testing for MTBF 9 PRISM - A New Tool from RAC 10 Calls for Papers 11 Special Insert START 2000-1: Sustained Maintenance Planning 15 Industry News 18 From the Editor 20 Calendar 21 In Progress at RAC Reliability Analysis Center First Quarter - 2000 A Discussion of Software Reliability Modeling Problems By: Jorge Romeu, Reliability Analysis Center Introduction A quarter of a century has passed since the first software reliability model appeared. Many dozens more, of various types, have been devel- oped since. And many practitioners still dis- agree on the practical uses of models in soft- ware managing, staffing, costing and release activities. The present article examines this sit- uation, discusses some of its causes and sug- gests some approaches to improve it. This author believes that the current user dissat- isfaction stems from the manner in which relia- bility, as a concept, is applied to the software environment and on how the related models have evolved. This paper, therefore, begins by providing an overview of the characteristics of software reliability models and of their devel- opment efforts. This is followed by a discus- sion of the assumptions underlying software reliability models and other related problems. Finally, some suggestions on how to improve the situation are provided. Software Reliability Broadly speaking, reliability is the probability of satisfactory operation of a system or device, under specific conditions, for a specific time. In software systems, the concept of reliability is complicated by several factors. An operator and a hardware subsystem are always associat- ed with the software. Hence, documentation, training, and interface problems can (directly or indirectly) induce software failures, thus becoming part of the reliability assessment process. It is also important to understand the origins of the concept of software reliability. It evolved as a result of the increasing use of embedded soft- ware in already existing hardware systems. Hardware reliability had been successfully developed and understood since the early 50s. Hence, it was only natural that the same type of hardware professionals (e.g. systems and elec- trical engineers) would develop the first soft- ware models by extending and adapting their previously successful hardware ones. But the hardware modeling techniques, as we will later see, did not always work well in the software environment. This lack of model portability occurred due to some basic differences between the two envi- ronments. It is true that, in general, both hard- ware and software reliability models can be broadly divided into three categories: Structural (theoretical), Part Count (component) and Black Box (empirical). Examples of theoretical hardware/software models are few and usually are of systems of minor complexity (independ- ent series, parallel component). Of the second type, we find MIL-HDBK-217 models for hard- ware, and models based on software science or cyclomatic complexity for software. Black Box models, usually time-driven, include the Army Materiel and Systems Analysis Activity (AMSAA) model for hardware, and Jelinski- Moranda, Musa, Goel-Okumoto, etc. for soft- ware. Also included are other types of empirical models based on time series, input domain, seeding, etc. However, this author finds one substantial dif- ference between the two modeling activities. T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F i r s t Q u a r t e r - 2 0 0 0 2 After its development stage, hardware is mass-produced and the resulting indus- trial product is used in relatively similar environments. For example, after the prototype has been developed and tested, a military helicopter is mass-produced and flown by similarly trained pilots in attack and rescue missions. Software, on the other hand, always remains a prototype on its own, of which exact copies are made and used by a wide variety of people with very differ- ent interests and applications. For exam- ple, a matrix-inversion software may be used by a high school student to solve a 2x2 system of linear equations or by a Ph.D. candidate to invert thousands of multivariate correlation matrices in a simulation study. The modeling stage is, therefore, perma- nent for the (prototype) software product. Hence, its characteristics and problems have a more significant impact in this environment than in the hardware one, as we will see next. Modeling Problems A major problem encountered in the soft- ware reliability modeling activity arises from the involvement of two different groups of individuals, modelers and practitioners, each having a different product and process in mind and seeking a different result. The fundamental dif- ferences between these two groups make the existing software reliability models a professional success for many modelers but unsatisfactory working tools for many practitioners. Modelers are usually researchers or aca- demicians while most practitioners are software developers. Academicians and researchers rely on publishing papers, which are peer reviewed and assessed for their theoretical value by other academi- cians and researchers, to obtain their tenure, promotion or doctorates. To pub- lish their work, modelers use sophisticat- ed statistical theories that require strict (and sometimes unrealistic or unjusti- fied) underlying assumptions. Practitioners (managers, developers) on the other hand, need to staff, cost, and release software products. Practitioners work with programmers, under time con- straints and must rely on insufficient and sometimes deficient information. Software practitioners need models and approaches that are feasible (implement- ed without incurring exorbitant costs or excessive burden) and practical (can be used to staff, cost, release the software, etc.). Theoretical models are based on many mathematically driven software assump- tions that, in practice, do not hold or are weak. In addition, many models of the Black Box (e.g. time-based) class fail to capture several other important factors that affect software reliability. In general, modelers achieve their goals but practitioners (whose needs are not met) remain unsatisfied. For, even when the software reliability models developed have indeed helped users in their work, they have not completely solved their practical problems in a satisfactory man- ner. Such dichotomy of interests is, in the opinion of this author, the source of most of the problems encountered in software reliability modeling. For models that have been based on theoretical assump- tions, many times far removed from real- ity, cannot produce accurate results. This is not to say that other problems, such as defining software reliability, agreeing on software metrics, etc. do not complicate the matter even further. We will, in the following pages, discuss some of the major discrepancies that arise from the mentioned dichotomy between software theory and practice. Validity of Software Reliability Model Assumptions Some software reliability model assump- tions do not hold or are weak because they have a purely theoretical (mathe- matical) origin. Note that not all model assumptions are invalid all the time or in all the models. Some (Black Box) model assumptions and related topics and the reasons for their possible lack of validity are: Definition and Criticality of Failures: In many cases, failures are user dependent and poorly defined. This makes their identifi- cation in the field also difficult. Definition of Time Units: Include calendar time, execution time, etc., which may differ substantially or may not always be accurately recorded. Some models (Musa) have found ways to deal with this by converting units from one time domain to another. The assumption also implies that testing intensity is time homogeneous. Fixed Number of Faults: Assumes that no additional faults are intro- duced and that every debugging attempt is successful. Some mod- els (e.g. imperfect debugging of Goel) attempt to address these issues. All Faults Have the Same Failure Rate: This implies that all faults have the same probability of occurrence. But failure probabili- ty is in fact associated with input domain and user profile. Hence, all failures are not equally likely. For example, a software failure occurs only when a specific input is given. But some users may pro- vide such input very frequently. For this user, the program will have a high failure rate. Another consequence of failures not being equally likely is that reliability will be affected by the order in which faults are discovered. Say two dif- ferent testing teams have uncov- ered two different sequences of n faults. They may obtain two different reliability estimates (and this is complicated by the specific user profile). All Software Faults are Always Exposed: Faults are encountered only if that part of the software T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F i r s t Q u a r t e r - 2 0 0 0 3 where they reside is exercised. If there is a fault that prevents the execution of some part of the soft- ware until it is removed, the faults that exist downstream, in that part of the code, are not exposed until the initial one is uncovered and removed. Faults are Immediately Removed: Testing will not usually be stopped once a fault is uncovered. Adaptive procedures (removing/ patching part of the code; restrict- ing the input) will be used to con- tinue the testing, while the fault is uncovered and fixed. Only One Failure at a Time Occurs: This is a required Poisson Process assumption and not all software necessarily complies with it. There may occur multiple fail- ures simultaneously (and not all faults will be corrected before restarting the testing). Testing is Homogeneous: Testing effort is not always the same; per- sonnel may vary as well as time dedicated to it and to other concur- rent functions. In addition, if a critical fault is uncovered or a deadline approaches, testing may become more intensive. Failure Rate is (only) Proportional to Error Content: There are many other factors, such as user profile, complexity of the problem, lan- guage, programming experience, etc. Number of Failures in Disjoint Intervals is Independent: This is also another key Poisson Process assumption. There is a finite num- ber of faults in the software, which are sequentially removed. If we encounter and remove a large number of faults in one interval, then in the next time interval there will be less faults to find, and vice versa. Hence, the number of faults uncovered in two disjoint, adjacent time intervals is affected by the number of faults previously uncovered. Times Between Failures are Independent: Since the number of failures encountered in disjoint intervals is not independent, this associated assumption is not true either. Testing Proceeds Only After a Fault is Removed (corrected): This is an ideal situation that does not occur in practice. Adaptive proce- dures are used to proceed with testing. All the Code is Tested, All the Time: Some testing may occur before all the code is completed. Then, if a fault is encountered and located in a given module, this fault may be removed or patched (adaptive procedure) to proceed with the testing. Run Time versus Think Time: Run (test) time models penalize devel- opment strategies that spend more desk (think) time analyzing the program than in testing. Calendar time captures both of these activi- ties (think and run times) but this time measurement is weak. Specific Prior Distribution: Some modelers have attempted to deal with the reality of different failure rates for different faults by assum- ing a prior distribution and then using a Bayesian model for the reliability. The form of such a prior is selected for mathematical rea- sons, in order to obtain a closed form solution for the correspon- ding posterior. Reliability Growth Continues with Additional Testing Time: It is implicitly assumed that, as test time proceeds, new faults will be uncovered and removed. Hence, the software reliability will increase. This precludes the intro- duction of new faults as well as the increase in program complexity by the maintenance operation. Seeded Faults Have the Same Failure Rate as Indigenous: In this approach to software reliability, the developer intentionally intro- duces a number of faults in the program (e.g. fault seeding). Then the testing team uncovers some of them during testing, along with other indigenous faults (not seeded, but actual programming ones). Based on the number of indigenous and seeded faults uncovered, an estimate of total number of program faults is obtained. This estimation approach assumes that the com- plexity and location of seeded faults are the same as the complex- ity and program location of the indigenous faults. This is not nec- essarily so. Software Input and User Profiles are Known and Representative: Some software reliability models use input domain profiles, which are difficult to establish. Then, some users exercise some parts of the code more than others do, establishing a particular user pro- file. Estimating such profiles con- stitutes a complex statistical prob- lem, involving multivariate param- eter estimation and goodness of fit tests. In addition, these profiles are user dependent, requiring one for each different user. Failure Data Collection is Accurate: In software develop- ment, the basic activity is to devel- op good code. Data collection is usually a peripheral activity that programmers are assigned, in addi- tion to their work. Data collection forms (problem reports, etc.) are complicated and data elements such as exact times, etc. may not be recorded accurately. In addition, other issues associated with software development and model use, impact software reliability model assess- ment. Some of them are: Software Doesnt Wear Out with Time: This has always been a key difference between software and hardware reliability. Hardware devices age. Software also ages just in a different way! As time proceeds and software maintenance occurs, new func- tions, modules, hardware, capabil- ities, are added/modified. Its com- T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F i r s t Q u a r t e r - 2 0 0 0 4 plexity increases to the point, that it becomes more economic to retire it than to continue main- taining it. This process is, concep- tually, akin to hardware aging processes. Development Phases and Fault Exposure: In different software development phases, different types of faults are uncovered. Hence, combining failure data from different phases for model input may be detrimental to the overall software reliability estima- tion. Experimentation: Many software development experiments under- taken to assess software reliability models and methods have been implemented using very specific problems and subjects. This situa- tion poses restrictions on the extrapolation of results. The experimental subjects are usually students or pre-selected profes- sional teams. The problems are usually theoretical, or replications of available real life ones. Neither has been randomly selected and may be far from representative. Experimental results are still use- ful and provide valuable insight, but care should be exercised in their interpretation and especially in extrapolation. Additional Factors Not Accounted For: Most software models are affected by project requirements, software environment and docu- mentation, user profile, program- mer and management experience, and other factors not accounted for in the model. Their contribution to the unreliability of the software program is therefore not reflected in the model results. Initial Reliability Estimations: In some software models, initial esti- mates are a function of (1) the processor speed, (2) programming language used (via expansion rates) and (3) error exposure rate. Program size cancels out and does not constitute a factor at this early time. Other previously mentioned factors that also affect software complexity and reliability are not included in this initial estimation. Fault Exposure Ratios: These ratios, used for initial estimation, were obtained several years ago. In a rapidly advancing area such as software programming, where new languages, new environments (e.g. visual) and technologies are coming out every day, such fault exposure ratios may no longer be representative. In addition, they were obtained from specific envi- ronments and projects of the past and may not represent the projects and new application areas of today. Language Exposure Ratios: These ratios are subject to the criticism made of fault exposure ratios. In addition, new programming lan- guages (e.g. Java) have appeared recently for which accurate lan- guage exposure ratios may not yet be available. Having a Large Pool of Software Reliability Models from Which to Choose: This constitutes an addi- tional and serious problem. Since no single model has been com- pletely established, software developers must choose one. For example, the developer may try fitting several models and then choose the most accurate among them, based on past behavior. However, can one be sure that past behavior always guarantees a models correct future behavior? Model selection is not an easy task. Some Suggestions to Improve Software Reliability Modeling Software reliability models are used to assess the end result of the software development process. This final result (program) is a function of at least three broad factors: people, project and envi- ronment. The people include program- mers, management, testers, etc. The project is represented by its characteris- tics: size, complexity, requirements, functions, interfaces, etc. The environ- ment includes all the characteristics of the software development shop: manage- ment style, software tools, methods, etc. This author believes that, in addition to using and improving the software relia- bility models, resources should also be dedicated to obtaining a better under- standing of ones own software organiza- tion (strengths and weaknesses) and to improve it by training its people and readjusting its methods. In-depth forensic analysis of an organiza- tions past work will reveal its strong and weak points. It will also provide for assessing key components of the three factors mentioned in the previous para- graph. This author also proposes that error prevention, rather than correction, be emphasized. Organizational improve- ments based on the results of forensic analysis may provide substantial mid and long-range software reliability gains. The development by the Software Engineering Institute of the Capability Maturity Model (SEI/CMM), with its five-level classification of software shops (from basic practices to continu- ous, quantitative improvement process) is a tacit recognition of the urgent need for organizational improvements. It is known that about 70% of software faults result from problems introduced during the requirements definition and design phases (including software reuse problems). Dedicating more time and staff to better understanding, stating and conveying to programmers the special requirements and design issues of each project will pay off at the end. Better training, software tools and programmer and resource time management may also contribute to reducing programming stress and thus software errors. T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F i r s t Q u a r t e r - 2 0 0 0 5 Finally, forensic analysis may also show that improvements are needed for impor- tant concepts such as fault avoidance and prevention techniques (tools, training), fault tolerance (recovery blocks, n-ver- sion programming), fault detection/cor- rection (walkthroughs). Conclusions This author recognizes the excellent work of many software reliability model developers. He has worked with some of them in industry and academia and respects their many talents and achieve- ments. There are ample examples in the software engineering literature that show how software reliability models, directly and indirectly, have: Improved the software estimation, assessment, prediction, etc. processes Improved the understanding of the software development process Contributed to the development of new software engineering tools and techniques However, it is also unquestionable that the problems discussed in this article affect model efficiency and accuracy and are at the heart of the dissatisfaction of some model users. For, in general, soft- ware reliability models are based upon: Poor (few, weak measurement scale) data Weak (invalid, incomplete, unre- alistic) assumptions Incomplete (lacking many impor- tant) factors Therefore, current software reliability models can only provide an approxima- tion of the results that we want and need. All things considered, however, and under the circumstances discussed in this article this is the best they can do, at this time. Therefore, this author also propos- es that the models be improved. This can be accomplished by seriously revisiting the modeling problems discussed in this paper. By approaching the stated model- developer versus model-practitioner dichotomy in a constructive manner, all parties may agree to: Not ask or expect unrealistic results from software reliability models Have practitioners work more closely with software model developers Provide an incentive for adapting (as opposed to developing new) models Assess each software organiza- tions strengths and weaknesses Improve, correspondingly, each organization (programming/ processes) Strive for error prevention rather than error correction Then, use a software reliability model, judiciously, to assess improvement. Bibliography 1. Handbook of Software Reliability Engineering. Michael R. Lyu, Editor. IEEE/Computer Society Press-McGraw Hill. 1995. 2. Software Reliability: Measurement, Prediction, Application. Musa, J., A. Iannino and K. Okumoto. McGraw Hill. 1987. 3. Software Reliability: Models, Assumptions, Limitations and Applicability. Goel A., IEEE-TR Software Engineering. Vol. 11. December 1985. 4. Discussion of Statistical Measures to Evaluate and Compare Predictive Quality of Software Reliability Estimation Methods. J. L. Romeu. Proceedings of the 1997 Biennial Meeting of the International Statistical Institute (ISI). Istanbul, Turkey. 5. Classifying Combined Hardware/ Software Reliability Models. Romeu, J. L. and K. Dey. Proceedings of the 1984 RAMS Conference. San Francisco, CA. 6. Some Measurement Problems Detected in the Analysis of Software Productivity Data and their Statistical Consequences. Romeu, J. L. and S. Gloss-Soler. Proceedings of the 1983 COMPSAC Conference. Chicago, IL. RAC Points of Contact Points of contact have been assigned for many of the services and products available from RAC. For the convenience of our cus- tomers and Journal readers, a list of all of our contacts is provided below. Please contact the indicated individuals for information and for answers to your questions on the topics as noted. RAC Point of Contact Topic Dave Dylis 315-339-7055 ddylis@iitri.org PRISM and Data collection and sharing Ned Criscimagna 301-918-1526 ncriscimagna@iitri.org RAC Journal Mary Priore 315-339-7135 mpriore@iitri.org RAC Web Site Bruce Dudley 315-339-7045 bdudley@iitri.org or rac@iitri.org Technical Inquiries Nan Pfrimmer 800-526-4803 npfrimmer@iitri.org Training Gina Nash 800-526-4802 gnash@iitri.org Product Catalog and orders