RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center and Operated by IIT Research Institute INSIDE T h e J o u r n a l o f t h e 4 Statistical Analysis of Reliability Data, Part 2: On Estimation and Testing 9 Industry News 10 Product Liability in the New Acquisition Environment: A Topic Requiring a Partnered Solution from the Military and Its Contractors 21 Future Events 22 From the Editor Reliability Analysis Center Third Quarter - 2001 Introduction The author of this paper has given seminars to the FAA and commercial airlines on the subjects of Probability, Reliability, and Markov Analysis. This experience revealed some misunderstand- ings among the engineering community con- cerning Fault Tree Analysis (FTA) and Markov Analysis (MA). The confusion is not due to the communitys lack of talent or interest, but pri- marily to a lack of good publications on these subjects written in a clear common language. FTA and MA are two major methods used for calculating the probability of failure (Pf) of elec- trical and electronic systems. In an attempt to eliminate some of the confusion, this paper com- pares the two methods and discusses why and when one should be used and not the other. Calculating Probability of Failure Four basic tasks for calculating system Pf are: 1. Clearly identify the undesired event. 2. Perform a qualitative analysis by con- structing a model of the sequence of events leading to the undesired event. This model accurately describes the logic flow of the entire process leading to the event. Does the undesired event involve a component failure? Do two or more com- ponents need to fail in some sequence? Do certain components need to fail during a certain phase of the mission? In short, a qualitative model describes in detail the logic flow of the entire process leading to an undesired event. 3. Perform a Reliability Prediction for the component piece parts. 4. Perform a quantitative analysis by con- structing a mathematical model (a set of equations based on the logic derived from the qualitative model), and calculat- ing the probability of the undesired event over a specified time interval. FTA Limitations Traditionally, tasks 2 and 4 have been performed using FTA, the most commonly known and uti- lized method. However, what is not commonly known is that FTA has two major limitations. (Note: In order to understand one of these limi- tations, the engineer must understand the con- cept of combinatorial vs. non-combinatorial problems. One of the objectives of this paper is to enhance the readers understanding of this concept with the help of example problems.) The two major FTA limitations are: 1. With respect to electrical or electronic systems FTAs do an excellent job with tasks 2 and 4 with combinatorial prob- lems. However, FTAs have difficulty with both when dealing with non-combi- natorial problems. 2. With respect to systems utilizing mechan- ical devices, while FTAs can be used effectively for task 2, they have much dif- ficulty with task 4. Calculating Pf of sys- tems with mechanical devices requires other methodologies. It is a subject unto itself and is not addressed in this paper. Pertinent Excerpts The following excerpts pertain to calculating Pf of electrical and electronic systems. Excerpt from FAAs ARP4761 Issue 1996-12: a. It is difficult if not impossible to allow for various types of failure modes and Calculating Probability of Failure of Electronic and Electrical Systems (Markov vs. FTA) By: Vito Faraci Jr., Lockheed Martin Fairchild Systems T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r - 2 0 0 1 2 dependencies such as transient and intermittent faults and standby systems with spares. b. A fault tree is constructed to assess cause and probability of a single top event. In some situations it may be diffi- cult for a fault tree to represent a system completely, e.g., repairable systems and systems where failure/repair rates are state dependent. Markov Analyses (MA) do not pos- sess the indicated limitations. The sequence dependent events are included and handled naturally, and therefore cover a wider range of system behaviors. The complexity and size of systems are rapidly increasing with new advances in technology. Aircraft systems are relying more and more on fault tolerant sys- tems. Such systems hardly ever fail completely because of con- tinuous monitoring of their condition and instantaneous recon- figuration of the systems. Given this scenario of fault tolerance, the safety assessment process and evaluation of such a system may be more appropriately achieved by the application of the Markov technique. Excerpt from NASA Reference Publication 1348 Traditionally, the reliability analysis of a complex system has been accomplished with combinatorial mathematics. The stan- dard fault-tree method of reliability analysis is based on such mathematics. Unfortunately, the fault-tree approach is some- what limited and incapable of analyzing systems in which recon- figuration is possible. Basically, a fault tree can be used to model a system with: a. Only permanent faults (no transient or intermittent faults) b. No reconfiguration c. No time or sequence failure dependencies d. No state-dependent behavior (e.g., state-dependent fail- ure rates) Because fault trees are easier to solve than Markov models, fault trees should be used wherever these fundamental assumptions are not violated. Summary of Excerpts (Why Markov?) Basically what the preceding excerpts are saying is that the FTA approach has difficulty handling problems that involve: a. Transient or intermittent faults, b. Reconfiguration, c. Time or sequence failure dependencies, d. State-dependent behavior (e.g., state-dependent failure rates). From a mathematical point of view, a system employing any one of the above items a. through d. is considered a non-combinato- rial type system. In other words, what the excerpts are claiming is that the FTA approach has difficulty handling non-combinato- rial type problems, and suggests the use of Markov when ana- lyzing these types. Note: A pure combinatorial system (or circuit) is a system whose outputs are functions of its inputs only, with none of the charac- teristics a. through d. Introduction to Markov Analysis If a system or component can be in one of two states (i.e., failed, non-failed), and if we can define the probabilities associated with these states on a discrete or continuous basis, the probabil- ity of being in one or the other at a future time can be evaluated using a state-time analysis. In reliability and availability analy- sis, failure probability and the probability of being returned to an available state are the variables of interest. The best known state-space technique is Markov Analysis. The Markov method can be applied under the following constraints: a. The probabilities of changing from one state to another must remain constant. Thus the method can only be used when a constant failure rate is assumed. b. Future states of the system are independent of all past states except the immediately preceding one. This is an important constraint in the analysis of repairable systems, since it implies that repair returns the system to an as new condition. Typical Markov Model In the typical Markov model (see Figure 1): The model represents various system states The transition rate is a function of failure or repair rate The states are mutually exclusive The sum of the probabilities must equal 1 Figure 1. Example Markov State Diagram Markov vs. FTA Markov and FTA differ in obvious ways. For example, Markov calculates probabilities of states, while FTA calculates probabil- PST State (2) PFU State (1) PLOTC State (5) PLT State (3) PDLT State (4) ULOTC-FU UDLT-Repair ULT-Repair LLT 2LST 2LLT UST-Repair LST-LOTC-AVE LLT-LOTC-AVE 100x10-6 LST T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r - 2 0 0 1 3 ities of top level events. A less obvious difference is the ability to solve non-combinatorial type problems. Foregoing a rigorous mathematical discussion, it is sufficient to say that Markov can yield precise quantitative solutions to non-combinatorial prob- lems, whereas FTA must resort to various approximation tech- niques. Both methods can be used for combinatorial problems and yield identical solutions. The examples that follow serve to illustrate the solution of combinatorial and non-combinatorial problems. They each involve calculating the Pf of electrical devices, and therefore assume constant failure rates. Markov solutions to these problems were derived using techniques for solving simultaneous differential equations. Example 1: Two Components in Series (Combinatorial) Two black boxes, A and B, with failure rates a and b, respective- ly, start operating at the same time. System operation requires both boxes be functional. Find Pf = Probability of System Failure. Note: Full Up State = all devices operating, (n) = State Number, P(n) = Probability of State (n). Markov Model Pf = P(2) = 1 e (a + b) t FTA Approach x = 1 e at y = 1 e bt Pf = x + y xy = 1 e (a + b) t Note the solutions (Pf) are identical in both methods. Example 2: Three Components in Series (Combinatorial) Three black boxes, A, B, and C, start operating at the same time. The failure rates are a, b, and c respectively. Successful system operation requires all boxes be functional. Find Pf . Markov Model Pf = P(2) = 1 e (a + b + c) t FTA Approach x = 1 e at y = 1 e bt z = 1 e ct Pf = x + y + z xy xz yz + xyz = 1 e (a + b + c) t Note again the identical solutions. Example 3: Two Components in Parallel (Combinatorial) Two black boxes start operating at the same time. They have failure rates a and b, respectively. Successful system operation requires that either Box A or Box B be functional. Find Pf . Markov Model Pf = P(4) = (1 e at )(1 e bt ) FTA Approach x = 1 e at y = 1 e bt Pf = xy = (1 e at )(1 e bt ) Note again the identical solutions. (1) (2) a + b Full Up System Fail (Box A or B Failed) 2B N O (1) (2) a + b + c Full Up System Fail Pf x y z (1) (4) Full Up System Fail (Box A and B Failed) (2) (3) a b a b A Failed B Failed Pf x y T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r - 2 0 0 1 4 Example 4: Two Components in Parallel with Required Order Factor (ROF) (Non-combinato- rial) Two black boxes start operating at the same time. BoxAhas fail- ure rate a and Box B has failure rate b. a. What is the probability that both Boxes fail and that A fails before B. b. What is the probability that both Boxes fail and that B fails before A. Markov Model a. P(4) = a/(a + b) + [b/(a + b)] e (a + b) t e bt b. P(5) = b/(a + b) + [a/(a + b)] e (a + b) t e at FTA Approach x = 1 e at y = 1 e bt a. Pf = ˝xy = ˝ (1 e at )(1 e bt ) b. Pf = ˝xy = ˝ (1 e at )(1 e bt ) This ROF problem has a sequence failure dependency, and con- sequently a non-combinatorial type problem. As can be seen, the above results are not the same. FTA has difficulty handling such problems. Summary Fault Tree Analysis is a very effective tool used for qualitative and quantitative analyses of combinatorial type problems. It uses approximation techniques when solving non-combinatorial types, and therefore should be used with caution and with full understanding of this limitation. Markov Analysis is a very effective tool used for qualitative and quantitative analyses of combinatorial and non-combinatorial type problems. However, Markov Analysis Computer Programs tend to have a limitation on the number of states they can handle. Remember that, with respect to quantitative analyses, both FTA and MA methods must be limited to constant failure rate items and therefore are not applicable to items characterized by a haz- ard function, e.g., mechanical components that wear out over time (increasing failure rate). About the Author Vito Faraci is a mathematician by education, and an electrical engineer by trade. He has 12 years of experience with qualita- tive and quantitative analyses of Reliability, Built-In-Test, and safety-related events. He has also served as a Reliability and Markov Analysis consultant for the Federal Aviation Administration and commercial airlines. Mr. Faraci is also an adjunct math professor at New York Institute of Technology. FU (4) A fail B fail a b a b A,B fail B,A fail (2) (1) (5)(3) Pf x y 1/2 Statistical Analysis of Reliability Data, Part 2: On Estimation and Testing By: Jorge Luis Romeu, IIT Research Institute Introduction In the first article of this series, random variables (RV), distribu- tions, and parameters were discussed, and an overview of the problems of data and outliers was presented. In this article, the problems associated with sampling, estimation and testing are discussed. We have seen that every random process (or RV) has two or more outputs that follow a distinctive pattern (its distri- bution). And we have seen how such a distribution can be uniquely specified by a set of fixed values or parameters. Once these two elements are known, we can answer all pertinent ques- tions regarding the random process and thus take the necessary actions to control, forecast or affect its course. Unfortunately, in almost every practical case, the underlying dis- tribution and its associated parameters are unknown. In such cases, the best that we can do is to observe the process (i.e., sam- ple) and then use these sample observations to: Reconstruct both the distribution and the parameters that generated them (estimation) or, alternatively, Confirm or reject some educated guesses previously formed about these distribution and parameters (hypoth- esis testing).