T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r S e c o n d Q u a r t e r - 2 0 0 3 3 ments in reliability can result in savings in the hundreds of mil- lions to billions of dollars. The five areas addressed in this article can provide a significant step toward achieving high reliability. Many people perceive that high levels of system reliability have to be very costly to achieve. This perception can be based on the notion that only expensive or militarized components provide high levels of reli- ability and that higher reliability equates to significant increases in testing and delays in schedule. We must change this percep- tion. When engineering-based reliability improvement tech- niques are performed as part of the design and development process, high reliability can be cost-effectively achieved. About the Authors Dr. David E. Mortin is Chief of the Reliability Branch at the U.S. Army Materiel Systems Analysis Activity, Aberdeen Proving Ground, MD. He has a B.S. in aerospace engineering from the State University of New York at Buffalo, an M.S. in statistics from the University of Delaware, and a Ph.D. in reliability engi- neering from the University of Maryland. Stephen P. Yuhas is the Reliability and Maintainability Director at the U.S. Army Evaluation Center. He holds a B.S. in mathematics from Pennsylvania State University. He has also completed exten- sive graduate studies in operations research/industrial engineering at Penn State and in statistics at the University of Delaware. Dr. Michael J. Cushing is a technical advisor for the U.S. Army Materiel Systems Analysis Activity. He has a B.S. in Electrical Engineering from Johns Hopkins University and an M.S. and Ph.D. in reliability engineering from the University of Maryland. Methods for Reducing the Cost to Maintain a Fleet of Repairable System By: Larry H. Crow, Alion Science and Technology Introduction When a fleet is first deployed, the economic life and useful life parameters are often not known. However, as the fleet ages, spares usage, repair frequency, reliability, and cost information become available that may be used to estimate these parameters. Specific problems receiving increased attention as systems age are: 1. Cost to maintain a fleet due to repair and overhaul. 2. Maintaining the mission reliability requirements. 3. Determining the optimum repair and overhaul strategy to minimize life cycle cost. 4. Determining the wearout profile for a fielded system. 5. Determining corrective actions for fielded systems to upgrade reliability and reduce cost. In this article we present two methodologies designed to provide information based on data that will help make decisions on these issues. One methodology is concerned with minimizing total life cycle costs due to repair and overhaul. The other methodology is concerned with corrective actions and in-service reliability growth to increase reliability and therefore reduce the cost of failures and overhauls. Specifically, the minimum life cycle cost methodology addresses issues 1, 2, 3, and 4, and the in-service reliability growth methodology addresses issues 1 and 5. In many cases, the approach to sustaining a given system fleet may differ from the approach for another fleet of the same system. For example, the sustainment policy for one fleet of helicopters may require periodic general overhaul for the entire helicopter, whereas the sustainment policy for another helicopter fleet may only have overhaul at the subsystem and LRU levels. Consequently, to address repair and overhaul criteria appropriate for a system in a fleet, a methodology must be applicable to all levels of potential repair and overhaul options. Therefore, the methods discussed in this article apply at the complex repairable system, subsystem, and LRU levels. The terminology "system" is used to reflect any of these applications, and the only assumptions are that a system is complex, repairable, and satisfies the Power Law reliability model assumption discussed in the next section. Notation Scale parameter, Power Law model Shape parameter, Power Law model s System failure intensity A Type A mode failure intensity B Type B mode failure intensity GP Growth Potential failure intensity P Projected failure intensity u (t) Intensity function t System age N(t) Number of failure at system age t Tj Total operating time for jth system TUL System useful life Xi,j Age at ith failure for jth system K Number of systems in sample D Number of Intervals MIq Distinct Type B modes in qth interval M Total Number of Distinct Type B modes C1 Average cost of repair C2 Cost of overhaul To Optimum overhaul time to minimize cost T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r S e c o n d Q u a r t e r - 2 0 0 3 4 System Reliability Model Both the minimum life cycle cost methodology and the in-serv- ice reliability growth methodology assume that systems under consideration are complex, with many failure modes, and are repaired upon failure. If a repair just restores the system to oper- ation it is called "minimum repair". Under minimum repair the system reliability after the repair is the same as the system relia- bility before the failure. Based on these assumptions, the under- lying system failures follow the non-homogeneous Poisson process with intensity u(t). Also, the reliability analysis of repairable systems under customer use will involve data generat- ed by multiple systems. Crow, References 2 and 4, developed the Power Law Non-homogeneous Poisson Process (NHPP) as a model for complex repairable systems and presented procedures for analyzing data from multiple systems. This model is widely used and is the standard model for repairable systems in International Electrotechnical Commission standards. Under the Power Law NHPP the intensity u(t) is u(t) = t-1 (1) where t > 0 is the system's age and , > 0 are parameters. Also, for the Power Law NHPP model the mean value function E[N(t)] = t t > 0 (2) is the expected number of failures for a system during its oper- ating time (0, t). To perform the analyses discussed herein, we need failure data for K systems chosen at random from the fleet population. Each of the K data sets starts at system age 0 and represents a sequence of fail- ures and repairs. If the systems are overhauled, then each cycle starts at time 0, initialized after an overhaul, and each failure time is the total accumulative operating time at failure during the over- haul cycle. System age t is the accumulated operating time since overhaul. If the systems are not overhauled, then the age 0 begins when the system is deployed into the fleet and age is the accumu- lated operating time since deployment. In both cases, the data for the jth system consists of the failure times Xi,j, i = 1,...,N(Tj), where N(Tj), is the total number of failures for the j-th system, and Tj is the total accumulated time, j = 1,...,K. The failure times Xi,j are the accumulated age at failure so the X1,j < X2,j <...< X N(Tj),j . Note also that the total accumulated time Tj may or may not corre- spond to a failure time. If XN(Tj),j = Tj the data are failure truncat- ed, and if XN(Tj),j < Tj the data are time truncated. Note that for = 1, we have the homogeneous Poisson process, and a constant intensity of failure. For > 1, u(t) is increasing and the successive interval between failures. Xi,j - Xi-1,j are tend- ing to decreasing, which is characteristic of wearout. For < 1, u(t) is decreasing and the successive interval between failures. Xi,j - Xi-1,j are tending to increase, which is characteristic of qual- ity and manufacturing issues. Also, for a system of age t we are often interested in the proba- bility that the system will go to age t+b without failure. This is mission reliability for a system of age t and mission length b. For many systems maintaining a minimum level for the mission is an important consideration in costs, maintenance and over haul strategies. For the Power Law NHPP the mission reliabili- ty is given by (3) A special application of the Power Law NHPP is for reliability growth. The Power Law NHPP is the basis for the Crow (AMSAA) model developed in Reference 2. The Crow (AMSAA) model will be applied as part of the analysis for the in-service reliability growth analysis. We will also apply esti- mation, goodness of fit tests, confidence intervals and other pro- cedures given in Reference 4 for the application of this model in this article. For a repairable system with > 1, we discuss two options for reducing Life Cycle costs: · By an optimum choice of overhaul schedule · By reliability growth For = 1, overhaul may not be necessary, and the option to reduce costs may be reliability growth. An example will be given illustrating this case. Minimum Life Cycle Cost Model One consideration in reducing the cost to maintain a fleet of sys- tems ( > 1) is to establish an overhaul policy that will minimize the total life cycle cost of the system. This says there is a point in which it is cheaper to overhaul a system and return it to the fleet than to continue repairs. What is the overhaul time that will minimize the total life cycle cost, considering repair cost and the cost of overhaul? This solution for a general NHPP is given in Reference 1. Applying this solution to the Power Law NHPP model with parameters , , and average repair cost C1and over- haul cost C2, the optimum overhaul time To that will minimize the life cycle cost of the system is given by: (4) The value T is called the economical life of the unit and is the operating time when the average cost of operation per unit time is minimum. The mission reliability and economical life To are of particular interest when > 0 and is the main focus in this arti- cle. In particular, for > 0, R(t) is decreasing as t increases. If the mission reliability R(t) must be greater than a certain level, say R0 then the time TM when R(TM) = R0 is the mission life of the unit. Useful life is the minimum of To and TM. ]t-b) (t[- e R(t) + = /1 1 2 o 1)C-( C T = T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r S e c o n d Q u a r t e r - 2 0 0 3 5 To apply this useful life model, the parameters and must be estimated or assumed known. The estimation of these parame- ters is addressed next. Estimation for Minimum Life Cycle Cost Model We estimate the model parameters and based on the failure data from K systems. The maximum likelihood (ML) estimates of and References 2, 4, are values and given by (5) (6) In general, these equations cannot be solved explicitly for and , but must be solved by iterative procedures. Once we have the estimates and , the ML estimate of the intensity function is: (7) the ML estimate of the mean value function is (8) and the ML estimate of the mission reliability function is (9) The estimate for the optimum overhaul time to minimize life cycle cost is given by (10) This is the estimated economic life of the system, and is the point where the average repair cost is minimized. The desired esti- mated useful life is the minimum of and , (where for mission length b) (11) and R0 is the required minimum mission reliability. Example for Minimum Life Cycle Cost Model Suppose we consider 11 systems selected at random from a fleet. (A small number of systems are used for the example in order to illustrate the methods. In practical applications a much larger data set would be analyzed. This is hypothetical data and does not represent any actual system.) The nominal overhaul cycle is 1,500 hours, but the actual over- haul time will often vary. The history of these systems for the last complete overhaul cycle are given in Table 1. Table 1. System 1 Data Applying Equations (5) and (6), the ML estimates of and are = 0.000444, = 1.064. Because this is a small sample size we use an upper confidence interval (CL) on given in References 2 and 4. A 95% upper CL on is * = 1.774, and using this in (5) we estimate by * = 0.000002558. For an average repair cost of C1 = $29,860 and an overhaul cost of C2 = $100,000, the optimum overhaul time to minimize life cycle cost is estimated by Equation (10) as = 3,237 hours. This is the economic life overhaul time that will minimize the system total life cycle costs due to repairs and overhead. The mission time is b = 3 hours, and the minimum mission reli- ability requirement is 0.995. At 3,237 hours the mission relia- bility is estimated to be 0.993. This is less than the requirement. At 1,500 hours the mission reliability is estimated to be 0.996. This means the useful life for overhauls is between 1,500 and 3,237 hours. At 2,060 hours the mission reliability is 0.995. Because we are using an upper bound on this preliminary analysis indicates that the useful life overhaul time is greater than the current 1,500 hours, but probably less than 3,237 hours. For a preliminary annual cost savings analysis the average num- ber of hours per year on the fleet population of systems is approximately 79,000 hours. For the current nominal overhaul time of 1,500 the expected number of failures between over- hauls, given by (8), is 1.10. The average cost of these failures is (1.10)($29,860) = $32,914. There is one overhaul per cycle at a cost of $100,000. Therefore, the average total cost per 1,500 hour cycle is $132,846, and the average cost per hour is ($132,846/1,500) = $88.56. For an overhaul time of 2,060 hours the corresponding average cost per hour is $76.66, or a savings ^ ^ , )( )N(T ^ K 1j ^ K 1j j = = = j T = = = = = K 1j K 1j )N(T 1i ji,j ^ K 1j j j Xln - )Tln ( ^ )N(T ^ j T ^ ^ ^ ^ ,t ^^ (t)u^ 1- ^ = ,t ^ [N(t)]E^ ^ = ,e (t)R^ ]t ^ -b)(t ^ [- ^^ + = . 1)C- ^ ( ^ C T^ ^ /1 1 2 o = UL T^ o T^ M T^ ,R )T^ R( 0M = System Failures OH Time 1 68,1137,1167 1,268 2 682,744,831 1,300 3 845 1,593 4 263,399 1,421 5 - 1,574 6 - 1,415 7 598 1,290 8 - 1,556 9 - 1,426 10 730 1,124 11 - 1,568 ^ ^ o T^ T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r S e c o n d Q u a r t e r - 2 0 0 3 6 per hour of $11.90. Assuming 79,000 hours annual usage, this translates to a yearly savings of $940,100. The overhaul time of 3,237 hours gives an average cost per hour of $70.65. The annu- al cost savings for this overhaul time is approximately $1.4 mil- lion. This is the maximum savings for any choice of overhaul times and gives the minimum overall system life cycle cost. The estimated overhaul time of 2,060 hours gives the maximum cost savings considering the constraint of maintaining a mini- mum mission reliability of 0.995. Of course, a thorough analy- sis would require applying these methods to a larger sample size. Approach for In-Service Reliability Growth In this article we adapt a reliability growth projection model developed by Crow in Reference 3 for application to in-service reliability growth for a fleet of systems. During reliability growth testing there is a test phase, of length T, say, and a corre- sponding chronological order of failures and identification of candidate problem modes for corrective action. We assume the corrective actions are delayed until time T. The projection model assumes that the system failure intensity is constant, say, S, during this testing over time T, and then jumps to a lower value due to the incorporation of corrective actions. The projection model estimates this lower failure intensity value based on information from the test. A projection allows the var- ious corrective actions options to be assessed. The scope of this article is on illustrating the adaptation and application the pro- jection model to in-service fleet reliability growth. For specific information on the derivation and background on the projection model the reader is referred to Reference 3. As described in Reference 3, the reliability growth projection model utilizes the concepts of Type A modes, Type B modes, and effectiveness factors. Assume that any corrective actions are the result of reviewing the T hours of data and observing failure modes. Management can make one of two possible decisions regarding each observed failure mode, either not fix or fix the failure mode. Therefore, the management strategy places failure modes into two categories called Type A and Type B modes. Type A modes are all failure modes such that if seen in the data no corrective action will be taken. This accounts for all modes for which management determines that it is not economically or otherwise justified to take corrective action. Type B modes are all failure modes such that when seen in the data a corrective action will be taken. Thus, the management strategy is to parti- tion the system into A parts and B parts. During reliability growth testing the model assumes that the total system and the A and B parts have corresponding Failure Intensity S, A, and B, respectively. That is S = A + B. The average intensity A for Type A modes will not change. With the management strategy, reliability growth can only be achieved by decreasing the Type B failure intensity B. It is also clear that we can only decrease that part of the Type B mode average failure intensity B that we have seen. In addition, once a Type B mode is in the system it is rarely totally eliminated by a corrective action. In particular, after a Type B mode is found and fixed, a certain fraction of the failure mode average failure intensity will be removed, but a certain fraction will generally remain. A fix effectiveness factor (EF) is the fraction decrease in a prob- lem mode Average Failure Intensity after a corrective action. Studies indicate that an average EF of about 0.70 is typical for a reliability growth program during development. However, indi- vidual EF's for the failure modes may be larger or smaller than the average. The baseline failure intensity S, is the current intensity and we wish to project the decrease in S due to correcting the Type B modes. The A mode, B modes and EF management strategy, may be changed in order to reached a desired reliability objective. For discussions in this article, we assume an average effective- ness factor for the corrective actions so that the projection model takes the form P = A + (1-d) B +dh(T) (12) where P is the lower failure intensity due to the corrective actions, T = total test time, and h(T) is the rate in which new Type B modes are being uncovered. Under the reliability growth projection model h(T) = T(-1), is the intensity of the Crow (AMSAA) model applied to only the first occurrences of Type B modes. (This application to the first occurrences is a very important point and will be addressed later in this article.) Following Reference 3 the following term is the Growth Potential failure intensity. GP = A+ (1-d) B (13) The Growth Potential Average Time Between Failure, 1/GP, is the maximum Average Time Between Failure that can be attained with the current management strategy. This maximum is attained when all Type B failure modes in the fleet have been seen in the data set and corrected. The function h(T) is also the failure intensity for all Type B failure modes in the fleet that did not appear in the current data set. The growth potential is attained when h(T) is zero. Application of the In-Service Reliability Growth Model: Wearout Case In the interest of space, the application of the in-service reliabil- ity growth projection model will be illustrated by example. Steps are given which can be followed as a template for general application. The example illustrates the application to a system that wears out and is overhauled. A statistical goodness of fit test strongly indicates that these systems follow the Power Law process as discussed earlier. T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r S e c o n d Q u a r t e r - 2 0 0 3 7 The system in this example (System 2) is overhauled but does not have a fixed overhaul interval. The prime cost savings strategy under consideration is to increase reliability and reduce failures. (This is hypothetical data and does not represent any actual system.) STEP 1. We first obtain a sample of size K from the fleet. A cycle is a complete history from overhaul to overhaul. For our data set, K = 27 systems are chosen at random. For these sys- tems the failure history for the last completed cycle is recorded. This is the random sample of data from the fleet. See Table 2. These systems are in the order they were selected. Table 2. System 2 Data The failure intensity parameter S of interest is the average num- ber of failures per cycle operating hour. The total accumulative operating time is 52,110 hours, with 37 failures. The baseline parameter S is estimated by = 37/52,110 or 0.00071. This esti- mates that for each system hour of operating time in the fleet, there is 0.00071 failures, or a failure every 1408 hours, on average. STEP 2. In order to apply the projection model we put the N = 37 failure times on an accumulative time scale over (0, T), where T = 52,110. In the example each Ti corresponds to a failure time Xi,j. This is often not the situation. However, in all cases the accumulated operating Yq at a failure time Xi,r is and q indexes the successive order of the failures. In the exam- ple, N = 37, Y1 = 1,396, Y2 = 5,893, Y3 = 6,418, Y37 = 52,110. See Table 3. Table 3. Ordered System 2 Data and Failure Mode Classification Suppose our objective is to lower S by selective corrective actions to the systems based on information in the sample. The projection model estimates this lower value, P. Each system failure time in Table 2 corresponds to a problem and a cause, that is, a failure mode. The management strategy can either not fix the failure mode-Type A, or fix the failure mode- Type B. To apply the projection methodology, each accumulat- ed operating time in Table 3 is designated as being caused by either a Type A mode or a distinct Type B mode. In this exam- ple, there are 13 distinct corrective actions corresponding to 13 distinct Type B failure modes. There are NA = 4 failures due to failure modes that will not receive a corrective action. There are NB = 33 total failures due to M = 13 distinct Type B failure modes that will be corrected. Some of the distinct Type B modes had repeats of the same problem. For example, mode B1 had 12 occurrences of the same problem. Type B mode B13 had only one occurrence of that particular problem. The objective of the projection model is to estimate the impact of the M = 13 distinct corrective actions. STEP 3. Choose an average effectiveness factor based on the proposed corrected actions and historical experience. Historical industry and government data supports a typical average effec- tiveness factor EF = 0.70 for many systems. In the System 2 application an average EF = 0.4 was assumed in order to be very conservative regarding the impact of the pro- posed corrected actions. (Continued on page 10) S ^ == += == K 1j j 1-r 1j jri,q N N and N ... 1,2, q where,T X Y System Cycle Time Tj Nj Failure Times Xi,j 1 1396 1 1396 2 4497 1 4497 3 525 1 525 4 1232 1 1232 5 227 1 227 6 135 1 135 7 19 1 19 8 812 1 812 9 2024 1 2024 10 943 2 316, 943 11 60 1 60 12 4234 2 4233, 4234 13 2527 2 1877, 2527 14 2105 2 2074, 2105 15 5079 1 5079 16 577 2 546, 577 17 4085 2 453, 4085 18 1023 1 1023 19 161 1 161 20 4767 2 36, 4767 21 6228 3 3795, 4375, 6228 22 68 1 68 23 1830 1 1830 24 1241 1 1241 25 2573 2 871, 2573 26 3556 1 3556 27 186 1 186 Totals 52110 37 - q Yq Mode q Yq Mode 1 1396 B1 20 26361 B1 2 5893 B2 21 26392 A 3 6418 A 22 26845 B8 4 7650 B3 23 30477 B1 5 7877 B4 24 31500 A 6 8012 B2 25 31661 B3 7 8031 B2 26 31697 B2 8 8843 B1 27 36428 B1 9 10867 B1 28 40223 B1 10 11183 B5 29 40803 B9 11 11810 A 30 42656 B1 12 11870 B1 31 42724 B10 13 16139 B2 32 44554 B1 14 16104 B6 33 45795 B11 15 18178 B7 34 46666 B12 16 18677 B2 35 48368 B1 17 20751 B4 36 51924 B13 18 20772 B2 37 52110 B2 19 25815 B1 T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r S e c o n d Q u a r t e r - 2 0 0 3 10 The first term of the projection model (12) is estimated by Also, For an average EF, the second term of the projection model is estimated by For d = 0.4, the estimated second term is This estimates the Growth Potential failure intensity. STEP 4. To estimate the last term dh(T) of the projection model (12) we partition the data in Table 3 into intervals. This partition consists of D successive intervals. The length of the qth interval is Lq, q = 1, ..., D. It is not required that the intervals be of the same length, but there should be several, say at least 5, cycles per interval on average. Also, let S1 = L1 , S2 = L1 + L2, ..., etc., be the accumulated time through the qth interval. For the qth inter- val we note the number of distinct Type B mode, MIq, appearing for the first time, q = 1, D. See Table 4. Table 4. Grouped Data for Distinct Type B Modes The third term is estimated by where (14) and the values and satisfy (15) (16) This is the grouped data version of the Crow (AMSAA) model applied only to the first occurrence of distinct Type B modes. For the data in Table 3 we chose the first 4 intervals of length 10,000, and the last interval of length 12,110. That is, D = 5. This choice gives an average of about 5 overhaul cycles per interval. See Table 5. Now, = 0.0033 and = 0.762, which gives Consequently, for d = 0.4 the last term of the projection model is: STEP 5. The projected failure intensity is Table 5. Example of Grouped Data for Distinct Type B Modes The projected failure intensity is This estimates that the 13 proposed corrective actions will reduce the number of failures per cycle operation hour from the current The average time between failures is estimated to increase from the current 1,408 hours to 1,877 hours. The growth potential failure intensity is Methods for Reducing the Cost . . . (Continued from page 6) 0.000077. ^ or /TN ^ AAA == 0.000633. ^ or /TN ^ BBB == . ^ d) - (1 B 0.000380. ^ d) - (1 B = Interval No. of Distinct Type B Mode Failures Length Accum. Time 1 MI1 L1 S1 2 MI2 L2 S2 q MIq Lq Sq D MID LD SD (T)h^ d 1- ^ T ^^ (T)h^ = ^ ^ [ ] [ ] [ ] [ ] [ ] [ ] = D q 1 ^ 1-q ^ q 1-q ^ 1-qq ^ q q S - S S Ln S - S Ln S MI . T M ^ ^ = ^ ^ 0.00019. T ^^ (T)h^ 1- ^ == 0.000076 (T)h^ d = or (T)h^ d ^ d) - (1 ^ ^ BAP ++= 9).0.4(0.0001 3)0.6(0.0006 0.000077 ^P ++= Interval No. of Distinct Type B Mode Failures MIq Length Lq Accum. Time Sq 1 4 10000 10000 2 3 10000 20000 3 1 10000 30000 4 0 10000 40000 5 5 12110 52110 Total 13 - - 0.000533. ^P = 0.000533. ^ to0.00071 ^ PS == 0.00038 ^ d) - (1 B = T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r S e c o n d Q u a r t e r - 2 0 0 3 11 and the estimated maximum average time between failure that can be attained with this management strategy is 2,631 hours, i.e., when h(T) = 0. STEP 6. The cost reduction associated with incorporating these 13 Type B-mode corrective actions can be calculated by considering the reduction in fleet failures. The System 2 population has an average of 440,000 fleet hours per year. Based on data it is esti- mated that 74% of the failures result in an overhaul. Currently, there is an estimated average of 231 overhauls per year. In this example, we assume each overhaul costs $60,000, for an average annual overhaul cost of $13,885,157. By increasing the average time to failure from 1,408 to 1,878 hours, under the same assump- tions, we estimate 173 overhauls per year at a cost of $10,412,969. Thus, the estimated projected annual cost savings is $3,470,000 if the 13 corrective actions are implemented. Application of the In-Service Reliability Growth Model ­ No Wearout Case We next give an example of the in-service reliability growth model to medical equipment. This system is not overhauled, operates continuously, and has a constant intensity of failure. The current interest is to reduce maintenance costs by increasing reliability. Note: For this example, the author modified all the actual real- life data by a randomly selected number. The important points are reflected in the relative comparisons of numbers in this exam- ple, and these relative comparisons are in proportion to the real- life numbers and results. However, because of the data modifi- cation, the example failure times and all the example reliability numbers have no relationship to the actual real-life numbers. The project plan for Project X identified design changes (based on field experience data) that are corrective actions for 11 dis- tinct failure modes. The project plan scope was limited to one specific assembly. These modes were labeled as Bi modes, where i = 1 to 11. Failure modes in that assembly that were not addressed and all failure modes in other assemblies were labeled A modes. These 11 corrective actions were implemented. For this example, service records of unscheduled maintenance events were used to estimate the growth statistics based on sys- tem data before any of these 11 corrective actions were imple- mented. Field data was pulled for a sample of 60 units. See Table 6 for an example of the data. The data was "cleaned" before the analysis to remove scheduled maintenance events and non-chargeable failures. For this analysis, there was a total of 241,000 hours of data, 450 total failures, 42 B-modes, and 408 A-modes. Statistical methods were applied to the data to verify constant failure intensity. Based on this data, the system, before any corrective actions, had an MTBF of 536 hours. The reliability growth projection model was then applied to this set of data for the 11 corrective actions (450 total failures, 42 total B-modes, and 408 A-modes failures), with an assumed effectiveness factor of 0.70. Based on this growth model, the estimate for the improved MTBF is 567 hours. To test the validity of the model, another set of data was analyzed after the 11 corrective actions were implemented. The original data analysis the project team used to determine the design changes was not available. Table 6. Example of Medical Device Data Data was pulled for a second sample of 60 units that had the cor- rective actions implemented. This data set had 134,000 operating hours and 237 failures. The calculated MTBF is 565 hours com- pared to the projection of 567 hours. Given the actual MTBF, the true effectiveness factor is calculated to be 0.73. When looking at the B-modes specifically, the MTBF of the B-modes improved from 5,700 hours to 11,600 hours which was considered a success even though the system reliability did not change significantly. After realizing the impacts that can be made, management want- ed to see improvements across all assemblies. A new manage- ment strategy was devised that put all modes on the list for improvement. The A-modes have been converted into an addi- tional 11 B-modes giving the system a total of 22 B-modes. With the demonstrated effectiveness factor of 0.73, we should expect to see the MTBF grow from 565 hours to 2,571 hours. A Life Cycle Cost (LCC) analysis shows that this improvement in reliability has the potential to yield huge savings in service costs and easily justifies the effort. The example on the application of these methods to a medical sys- tem was developed in collaboration with Siemens Medical Systems. Siemens noted that: "The increased effectiveness fac- tor was achieved with the help of RAC. This included audits of our processes and the establishment of a Reliability Program that included derating, reliability prediction with PRISM, and HALT testing. We are continuing to work on our processes to sustain our effectiveness factor and hopefully improve upon it. The Process Grade Factors in PRISM is a tool we are using to point out improvement areas. We are also spending more time setting tar- gets since the targets are very aggressive. Reliability growth will be used more extensively throughout the TAAF phase." Acknowledgements I would like to thank Jim Harriett and Lonnie Scott, with the reli- ability team at the Cherry Point Naval Depot for implementation Cum Hours Mode Cum Hours Mode 7661 A 10556 A 7979 A 12498 A 8614 B1 12886 A 9568 A 12886 A 9568 A 12886 A 9885 A 13663 A 9885 A 14934 B2 10556 A T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r S e c o n d Q u a r t e r - 2 0 0 3 12 of the methodologies to Navy systems. I thank Paul Oldham, IMMC, Redstone Arsenal, for implementation of the methodolo- gies to Army systems. I would also like to acknowledge Kevin O'Shaughnessy at the Reliability Analysis Center for his excel- lent support, and for being instrumental in the application of these methods at Naval Depots. I acknowledge Pete Hurley of Siemens Medical Systems for the example on the application of these methods to a medical system. I would also like to thank Adamantios Mettas of the ReliaSoft Corporation for the excel- lent application of ReliaSoft software to perform the growth analyses on several large data sets used in the examples. References 1. R.E. Barlow and F. Proschan, Mathematical Theory of Reliability, John Wiley & Sons, Inc., 1967. 2. L.H. Crow, Reliability Analysis for Complex, Repairable Systems, in Reliability and Biometry, ed. by F. Proschan and R.J. Serfing, pp. 379-410, 1974, Philadelphia, SIAM. 3. L.H. Crow, Reliability Growth Projection from Delayed Fixes, Proceedings 1983 Annual Reliability and Maintainability Symposium, pp. 84-89. 4. L.H. Crow, Evaluating the Reliability of Repairable Systems, Proceedings 1990 Annual Reliability and Maintainability Symposium, pp. 275-279. About the Author Larry H. Crow is VP, Reliability & Sustainment Programs, at Alion Science and Technology, Huntsville, AL. Previously; Dr. Crow was Director, Reliability, at General Dynamics ATS (for- mally Bell Labs ATS). From 1971-1985, Dr. Crow was chief of the Reliability Methodology Office at the US Army Materiel Systems Analysis Activity (AMSAA). He developed the Crow (AMSAA) reliability growth model, which has been incorporat- ed into US DoD handbooks, and national & international stan- dards. He chaired the committee to develop MIL-HDBK-189, Reliability Growth Management and is the principal author of that document. He is the principal author of the IEC International Standard 1164, Reliability Growth-Statistical Tests and Estimation Methods. Dr. Crow is a Fellow of the American Statistical Association, the Institute of Environmental Sciences and Technology, and the recipient of The Florida State University "Grad Made Good" Award for the Year 2000, the highest honor given to a graduate by Florida State University. Larry H. Crow, Ph.D. Alion Science and Technology 215 Wynn Drive, Suite 101 Huntsville, AL 35805 Internet (E-mail): PRISM Column At no charge to RAC PRISM users, the RAC provides an open training course to teach users the most efficient and effective uses of the software. This July, RAC will present its eighth open training course on PRISM. The course has evolved over the past three years into a comprehensive program providing a brief introduction to reliability, in-depth instruction on PRISM analy- sis and techniques, and the theory behind the PRISM models. This evolution paralleled the evolution of the software. Users will leave this hands-on training course with the knowledge and expertise to effectively utilize PRISM. This training course is specifically designed to assist users in making the transition from novice to journeyman PRISM User. This two day training course begins with a basic overview of reliability and it's evolution over the past fifty years. Building on this foundation, users learn to develop a basic parts count analysis, expand into the development of a detailed stress analysis, and move on to include experience data for performing a Bayesian analysis. Other techniques for building comprehensive analyses include the application of the Process Grading methodology introduced with PRISM and the importing techniques used to make predic- tion development easier. Using this training technique, the PRISM staff has trained users from over 100 companies and organizations around the globe. Users who have participated in the course have had the follow- ing to say about it. · "This was an excellent overview." · "I came in knowing little or nothing about PRISM or how to use it. Now I can generate reliability predictions and have a basis for understanding the theory behind the PRISM models." · "I feel much more able to do a credible analysis now." · "I have a much better understanding of PRISM, both how to use (it) and what it `means'. I also now know where to find detailed background info." · "Thanks ­ This is my beginning education on prediction. (I) have always resisted before due to lack of confidence in the end result." Registration for any open RAC PRISM training course is free of charge to licensed PRISM users. Individuals who are not cur- rently licensed users can purchase the PRISM software for $1,995 ($2,195 International) and attend the course at no charge. To register for a PRISM Training Course go to . Course seating is limited to 20 individuals. The RAC can also provide on-site training. On-site training is provided at no charge when 6 or more copies of the PRISM software are purchased. If you have any questions, feel free to contact the PRISM training coor- dinator, Gina Nash at (315) 339-7047. For more information on PRISM feel free to contact the PRISM team by phone (315-337-0900), by E-mail () or through the PRISM Forum ().