Reliability Prediction Methods ­ An Overview T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r ­ 1 9 9 9 8 Introduction The intent of this article is to provide a basic overview of exist- ing reliability prediction methods and some of the issues sur- rounding their use. The applicability of various methods will be discussed and a new Reliability Analysis Center methodology called "PRISM" will be introduced and compared to existing methods. Mr. William Denson presents a comprehensive overview of PRISM elsewhere in this Journal. Purpose of Predictions Reliability predictions have several purposes, the primary being: · Feasibility evaluation, which seeks to determine if design reli- ability goals can be met · Comparing competing designs · The identification of potential reliability problems, such as over stressed or misapplied parts · To provide input to other reliability and maintainability tasks about the relative failure frequency of subsystems and compo- nents Empirically-based tools have commonly been used for electron- ic system-level reliability predictions; however, some critics argue that physics-of-failure methods offer a better alternative. MIL-HDBK-217 Current Status The empirically based MIL-HDBK-217, Reliability Prediction of Electronic Equipment, has been widely used over the last three decades to predict the reliability of most military and some commercial electronic systems. Until Department of Defense acquisition reform in the mid-1990's, it was contractually required on many U.S. Government contracts developing new electronic systems. Although it remains an active Department of Defense handbook, there has been no effort put forth in recent years to update or maintain the methodology due to a major Air Force reorganization that merged the former Rome Laboratory into the newly formed Air Force Research Laboratory. With this realignment came the total elimination of all reliability research which took place at the former Rome Laboratory. Since the stand-up of the new Air Force Research Laboratory, preparing activity responsibility for MIL-HDBK-217 (and sev- eral other reliability specifications and stan- dards) has been in a state of limbo, with no other Government organization willing to take on the unfunded task of maintaining the handbook. Other Empirical Methods In addition to MIL-HDBK-217, there are a number of other empirically-based methods with varying degrees of similarity to MIL- HDBK-217 models. Some of the models are unique to a given industry, such as telecom- munications or automotive. These methods include (Ref. 3-6): · Bellcore · British Telecom HRD-5 · Nippon Telephone & Telegraph · CNET (French) Physics-of-Failure versus Empirical Methods During the late 1980's and throughout the 1990's empirically- based reliability prediction methods in general, and MIL- HDBK-217 in particular, have come under heavy criticism (Ref. 1) as being inadequate. The criticisms have basically cen- tered on the following issues: · Method is not a good indicator of field reliability · Temperature cycling is not accounted for · Method does not reflect new manufacturing trends · Method does not differentiate good quality and design practices · System level factors that influence reliability are penalized (e.g., transient protection circuits) · Method is not "science-based" To replace existing methods, the critics have called for a more science based physics-of-failure method to be used. Historically, this approach has been used for mechanical stress analysis to ensure a design is of sufficient durability to exceed the required product life. However, empirically-based methods have been widely favored for overall electronic system reliability predictions. This is because of the complexity of electronic sys- tems which tend to fail for causes and at times that are impos- sible to foresee. Most components selected during a product design effort have already been designed, tested, fully character- ized, and are being manufactured on established lines by ven- dors such as Texas Instruments, Motorola, Intel, etc.; their fun- damental device design is "perfect" for the useful life of most electronic systems. This implies that all life limiting failure mechanisms far exceed the useful operating life of a typical elec- tronic system, leaving only latent manufacturing defects, com- ponent variability and misapplication to cause field failure. Figure 1: Bathtub Curve By Seymour Morris, Reliability Analysis Center T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r ­ 1 9 9 9 9 Where one stands on the "perfect component" premise greatly affects how he views the application of physics-of-failure as a tool for quantifying system reliability. The differences between the two approaches are depicted on the classic bathtub curve shown in Figure 1. Empirical methods predict a failure rate, , caused by randomly occurring failures during any period of a system's useful life. These failures occur at random times during useful life and are caused by manufacturing defects, assembly errors, component variability, and customer use variations. Physics-of-failure methods predict when a single specific failure mechanism will occur for an individual component due to wear-out (end-of-life). It assumes that the "floor" of the bathtub curve approaches zero and system reliability (life) is defined by the "weakest link" in a multitude of competing failure mecha- nisms. For system level reliability predictions, physics-of-failure is usually not a practical approach since there are numerous unique failure mechanisms (e.g., electromigration, solder joint cracking, die bond adhesion strength, die cracking, bond wire corrosion, etc.) per electronic device, each requiring a fairly complex modeling approach. This process requires very detailed knowledge of all device material characteristics, geometry, and applications which may be unavailable to system designers, or which may be proprietary. A physics-of-failure type of analysis is most beneficial at the device level where the device design can be influenced, such as with newly developed hybrid circuits, or in investigating an established prob- lem. It should be considered as a com- plementary approach to evaluate product reliability based on part selection, applied stress, historical part defect rates and system application. Table 1 summa- rizes the two approaches and when each may be appropriate for use. PRISM ­ A New Methodology PRISM is the name of a new empirically-based Reliability Analysis Center software suite that ties together several tools in a comprehensive reliability predic- tion methodology that addresses many of the shortcomings of tra- ditional methods. The PRISM concept is to try to account for the myriad of factors that can influence system-level reliability. The name PRISM comes from the concept of "focusing" all these factors into a single system reliability assessment. The premise of traditional methods is that the failure rate is primarily determined by com- ponents comprising the system. As critics of MIL-HDBK-217 and similar methods have often pointed out, increased system complexity and component quality have resulted in a shift of system failure causes away from components to more "system- level" factors including manufacturing processes, design, system requirements, interface, software and mechanical problems. Historically, these factors have not been explicitly addressed in traditional prediction methods. PRISM provides a framework to address many of these factors, including: · Electronic part failure rate models ("RACRates") for many component types. These models include factors for operating temperature, thermal cycling, power stress, manufacturing maturity, dormancy, duty cycle, humidity, and other part application characteristics. · Electronic parts reliability data (EPRD) and non-electronic parts reliability data (NPRD). These modules represent 26.1 Initial Field Deployment Data Field Data Factory Testing Paper Analysis Upper Conf. Level "True MTBF" Lower Conf. Level Time MTBF Use Empirical Methods: ·When reliability estimates are per- formed for large, complex products ·When reliability estimates are devel- oped on a quick-turnaround basis ·When there is a need to estimate the relative merits of competing designs ·When there is no way to change the fundamental design of the components ·When the only design flexibility is to select different components or limit applied component stresses Use Physics-of-Failure Methods: ·When a detailed understanding of life- limiting failure mechanisms is needed ·When new component technologies need to be assessed and no historical data exists ·For detailed component design prior to life testing and qualification ·When design flexibility exists at the component level ·To investigate the root cause of a failure Table 1: Reliability Prediction Application Figure 2: Prediction Refinement Empirical Methods Deterministic Method T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r ­ 1 9 9 9 10 Table 2: Reliability Prediction Method Characteristics Characteristics Predicts operating part failure rates for a popu- lation of devices Predicts non-operating part failure rates for a population of devices Failure rate models account for industry improvement trends (growth factor) Predicts end-of-life Constant failure rate assumption - Part failure rates can be summed Number of environ- mental categories addressed Electronic part categories covered Operating temperature considered in models Temperature cycling explicitly considered in models Electrical stress consid- ered in models Manufacturing con- trol/screening level considered Parts count method available Parts stress method available Addresses system level design and manufactur- ing factors Provides approach to analyze field data Software addressed Mechanical parts addressed PRISM Yes Yes Yes No Yes Infinite. Can specify temperature extremes, cycling rate and absolute temperatures Most with RACRate models. Very exten- sive with EPRD and NPRD data inter- face Yes Yes Yes Yes, through system level process grad- ing factors Yes Yes Yes No Yes Yes Physics-of- Failure No No No Yes No Infinite Not specific. Concept applies to all. Some Some Some No No Models address stress Some Yes No Not specific. Concept applies to all. MIL-HDBK- 217 Yes No No No Yes 14 Extensive Yes No Yes Yes Yes Yes No No No No Bellcore Yes No No No Yes 3 Most Yes No Yes Yes Yes Yes No Yes No No British Telecom Yes No No No Yes 3 Most Yes, >70°C, microcircuits only No No Yes, micro- circuits and discrete semiconduc- tors only No Yes No No No No Nippon Telegraph and Telephone Yes No No No Yes 3 Most Yes No Yes Yes No Yes No No No No French (CNET) Yes No Yes No Yes 19 Extensive Yes No Yes Yes Yes Yes No No No No MIL-HDBK- 217 Bellcore British Telecom Nippon Telegraph and Telephone French (CNET) Physics-of- Failure T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r ­ 1 9 9 9 11 In Memoriam David F. Barber Sept 12 1922 - June 28, 1999 The reliability community mourns the loss of David F. Barber, who died of a heart attack on June 28, 1999. From 1962 to his retire- ment in 1979, Dave was Chief of the Reliability Branch for the Rome Air Development Center (now the Information Directorate of the Air Force Research Laboratory), where he directed the leading DoD research and development activity in reliability and main- tainability. He served on various high level government committees, such as the Weapons Systems Effectiveness Industry Advisory Committee, and was long involved in reliability conferences such as the Reliability Physics Symposium and the Annual Reliability and Maintainability Symposium. Over the years, he was involved in virtually every conference management position, including General Chairman of both symposiums. Born in Utica, NY, in 1922, Dave earned a BA in mathematics from Hamilton College and a MS in meteorology from MIT. He served as an Air Force Weather Officer from 1942 to 1947. He was an Atmospheric Research Scientist at the Air Force Cambridge Research Laboratories from 1948 to 1951, a Research and Development Meteorologist at RADC from 1951 to 1953, then director of various classified programs until 1962, when he was selected as Chief of the Reliability Branch. Always a strong supporter of his community, Dave served 12 years on the Adirondack Central School Board. He was a past President of the Oneida-Madison-Herkimer Counties School Board Association, and a past Director of the New York State School Boards Association. He was a member of the Rome, NY, Rotary Club for 15 years. On his retirement, Dave used his experience in organizing and managing conferences to create his own consulting company, Scien- Tech Associates, which has continued to serve the reliability community, helping stage both the Reliability Physics Symposium and the Annual Reliability and Maintainability Symposium, among others. Dave was an accomplished golfer. Beside playing, he sometimes reported on major area tournaments for local radio, and, once, as a substitute announcer for CBS sports. He was playing golf when he was stricken. Dave was gregarious and friendly, always willing to help. His contributions will be greatly missed by all. Those who knew him will miss his camaraderie even more. and 2.5 trillion part hours of RAC data for electronic and mechanical parts, respectively. They provide a means to include devices that are not otherwise addressed by most exist- ing prediction methods. · An assessment of processes used in the design and manufacture of the system, including factors contributing to the following failure causes: parts, design, manufacturing, system manage- ment, induced, wearout, no defect found and software. These processes are graded with a metric that corresponds to the degree to which actions have been taken to mitigate the occur- rence of system failure due to these failure causes. · A software reliability prediction methodology and a means to factor software into the overall reliability prediction estimate. · A methodology to make refinements to the reliability as addi- tional relevant data becomes available. The approach uses Bayesian data combination to dynamically adjust the reliabil- ity estimate over time, as shown in Figure 2. Summary Table 2 provides a high level summary of commonly used empir- ically-based reliability prediction tools, along with the new PRISM methodology. Also shown for comparison purposes is the physics-of-failure approach for component life determination. References: Morris, S.F. and J.F. Reilly, "MIL-HDBK-217 - A Favorite Target," 1993 IEEE Reliability and Maintainability Symposium Proceedings, pp. 503- 509, 1993. MIL-HDBK-217, "Reliability Prediction of Electronic Equipment," 1995. 3TR-TSY-000332, "Reliability Prediction Procedure for Electronic Equipment," Issue 2, 1988. "Handbook of Reliability Data for Components Used in Telecom- munications Systems," British Telecom, Issue 4, 1987. Recueil De Donnees De Fiabilite Du CNET Collection of Reliability Data From CNET), Center National D'Etudes des Telecommunications (National Center for Telecommunications Studies), 1983. Standard Reliability Table for Semiconductor Devices, Nippon Telegraph and Telephone Corporation, 1986. Bowles, J. B., "A Survey of Reliability-Prediction Procedures For Micro- electronic Devices," IEEE Transactions on Reliability, Vol. 41, No. 1, 1992.