|
|
Raytheon Assessment of PRISM As A Field Failure Prediction Tool
By: Christopher L. Smith and Jerry B. Womack, Jr; Raytheon Company, McKinney
Summary & Conclusions
For any company interested in predicting field reli-
ability performance, finding a prediction technique
that provides a high degree of fidelity to observed
field data is essential. With the discontinuance of
military handbook Mil-Hdbk-217, Reliability
Prediction of Electronic Equipment, and the limit-
ed environmental applications of Telcordia SR-
332,
Reliability
Prediction
for
Electronic
Equipment, this paper evaluates the Reliability
Analysis Center's (RIAC) PRISM software tool as
a potential improved methodology in predicting the
field reliability of military systems. This evalua-
tion compares the PRISM predicted failure rate to
the actual observed field failure rate for three mili-
tary electronics units. While initial results showed
the predicted failure rate to be approximately one-
half of the observed field failure rate, the ratio of
predicted failure rate to observed field failure rate
was consistent across three independent systems.
Furthermore, the PRISM methodology has features
such as process grade factors and field failure data
incorporation through Bayesian analysis which
show promise in allowing a more accurate field
reliability prediction to be generated. As a point of
comparison, the initial failure rate prediction by
Raytheon is opposite to an earlier assessment per-
formed by TRW Automotive where field data was
not factored in through the use of PRISM's
Bayesian analysis option (Reference 1).
TRW Automotive found a predicted failure rate that was
twice the observed field failure rate.
This paper discusses Raytheon's assessment of the
PRISM software tool including the reason for
choosing PRISM, application of the PRISM pre-
diction methodology to three military electronic
units, and analysis of the prediction results. This
paper also discusses future plans for refinements in
the use of PRISM's features to produce a more
accurate reliability prediction of field performance.
Introduction
While Mil-Hdbk-217 was never intended as a field
reliability predictor, it remained a reliability pre-
diction mainstay in the defense industry through
the 1980's until its discontinuance in 1995. With
the increased use of commercial electronics in mil-
itary applications and the lack of periodic update,
Mil-Hdbk-217 has become notorious for generat-
ing overly pessimistic field reliability predictions.
Over the past 10 years, the defense industry has
faced a challenge in finding a field reliability pre-
diction methodology that consistently provides a
high degree of correlation with observed field data.
Techniques such as physics-of-failure, while help-
ful in examining specific failure mechanisms, tend
to be very cumbersome for complex systems, and
Telcordia SR-332 has limitations given it was
developed for commercial systems. In the past few
years, defense contractors have tended to either
extrapolate Telcordia environmental factors to
address military environments or extrapolate Mil-
Hdbk-217 part complexity and quality factors to
address advances in device technology and increas-
es in commercial electronics quality. Furthermore,
these methodologies typically only address part
operational failures while other system failure con-
tributors such as inadequate design and manufac-
turing are not considered in the reliability predic-
tion. While these practices have provided a stop-
gap measure for producing field reliability predic-
tions, a comprehensive reliability prediction tool to
accurately predict field performance is desired.
Raytheon has conducted a recent assessment of
the PRISM software tool to determine its ability
to accurately predict field reliability. This assessment consists of a comparison of the PRISM predicted failure
rate to the actual observed field failure rate for three military elec-
tronics units used in an Air Force fighter aircraft, a Navy heli-
copter, and a Navy surveillance aircraft. The assessment includes
understanding the details of the PRISM methodology, conducting
a PRISM prediction of the electronics units, comparing the pre-
dicted failure rates with those observed in the field, analyzing
how the various PRISM input parameters affect the predicted
failure rates, and determining areas where further refinement
could produce a higher fidelity field reliability prediction.
Background
The Reliability Information Analysis Center (RIAC) has developed a methodology and asso-
ciated engineering software tool, PRISM, to assess the reliabili-
ty of electronic systems. This methodology includes component-
level reliability prediction models as well as a process for assess-
ing the reliability of systems due to non-component variables.
The PRISM system reliability assessment program is comprised
of component-level failure rate calculations taken from
RIACRates models, RIAC data, or user-defined data and a system-
level model that applies process grading factors.
The building blocks of the PRISM prediction methodology are component-level fail-
ure rates. These failure rates are determined from RIACRates
models, RIAC data, or user-defined data.
RIACRates are component reliability
prediction models that use a combination of additive and multi-
plicative factors to generate a separate failure rate for each
generic class of failure mechanisms for a component. Each of
these failure rate terms is then accelerated by the appropriate
stress. RIACRates models have the following general form:
!!!Equation 1
where
λp = predicted failure rate,
λo = failure rate from operational stresses,
πo = product of failure rate multipliers for operational stresses,
λe = failure rate from environmental stresses,
πe = product of failure rate multipliers for environmental stresses,
λc = failure rate from power or temperature cycling stresses,
πc = product of failure rate multipliers for cycling stresses,
λi = failure rate from induced stresses including electrical overstress,
λsj = failure rate from solder joints,
πsj = product of failure rate multipliers for solder joint stresses.
By modeling the failure rate in this manner, factors that account
for the application and component-specific variables that affect
reliability ("" factors) can be applied to the appropriate additive
failure rate term. RIACRates models are currently available only
for capacitors, resistors, diodes, transistors, thyristors, integrated
circuits, and software.
PRISM also contains data from the RIAC
Electronics Parts Reliability Data (EPRD) and Nonelectronic
Parts Reliability Data (NPRD) publications. This data has been
refined and scaled to fit into the calendar hour structure of
PRISM. RIAC data is available for a variety of components
including transformers, inductors, switches, relays, and connec-
tors. This data is helpful when a RIACRates model or user-
defined data does not exist for a particular component.
In the event that empirical data is
available, the PRISM software tool allows for the input of user-
defined failure rate data when RIACRates model or RIAC data
does not exist.
The PRISM system failure rate model is defined as the sum of the component failure
rates times a process grade factor. This system model is given by:
!!!Equation 2
where the parameters are defined in Table 1.
Table 1. PRISM System Failure Rate Model Parameters
Once a unit is designed, the failure rate value that is calculated by any model is an inherent
or "seed" failure rate because it represents only the physical
attributes of the components that comprise the unit, subject to the
environmental conditions and operating profile characteristics
associated with its application. The failure rate that the unit will
actually experience in the field may be potentially better or
worse than the inherent failure rate. The difference in the
observed field failure rate and the inherent predicted failure rate
depends on the design, requirement definition, and testing activ-
ities undertaken by the manufacturer to ensure that:
- Designs are reliable and robust
- Manufacturing practices do not degrade reliability performance
- Parts of acceptable quality are selected and controlled
- Management processes encourage good requirements definition and design practices
- The number of "cannot duplicate" (CND) incidents is minimized
- Maintenance activities do not induce failures
- Wearout and infant mortality issues are understood and addressed
- Reliability growth is emphasized throughout the design and development phases
The effect of process-related variability around the inherent (or
seed) failure rate is accounted for within PRISM by applying
process grade factors. By answering a series of questions with-
in a specific process grade type, a scoring profile is generated
and translated into a quantitative pi-factor multiplier. This score
then accounts for the process-related variability by impacting the
predicted failure rate positively or negatively. The process grade
types within PRISM and the pi-factor multipliers associated with
them are:
- Design process grade (πD)
- Manufacturing process grade (πM)
- Parts process grade (πP)
- System management process grade (πS)
- No-defect process grade (πN)
- Induced process grade (πI)
- Wearout process grade (πW)
- Reliability growth factor (πG)
- Infant mortality grade (πIM)
Each of these processes are scored, and the process scores are
combined into a module-level process grade set. For the "indus-
try average", the process grade expression in the system-level
model (i.e., πPIπME + πDπG + πMπIMπEπG + πSπG + πI + πN + πW) is equal to unity for the average grade. The process grade factor will increase if "less than average" processes are in
place while the grade will decrease if "better than average"
processes are in place.
Most failure rate prediction methods allow only for an inherent reliability to be predicted, that
is, the reliability of the components given correct manufacturing,
requirement specifications, and handling.
However, PRISM
allows for two failure rate prediction types: inherent and logistics.
Inherent: The inherent failure rate calculation does not take
into account induced failures or "cannot duplicate" (CND) issues.
The induced process grade (I) and the no-defect process grade
(N) are not included in the system-level failure rate calculation.
!!!Equation 3
Logistics: The logistics failure rate calculation takes into
account induced failures and cannot duplicate (CND) issues.
The induced process grade (I) and the no-defect process grade
(N) are included in the system-level failure rate calculation.
!!!Equation 4
Evaluation Methodology
For the purpose of the PRISM evaluation, three airborne electronic units were chosen that had well-
documented field failure data and sufficient cumulative field-oper-
ating time. Their basic makeup involves multiple circuit card
assemblies mounted in an enclosed chassis. All three units had at
least 12 months of continuous field failure data that was detailed
enough for categorizing by induced, could not duplicate (synony-
mous with no defect found), design, or part failure modes. This
same field data was also used to baseline the observed performance
of each electronics unit. For comparing the field and predicted
data, the reliability metrics Mean Time Between Failures (MTBF)
and Mean Time Between Unscheduled Removal (MTBUR) were
used. The MTBF metric includes only inherent-type failures
excluding cannot duplicate (CND) and induced fail returns.
MTBUR includes both induced and CND returns along with inher-
ent failures. These field MTBF and MTBUR values were directly
compared with the PRISM inherent and logistics models, respec-
tively. Using these relationships, common failure modes are kept
consistent in both the field data and the methodology of PRISM,
thus ensuring accurate correlation between the data.
The field data used in the PRISM evaluation is given in Table 2.
The baseline used for comparison includes actual failure data
taken over 12 months of continuous performance monitoring.
The observed MTBF and MTBUR calculations were normalized
over the steady 12-month period. Field returns were analyzed, sort-
ed, and combined so that the observed metric represents random
equipment failures. Returns that were repetitive, systematically
induced, or non-performance related were removed from the total
failure count. The number of software-related failures was insignif-
icant and therefore left out of the evaluation altogether. The PRISM
methodology generates failure rate predictions in terms of failures
per million calendar hours, instead of the more common failure per
million operating hours. Therefore, a translation from operating
hours to calendar hours was accomplished by dividing the cumula-
tive operating hours by the units' respective duty cycle.
A four-step process was used to evaluate PRISM failure rate predictions against observed field
data. The following four-step process was repeated for each of
the three units:
- Inputting component/system data into the PRISM tool,
- Calculating predicted failure rates using the inherent and logistics RIAC models with both industry average and
program-specific process grade factors, and
- Comparing the PRISM prediction results with observed field failure rates.
Existing system models (also known as component tree struc-
tures) in Raytheon's Advanced Specialty Engineering Networked
Toolkit (ASENT) reliability analysis software tool were used from
previous engineering efforts on each of the three units. All assem-
bly models and component parameters such as part type, electrical
stress, and temperature were exported from ASENT into PRISM.
After importing parts data into PRISM, component parameters
were checked to verify misplaced or corrupt data was not incurred
during the data transport. Components that did not have RIACRates models were assigned failure rates from the RIAC data
library or assigned a user-defined failure rate. Subassemblies and
components having user-defined failure rates were converted from
failures per million operating hours into failures per million cal-
endar hours using the respective duty cycle of each unit.
PRISM's default environment settings and operating profiles
were not used in our evaluation. Instead, environment and pro-
file information was obtained from actual field measurements
and/or contract specifications for each program.
The composition component failure rate sources for each of the
three units is shown in Figure 1. As can be seen, the failure rate
sources varied greatly among the three units. Electronics Unit 1
had approximately an equal contribution of failure rates from
RIACRates models and RIAC data. However, Electronics Unit 2
and Electronics Unit 3 had a predominant contribution from
RIAC data and user-defined data, respectively.
Because each of the three electronic units was designed, devel-
oped, and manufactured by different programs, each unit had its
own process grade factor (PGF) set. PGF sets were created by
surveying the program's engineering personnel responsible for
the respective process. Once attained, these factors were applied
only to assemblies that were designed and manufactured by the
respective Raytheon program. Parts and assemblies that were
out-sourced to subcontractors were assigned PRISM's default
PGFs for this evaluation.
!!!FIGURE
Figure 1. Comparison of Component Failure Rate Sources
Results
The ratios of the PRISM predicted failure rates to the observed
field failure rates were compared for each unit and the relative
differences evaluated. Percentage differences were also calcu-
lated so as to quantify the accuracy of the individual predictions
as well as the evaluation average.
Inherent Reliability Comparison. First, the predicted and observed
inherent failure rates of each unit were compared using both the
PRISM default and program-specific process grade factor sets.
Figure 2 illustrates the percent differences between the predicted
inherent failure rates and the observed inherent field failure rates.
!!!FIGURE
Figure 2. Comparison of PRISM Inherent Failure Rate Predictions to Observed Inherent Field Failure Rates
The primary observation is the accuracy of the predictions using
the default PGFs compared to the program-specified PGFs. On
the average, predictions made using the default PGFs were 57%
closer to the observed values than ones made with the program-
specific PGFs. The program-specific process grade factor sets
have adjusted the overall failure rate to generate an optimistic
Comparison of PRISM Failure Rate Sources
23%
9%
48%
12%
19%
44%
63%
19%
66%
0%
20%
40%
60%
80%
Electronics Unit 1
Electronics Unit 2
Electronics Unit 3
RACRate Models
RIAC Data
User Defined
PRISM Predicted Failure Rate vs. Random Field Failure Rate
Inherent Reliability: No CNDs or Induced Failures
103%
55%
103%
70%
81%
59%
0%
25%
50%
75%
100%
125%
PRISM Failure Rate:
Default PGF
PRISM Failure Rate:
Prgm-Specific PGF
Electronics Unit 1
Electronics Unit 2
Electronics Unit 3
Field Failure Rate Normalized to 100%
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 4
5
prediction. The standard deviations for both categories were
approximately equal, which suggests relative agreement between
the overall effect of each program-specific grade factor sets.
With the exception of Electronics Unit 3, the inherent default PGF
reliability predictions for both Electronic Units 1 and 2 were very
close to the observed failure rates exhibiting no more than a 3%
deviation. Electronics Unit 3 showed an 18% difference between
predicted and observed field failure rates. To understand the cause
for the variance in prediction accuracies, the failure rate sources
for each assembly of each unit were evaluated. Figure 1 illustrates
the failure data percent makeup of each electronic unit. One dis-
tinct difference between Electronic Units 1 and 2 and Electronics
Unit 3 is the percentage of user-defined values associated with the
total predicted failure rate.
The predicted failure rate of
Electronics Unit 3 is 66% user-defined, whereas the user-defined
contribution to Electronics Units 1 and 2 units is less than 20%.
This large user-defined source in Electronics Unit 3 may contribute
to the source of the prediction variance, and it will be assessed fur-
ther in future PRISM evaluations.
4.2 Logistics Reliability Comparison. Next, the predicted and
observed logistics failure rates of each unit were compared using
both the PRISM default and program-specific process grade fac-
tor sets. For this comparison, CND and induced failure returns
were incorporated into the field failure rate. Likewise, the
PRISM model factored CND and induced process grade factors
into the overall failure rate predictions. Figure 3 shows the per-
cent differences between the predicted logistics failure rates and
the observed logistics field failure rates.
Figure 3. Comparison of PRISM Logistics Failure Rate
Predictions to Observed Logistics Field Failure Rates
Reviewing the average percent differences of the default PGFs
logistics model prediction, a similar difference was observed,
both in value and error, to that of the inherent model prediction.
It appears the effect of introducing CND and induced failures
into the field data was consistent with the effect of the results of
the same process grade factors. This correlation provides valid-
ity to the CND and induced process grade factors.
Again, the choice of PGFs greatly affects the overall accuracy of
the predictions. In Figure 3, the average effective difference
between using the default PGFs and program-specific PGFs is
54%. The outcome of a logistics model prediction using the pro-
gram-specific PGFs is still optimistic.
To better understand this, the variations between the default and
program-specific process grade factors were analyzed. Table 3
illustrates the percent differences between the RIAC default and
Raytheon-surveyed process grade factors. This table raises the
possibility that the optimistic predicted failure rates may stem
from the results of the program-specific PGF surveys. In six of
the nine total process gradings, the three programs averaged at
least a 35% lower value than the RIAC default values. These dif-
ferences signify lower process grade scores which, by the RIAC
model, lead to lower predicted failure rates.
Table 3. Comparison of Process Grade Factor Scores
4.3 Field Data Incorporation. Using the two prediction models
already calculated, field data was incorporated into the PRISM soft-
ware tool as "observed data" to evaluate the accuracy of the adjust-
ed predictions. When applying the field data into PRISM, CND and
induced field failures were removed from the PRISM model entry
when adjusting the inherent model. Likewise, these failures were
included when adjusting the logistics model predictions. The results
of using PRISM's Bayesian analysis is shown in Figures 4 and 5.
Figure 4. Inherent Reliability Comparison Using Bayesian
Analysis
Figure 5. Logistics Reliability Comparison Using Bayesian
Analysis
PRISM Predicted Failure Rate vs. RandomField Failure Rate
Logistics Reliability: Includes CNDs and Induced Failures
94%
59%
82%
58%
101%
82%
0%
25%
50%
75%
100%
125%
PRISM Failure Rate:
Default PGF
PRISM Failure Rate:
Prgm-Specific PGF
Electronics Unit 1
Electronics Unit 2
Electronics Unit 3
Field Failure Rate Normalized to 100%
Process Grade
Factor Type
Electr.
Unit 1
Electr.
Unit 2
Electr.
Unit 3
Avg
Diff
Part Quality
-32%
-47%
-28%
-36%
Infant Mortality
-48%
-42%
-55%
-48%
Design
-56%
-62%
-41%
-53%
Growth
+4%
-7%
-6%
-3%
Manufacturing
-44%
-50%
-24%
-39%
System Mgmt.
-62%
-68%
-36%
-55%
Induced
-42%
-81%
-27%
-50%
No Defect
-9%
-22%
+4%
-9%
Wear Out
+4%
0%
-18%
-5%
PRISM Predicted Failure Rate vs. Actual Field Failure Rate
Inherent Reliability: No CNDs or Induced Failures
With PRISM Bayesian Analysis
100.0%
99.2%
99.4%
98.6%
100.1%
98.2%
98%
99%
100%
101%
PRISMFailure Rate:
Default PGF
PRISMFailure Rate:
Prgm-Specific PGF
Electronics Unit 1
Electronics Unit 2
Electronics Unit 3
Field Failure Rate Normalized to 100%
PRISM Predicted Failure Rate vs. Actual Field Failure Rate
Logistics Reliability : Includes CNDs or Induced Failures
With PRISM Bayesian Analysis
100.0%
100.0%
99.6%
99.6%
98.7%
99.5%
98%
99%
100%
101%
PRISM Failure Rate:
Default PGF
PRISMFailure Rate:
Prgm-Specific PGF
Electronics Unit 1
Electronics Unit 2
Electronics Unit 3
Field Failure Rate Normalized to 100%
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 4
6
After applying observed field data into PRISM, the failure rate
predictions fell very close to observed field values. Using the
observed field data, the predictions fell within 2% of the
observed field failure rate values. This improvement in predic-
tion accuracy is an example of how important historical field
data can be for predicting the field reliability of a derivative elec-
tronics system.
5. Conclusions
This paper compares the predicted field reliability of electronics
units using the PRISM methodology to the observed field failure
rate. The initial results showed that:
The PRISM inherent and logistics failure rate predictions
both agreed well with observed field failure rates when
using the PRISM default process grade factors.
The PRISM failure rate predictions for both inherent and
logistics reliability were optimistic by approximately 30-
40% using program-specific process grade factors. It is
interesting to note that the differences in the predicted val-
ues versus actual field values are opposite to those found
in an earlier TRW Automotive PRISM evaluation
(Reference 1). While TRW Automotive's predicted fail-
ure rates were approximately twice the actual field values
(where field data was not factored in through the use of
PRISM's Bayesian analysis option), Raytheon's predicted
failure rates were approximately one-half the actual field
values.
The goal of this evaluation was to determine if PRISM would
provide a methodology to accurately predict field failure rates.
Based on these initial results, it can be concluded that PRISM
does indeed have the potential to accurately predict field failure
rates. It is encouraging that, given the variations in use environ-
ments, failure data, and failure rate sources (RIACRates models,
RIAC data, and user-defined data), the predicted failure rates of
the three electronics units track fairly well with each other for
both the inherent and logistics reliability predictions.
6. Future Plans
Raytheon plans to continue its PRISM evaluation. While this ini-
tial evaluation was conducted independently with minimal con-
sultation with the Reliability Analysis Center, future plans
include working more closely with the RIAC group to develop a
more refined PRISM use methodology to increase the accuracy of
the failure rate predictions. The ultimate goal is to develop a
PRISM prediction process that accurately predicts field perform-
ance using program-specific process grade factors without the
need for adding observed field data via PRISM's Bayesian analy-
sis methodology. The main areas of future emphasis will include:
Focusing on the proper development and use of program-
specific process grade factors.
Evaluating the PRISM predicted failure modes/categories
versus the observed field failure modes/categories.
Evaluating the PRISM reliability assessment of more
complex electrical and mechanical systems to determine
if the observed data patterns remain consistent.
Conducting independent field failure rate analyses and
PRISM failure rate predictions for comparison (i.e., using
two independent personnel to conduct the field failure
rate analysis and the PRISM prediction to eliminate any
bias that would tend to converge the two analyses).
References
1. M.G. Priore, P.S. Goel, R. Itabashi-Campbell, "TRWAutomotive
Assesses PRISMŽ Methodology for Internal Use", The Journal
of the Reliability Analysis Center, 2002 First Quarter, pp 14-19.
Biographies
Christopher L. Smith is a Reliability Engineer with Raytheon
Company's Space and Airborne Systems Division located in
McKinney, Texas. Chris earned a BS degree in Physics from
Southwest Texas State University.
He has worked with
Raytheon Systems Specialty Engineering for 2 years.
Christopher L. Smith
Raytheon Company
2501 W. University Drive, M/S 8052
McKinney, TX 75071
Internet (E-mail):
Jerry B. Womack, Jr. is a Senior Reliability Engineer in
Raytheon Company's Space and Airborne Systems Division
located in McKinney, Texas. Jerry has 16 years of experience in
reliability and system safety engineering in radar and electro-
optic programs. Jerry received his BS degree in Physics from the
University of Mississippi in 1987 and his MS degree in Physics
from the University of Texas at Dallas in 1992. Jerry has been
an American Society for Quality (ASQ) Certified Reliability
Engineer (CRE) since 1993.
Jerry B. Womack, Jr.
Raytheon Company
2501 W. University Drive, M/S 8094
McKinney, TX 75071
Internet (E-mail):
The appearance of advertising in this publication does not constitute
endorsement by the Department of Defense or RIAC of the products or
services advertised.
|
|
|
|