|
|
| RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center
INSIDE
T h e J o u r n a l o f t h e
5
Reliability Theory
Explains Human
Aging and Longevity
7
Form, Fit, Function,
and Interface - An
Element of an Open
System Strategy
14
System Level Clues
for Detailed Part
Issues
19
RMSQ Headlines
21
Future Events
22
From the Editor
23
PRISM Column
23
Upcoming June
Training
Reliability Analysis Center
First Quarter - 2005
Abstract
The reliability of avionics using commercial-off-
the-shelf (COTS) items and products is a concern
for the aerospace industry. The results of collect-
ing and analyzing field return records of avionics
are documented in this article.
Our analysis
shows that the exponential distribution is still
appropriate for describing the life of most avion-
ics manufactured over the past 20 years. Results
also show that failure rates decrease at the intro-
duction of products. An increasing trend in fail-
ure rate can be noted, for systems made after
1994, suggesting the need for further investiga-
tion.
Introduction
Microelectronic systems built with COTS are
now widely used in the aerospace industry and
are becoming increasingly important. After the
Department of Defense (DoD) changed the
acquisition process (one formerly based on mili-
tary standards and specifications) in 1994, mili-
tary-specified avionics have become rare. The
aerospace industry's use of microelectronics is
shrinking as a percentage of the entire market, so
it must face the reality of a commercially-driven
market. Commercial integrated circuit (IC) prod-
ucts' life cycles are decreasing to 2-4 years
[Reference 6]. In contrast, the aerospace indus-
try assumes the life of a Line Replacement Unit
(LRU) is more than 10 years. This discrepancy
will worsen given the continuing advancement in
functionality and speed in the microelectronic
industry. To understand the impact of technolo-
gy advancement on avionics, we needed to find
out what had happened in field operation. Field
records of return-for-service of avionics in the
past 20 years were collected and analyzed, and
the results are documented herein.
Data Collection
Return-for-service records were collected from
two major suppliers of avionics. Several types of
systems were included, such as a flight control
system, autopilot, flight director system, and
symbol generator.
Records from company A
include eight systems dating from 1982 to 2002.
Company B's records are dated from 1997 to
2002 and include one system. Most of these
records include the unit serial number, date sold,
return for service date, replaced IC types, and
quantities. Some of the original data were found
to be insufficient for analysis. We compiled the
original records to weed out and discard the use-
less ones; the remaining records had sufficient
data to support statistically significant conclu-
sions. We also made some assumptions to facil-
itate the statistical analysis. Our assumptions
were as follows.
1. Systems were grouped by type and the
year of "date sold" assuming they were
manufactured and used in the same year.
2. For units with multiple returns, only the
first return was calculated and analyzed.
3. It is assumed that all ICs replaced in serv-
ice have experienced failure.
This
assumption may have caused us to over-
estimate the number of failures.
4. Censor time: the time to check the status
of system. It is set to April 30, 2002.
Based on these assumptions, a C language pro-
gram was used to select the useful records, check
the end status of the systems, and calculate the
service hours. The method used to calculate the
service hours follows Figure 1, in which the dif-
ferent periods between sold date and return-to-
service date are shown.
By: Jin Qin, Bing Huang, Joerg Walter, Joseph B. Bernstein, Michael Talmor Reliability Engineering,
University of Maryland, College Park
Reliability Analysis of Avionics in the Commercial
Aerospace Industry
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 5
2
SD Sold Date,
BISD Begin In Service Date,
FD Failure Date, RTSD Return To Supplier Date
Figure 1. Time Line of Field Records
The P1 interval between SD and BISD includes delivery time and
installation time. The unit service period (days), P2, is:
P2 = (RTSD - SD) - P1 - P3
(1)
P3 is the return time from customers to suppliers. If the unit did
not fail to the censor time, the service period is between BISD
and the censor time. Generally, there are only SD and RTSD in
the raw records. P1 and P3 are estimated based on the informa-
tion given by the suppliers. Different suppliers have different P1
and P3. Once P2 is found, the unit service hours are calculated
from: ServiceHours = Hon * P2. The Hon is the power-on hours
per day of system. Different companies give different Hon.
Data Analysis
Analysis of System Records from Company A. There are
records for about 21,535 systems sold between August 17, 1982
and December 30, 2001 from company A. Categorized by sys-
tem type and year of "date sold," there are 87 groups of data,
which include 9 groups with zero failures and 6 groups with one
failure. The statistical analysis process and results follow.
Probability plotting. As the generally accepted lifetime distribu-
tion in microelectronic industry, Weibull distribution is used to
analyze the service hours. To verify its usage, we plotted proba-
bility and calculated the correlation coefficient (CC) of Weibull
distribution and lognormal distribution respectively (Groups
with 0 or 1 failure are omitted). Results show that the CC for 42
groups of Weibull distribution was greater than the CC for the
lognormal distribution. The CCs of Weibull distribution were
also compared with the 90% critical CC [Reference 1] to deter-
mine if the distribution is appropriate or not. Results show that
62 of 72 groups CC was greater than the given critical CC.
Parameter estimation. The parameters of Weibull distribution
are estimated by using the maximum likelihood estimation
(MLE) method. The histogram of the estimated shape parame-
ters is shown in Figure 2. It shows the values of most of the
shape parameters are distributed between 0.6 and 1.1.
Exponential distribution verification. Although the wide use
of exponential distribution has been questioned for a long time,
it is unwise to blindly accept or reject it. The exponential distri-
bution was theoretically shown to be the appropriate failure dis-
tribution for complex systems by R.F. Drenick [Reference 5].
He stated that "Under some reasonably general conditions, the
distribution of the time between equipment failures tends to the
exponential as the complexity and the time of operation increas-
es; and somewhat less generally, so does the time up to the first
failure of the equipment."
Figure 2. Weibull Shape Parameter Histogram
In the microelectronic industry, due to the advance of technology,
chips are becoming more and more complex following Moore's
law. Additionally, avionics have complex structures. A flight
director system may consist of 460 digital ICs, 97 linear ICs, 34
memories, 25 ASICs, and 7 processors. The number of compo-
nents in such a system is huge. For these components, external
failure mechanisms caused by random factors such as electrical
overstress, electrostatic discharge, and other environmental and
human interaction, and intrinsic failure mechanisms, which
include dielectric breakdown, electromigration, and hot carrier
injection, can cause the components to fail. These failure modes
combine together to form a constant failure rate process, as
Abernethy [Reference 2] stated that as the number of failure modes
mixed together increases to five or more, the Weibull shape param-
eter will tend toward one unless all the modes have the same shape
parameter and similar scale parameter. Some recent research that
focuses on intrinsic wearout failure mechanisms lends support to
the exponential distribution.
Degraeve [Reference 4], Stathis
[Reference 7], and Alam [Reference 3] pointed out that the Weibull
shape parameter of oxide breakdown is thickness dependent and
goes to unity for ultra-thin oxides. As the Weibull shape parame-
ter approaches 1, the intrinsic wearout becomes more random and
the device times to failure become statistically indistinguishable
from a random pattern of times to failure.
We use the likelihood ratio test to verify the hypothesis of the
exponential distribution the special case of Weibull distribution
with the shape parameter equals to 1. Setting the significance
level to 0.05, for systems grouped in different years, the likeli-
hood ratio test is done using the following steps.
a. H0: = 1; H1: 1
5,
*15,
.,
465,
2
2
2!
Weibull Shape Parameter Histogram
0
2
4
6
8
10
12
14
16
0.3
0.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
2.1
2.3
2.5
2.7
2.9
3.1
3.3
3.5
3.7
Beta
Fr
eque
nc
y
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 5
3
b. Calculate the statistics T = 2(
)
: global maximum log likelihood
: constrained maximum log likelihood at = 1
c. If, T < 2 (0.95, 1), accepts H0, else rejects H0.
The hypothesis test results show that exponential distribution is
acceptable for 56 groups.
System failure rate results. Since the exponential distribution is
appropriate for most of those systems, we use MLE to calculate
the failure rate. The systems' failure rates vs. year are shown in
Figure 3. The data shows that, with the exception of system 2 and
8, the systems' failure rates decrease at the beginning of use. For
system 4, 5, 6, and 7, whose use spanned the 1980s and 1990s,
the trend of system reliability increase around 1994 and after that,
could be noted. System 1 shows the same trend around 1997.
Analysis of system records from company B. Records from
company B are dated between January 14, 1988 and October 27,
2001. Since the population size and the failure number of each
year are small, we statistically analyze the moving five-year's
records using the exponential distribution to get better results.
We also analyze all records of company A in the same way to
compare the change in reliability. Figure 4 shows the overall
failure rates of systems from company A and B (Year in the X-
axis is the middle point of the moving five-year period). From
this result, we determined that there is an increasing trend of fail-
ure rate after 1994 for systems from both companies.
0
L^
-
L^
L^
0
L^
System 1
0.00
5.00
10.00
15.00
20.00
25.00
30.00
1992
1994
1996
1998
2000
2002
Year
Fa
ilur
e
rate
(E-
6)
System 2
0.00
1.00
2.00
3.00
4.00
5.00
1982
1983
1984
1985
1986
1987
Year
Fai
lur
e
ra
te
(E
-6)
System 3
0.00
2.00
4.00
6.00
8.00
10.00
12.00
1985 1986 1987 1988 1989 1990 1991 1992 1993
Year
Failu
re
rate
(E
-6)
System 4
0.00
1.00
2.00
3.00
4.00
5.00
1989
1991
1993
1995
1997
1999
2001
Year
Fai
lur
e
ra
te
(E
-6)
System 5
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
20.00
1985 1987 1989 1991 1993 1995 1997 1999 2001
Year
Fai
lur
e
R
ate(E
-6)
System 6
0.00
1.00
2.00
3.00
4.00
5.00
6.00
1981 1983 1985 1987 1989 1991 1993 1995 1997
Year
Fa
ilur
e
rate
(E
-6)
System 7
0.00
2.00
4.00
6.00
8.00
10.00
1985
1987
1989
1991
1993
1995
1997
1999
Year
F
ailu
re
rate
(E
-6)
System 8
0.00
2.00
4.00
6.00
8.00
1997 1998 1999 2000 2001 2002
Year
Failu
re
rate
(E
-6
)
Figure 3. Failure Rates of Systems with 90% Confidence Intervals
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 5
4
IC failure analysis. We can get the type and number of replaced
ICs from company A's records but only the number of failed ICs
from company B's records. Since no information was available for
tracing down the failure mechanism, we simply calculated the over-
all failure rate of all ICs from company A and from Company B.
For company B's records, we used the exponential distribution to
analyze the moving five-year IC failure data because of the small
number of failures in each year. Company A's IC failure records
were analyzed in the same way. The results are shown in Figure 5.
Summary
Field data of microelectronic systems in the aerospace industry
was collected and analyzed. Based on our statistical analysis
results, we found that:
1. The exponential distribution is appropriate for most
avionics' lifetime analyses because the IC chips and sys-
tem structure are becoming more complex.
2. System reliability generally improves in the first several
years after introduction and drops off later. It follows
very well the known phenomena of "infant mortality" or
"learning curve."
3. According to the analysis, the failure rate of several sys-
tems increases, almost constantly, after 1994-1996. The
increase isn't large and not statistically significant. No
one specific reason of this trend could be postulated due
to the lack of information. It could be due to design prob-
lems in replacement military grade components by com-
mercial or due to total redesign in introducing new tech-
nologies, inherent reliability of commercial components
or manufacturing problems in introducing new for avion-
ic system packaging standards, etc.
This work presents some practical observations. A future inves-
tigation, tracking of the failure data and failure analysis, is sug-
gested.
For Further Reading
1. Abernethy, R.B., The New Weibull Handbook, Third Edition,
page 3-3. North Palm Beach, Florida: R.B. Abernethy, 1998.
2. Abernethy, R.B., Ibid, page 3-14.
3. Alam, M., B. Weir, and P. Silverman, "A future of function
or failure? (CMOS gate oxide scaling)," IEEE Circuits and
Device, Vol. 18, pages 42-48, March 2002.
4. Degraeve, R., "New insights in the relation between electron
trap generation and the statistical properties of oxide break-
down," IEEE Transaction on Electron Devices, Vol. 45,
pages 904-911, April 1998.
5. Drenick, R.F., "The failure law of complex equipment,"
Journal of the Society for Industrial and Applied
Mathematics, Vol. 8, pages 680-690, December 1960.
System Failure Rate
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
1983 1985 1987 1989 1991 1993 1995 1997 1999 2001
Year
Failu
re
R
ate(E
-6)
Company A
Company B
Figure 4. Overall System Failure Rates from Company A and B (90% Confidence Interval)
IC Failure Rates
0
5
10
15
20
25
30
1984
1986
1988
1990
1992
1994
1996
1998
2000
Year(sold)
Failure
R
ate(FIT)
Company A
Company B
?
?
?
?
?
?
?
?
?
?
Figure 5. Overall IC Failure Rate (90% Confidence Interval)
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
F i r s t Q u a r t e r - 2 0 0 5
5
6. Hnatek, E.R., Integrated Circuit Quality and Reliability,
Marcel Dekker, Inc, 2nd Edition, 1995.
7. Stathis, J.H., "Percolation models for gate oxide break-
down," Journal of Applied Physics, Vol. 86, pages 5757-
5766, November 1999.
About the Authors
JIN QIN is a PhD candidate of Reliability Engineering at the
University of Maryland, College Park.
His research topics
include reliability testing, reliability data analysis, and microelec-
tronic system reliability estimation. He holds a Master of Science
in Reliability Engineering from the University of Maryland and a
Master of Engineering in Management Science and Engineering
from the University of Science and Technology of China.
Bing Huang currently is a PhD Candidate of Reliability
Engineering program at the University of Maryland.
He
received a B.S. in Mining Engineering from the University of
Science and Technology of Beijing, and a M.S. in Nuclear
Engineering from Tsinghua University.
Joerg Walter Dr. Joerg D. Walter is an Assistant Professor of
Aerospace Engineering at the Air Force Institute of Technology
(AFIT). He holds a PhD in Reliability Engineering from the
University of Maryland (2003) and a Masters of Science in
Systems Engineering from AFIT (1997).
Joseph B. Bernstein Dr. Bernstein is an Associate Professor of
Reliability Engineering at the University of Maryland, College
Park. Professor Bernstein's interests lie in several areas of micro-
electronics reliability and physics of failure research including sys-
tem reliability modeling, gate oxide integrity, radiation effects,
MEMS and laser programmable metal interconnect. Research
areas include thermal, mechanical, and electrical interactions of
failure mechanisms of ultra-thin gate dielectrics, next generation
metallization, and power devices. Dr. Bernstein is currently a
Fulbright Senior Scientist at Tel Aviv University in the Department
of Electrical Engineering, Physical Electronics where he started a
Maryland/Israel Joint Center for Reliable Electronic Systems.
Michael Talmor is a Certified ASQC Quality (CQE) and
Reliability (CRE) Engineer.
He holds Master's degrees in
Reliability and Quality Assurance from the Technion - Israel
Institute of Technology in Haifa and in Electrical Engineering,
Automatics and Telemechanics from the Electrotechnical
Institute, Saint Petersburg, Russia.
Michael is currently a
Visiting Researcher at UMD during his sabbatical leave from
RAFAEL Ltd, Israel.
Our bodies' backup systems don't prevent aging, they make it
more certain. This is one offshoot of a new "reliability theory of
aging and longevity" by two researchers at the Center on Aging,
National Opinion Research Center (NORC) at the University of
Chicago.
The authors presented their new theory at the National Institutes
of Health (NIH) conference "The Dynamic and Energetic Bases
of Health and Aging" (held in Bethesda, NIH). Their theory of
aging has been published by the "Science" magazine department
on aging research, Science's SAGE KE ("Science of Aging
Knowledge Environment").
The authors say, "Reliability theory is a general theory about sys-
tems failure. It allows researchers to predict the age-related fail-
ure kinetics for a system of given architecture (reliability struc-
ture) and given reliability of its components."
"Reliability theory predicts that even those systems that are entire-
ly composed of non-aging elements (with a constant failure rate)
will nevertheless deteriorate (fail more often) with age, if these
systems are REDUNDANT in irreplaceable elements. Aging,
therefore, is a direct consequence of systems redundancy."
In their paper, "The quest for a general theory of aging and
longevity" (Science's SAGE KE [Science of Aging Knowledge
Environment] for 16 July 2003; Vol. 2003, No. 28, 1-10.
), Leonid Gavrilov and Natalia
Gavrilova offer an explanation why people (and other biological
species as well) deteriorate and die more often with age.
Interestingly, the relative differences in mortality rates across
nations and gender decrease with age: Although people living in
the U.S. have longer life spans on average than people living in
countries with poor health and high mortality, those who achieve
the oldest-old age in those countries die at rates roughly similar
to the oldest-old in the U.S.
The authors explain that humans are built from the ground up,
starting off with a few cells that differentiate and multiply to form
the systems that keep us operating. But even at birth, the cells
that make up our systems are full of faults that would kill primi-
tive organisms lacking the redundancies that we have built in.
"It's as if we were born with our bodies already full of garbage,"
said Gavrilov. "Then, during our life span, we are assaulted by
random destructive hits that accumulate further damage. Thus
we age."
"At some point, one of those hits causes a critical system with-
out a back-up redundancy to fail, and we die."
As the authors puts it, "Reliability theory also predicts the late-
life mortality deceleration with subsequent leveling-off, as well
as the late-life mortality plateaus, as inevitable consequences of
redundancy exhaustion at extreme old ages."
Reliability Theory Explains Human Aging and Longevity
Reprinted with permission of Dr. Leonid A. Gavrilov, Center on Aging, NORC/University of Chicago
|
|
|
|