|
|
| T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
S e c o n d Q u a r t e r - 2 0 0 3
3
ments in reliability can result in savings in the hundreds of mil-
lions to billions of dollars.
The five areas addressed in this article can provide a significant
step toward achieving high reliability. Many people perceive
that high levels of system reliability have to be very costly to
achieve. This perception can be based on the notion that only
expensive or militarized components provide high levels of reli-
ability and that higher reliability equates to significant increases
in testing and delays in schedule. We must change this percep-
tion. When engineering-based reliability improvement tech-
niques are performed as part of the design and development
process, high reliability can be cost-effectively achieved.
About the Authors
Dr. David E. Mortin is Chief of the Reliability Branch at the U.S.
Army Materiel Systems Analysis Activity, Aberdeen Proving
Ground, MD. He has a B.S. in aerospace engineering from the
State University of New York at Buffalo, an M.S. in statistics
from the University of Delaware, and a Ph.D. in reliability engi-
neering from the University of Maryland.
Stephen P. Yuhas is the Reliability and Maintainability Director at
the U.S. Army Evaluation Center. He holds a B.S. in mathematics
from Pennsylvania State University. He has also completed exten-
sive graduate studies in operations research/industrial engineering
at Penn State and in statistics at the University of Delaware.
Dr. Michael J. Cushing is a technical advisor for the U.S. Army
Materiel Systems Analysis Activity. He has a B.S. in Electrical
Engineering from Johns Hopkins University and an M.S. and
Ph.D. in reliability engineering from the University of Maryland.
Methods for Reducing the Cost to Maintain a Fleet of Repairable
System
By: Larry H. Crow, Alion Science and Technology
Introduction
When a fleet is first deployed, the economic life and useful life
parameters are often not known. However, as the fleet ages,
spares usage, repair frequency, reliability, and cost information
become available that may be used to estimate these parameters.
Specific problems receiving increased attention as systems age
are:
1. Cost to maintain a fleet due to repair and overhaul.
2. Maintaining the mission reliability requirements.
3. Determining the optimum repair and overhaul strategy to
minimize life cycle cost.
4. Determining the wearout profile for a fielded system.
5. Determining corrective actions for fielded systems to
upgrade reliability and reduce cost.
In this article we present two methodologies designed to provide
information based on data that will help make decisions on these
issues. One methodology is concerned with minimizing total life
cycle costs due to repair and overhaul. The other methodology
is concerned with corrective actions and in-service reliability
growth to increase reliability and therefore reduce the cost of
failures and overhauls. Specifically, the minimum life cycle cost
methodology addresses issues 1, 2, 3, and 4, and the in-service
reliability growth methodology addresses issues 1 and 5.
In many cases, the approach to sustaining a given system fleet may
differ from the approach for another fleet of the same system. For
example, the sustainment policy for one fleet of helicopters may
require periodic general overhaul for the entire helicopter, whereas
the sustainment policy for another helicopter fleet may only have
overhaul at the subsystem and LRU levels.
Consequently, to
address repair and overhaul criteria appropriate for a system in a
fleet, a methodology must be applicable to all levels of potential
repair and overhaul options. Therefore, the methods discussed in
this article apply at the complex repairable system, subsystem, and
LRU levels. The terminology "system" is used to reflect any of
these applications, and the only assumptions are that a system is
complex, repairable, and satisfies the Power Law reliability model
assumption discussed in the next section.
Notation
Scale parameter, Power Law model
Shape parameter, Power Law model
s
System failure intensity
A
Type A mode failure intensity
B
Type B mode failure intensity
GP Growth Potential failure intensity
P
Projected failure intensity
u (t) Intensity function
t
System age
N(t) Number of failure at system age t
Tj
Total operating time for jth system
TUL System useful life
Xi,j Age at ith failure for jth system
K
Number of systems in sample
D
Number of Intervals
MIq Distinct Type B modes in qth interval
M
Total Number of Distinct Type B modes
C1
Average cost of repair
C2
Cost of overhaul
To
Optimum overhaul time to minimize cost
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
S e c o n d Q u a r t e r - 2 0 0 3
4
System Reliability Model
Both the minimum life cycle cost methodology and the in-serv-
ice reliability growth methodology assume that systems under
consideration are complex, with many failure modes, and are
repaired upon failure. If a repair just restores the system to oper-
ation it is called "minimum repair". Under minimum repair the
system reliability after the repair is the same as the system relia-
bility before the failure. Based on these assumptions, the under-
lying system failures follow the non-homogeneous Poisson
process with intensity u(t). Also, the reliability analysis of
repairable systems under customer use will involve data generat-
ed by multiple systems. Crow, References 2 and 4, developed the
Power Law Non-homogeneous Poisson Process (NHPP) as a
model for complex repairable systems and presented procedures
for analyzing data from multiple systems. This model is widely
used and is the standard model for repairable systems in
International Electrotechnical Commission standards.
Under the Power Law NHPP the intensity u(t) is
u(t) = t-1
(1)
where t > 0 is the system's age and , > 0 are parameters. Also,
for the Power Law NHPP model the mean value function
E[N(t)] = t
t > 0
(2)
is the expected number of failures for a system during its oper-
ating time (0, t).
To perform the analyses discussed herein, we need failure data for
K systems chosen at random from the fleet population. Each of the
K data sets starts at system age 0 and represents a sequence of fail-
ures and repairs. If the systems are overhauled, then each cycle
starts at time 0, initialized after an overhaul, and each failure time
is the total accumulative operating time at failure during the over-
haul cycle. System age t is the accumulated operating time since
overhaul. If the systems are not overhauled, then the age 0 begins
when the system is deployed into the fleet and age is the accumu-
lated operating time since deployment. In both cases, the data for
the jth system consists of the failure times Xi,j, i = 1,...,N(Tj),
where N(Tj), is the total number of failures for the j-th system, and
Tj is the total accumulated time, j = 1,...,K. The failure times Xi,j
are the accumulated age at failure so the X1,j < X2,j <...< X
N(Tj),j
.
Note also that the total accumulated time Tj may or may not corre-
spond to a failure time. If XN(Tj),j = Tj the data are failure truncat-
ed, and if XN(Tj),j < Tj the data are time truncated.
Note that for = 1, we have the homogeneous Poisson process,
and a constant intensity of failure. For > 1, u(t) is increasing
and the successive interval between failures. Xi,j - Xi-1,j are tend-
ing to decreasing, which is characteristic of wearout. For < 1,
u(t) is decreasing and the successive interval between failures.
Xi,j - Xi-1,j are tending to increase, which is characteristic of qual-
ity and manufacturing issues.
Also, for a system of age t we are often interested in the proba-
bility that the system will go to age t+b without failure. This is
mission reliability for a system of age t and mission length b.
For many systems maintaining a minimum level for the mission
is an important consideration in costs, maintenance and over
haul strategies. For the Power Law NHPP the mission reliabili-
ty is given by
(3)
A special application of the Power Law NHPP is for reliability
growth.
The Power Law NHPP is the basis for the Crow
(AMSAA) model developed in Reference 2.
The Crow
(AMSAA) model will be applied as part of the analysis for the
in-service reliability growth analysis. We will also apply esti-
mation, goodness of fit tests, confidence intervals and other pro-
cedures given in Reference 4 for the application of this model in
this article.
For a repairable system with > 1, we discuss two options for
reducing Life Cycle costs:
·
By an optimum choice of overhaul schedule
·
By reliability growth
For = 1, overhaul may not be necessary, and the option to
reduce costs may be reliability growth. An example will be
given illustrating this case.
Minimum Life Cycle Cost Model
One consideration in reducing the cost to maintain a fleet of sys-
tems ( > 1) is to establish an overhaul policy that will minimize
the total life cycle cost of the system. This says there is a point
in which it is cheaper to overhaul a system and return it to the
fleet than to continue repairs. What is the overhaul time that will
minimize the total life cycle cost, considering repair cost and the
cost of overhaul? This solution for a general NHPP is given in
Reference 1. Applying this solution to the Power Law NHPP
model with parameters , , and average repair cost C1and over-
haul cost C2, the optimum overhaul time To that will minimize
the life cycle cost of the system is given by:
(4)
The value T is called the economical life of the unit and is the
operating time when the average cost of operation per unit time
is minimum. The mission reliability and economical life To are
of particular interest when > 0 and is the main focus in this arti-
cle. In particular, for > 0, R(t) is decreasing as t increases. If
the mission reliability R(t) must be greater than a certain level,
say R0 then the time TM when R(TM) = R0 is the mission life of
the unit. Useful life is the minimum of To and TM.
]t-b)
(t[-
e
R(t)
+
=
/1
1
2
o
1)C-(
C
T
=
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
S e c o n d Q u a r t e r - 2 0 0 3
5
To apply this useful life model, the parameters and must be
estimated or assumed known. The estimation of these parame-
ters is addressed next.
Estimation for Minimum Life Cycle Cost Model
We estimate the model parameters and based on the failure
data from K systems. The maximum likelihood (ML) estimates
of and References 2, 4, are values
and
given by
(5)
(6)
In general, these equations cannot be solved explicitly for and
, but must be solved by iterative procedures. Once we have the
estimates
and
, the ML estimate of the intensity function is:
(7)
the ML estimate of the mean value function is
(8)
and the ML estimate of the mission reliability function is
(9)
The estimate for the optimum overhaul time to minimize life
cycle cost is given by
(10)
This is the estimated economic life of the system, and is the point
where the average repair cost is minimized. The desired esti-
mated useful life
is the minimum of
and
, (where for
mission length b)
(11)
and R0 is the required minimum mission reliability.
Example for Minimum Life Cycle Cost Model
Suppose we consider 11 systems selected at random from a fleet.
(A small number of systems are used for the example in order to
illustrate the methods. In practical applications a much larger
data set would be analyzed. This is hypothetical data and does
not represent any actual system.)
The nominal overhaul cycle is 1,500 hours, but the actual over-
haul time will often vary. The history of these systems for the
last complete overhaul cycle are given in Table 1.
Table 1. System 1 Data
Applying Equations (5) and (6), the ML estimates of and are
= 0.000444, = 1.064. Because this is a small sample size we
use an upper confidence interval (CL) on given in References
2 and 4. A 95% upper CL on is * = 1.774, and using this in
(5) we estimate by * = 0.000002558. For an average repair
cost of C1 = $29,860 and an overhaul cost of C2 = $100,000, the
optimum overhaul time to minimize life cycle cost is estimated
by Equation (10) as
= 3,237 hours. This is the economic life
overhaul time that will minimize the system total life cycle costs
due to repairs and overhead.
The mission time is b = 3 hours, and the minimum mission reli-
ability requirement is 0.995. At 3,237 hours the mission relia-
bility is estimated to be 0.993. This is less than the requirement.
At 1,500 hours the mission reliability is estimated to be 0.996.
This means the useful life for overhauls is between 1,500 and
3,237 hours. At 2,060 hours the mission reliability is 0.995.
Because we are using an upper bound on this preliminary
analysis indicates that the useful life overhaul time is greater
than the current 1,500 hours, but probably less than 3,237 hours.
For a preliminary annual cost savings analysis the average num-
ber of hours per year on the fleet population of systems is
approximately 79,000 hours. For the current nominal overhaul
time of 1,500 the expected number of failures between over-
hauls, given by (8), is 1.10. The average cost of these failures is
(1.10)($29,860) = $32,914. There is one overhaul per cycle at a
cost of $100,000. Therefore, the average total cost per 1,500
hour cycle is $132,846, and the average cost per hour is
($132,846/1,500) = $88.56. For an overhaul time of 2,060 hours
the corresponding average cost per hour is $76.66, or a savings
^
^
,
)(
)N(T
^
K
1j
^
K
1j
j
=
=
=
j
T
=
=
=
=
=
K
1j
K
1j
)N(T
1i
ji,j
^
K
1j
j
j
Xln
-
)Tln
(
^
)N(T
^
j
T
^
^
^
^
,t
^^
(t)u^
1-
^
=
,t
^
[N(t)]E^
^
=
,e
(t)R^
]t
^
-b)(t
^
[-
^^
+
=
.
1)C-
^
(
^
C
T^
^
/1
1
2
o
=
UL
T^
o
T^
M
T^
,R
)T^
R(
0M
=
System
Failures
OH Time
1
68,1137,1167
1,268
2
682,744,831
1,300
3
845
1,593
4
263,399
1,421
5
-
1,574
6
-
1,415
7
598
1,290
8
-
1,556
9
-
1,426
10
730
1,124
11
-
1,568
^
^
o
T^
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
S e c o n d Q u a r t e r - 2 0 0 3
6
per hour of $11.90. Assuming 79,000 hours annual usage, this
translates to a yearly savings of $940,100. The overhaul time of
3,237 hours gives an average cost per hour of $70.65. The annu-
al cost savings for this overhaul time is approximately $1.4 mil-
lion. This is the maximum savings for any choice of overhaul
times and gives the minimum overall system life cycle cost.
The estimated overhaul time of 2,060 hours gives the maximum
cost savings considering the constraint of maintaining a mini-
mum mission reliability of 0.995. Of course, a thorough analy-
sis would require applying these methods to a larger sample size.
Approach for In-Service Reliability Growth
In this article we adapt a reliability growth projection model
developed by Crow in Reference 3 for application to in-service
reliability growth for a fleet of systems.
During reliability
growth testing there is a test phase, of length T, say, and a corre-
sponding chronological order of failures and identification of
candidate problem modes for corrective action. We assume the
corrective actions are delayed until time T.
The projection model assumes that the system failure intensity is
constant, say, S, during this testing over time T, and then jumps
to a lower value due to the incorporation of corrective actions.
The projection model estimates this lower failure intensity value
based on information from the test. A projection allows the var-
ious corrective actions options to be assessed. The scope of this
article is on illustrating the adaptation and application the pro-
jection model to in-service fleet reliability growth. For specific
information on the derivation and background on the projection
model the reader is referred to Reference 3.
As described in Reference 3, the reliability growth projection
model utilizes the concepts of Type A modes, Type B modes, and
effectiveness factors. Assume that any corrective actions are the
result of reviewing the T hours of data and observing failure
modes. Management can make one of two possible decisions
regarding each observed failure mode, either not fix or fix the
failure mode. Therefore, the management strategy places failure
modes into two categories called Type A and Type B modes.
Type A modes are all failure modes such that if seen in the data
no corrective action will be taken. This accounts for all modes
for which management determines that it is not economically or
otherwise justified to take corrective action. Type B modes are
all failure modes such that when seen in the data a corrective
action will be taken. Thus, the management strategy is to parti-
tion the system into A parts and B parts.
During reliability growth testing the model assumes that the total
system and the A and B parts have corresponding Failure
Intensity S, A, and B, respectively. That is S = A + B. The
average intensity A for Type A modes will not change. With the
management strategy, reliability growth can only be achieved by
decreasing the Type B failure intensity B. It is also clear that we
can only decrease that part of the Type B mode average failure
intensity B that we have seen. In addition, once a Type B mode
is in the system it is rarely totally eliminated by a corrective
action. In particular, after a Type B mode is found and fixed, a
certain fraction of the failure mode average failure intensity will
be removed, but a certain fraction will generally remain.
A fix effectiveness factor (EF) is the fraction decrease in a prob-
lem mode Average Failure Intensity after a corrective action.
Studies indicate that an average EF of about 0.70 is typical for a
reliability growth program during development. However, indi-
vidual EF's for the failure modes may be larger or smaller than
the average.
The baseline failure intensity S, is the current intensity and we
wish to project the decrease in S due to correcting the Type B
modes. The A mode, B modes and EF management strategy, may
be changed in order to reached a desired reliability objective.
For discussions in this article, we assume an average effective-
ness factor for the corrective actions so that the projection model
takes the form
P = A + (1-d) B +dh(T)
(12)
where P is the lower failure intensity due to the corrective
actions, T = total test time, and h(T) is the rate in which new
Type B modes are being uncovered.
Under the reliability growth projection model h(T) = T(-1), is
the intensity of the Crow (AMSAA) model applied to only the
first occurrences of Type B modes. (This application to the first
occurrences is a very important point and will be addressed later
in this article.)
Following Reference 3 the following term is the Growth
Potential failure intensity.
GP = A+ (1-d) B
(13)
The Growth Potential Average Time Between Failure, 1/GP, is the
maximum Average Time Between Failure that can be attained with
the current management strategy. This maximum is attained when
all Type B failure modes in the fleet have been seen in the data set
and corrected. The function h(T) is also the failure intensity for all
Type B failure modes in the fleet that did not appear in the current
data set. The growth potential is attained when h(T) is zero.
Application of the In-Service Reliability Growth
Model: Wearout Case
In the interest of space, the application of the in-service reliabil-
ity growth projection model will be illustrated by example.
Steps are given which can be followed as a template for general
application. The example illustrates the application to a system
that wears out and is overhauled. A statistical goodness of fit test
strongly indicates that these systems follow the Power Law
process as discussed earlier.
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
S e c o n d Q u a r t e r - 2 0 0 3
7
The system in this example (System 2) is overhauled but does not
have a fixed overhaul interval. The prime cost savings strategy
under consideration is to increase reliability and reduce failures.
(This is hypothetical data and does not represent any actual system.)
STEP 1. We first obtain a sample of size K from the fleet. A
cycle is a complete history from overhaul to overhaul. For our
data set, K = 27 systems are chosen at random. For these sys-
tems the failure history for the last completed cycle is recorded.
This is the random sample of data from the fleet. See Table 2.
These systems are in the order they were selected.
Table 2. System 2 Data
The failure intensity parameter S of interest is the average num-
ber of failures per cycle operating hour. The total accumulative
operating time is 52,110 hours, with 37 failures. The baseline
parameter S is estimated by
= 37/52,110 or 0.00071. This esti-
mates that for each system hour of operating time in the fleet, there
is 0.00071 failures, or a failure every 1408 hours, on average.
STEP 2. In order to apply the projection model we put the N =
37 failure times on an accumulative time scale over (0, T), where
T = 52,110. In the example each Ti corresponds to a failure time
Xi,j. This is often not the situation. However, in all cases the
accumulated operating Yq at a failure time Xi,r is
and q indexes the successive order of the failures. In the exam-
ple, N = 37, Y1 = 1,396, Y2 = 5,893, Y3 = 6,418, Y37 = 52,110.
See Table 3.
Table 3. Ordered System 2 Data and Failure Mode
Classification
Suppose our objective is to lower S by selective corrective
actions to the systems based on information in the sample. The
projection model estimates this lower value, P.
Each system failure time in Table 2 corresponds to a problem and
a cause, that is, a failure mode. The management strategy can
either not fix the failure mode-Type A, or fix the failure mode-
Type B. To apply the projection methodology, each accumulat-
ed operating time in Table 3 is designated as being caused by
either a Type A mode or a distinct Type B mode. In this exam-
ple, there are 13 distinct corrective actions corresponding to 13
distinct Type B failure modes. There are NA = 4 failures due to
failure modes that will not receive a corrective action. There are
NB = 33 total failures due to M = 13 distinct Type B failure
modes that will be corrected. Some of the distinct Type B modes
had repeats of the same problem. For example, mode B1 had 12
occurrences of the same problem. Type B mode B13 had only
one occurrence of that particular problem.
The objective of the projection model is to estimate the impact of
the M = 13 distinct corrective actions.
STEP 3. Choose an average effectiveness factor based on the
proposed corrected actions and historical experience. Historical
industry and government data supports a typical average effec-
tiveness factor EF = 0.70 for many systems.
In the System 2 application an average EF = 0.4 was assumed in
order to be very conservative regarding the impact of the pro-
posed corrected actions.
(Continued on page 10)
S
^
==
+=
==
K
1j
j
1-r
1j
jri,q
N
N
and
N
...
1,2,
q
where,T
X
Y
System
Cycle Time Tj
Nj
Failure Times Xi,j
1
1396
1
1396
2
4497
1
4497
3
525
1
525
4
1232
1
1232
5
227
1
227
6
135
1
135
7
19
1
19
8
812
1
812
9
2024
1
2024
10
943
2
316, 943
11
60
1
60
12
4234
2
4233, 4234
13
2527
2
1877, 2527
14
2105
2
2074, 2105
15
5079
1
5079
16
577
2
546, 577
17
4085
2
453, 4085
18
1023
1
1023
19
161
1
161
20
4767
2
36, 4767
21
6228
3
3795, 4375, 6228
22
68
1
68
23
1830
1
1830
24
1241
1
1241
25
2573
2
871, 2573
26
3556
1
3556
27
186
1
186
Totals
52110
37
-
q
Yq
Mode
q
Yq
Mode
1
1396
B1
20
26361
B1
2
5893
B2
21
26392
A
3
6418
A
22
26845
B8
4
7650
B3
23
30477
B1
5
7877
B4
24
31500
A
6
8012
B2
25
31661
B3
7
8031
B2
26
31697
B2
8
8843
B1
27
36428
B1
9
10867
B1
28
40223
B1
10
11183
B5
29
40803
B9
11
11810
A
30
42656
B1
12
11870
B1
31
42724
B10
13
16139
B2
32
44554
B1
14
16104
B6
33
45795
B11
15
18178
B7
34
46666
B12
16
18677
B2
35
48368
B1
17
20751
B4
36
51924
B13
18
20772
B2
37
52110
B2
19
25815
B1
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
S e c o n d Q u a r t e r - 2 0 0 3
10
The first term of the projection model (12) is estimated by
Also,
For an average EF, the second term of the projection model is
estimated by
For d = 0.4, the estimated second term is
This estimates the Growth Potential failure intensity.
STEP 4. To estimate the last term dh(T) of the projection model
(12) we partition the data in Table 3 into intervals. This partition
consists of D successive intervals. The length of the qth interval
is Lq, q = 1, ..., D. It is not required that the intervals be of the
same length, but there should be several, say at least 5, cycles per
interval on average. Also, let S1 = L1 , S2 = L1 + L2, ..., etc., be
the accumulated time through the qth interval. For the qth inter-
val we note the number of distinct Type B mode, MIq, appearing
for the first time, q = 1, D. See Table 4.
Table 4. Grouped Data for Distinct Type B Modes
The third term is estimated by
where
(14)
and the values
and
satisfy
(15)
(16)
This is the grouped data version of the Crow (AMSAA) model
applied only to the first occurrence of distinct Type B modes.
For the data in Table 3 we chose the first 4 intervals of length
10,000, and the last interval of length 12,110. That is, D = 5.
This choice gives an average of about 5 overhaul cycles per
interval. See Table 5.
Now,
= 0.0033 and
= 0.762, which gives
Consequently, for d = 0.4 the last term of the projection model is:
STEP 5. The projected failure intensity is
Table 5. Example of Grouped Data for Distinct Type B Modes
The projected failure intensity is
This estimates that the 13 proposed corrective actions will
reduce the number of failures per cycle operation hour from the
current
The average time between failures is estimated to increase from
the current 1,408 hours to 1,877 hours.
The growth potential failure intensity is
Methods for Reducing the Cost . . .
(Continued from page 6)
0.000077.
^
or
/TN
^
AAA
==
0.000633.
^
or
/TN
^
BBB
==
.
^
d)
-
(1
B
0.000380.
^
d)
-
(1
B
=
Interval
No. of Distinct Type B Mode
Failures
Length
Accum.
Time
1
MI1
L1
S1
2
MI2
L2
S2
q
MIq
Lq
Sq
D
MID
LD
SD
(T)h^
d
1-
^
T
^^
(T)h^
=
^
^
[ ] [ ] [ ] [ ]
[ ] [ ]
=
D
q 1
^
1-q
^
q
1-q
^
1-qq
^
q
q
S
-
S
S
Ln
S
-
S
Ln
S
MI
.
T
M
^
^
=
^
^
0.00019.
T
^^
(T)h^
1-
^
==
0.000076
(T)h^
d
=
or
(T)h^
d
^
d)
-
(1
^
^
BAP
++=
9).0.4(0.0001
3)0.6(0.0006
0.000077
^P
++=
Interval
No. of Distinct Type B Mode
Failures MIq
Length Lq
Accum.
Time Sq
1
4
10000
10000
2
3
10000
20000
3
1
10000
30000
4
0
10000
40000
5
5
12110
52110
Total
13
-
-
0.000533.
^P
=
0.000533.
^
to0.00071
^
PS
==
0.00038
^
d)
-
(1
B
=
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
S e c o n d Q u a r t e r - 2 0 0 3
11
and the estimated maximum average time between failure that
can be attained with this management strategy is 2,631 hours,
i.e., when h(T) = 0.
STEP 6. The cost reduction associated with incorporating these 13
Type B-mode corrective actions can be calculated by considering
the reduction in fleet failures. The System 2 population has an
average of 440,000 fleet hours per year. Based on data it is esti-
mated that 74% of the failures result in an overhaul. Currently,
there is an estimated average of 231 overhauls per year. In this
example, we assume each overhaul costs $60,000, for an average
annual overhaul cost of $13,885,157. By increasing the average
time to failure from 1,408 to 1,878 hours, under the same assump-
tions, we estimate 173 overhauls per year at a cost of $10,412,969.
Thus, the estimated projected annual cost savings is $3,470,000 if
the 13 corrective actions are implemented.
Application of the In-Service Reliability Growth
Model No Wearout Case
We next give an example of the in-service reliability growth model
to medical equipment. This system is not overhauled, operates
continuously, and has a constant intensity of failure. The current
interest is to reduce maintenance costs by increasing reliability.
Note: For this example, the author modified all the actual real-
life data by a randomly selected number. The important points
are reflected in the relative comparisons of numbers in this exam-
ple, and these relative comparisons are in proportion to the real-
life numbers and results. However, because of the data modifi-
cation, the example failure times and all the example reliability
numbers have no relationship to the actual real-life numbers.
The project plan for Project X identified design changes (based
on field experience data) that are corrective actions for 11 dis-
tinct failure modes. The project plan scope was limited to one
specific assembly. These modes were labeled as Bi modes,
where i = 1 to 11. Failure modes in that assembly that were not
addressed and all failure modes in other assemblies were labeled
A modes. These 11 corrective actions were implemented.
For this example, service records of unscheduled maintenance
events were used to estimate the growth statistics based on sys-
tem data before any of these 11 corrective actions were imple-
mented. Field data was pulled for a sample of 60 units. See
Table 6 for an example of the data. The data was "cleaned"
before the analysis to remove scheduled maintenance events and
non-chargeable failures. For this analysis, there was a total of
241,000 hours of data, 450 total failures, 42 B-modes, and 408
A-modes. Statistical methods were applied to the data to verify
constant failure intensity. Based on this data, the system, before
any corrective actions, had an MTBF of 536 hours.
The reliability growth projection model was then applied to this
set of data for the 11 corrective actions (450 total failures, 42
total B-modes, and 408 A-modes failures), with an assumed
effectiveness factor of 0.70. Based on this growth model, the
estimate for the improved MTBF is 567 hours.
To test the validity of the model, another set of data was analyzed
after the 11 corrective actions were implemented. The original
data analysis the project team used to determine the design
changes was not available.
Table 6. Example of Medical Device Data
Data was pulled for a second sample of 60 units that had the cor-
rective actions implemented. This data set had 134,000 operating
hours and 237 failures. The calculated MTBF is 565 hours com-
pared to the projection of 567 hours. Given the actual MTBF, the
true effectiveness factor is calculated to be 0.73. When looking at
the B-modes specifically, the MTBF of the B-modes improved
from 5,700 hours to 11,600 hours which was considered a success
even though the system reliability did not change significantly.
After realizing the impacts that can be made, management want-
ed to see improvements across all assemblies. A new manage-
ment strategy was devised that put all modes on the list for
improvement. The A-modes have been converted into an addi-
tional 11 B-modes giving the system a total of 22 B-modes.
With the demonstrated effectiveness factor of 0.73, we should
expect to see the MTBF grow from 565 hours to 2,571 hours. A
Life Cycle Cost (LCC) analysis shows that this improvement in
reliability has the potential to yield huge savings in service costs
and easily justifies the effort.
The example on the application of these methods to a medical sys-
tem was developed in collaboration with Siemens Medical
Systems. Siemens noted that: "The increased effectiveness fac-
tor was achieved with the help of RAC. This included audits of
our processes and the establishment of a Reliability Program that
included derating, reliability prediction with PRISM, and HALT
testing. We are continuing to work on our processes to sustain our
effectiveness factor and hopefully improve upon it. The Process
Grade Factors in PRISM is a tool we are using to point out
improvement areas. We are also spending more time setting tar-
gets since the targets are very aggressive. Reliability growth will
be used more extensively throughout the TAAF phase."
Acknowledgements
I would like to thank Jim Harriett and Lonnie Scott, with the reli-
ability team at the Cherry Point Naval Depot for implementation
Cum Hours
Mode
Cum Hours
Mode
7661
A
10556
A
7979
A
12498
A
8614
B1
12886
A
9568
A
12886
A
9568
A
12886
A
9885
A
13663
A
9885
A
14934
B2
10556
A
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
S e c o n d Q u a r t e r - 2 0 0 3
12
of the methodologies to Navy systems. I thank Paul Oldham,
IMMC, Redstone Arsenal, for implementation of the methodolo-
gies to Army systems. I would also like to acknowledge Kevin
O'Shaughnessy at the Reliability Analysis Center for his excel-
lent support, and for being instrumental in the application of
these methods at Naval Depots. I acknowledge Pete Hurley of
Siemens Medical Systems for the example on the application of
these methods to a medical system. I would also like to thank
Adamantios Mettas of the ReliaSoft Corporation for the excel-
lent application of ReliaSoft software to perform the growth
analyses on several large data sets used in the examples.
References
1. R.E. Barlow and F. Proschan, Mathematical Theory of
Reliability, John Wiley & Sons, Inc., 1967.
2. L.H. Crow, Reliability Analysis for Complex, Repairable
Systems, in Reliability and Biometry, ed. by F. Proschan
and R.J. Serfing, pp. 379-410, 1974, Philadelphia, SIAM.
3. L.H. Crow, Reliability Growth Projection from Delayed
Fixes, Proceedings 1983 Annual Reliability and
Maintainability Symposium, pp. 84-89.
4. L.H. Crow, Evaluating the Reliability of Repairable
Systems, Proceedings 1990 Annual Reliability and
Maintainability Symposium, pp. 275-279.
About the Author
Larry H. Crow is VP, Reliability & Sustainment Programs, at
Alion Science and Technology, Huntsville, AL. Previously; Dr.
Crow was Director, Reliability, at General Dynamics ATS (for-
mally Bell Labs ATS). From 1971-1985, Dr. Crow was chief of
the Reliability Methodology Office at the US Army Materiel
Systems Analysis Activity (AMSAA). He developed the Crow
(AMSAA) reliability growth model, which has been incorporat-
ed into US DoD handbooks, and national & international stan-
dards. He chaired the committee to develop MIL-HDBK-189,
Reliability Growth Management and is the principal author of
that document.
He is the principal author of the IEC
International Standard 1164, Reliability Growth-Statistical Tests
and Estimation Methods. Dr. Crow is a Fellow of the American
Statistical Association, the Institute of Environmental Sciences
and Technology, and the recipient of The Florida State
University "Grad Made Good" Award for the Year 2000, the
highest honor given to a graduate by Florida State University.
Larry H. Crow, Ph.D.
Alion Science and Technology
215 Wynn Drive, Suite 101
Huntsville, AL 35805
Internet (E-mail):
PRISM Column
At no charge to RAC PRISM users, the RAC provides an open
training course to teach users the most efficient and effective
uses of the software. This July, RAC will present its eighth open
training course on PRISM. The course has evolved over the past
three years into a comprehensive program providing a brief
introduction to reliability, in-depth instruction on PRISM analy-
sis and techniques, and the theory behind the PRISM models.
This evolution paralleled the evolution of the software.
Users will leave this hands-on training course with the knowledge
and expertise to effectively utilize PRISM. This training course is
specifically designed to assist users in making the transition from
novice to journeyman PRISM User. This two day training course
begins with a basic overview of reliability and it's evolution over
the past fifty years. Building on this foundation, users learn to
develop a basic parts count analysis, expand into the development
of a detailed stress analysis, and move on to include experience
data for performing a Bayesian analysis.
Other techniques for building comprehensive analyses include
the application of the Process Grading methodology introduced
with PRISM and the importing techniques used to make predic-
tion development easier.
Using this training technique, the PRISM staff has trained users
from over 100 companies and organizations around the globe.
Users who have participated in the course have had the follow-
ing to say about it.
·
"This was an excellent overview."
·
"I came in knowing little or nothing about PRISM or how
to use it. Now I can generate reliability predictions and
have a basis for understanding the theory behind the
PRISM models."
·
"I feel much more able to do a credible analysis now."
·
"I have a much better understanding of PRISM, both how
to use (it) and what it `means'. I also now know where to
find detailed background info."
·
"Thanks This is my beginning education on prediction.
(I) have always resisted before due to lack of confidence
in the end result."
Registration for any open RAC PRISM training course is free of
charge to licensed PRISM users. Individuals who are not cur-
rently licensed users can purchase the PRISM software for
$1,995 ($2,195 International) and attend the course at no charge.
To register for a PRISM Training Course go to .
Course
seating is limited to 20 individuals. The RAC can also provide
on-site training. On-site training is provided at no charge when
6 or more copies of the PRISM software are purchased. If you
have any questions, feel free to contact the PRISM training coor-
dinator, Gina Nash at (315) 339-7047.
For more information on PRISM feel free to contact the PRISM
team by phone (315-337-0900), by E-mail () or through the PRISM Forum ().
|
|
|
|