|
|
| RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center
INSIDE
T h e J o u r n a l o f t h e
6
Military Systems
Sustainability: A
Lean Model
11
Markov Analysis
19
PRISM Column
21
Future Events
22
From the Editor
22
RMSQ Headlines
23
RAC Product News
Reliability Analysis Center
Third Quarter - 2003
Introduction
Any field has its beginners, sometimes not
through their personal choice. Economic condi-
tions over recent years have resulted in massive
layoffs, with many opting to take early retirement
if offered. Others are being asked to do work that
they have never done before. Mergers may mean
that you suddenly own equipment that you don't
know how to use. And there will always be peo-
ple who choose anew a field perhaps they are
fresh out of college or transferring from another
department.
Working with Highly Accelerated Life Testing
(HALT) is for many a new experience, one that
can be an exciting challenge. The author has been
involved with the process for over a decade. She
has met many engineers who have a very good
understanding of the process but has become
aware that many others are new to the subject.
In discussing HALT, the first thing to remember
is that there is no one right way to run HALT.
Table 1 is a brief review of what HALT is and
what HALT isn't.
Usefulness of Accelerated Testing
No matter how it's viewed, business is tough
right now. Customers, whether personal con-
sumers, government agencies, or departments
within a business, expect more for less. People
want high reliability, low cost, the latest technol-
ogy, and something that will last.
It is difficult to please everyone. You need to beat
your competitor to market but must ensure that
your product will last. How do you do this? One
of the easiest ways of addressing this issue is by
accelerating the testing process. But you can't
afford to test a light bulb for 20 years to verify it
will last 20 years. You need to speed things up.
You can make the light bulb fail simply by push-
ing it off a table. It doesn't teach you much,
though, other than that gravity still works. What
is needed is a logical approach that allows you to
"accelerate" the time. For some forms of accel-
erated testing, there are software programs and
different scientific formulas that can help you to
correlate the life at the accelerated conditions to
the life at normal conditions. But the main point
is that you will see what failures are most likely
to happen out in the field. You should be able to
find out within a matter of days what you might
not have been able to find out for years.
Why is this important? One major company
expense is warranty issues. Every time an auto-
mobile manufacturer finds a failure and fixes the
weakness before the part can get into a car out on
the road, their accounting department comes
By: Chris Hanse, C. Hanse Industries Inc.
A Beginners Guide to HALT
35
Years
Of Leadership
in R&M
Celebrating 35 Years
of Excellence in R&M
Table 1. HALT: What It Is and Is Not
HALT is ...
HALT is not ...
· An excellent tool for learning about the types of failures you
can expect to see during the lifetime of a product.
· A way of finding the absolute limits of a product.
· Similar to classical ESS tests, taking it through to the failure
stage (suggested as the first test with ESS, though many
choose to skip that).
· A way of saving money through warranty costs by catching
possible failures ahead of time.
· A process that allows you to understand your product more
fully.
· A technique for developing a robust design.
· The only tool in the toolbox.
· A pass/fail test. You are expecting it to fail the
product - that is the reason for the test.
· As slow as ESS or performed at the same
levels.
· A guarantee of any given dollar amount saved.
That amount depends on the nature of the
product.
· Magic that fixes everything.
· Equal to Accelerated Life Testing (i.e., no
measurement of actual reliability is possible).
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
T h i r d Q u a r t e r - 2 0 0 3
2
back and tells them how much Engineering just saved their com-
pany. You can literally save millions by catching something
early, and increase customer loyalty and confidence. You save
money and also increase the probability of repeat business.
Starting Temperature
A good starting temperature for all of the tests is laboratory
ambient, typically around 25°C. Why do we use that term? The
author is a member of several working groups for the
International Electrotechnical Commission.
These working
groups have found that "ambient" means different things to dif-
ferent people. The older definition of ambient is the chamber
temperature at any given time. However, most people use the
term to mean the room temperature at which they are doing their
testing. By using "laboratory ambient," we ensure that people
understand that their testing starts at about room temperature.
Remember to constantly monitor the product. Ford Electronics,
before becoming Visteon, reported publicly that 50% of inter-
mittent failures would not have been caught without constant
monitoring. It is not good enough just to take a reading at the
beginning and a reading at the end.
A Series of Tests
Keeping in mind that there is no one perfect way to run a HALT,
here are a few basic principles. HALT is actually a series of
tests. You should have more than one unit available for the test-
ing, preferably one for each test of the series.
From the purist standpoint, the best way to start is by testing
using single environments, then run with combined environ-
ments for comparison. These are the six standard tests that we
recommend, although there are others:
·
Cold only
· Temperature swings
·
Vibration only
· Heat with vibration
·
Heat only
· Temperature swings with vibration
Of course, if you are concerned about lower temperatures, you
can add cold with vibration. If worried about humidity, do a
Humidity-only test and then combine it with other environments.
Power cycling can also be very beneficial.
Step by Step
Once the environment is selected, how do you start? The key is
in knowing your product. Is it as small as a PCB or as big as a
tank? How long will it take to stabilize at a temperature? Are
some components more likely to fail than others?
Along these lines, how do you know if your product is stabilized
at temperature? Consider the following example. We are testing a
console for a vehicle that is roughly the same width as the ceiling
of a minivan. Seldom you could choose one area of the console
and be confident that its temperature is representative of the tem-
perature of the entire unit. The simple rule of thumb is: the larg-
er the product, or the more diverse its components, the more ther-
mocouples need to be used. You will still control one main ther-
mocouple. This master control thermocouple should be placed in
a spot where you are comfortable with it representing the entire
unit, if that is possible, or on the most sensitive component.
Fixture your product, if necessary, and hook up any wiring that
may be needed. Keep in mind that your wiring or cabling will
be seeing the same extremes as your product!
The order of the tests is up to you, but keep in mind that starting
with single environments yields a good baseline for what you can
expect when you move to combinations. For example, if your
first test is heat combined with vibration, and you find a failure,
can you tell by looking at the product whether temperature or
vibration or a combination caused the failure? If you first run a
Heat Step Test and a vibration only test, you can more readily see
if one of the environments or the combination caused the failure.
What is a Failure?
Different companies will judge failures differently. One compa-
ny visited by the author set product aside (i.e., did not ship)
because of pinpoint scratches in the paint job. To them, that was
considered enough of a "failure" that they refused to ship it out
to a customer.
To some, finding the first intermittent failure is sufficient.
Others will want to find all "hard" failures. What you consider
as a failure is something that you should keep in mind when you
are planning your test.
The Tests
Having reviewed the main points that you need to consider
before the test, we now consider the tests themselves. We will
take a look at the different considerations for each of the tests
and the reasons for running them.
Single Stress Tests
Cold Step Test. Using cold temperatures tends to be the least
destructive of the single stress tests, making it a good starting
point for doing testing. You can choose the size of the steps
based upon your knowledge of the product. If, for instance, you
are concerned about the effects of cold on it, you should make
the steps smaller. If you are looking for baseline data, you may
want to start with larger steps and then make them smaller if you
have a premature failure. With cold, many engineers are com-
fortable starting with steps of 10°C per minute change rate.
Once you have decided on your ramp rate, you need to decide
your dwell time. How long should your product stay at temper-
ature before ramping again? The prevailing opinion is that you
keep the dwell time to the minimum needed to stabilize the prod-
uct. For something like a PCB, the dwell time may be only five
minutes. If testing an assembly, it may need to be longer. Again,
use your own expertise to decide.
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
T h i r d Q u a r t e r - 2 0 0 3
3
Before the tests even start, you should be thoroughly familiar
with the product to be tested. You should know as much as pos-
sible of the actual operating environment, then test accordingly.
Remember: End users are always harder on a product than you
will expect them to be, and they will expect it to continue work-
ing anyway.
Consider Figure 1. The figure shows an example of a Cold Step
Test. Starting at laboratory ambient and holding until stabiliza-
tion, the test temperature is lowered in 10°C steps as quickly as
possible, and held at each temperature for ten minutes. The steps
are continued until there is a failure.
Figure 1. Example of a Cold Step Test
Heat Step Test. The heat test is very similar to the Cold Step
Test, except with temperature increasing, and is based on the
same principles as the Cold Step Test. First, allow the product
temperature to stabilize at laboratory ambient, and then ramp the
temperature up, dwelling at each temperature step. Heat tends to
be more destructive than cold, so you may choose to raise your
temperature only 5°C per minute. Figure 2 depicts a Heat Step
Test (ramp shown at 10°C per minute).
Figure 2. Example of a Heat Step Test
The author once had the opportunity to work with a company
making a product used in a hospital environment. Their first hard
failure came at only 2°C above laboratory ambient. At first they
were not concerned, reasoning that hospitals are air-conditioned
and the product would never be exposed to surrounding air that
was "too warm." The author pointed out that she had an extend-
ed stay at a hospital where the temperature was controlled based
on the date (i.e., calendar intervals denoted winter, summer, etc.).
The heating automatically turned on, triggered by the date, not the
temperature. The temperature in patient rooms approached 30°C.
The customer reworked the board, and their unit is now number
one on the market. The moral is that you must ensure that you
have some margin between expected use circumstances and what
your product can actually survive. The author recommends that
you establish a worst-case scenario and then add a percentage.
Vibration Only Test. You've gotten through the easy tests, heat
and cold. You have monitored the test and logged the data. You
have made any changes to the product that you feel are neces-
sary. Now you are ready for vibration testing.
Once again you start with the temperature at laboratory ambient
to make sure that temperature is not going to affect the test. It
may not seem like a real world situation, but right now we are
concentrating purely on the vibration. The standard way of
measuring a vibration test is with the g level. But where do we
measure the vibration?
If you measure at the bottom of the vibration table, you are really
measuring the vibration of the table, not your product. The best
placement for the accelerometer is on or near your product. Some
prefer to attach the accelerometer to the fixture holding the prod-
uct in place, a perfectly acceptable approach. If there is no easy
way to affix it to the product or fixture, then you mount it near the
fixture on the tabletop. This position will give at least a close
approximation of the vibration that the product is experiencing.
Vibration, as opposed to temperature, must be handled in very
small increments. It can be difficult to control very tightly, so we
suggest starting at 2 g and moving up 2 g at a time. Choose the
dwell time based on your knowledge of the product. There
should be an ample settling time (ten minutes is often used).
Continue stepping up until you see what you consider to be a
failure, again monitoring and data logging as you go. Figure 3
depicts a typical Vibration Only Test.
Figure 3. A Typical Vibration Only Test
Laboratory
Ambient
Lower Temperature 10°C every 10
minutes until failure.
Ramp rate as fast as possible.
Time in Minutes
Temp
(°C)
30
20
10
-50
0
-10
-20
-30
-40
Laboratory
Ambient
Raise Temperature 10°C every 10
minutes until failure.
Ramp rate is as fast as possible.
Time in Minutes
80
70
60
0
50
40
30
20
10
90
Temp
(°C)
Time in Minutes
14
12
10
0
8
6
4
2
16
Vibration
Level
(g)
Add 2 g every 10 minutes until failure.
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
T h i r d Q u a r t e r - 2 0 0 3
4
Combinations of Stresses
Thermal Swing Test. The next logical test after performing
cold, heat, and vibration testing is the Thermal Swing Test, in
which the heat test is combined with the cold test. Figure 4 is an
example of a Thermal Swing Test.
Figure 4. An Example of a Thermal Swing Test
Set the chamber to begin at laboratory ambient and allow the prod-
uct temperature to stabilize. Decide whether you would rather first
go hotter or colder. Humidity can affect this test. Adding nitrogen
to the chamber air will help to get rid of any latent humidity in the
product. Decide on how to ramp the temperature, typically either
5 or 10°C at a time, and select a dwell time.
As shown in Table 2, each step should get wider. Assume that
you choose to ramp with a 5°C increment and your starting tem-
perature is 25°C. You've chosen 5-minute dwell times, and
decide to go hot first. Each ramp in the opposite direction is
increased by another 5°C.
Table 2. Thermal Swing Test Temperature Increments
It is not unusual for a product to fail at extremely low tempera-
tures and then begin working once it warms up. If you find a
failure during the cold portion of this test, proceed to the next hot
step and determine if the product again works.
There are valid reasons for doing the Thermal Swing Test. The
difference between thermal coefficients can cause parts to pull
away from each other, sometimes causing cracks. By applying
temperature changes as fast as possible, we stress the product
beyond what it would normally see. Some people claim that a
temperature change of over 30°C per minute never occurs in
most circumstances, but consider the following example.
The author lives in Michigan, which, although not the coldest
winter state, typically gets a wind chill of -40°C at least once
every winter. Assume my car gets stuck in the snow within a
mile of a customer's office. I decide to walk. My cell phone
temperature drops from 25°C to near -40°C as I make a call to
tell the customer of the circumstances. I pull out my personal
digital assistant (PDA) to get the customer's number.
I arrive at the customer's office, shivering, ask if we can turn the
heat up, and try to get my mind on business. The cell phone and
PDA warm up to 25°C. By this time, the electronics and I have
been thoroughly stressed! Undoubtedly, you can think of other
severe circumstances to which we subject electronics products.
Heat with Vibration Test. It is time to test your product com-
bining heat and vibration. What should be the vibration level?
Assume that your product had a failure at 10 g's in the previous
Vibration Only Test. Since you already know that this is the
breaking point, you need not begin with that high a level. Most
of the engineers with whom the author has worked choose a level
that is about 80% of the breaking point. In this case, that would
mean 8 g's. You are trying to learn what will happen when you
combine thermal factors with the vibration.
What is the best way to do a combined heat and vibration test?
Again, you must rely on your expertise and knowledge of the
product. It is best to follow closely the temperatures used in the
original Heat Step Test, again starting at laboratory ambient.
That will give you a guide as to where you would expect a heat-
related failure. If the temperature was extremely high, you may
want to skip some of the lower temperatures. Assume that your
product didn't show a failure until 110°C you may want to start
at 60°C or so and start your steps from there.
The example in Figure 5 shows 10-minute dwells, with vibration
being run (at lower than failure level) for two minutes out of
every five. Pulsing or constant running can cause failures. You
must decide the best way to run the test.
Figure 5. Example of Heat With Vibration Test
At this point, comparison gets interesting. The typical result will
be that the product will fail at a slightly lower temperature once
vibration is added than it will using heat alone. There are always
exceptions to every rule. This particular test should tell you a lot
Time in Minutes
80
60
40
-60
20
0
-20
-40
100
Temp
(°C)
Temperature Swings gradually growing
larger until failure is found. Ramp rate
as fast as possible.
Laboratory
Ambient
Step No.
Temperature (°C)
Difference (°C)
1
25
None
2
30
5
3
20
10
4
35
15
5
15
20
Continue testing until failure
Time in Minutes
80
70
60
0
50
40
20
10
30
Temp
(°C)
Step up heat 5 or 10°C every 10 minutes
while applying vibration 2 minutes out of
every 5.
90
Thermal
Vibration
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
T h i r d Q u a r t e r - 2 0 0 3
5
about how your product will survive a combination, which is
what it will no doubt see in real life usage.
Thermal Swings with Vibration Test. Using the data that you
learned from your heat plus vibration test, apply the same princi-
ples to the next test, in which we combine thermal swings with
vibration. Again, you may want to skip ahead a number of steps if
your failure occurred at a temperature far from laboratory ambient.
Figure 6 shows a typical Thermal Swings with Vibration Test.
Figure 6. Example of Thermal Swings With Vibration Test
The value of the Thermal Swings with Vibration Test was shown
in a case described in a machine design magazine article. The
article discussed how some companies are using extremely cold
temperatures to "de-stress" equipment that will go into a high
vibration area. Applying the principles found in this article, the
author ran a few experiments with totally different types of prod-
ucts. In each case, by combining vibration with cold tempera-
tures, the products could actually withstand more vibration than
they could at ambient temperatures. You may see this same
result when performing the Thermal Swings with Vibration Test.
You may want to add a cold test with vibration if your product is
expected to survive low temperatures.
Now What?
You've got all of these numbers and failure times. Now what do
you do? If your product has lived up to all the conditions it will
experience in real life, plus a reasonable margin, you may not
want to make any design changes.
If testing does not cause failure, you have not necessarily failed
in designing the tests. It most likely means that you already have
an extremely tough product ready for market. If you do find a
failure, that is a good thing. HALT is a learning tool. You can
learn the product's weaknesses quickly, characteristics that might
otherwise have taken months, and a lot of warranty replace-
ments, to discover.
No Magic Bullet
If you use HALT testing, will you somehow be able to tell exact-
ly how long the life of the product will be when put out on the
market? Probably not. Unless you already have field failures on
the same product, or one almost identical, it can be very hard to
make a true time correlation. If you have field failures and can
reproduce them in the lab under certain stresses and a given
amount of time, the answer is yes. In that case, you can make a
good correlation. But HALT is not intended to provide a meas-
ure of reliability, only a lower bound. Conventional Accelerated
Life Testing is better for testing at accelerated (but not highly
accelerated) levels and then correlating the reliability measured
at the accelerated levels to "normal" operating levels.
Is a rapid thermal cycling chamber with tri-axial vibration the only
test equipment you will ever need again? No. Mechanics have
hammers, screwdrivers, and wrenches in their toolbox. They can't
use one tool to fix everything. The same goes for test equipment.
HALT is an extremely valuable tool but should not be considered
your only one. Other testing, and test equipment and facilities, are
needed as part of a comprehensive test program.
One of the best industry changes that the author has seen is the
willingness of engineers to share more non-proprietary informa-
tion. If you want to learn more about HALT, look into groups
like the IEEE AST (Accelerated Stress Test) group, take a semi-
nar, and look into user groups. When people share more infor-
mation, we don't all have to start at square 1. Look to people you
know you can trust.
Final Note
The most important thing to remember in doing HALT testing is
to rely upon your own knowledge of the product. Preplan, know-
ing the end use environment. For materials, know your melting
points. Keep an open mind as you test, remembering that a prod-
uct failure is not the same as a personal failure. Work with
design engineers, project engineers, production engineers, man-
agement, and anyone else who can give good input.
The author believes that each of us engineers, manufacturers,
scientists, and technicians can make a difference and help the
world to be a little bit better place to live in. HALT can help you
do your job better than ever before, turning out a safer, less
expensive, and more reliable product for your customers.
For More Information
1. Chan, H. Anthony and Paul J. Englert, Editors, Accelerated
Stress Testing Handbook: Guide for Achieving Quality
Products, John Wiley & Sons, 2001.
2. Hobbs, Gregg K., Accelerated Reliability Engineering:
HALT and HASS, John Wiley & Sons, 2000.
3. IEEE AST Proceedings, 2002 (available through author).
4. McLean, Harry W., Halt, Hass, and Hasa Explained:
Accelerated Reliability Techniques, ASQ Quality Press, 2000.
About the Author
Chris Hanse is a partner with her husband in C. Hanse Industries
Inc. (CHI), which provides training and consulting in manufac-
Time in Minutes
80
60
40
-60
20
0
-20
-40
Temp
(°C)
100
Laboratory
Ambient
Thermal
Vibration
Thermal swing test adding minutes of
vibration for 2 minutes of test
T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r
T h i r d Q u a r t e r - 2 0 0 3
6
turing. The company is dedicated to the testing community.
They help identify reliable equipment and provide the training to
use it, and are an accessible source of test consulting. CHI
makes HALT/HASS style chambers and chambers for thermal,
humidity, dust, and water testing. CHI's web site is at .
Chris is the winner of the 2002 IEST Climatics Award. She is a
Visiting Professor at BUAA (Beijing University of Avionics and
Aeronautics) and a worldwide teacher of HALT methods. She is
a member of IEST, IEEE (AST chair), the RMS Partnership
(Reliability, Maintainability, and Sustainability), and IEC,
Technical Committees (TC) 56 and 104, and liaison to ISO TC
108. She can be contacted at:
C. Hanse Industries
235 Hubbard Street
Allegan, MI 49010
Tel: (616) 673-8638
Fax: (616) 673-8632
E-mail:
Introduction
Since 1990, the Department of Defense (DoD) has reduced its
budget by 29%, a reduction that has greatly impacted weapon
system acquisition and in-service support [Cordesman 2000].
Reduced budgets have forced the military branches to extend the
life of legacy systems with significant reductions in acquisition
of replacement systems. In addition, current weapon systems are
faced with rising operations and maintenance ("sustainment")
costs due to:
·
Increased operational tempo.
·
Increased mean time between maintenance cycles due to
increased operational requirements.
·
Increased life extension of existing weapon systems due
to delays in new system acquisition.
·
Unforeseen support problems associated with aging
weapons systems.
·
Material shortages due to diminishing manufacturing
resources and technological obsolescence.
As sustainment costs increase, less funding is available to pro-
cure replacement systems.
An analysis conducted by DoD
[Gansler 1999] concluded that, unless mission requirements and
the operational tempo are reduced, or the budget significantly
increases, the operational maintenance cost portions of the budg-
et will equal the total current (net present value) budgets by the
year 2024. This chain of events has been characterized as the
DoD Death Spiral and is illustrated in Figure 1.
To waive off this death spiral, DoD must find innovative solutions
to support legacy systems, solutions that are both cost effective and
flexible. The DoD must economically manage these system life
cycles to address obsolescence and modernization issues without
degrading readiness, cost, and performance objectives.
A Lean Sustainment Enterprise Model for
Military Systems
To achieve a truly lean approach, some organizational structures
within the current military system must be integrated. The authors
propose a new Lean Sustainment Enterprise Model (LSEM) that
Figure 1. DoD Death Spiral
(Source: Dr. Gansler, USD(A&T), Acquisition Reform Update, January 1999)
calls for consolidating and integrating the following sustainment
functions: In-Service Engineering, Integrated Logistic Support,
Intermediate/Depot Maintenance, Operational Support, and
Supply Support. This realignment of the military sustainment sys-
tem mirrors a commercial Maintenance Repair and Overhaul
(MRO) operation. The goal is to achieve significant customer
service levels while reducing total ownership costs. The new
organizational framework allows close coordination between the
operational community and the supporting sustainment network
required to meet evolving life cycle support requirements.
The proposed enterprise model is illustrated in Figure 2. The
key attribute of this framework is that it is organized around
three primary sustainment structures: Operational Sustainment,
Sustainment Engineering, and MRO operations. These three
structures are consolidated into one Life Cycle Support
Facility, shown in the center. The Supply Chain that feeds this
new Facility is illustrated in Figure 2 to the right of the Facility,
and the Operational (O) Level and Intermediate (I) Level
Maintenance activities, which benefit from the Facility, are
illustrated on the left (as the Operational Support function).
Within the Life Cycle Support Facility, there exist the tradition-
al integrated logistic support (ILS) functions, such as training,
packaging, handling, shipping and transportation, and the com-
puter resources (CR), among others. These functions are now
part of what the authors call the first structure, the Operational
Military Systems Sustainability: A Lean Model
By: Mario F. Agripino, Tim P. Cathcart, and Dennis F.X. Mathaisel
Deferred
Modernization
Aging Weapon
Systems
Increased
Operations
Tempo
Reduced
Readiness
Increased
Maintenance
Increased
O&S Costs
Funding Migration
from Procurement to
O&S
|
|
|
|