RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center INSIDE T h e J o u r n a l o f t h e 6 Military Systems Sustainability: A Lean Model 11 Markov Analysis 19 PRISM Column 21 Future Events 22 From the Editor 22 RMSQ Headlines 23 RAC Product News Reliability Analysis Center Third Quarter - 2003 Introduction Any field has its beginners, sometimes not through their personal choice. Economic condi- tions over recent years have resulted in massive layoffs, with many opting to take early retirement if offered. Others are being asked to do work that they have never done before. Mergers may mean that you suddenly own equipment that you don't know how to use. And there will always be peo- ple who choose anew a field ­ perhaps they are fresh out of college or transferring from another department. Working with Highly Accelerated Life Testing (HALT) is for many a new experience, one that can be an exciting challenge. The author has been involved with the process for over a decade. She has met many engineers who have a very good understanding of the process but has become aware that many others are new to the subject. In discussing HALT, the first thing to remember is that there is no one right way to run HALT. Table 1 is a brief review of what HALT is and what HALT isn't. Usefulness of Accelerated Testing No matter how it's viewed, business is tough right now. Customers, whether personal con- sumers, government agencies, or departments within a business, expect more for less. People want high reliability, low cost, the latest technol- ogy, and something that will last. It is difficult to please everyone. You need to beat your competitor to market but must ensure that your product will last. How do you do this? One of the easiest ways of addressing this issue is by accelerating the testing process. But you can't afford to test a light bulb for 20 years to verify it will last 20 years. You need to speed things up. You can make the light bulb fail simply by push- ing it off a table. It doesn't teach you much, though, other than that gravity still works. What is needed is a logical approach that allows you to "accelerate" the time. For some forms of accel- erated testing, there are software programs and different scientific formulas that can help you to correlate the life at the accelerated conditions to the life at normal conditions. But the main point is that you will see what failures are most likely to happen out in the field. You should be able to find out within a matter of days what you might not have been able to find out for years. Why is this important? One major company expense is warranty issues. Every time an auto- mobile manufacturer finds a failure and fixes the weakness before the part can get into a car out on the road, their accounting department comes By: Chris Hanse, C. Hanse Industries Inc. A Beginners Guide to HALT 35 Years Of Leadership in R&M Celebrating 35 Years of Excellence in R&M Table 1. HALT: What It Is and Is Not HALT is ... HALT is not ... · An excellent tool for learning about the types of failures you can expect to see during the lifetime of a product. · A way of finding the absolute limits of a product. · Similar to classical ESS tests, taking it through to the failure stage (suggested as the first test with ESS, though many choose to skip that). · A way of saving money through warranty costs by catching possible failures ahead of time. · A process that allows you to understand your product more fully. · A technique for developing a robust design. · The only tool in the toolbox. · A pass/fail test. You are expecting it to fail the product - that is the reason for the test. · As slow as ESS or performed at the same levels. · A guarantee of any given dollar amount saved. That amount depends on the nature of the product. · Magic that fixes everything. · Equal to Accelerated Life Testing (i.e., no measurement of actual reliability is possible). T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r - 2 0 0 3 2 back and tells them how much Engineering just saved their com- pany. You can literally save millions by catching something early, and increase customer loyalty and confidence. You save money and also increase the probability of repeat business. Starting Temperature A good starting temperature for all of the tests is laboratory ambient, typically around 25°C. Why do we use that term? The author is a member of several working groups for the International Electrotechnical Commission. These working groups have found that "ambient" means different things to dif- ferent people. The older definition of ambient is the chamber temperature at any given time. However, most people use the term to mean the room temperature at which they are doing their testing. By using "laboratory ambient," we ensure that people understand that their testing starts at about room temperature. Remember to constantly monitor the product. Ford Electronics, before becoming Visteon, reported publicly that 50% of inter- mittent failures would not have been caught without constant monitoring. It is not good enough just to take a reading at the beginning and a reading at the end. A Series of Tests Keeping in mind that there is no one perfect way to run a HALT, here are a few basic principles. HALT is actually a series of tests. You should have more than one unit available for the test- ing, preferably one for each test of the series. From the purist standpoint, the best way to start is by testing using single environments, then run with combined environ- ments for comparison. These are the six standard tests that we recommend, although there are others: · Cold only · Temperature swings · Vibration only · Heat with vibration · Heat only · Temperature swings with vibration Of course, if you are concerned about lower temperatures, you can add cold with vibration. If worried about humidity, do a Humidity-only test and then combine it with other environments. Power cycling can also be very beneficial. Step by Step Once the environment is selected, how do you start? The key is in knowing your product. Is it as small as a PCB or as big as a tank? How long will it take to stabilize at a temperature? Are some components more likely to fail than others? Along these lines, how do you know if your product is stabilized at temperature? Consider the following example. We are testing a console for a vehicle that is roughly the same width as the ceiling of a minivan. Seldom you could choose one area of the console and be confident that its temperature is representative of the tem- perature of the entire unit. The simple rule of thumb is: the larg- er the product, or the more diverse its components, the more ther- mocouples need to be used. You will still control one main ther- mocouple. This master control thermocouple should be placed in a spot where you are comfortable with it representing the entire unit, if that is possible, or on the most sensitive component. Fixture your product, if necessary, and hook up any wiring that may be needed. Keep in mind that your wiring or cabling will be seeing the same extremes as your product! The order of the tests is up to you, but keep in mind that starting with single environments yields a good baseline for what you can expect when you move to combinations. For example, if your first test is heat combined with vibration, and you find a failure, can you tell by looking at the product whether temperature or vibration or a combination caused the failure? If you first run a Heat Step Test and a vibration only test, you can more readily see if one of the environments or the combination caused the failure. What is a Failure? Different companies will judge failures differently. One compa- ny visited by the author set product aside (i.e., did not ship) because of pinpoint scratches in the paint job. To them, that was considered enough of a "failure" that they refused to ship it out to a customer. To some, finding the first intermittent failure is sufficient. Others will want to find all "hard" failures. What you consider as a failure is something that you should keep in mind when you are planning your test. The Tests Having reviewed the main points that you need to consider before the test, we now consider the tests themselves. We will take a look at the different considerations for each of the tests and the reasons for running them. Single Stress Tests Cold Step Test. Using cold temperatures tends to be the least destructive of the single stress tests, making it a good starting point for doing testing. You can choose the size of the steps based upon your knowledge of the product. If, for instance, you are concerned about the effects of cold on it, you should make the steps smaller. If you are looking for baseline data, you may want to start with larger steps and then make them smaller if you have a premature failure. With cold, many engineers are com- fortable starting with steps of 10°C per minute change rate. Once you have decided on your ramp rate, you need to decide your dwell time. How long should your product stay at temper- ature before ramping again? The prevailing opinion is that you keep the dwell time to the minimum needed to stabilize the prod- uct. For something like a PCB, the dwell time may be only five minutes. If testing an assembly, it may need to be longer. Again, use your own expertise to decide. T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r - 2 0 0 3 3 Before the tests even start, you should be thoroughly familiar with the product to be tested. You should know as much as pos- sible of the actual operating environment, then test accordingly. Remember: End users are always harder on a product than you will expect them to be, and they will expect it to continue work- ing anyway. Consider Figure 1. The figure shows an example of a Cold Step Test. Starting at laboratory ambient and holding until stabiliza- tion, the test temperature is lowered in 10°C steps as quickly as possible, and held at each temperature for ten minutes. The steps are continued until there is a failure. Figure 1. Example of a Cold Step Test Heat Step Test. The heat test is very similar to the Cold Step Test, except with temperature increasing, and is based on the same principles as the Cold Step Test. First, allow the product temperature to stabilize at laboratory ambient, and then ramp the temperature up, dwelling at each temperature step. Heat tends to be more destructive than cold, so you may choose to raise your temperature only 5°C per minute. Figure 2 depicts a Heat Step Test (ramp shown at 10°C per minute). Figure 2. Example of a Heat Step Test The author once had the opportunity to work with a company making a product used in a hospital environment. Their first hard failure came at only 2°C above laboratory ambient. At first they were not concerned, reasoning that hospitals are air-conditioned and the product would never be exposed to surrounding air that was "too warm." The author pointed out that she had an extend- ed stay at a hospital where the temperature was controlled based on the date (i.e., calendar intervals denoted winter, summer, etc.). The heating automatically turned on, triggered by the date, not the temperature. The temperature in patient rooms approached 30°C. The customer reworked the board, and their unit is now number one on the market. The moral is that you must ensure that you have some margin between expected use circumstances and what your product can actually survive. The author recommends that you establish a worst-case scenario and then add a percentage. Vibration Only Test. You've gotten through the easy tests, heat and cold. You have monitored the test and logged the data. You have made any changes to the product that you feel are neces- sary. Now you are ready for vibration testing. Once again you start with the temperature at laboratory ambient to make sure that temperature is not going to affect the test. It may not seem like a real world situation, but right now we are concentrating purely on the vibration. The standard way of measuring a vibration test is with the g level. But where do we measure the vibration? If you measure at the bottom of the vibration table, you are really measuring the vibration of the table, not your product. The best placement for the accelerometer is on or near your product. Some prefer to attach the accelerometer to the fixture holding the prod- uct in place, a perfectly acceptable approach. If there is no easy way to affix it to the product or fixture, then you mount it near the fixture on the tabletop. This position will give at least a close approximation of the vibration that the product is experiencing. Vibration, as opposed to temperature, must be handled in very small increments. It can be difficult to control very tightly, so we suggest starting at 2 g and moving up 2 g at a time. Choose the dwell time based on your knowledge of the product. There should be an ample settling time (ten minutes is often used). Continue stepping up until you see what you consider to be a failure, again monitoring and data logging as you go. Figure 3 depicts a typical Vibration Only Test. Figure 3. A Typical Vibration Only Test Laboratory Ambient Lower Temperature 10°C every 10 minutes until failure. Ramp rate as fast as possible. Time in Minutes Temp (°C) 30 20 10 -50 0 -10 -20 -30 -40 Laboratory Ambient Raise Temperature 10°C every 10 minutes until failure. Ramp rate is as fast as possible. Time in Minutes 80 70 60 0 50 40 30 20 10 90 Temp (°C) Time in Minutes 14 12 10 0 8 6 4 2 16 Vibration Level (g) Add 2 g every 10 minutes until failure. T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r - 2 0 0 3 4 Combinations of Stresses Thermal Swing Test. The next logical test after performing cold, heat, and vibration testing is the Thermal Swing Test, in which the heat test is combined with the cold test. Figure 4 is an example of a Thermal Swing Test. Figure 4. An Example of a Thermal Swing Test Set the chamber to begin at laboratory ambient and allow the prod- uct temperature to stabilize. Decide whether you would rather first go hotter or colder. Humidity can affect this test. Adding nitrogen to the chamber air will help to get rid of any latent humidity in the product. Decide on how to ramp the temperature, typically either 5 or 10°C at a time, and select a dwell time. As shown in Table 2, each step should get wider. Assume that you choose to ramp with a 5°C increment and your starting tem- perature is 25°C. You've chosen 5-minute dwell times, and decide to go hot first. Each ramp in the opposite direction is increased by another 5°C. Table 2. Thermal Swing Test Temperature Increments It is not unusual for a product to fail at extremely low tempera- tures and then begin working once it warms up. If you find a failure during the cold portion of this test, proceed to the next hot step and determine if the product again works. There are valid reasons for doing the Thermal Swing Test. The difference between thermal coefficients can cause parts to pull away from each other, sometimes causing cracks. By applying temperature changes as fast as possible, we stress the product beyond what it would normally see. Some people claim that a temperature change of over 30°C per minute never occurs in most circumstances, but consider the following example. The author lives in Michigan, which, although not the coldest winter state, typically gets a wind chill of -40°C at least once every winter. Assume my car gets stuck in the snow within a mile of a customer's office. I decide to walk. My cell phone temperature drops from 25°C to near -40°C as I make a call to tell the customer of the circumstances. I pull out my personal digital assistant (PDA) to get the customer's number. I arrive at the customer's office, shivering, ask if we can turn the heat up, and try to get my mind on business. The cell phone and PDA warm up to 25°C. By this time, the electronics and I have been thoroughly stressed! Undoubtedly, you can think of other severe circumstances to which we subject electronics products. Heat with Vibration Test. It is time to test your product com- bining heat and vibration. What should be the vibration level? Assume that your product had a failure at 10 g's in the previous Vibration Only Test. Since you already know that this is the breaking point, you need not begin with that high a level. Most of the engineers with whom the author has worked choose a level that is about 80% of the breaking point. In this case, that would mean 8 g's. You are trying to learn what will happen when you combine thermal factors with the vibration. What is the best way to do a combined heat and vibration test? Again, you must rely on your expertise and knowledge of the product. It is best to follow closely the temperatures used in the original Heat Step Test, again starting at laboratory ambient. That will give you a guide as to where you would expect a heat- related failure. If the temperature was extremely high, you may want to skip some of the lower temperatures. Assume that your product didn't show a failure until 110°C ­ you may want to start at 60°C or so and start your steps from there. The example in Figure 5 shows 10-minute dwells, with vibration being run (at lower than failure level) for two minutes out of every five. Pulsing or constant running can cause failures. You must decide the best way to run the test. Figure 5. Example of Heat With Vibration Test At this point, comparison gets interesting. The typical result will be that the product will fail at a slightly lower temperature once vibration is added than it will using heat alone. There are always exceptions to every rule. This particular test should tell you a lot Time in Minutes 80 60 40 -60 20 0 -20 -40 100 Temp (°C) Temperature Swings gradually growing larger until failure is found. Ramp rate as fast as possible. Laboratory Ambient Step No. Temperature (°C) Difference (°C) 1 25 None 2 30 5 3 20 10 4 35 15 5 15 20 Continue testing until failure Time in Minutes 80 70 60 0 50 40 20 10 30 Temp (°C) Step up heat 5 or 10°C every 10 minutes while applying vibration 2 minutes out of every 5. 90 Thermal Vibration T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r - 2 0 0 3 5 about how your product will survive a combination, which is what it will no doubt see in real life usage. Thermal Swings with Vibration Test. Using the data that you learned from your heat plus vibration test, apply the same princi- ples to the next test, in which we combine thermal swings with vibration. Again, you may want to skip ahead a number of steps if your failure occurred at a temperature far from laboratory ambient. Figure 6 shows a typical Thermal Swings with Vibration Test. Figure 6. Example of Thermal Swings With Vibration Test The value of the Thermal Swings with Vibration Test was shown in a case described in a machine design magazine article. The article discussed how some companies are using extremely cold temperatures to "de-stress" equipment that will go into a high vibration area. Applying the principles found in this article, the author ran a few experiments with totally different types of prod- ucts. In each case, by combining vibration with cold tempera- tures, the products could actually withstand more vibration than they could at ambient temperatures. You may see this same result when performing the Thermal Swings with Vibration Test. You may want to add a cold test with vibration if your product is expected to survive low temperatures. Now What? You've got all of these numbers and failure times. Now what do you do? If your product has lived up to all the conditions it will experience in real life, plus a reasonable margin, you may not want to make any design changes. If testing does not cause failure, you have not necessarily failed in designing the tests. It most likely means that you already have an extremely tough product ready for market. If you do find a failure, that is a good thing. HALT is a learning tool. You can learn the product's weaknesses quickly, characteristics that might otherwise have taken months, and a lot of warranty replace- ments, to discover. No Magic Bullet If you use HALT testing, will you somehow be able to tell exact- ly how long the life of the product will be when put out on the market? Probably not. Unless you already have field failures on the same product, or one almost identical, it can be very hard to make a true time correlation. If you have field failures and can reproduce them in the lab under certain stresses and a given amount of time, the answer is yes. In that case, you can make a good correlation. But HALT is not intended to provide a meas- ure of reliability, only a lower bound. Conventional Accelerated Life Testing is better for testing at accelerated (but not highly accelerated) levels and then correlating the reliability measured at the accelerated levels to "normal" operating levels. Is a rapid thermal cycling chamber with tri-axial vibration the only test equipment you will ever need again? No. Mechanics have hammers, screwdrivers, and wrenches in their toolbox. They can't use one tool to fix everything. The same goes for test equipment. HALT is an extremely valuable tool but should not be considered your only one. Other testing, and test equipment and facilities, are needed as part of a comprehensive test program. One of the best industry changes that the author has seen is the willingness of engineers to share more non-proprietary informa- tion. If you want to learn more about HALT, look into groups like the IEEE AST (Accelerated Stress Test) group, take a semi- nar, and look into user groups. When people share more infor- mation, we don't all have to start at square 1. Look to people you know you can trust. Final Note The most important thing to remember in doing HALT testing is to rely upon your own knowledge of the product. Preplan, know- ing the end use environment. For materials, know your melting points. Keep an open mind as you test, remembering that a prod- uct failure is not the same as a personal failure. Work with design engineers, project engineers, production engineers, man- agement, and anyone else who can give good input. The author believes that each of us ­ engineers, manufacturers, scientists, and technicians ­ can make a difference and help the world to be a little bit better place to live in. HALT can help you do your job better than ever before, turning out a safer, less expensive, and more reliable product for your customers. For More Information 1. Chan, H. Anthony and Paul J. Englert, Editors, Accelerated Stress Testing Handbook: Guide for Achieving Quality Products, John Wiley & Sons, 2001. 2. Hobbs, Gregg K., Accelerated Reliability Engineering: HALT and HASS, John Wiley & Sons, 2000. 3. IEEE AST Proceedings, 2002 (available through author). 4. McLean, Harry W., Halt, Hass, and Hasa Explained: Accelerated Reliability Techniques, ASQ Quality Press, 2000. About the Author Chris Hanse is a partner with her husband in C. Hanse Industries Inc. (CHI), which provides training and consulting in manufac- Time in Minutes 80 60 40 -60 20 0 -20 -40 Temp (°C) 100 Laboratory Ambient Thermal Vibration Thermal swing test adding minutes of vibration for 2 minutes of test T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r T h i r d Q u a r t e r - 2 0 0 3 6 turing. The company is dedicated to the testing community. They help identify reliable equipment and provide the training to use it, and are an accessible source of test consulting. CHI makes HALT/HASS style chambers and chambers for thermal, humidity, dust, and water testing. CHI's web site is at . Chris is the winner of the 2002 IEST Climatics Award. She is a Visiting Professor at BUAA (Beijing University of Avionics and Aeronautics) and a worldwide teacher of HALT methods. She is a member of IEST, IEEE (AST chair), the RMS Partnership (Reliability, Maintainability, and Sustainability), and IEC, Technical Committees (TC) 56 and 104, and liaison to ISO TC 108. She can be contacted at: C. Hanse Industries 235 Hubbard Street Allegan, MI 49010 Tel: (616) 673-8638 Fax: (616) 673-8632 E-mail: Introduction Since 1990, the Department of Defense (DoD) has reduced its budget by 29%, a reduction that has greatly impacted weapon system acquisition and in-service support [Cordesman 2000]. Reduced budgets have forced the military branches to extend the life of legacy systems with significant reductions in acquisition of replacement systems. In addition, current weapon systems are faced with rising operations and maintenance ("sustainment") costs due to: · Increased operational tempo. · Increased mean time between maintenance cycles due to increased operational requirements. · Increased life extension of existing weapon systems due to delays in new system acquisition. · Unforeseen support problems associated with aging weapons systems. · Material shortages due to diminishing manufacturing resources and technological obsolescence. As sustainment costs increase, less funding is available to pro- cure replacement systems. An analysis conducted by DoD [Gansler 1999] concluded that, unless mission requirements and the operational tempo are reduced, or the budget significantly increases, the operational maintenance cost portions of the budg- et will equal the total current (net present value) budgets by the year 2024. This chain of events has been characterized as the DoD Death Spiral and is illustrated in Figure 1. To waive off this death spiral, DoD must find innovative solutions to support legacy systems, solutions that are both cost effective and flexible. The DoD must economically manage these system life cycles to address obsolescence and modernization issues without degrading readiness, cost, and performance objectives. A Lean Sustainment Enterprise Model for Military Systems To achieve a truly lean approach, some organizational structures within the current military system must be integrated. The authors propose a new Lean Sustainment Enterprise Model (LSEM) that Figure 1. DoD Death Spiral (Source: Dr. Gansler, USD(A&T), Acquisition Reform Update, January 1999) calls for consolidating and integrating the following sustainment functions: In-Service Engineering, Integrated Logistic Support, Intermediate/Depot Maintenance, Operational Support, and Supply Support. This realignment of the military sustainment sys- tem mirrors a commercial Maintenance Repair and Overhaul (MRO) operation. The goal is to achieve significant customer service levels while reducing total ownership costs. The new organizational framework allows close coordination between the operational community and the supporting sustainment network required to meet evolving life cycle support requirements. The proposed enterprise model is illustrated in Figure 2. The key attribute of this framework is that it is organized around three primary sustainment structures: Operational Sustainment, Sustainment Engineering, and MRO operations. These three structures are consolidated into one Life Cycle Support Facility, shown in the center. The Supply Chain that feeds this new Facility is illustrated in Figure 2 to the right of the Facility, and the Operational (O) Level and Intermediate (I) Level Maintenance activities, which benefit from the Facility, are illustrated on the left (as the Operational Support function). Within the Life Cycle Support Facility, there exist the tradition- al integrated logistic support (ILS) functions, such as training, packaging, handling, shipping and transportation, and the com- puter resources (CR), among others. These functions are now part of what the authors call the first structure, the Operational Military Systems Sustainability: A Lean Model By: Mario F. Agripino, Tim P. Cathcart, and Dennis F.X. Mathaisel Deferred Modernization Aging Weapon Systems Increased Operations Tempo Reduced Readiness Increased Maintenance Increased O&S Costs Funding Migration from Procurement to O&S