|
|

Designing for Reliability
About the RIAC Blueprints
The RIAC "Blueprints for Product Reliability" are a series of documents published by the Reliability Information Analysis Center (RIAC) to provide insight into, and guidance in applying, sound reliability practices. The RIAC is the Information Analysis Center chartered to be a centralized source of data, information and expertise in the subjects of reliability, maintainability and quality. While sponsored by the US Department of Defense (DoD), RIAC's charter addresses both military and commercial communities with the requirement to disseminate guidance information in these subjects. The Blueprints serve to provide information on those approaches to planning and implementing effective reliability programs based on experience, lessons learned, and state-of-the-art techniques. To make the Blueprints as useful as possible, the approaches and procedures are based on the best practices used by commercial industry and on the concepts documented in many of the now-rescinded military standards. The tree shown in Figure 1 depicts the Blueprints that make up the series (the shaded second tier box indicates this Blueprint).
Figure 1. RIAC Blueprints for Product Reliability (Click to Zoom)
In the government sector, and in particular the DoD, significant changes have been made regarding the acquisition of new products. Previously, by imposing standards and specifications, a DoD customer would require contractors to use certain analytical tools and methods, perform specific tests in a prescribed manner, use components from an approved list, and so forth Current policy emphasizes the use of commercial technology as well as specifying "performance-based" requirements only, with suppliers left to determine how to best achieve them.
Users of the RIAC Blueprints
The Blueprints are designed for use in both the government and private sectors. They address products ranging from completely new commercial consumer products to highly specialized military systems. The documents are written in a style that is easy to understand and implement whether the reader is a manager, design engineer or reliability specialist. In keeping with the new philosophy of the DoD, which is now similar to that of the private sector, the Blueprints do not provide a cookbook of reliability tasks that should be applied in every situation. Instead, some general principles are cited as the underpinnings of a sound reliability program. Then, many of the tasks and activities that support each principle are highlighted in detail sufficient for the user to determine if a task or activity is appropriate to his or her situation.
SECTION ONE - INTRODUCTION
Proven design approaches are an important factor in any product reliability success. In the absence of customer-imposed "how-to" requirements and documentation, it is in the best interests of suppliers to understand and implement sound reliability design processes to satisfy customers in a competitive marketplace. A well thought-out approach will result in a product that has high reliability performance for the customer and should retain a reliability based competitive advantage. The topics within this section emphasize the premise that inherent reliability performance begins at the initial product design stages, with activities such as the selection of parts and control of the environment, and continues through subsequent build and test phases.
SECTION TWO - BASIC RELIABILITY DESIGN CONSIDERATIONS
This section addresses the key issues that should be considered for implementing good reliability design practices.
2.1 Product Development Phase Terminology
The choice of reliability tasks to be considered for a particular product design are a function of the challenge to the state of the technology, purpose of the overall effort, environmental characteristics, repair or service needs, safety considerations, and funding and schedule constraints. Table 1 summarizes the basic guidelines for the scope of reliability design tasks based on broad, generally accepted, program phase categories.
Table 1. Program Phase and Scope of Reliability Tasks
| Program Phase |
Purpose |
Scope of Task |
|
Concept and Planning |
• Study product feasibility
• Consider alternate solutions
• Understand design & operating environment |
• Trade-off analysis for critical items
• Customer needs refined
• Part selection alternatives evaluated
• Environmental aspects determined |
|
Design and Development |
• Define approaches & solutions for producing a product
• Develop models or prototypes
• Validate through test, analysis or simulation |
• Integration of design & application guides
• Evaluation of design progress through analyses and/or tests
• Construction of product evaluation processes |
|
Production and Manufacturing |
• Maintain inherent product reliability |
• Implement process control and quality assurance procedures
• Operating & maintenance manuals refined |
As can be seen in Figure 2, sixty-six percent of a product's total life cycle cost may be locked in by decisions made in the Concept/Planning phase, emphasizing the need for effective, up-front reliability design.
Figure 2. Life Cycle Cost Impact (Click to Zoom)
2.2 Reliability Design Oriented Tasks
Section Three of this Blueprint provides insight into the reliability design tasks that may be appropriate for different product development situations, at different phases in the product life cycle. Table 2 represents those tasks (historically classified as design, analysis and test) that have been proven to have a positive influence on inherent product design reliability when properly tailored to add value for the customer.
Table 2. Reliability Tasks
| Type of Activity |
Tasks and Description |
Section |
D
E S I G N |
Critical Item Identification. Cataloging items that have relatively high impact in determining product reliability. Can include hardware and software. |
3.2 |
| Derating. Limiting the maximum allowable stresses on a part to a designated value below its rated maximum stress in order to improve its reliability. |
3.3 |
| Design Reviews. Formal or informal independent evaluation and critique of a design to identify and correct hardware or software deficiencies. |
3.4 |
| Environmental Characterization. Determination of the operational and environmental stresses the product can be expected to experience. |
3.5 |
| Fault Tolerance. Designing alternate means to continue operation when components of a product fail. |
3.6 |
| Parts Application. Using parts under design rules intended to assure that they will operate reliably under the expected operational and environmental stresses. |
3.7 |
| Parts Selection. Choosing parts that will be effective and reliable in the planned application and which should be available at reasonable cost during the product’s life. |
3.8 |
| Thermal Design. Consideration of heat generation and dissipation in the product in order to prevent reliability problems caused by the effects of temperature. |
3.9 |
|
A
N A L Y S I S |
Allocations. Translation of product reliability goals into reliability goals for the components making up the product. |
3.10 |
| Design of Experiments (DOE). Systematically determining the impact of process and environmental factors on a desired product characteristic, in order to optimize and control the design. |
3.11 |
| Dormancy Analysis. Determination of the effects of expected periods of storage or other nonoperating conditions on the reliability of the product. |
3.12 |
| Durability Assessment. Determination of whether or not the mechanical strength of a product will remain adequate for its expected life. |
3.13 |
| Failure Modes, Effects & Criticality Analysis (FMECA). Systematically determining the effects of part or software failures on the product’s ability to perform its function. This task includes FMEA. |
3.14 |
| Failure Reporting Analysis & Corrective Action System (FRACAS). A closed-loop system of data collection, analysis and dissemination to identify and correct failures of a product or process. |
3.15 |
| Fault Tree Analysis (FTA). Using inductive logic to determine the possible causes of a defined undesired operational result. |
3.16 |
| Finite Element Analysis (FEA). Determining the mechanical stresses present in products through simulation by decomposing the product into simple elements. |
3.17 |
| Life Cycle Planning. Determining reliability (and other) requirements by considering the impact over the expected useful life of the product. |
3.1 |
| Modeling & Simulation. Creation of a representation, usually graphical or mathematical, for the expected reliability of a product, and validating the selected model through simulation. |
3.18 |
| Parts Obsolescence. Analysis of the likelihood that changes in technology and market demand will make the use of a currently available part undesirable. |
3.19 |
| Predictions. Estimation of reliability from available design, analysis or test data, or data from similar products. |
3.20 |
| Repair Strategies. Determination of the most appropriate or cost effective procedures for restoring operation after a product fails. |
3.21 |
| Sneak Circuit Analysis (SCA). Investigation to discover the existence of unintended signal paths in a product. |
3.22 |
| Worst Case Circuit Analysis (WCCA). Analysis of the effects of variability in the components of a product on the product’s performance. |
3.23 |
|
T
E S T |
Accelerated Life Test. Testing at high stress levels over compressed time periods to draw conclusions about the reliability of a product under expected operating conditions, based on formulated correlation factors. |
3.25 |
| Test Strategy. Determination of the most cost effective mix of tests for a product. |
3.24 |
2.3 Priority Versus Phase of the Program
Reliability design tasks that should be considered during a product design are included in Table 3 as design tasks. A listing of priority by phase of the program is included to give an indication of the relative importance of early planning of each task. The tasks listed as essential should be considered as high priority for most new or redesigned products. The recommended or suggested activities should be considered if critical performance is required or stringent safety regulations are imposed on the product. The remaining activities are on an as needed basis depending on the nature of the product. For example, a product that is being considered for manned space flight might need an extra level of assurance by eliminating sneak circuits and marginal electrical parameters, resulting in the need for sneak and worst case circuit analysis tasks.
Table 3. Priority Versus Phase
| Design Tasks |
Phase - Concept/ Planning |
Phase - Design/ Development |
|
Accelerated Testing |
Suggested |
Suggested |
|
Allocations |
Suggested |
Recommended |
|
Critical Items |
Essential |
Recommended |
|
Derating |
Recommended |
Essential |
|
Design of Experiments |
Suggested |
Suggested |
|
Design Reviews |
Recommended |
Essential |
|
Dormancy Analysis |
|
Suggested |
|
Durability Assessment |
|
|
|
Environmental Characterization |
Essential |
Recommended |
|
Failure Mode Analysis |
|
Recommended |
|
Failure Reporting |
|
Essential |
|
Fault Tolerance |
Suggested |
Suggested |
|
Fault Tree Analysis |
|
Suggested |
|
Finite Element Analysis |
|
|
|
Life Cycle Planning |
|
|
|
Modeling & Simulation |
Essential |
Suggested |
|
Part Application |
|
Essential |
|
Part Obsolescence |
Suggested |
Suggested |
|
Part Selection |
Essential |
Suggested |
|
Predictions |
Suggested |
Recommended |
|
Repair Strategy |
Recommended |
Suggested |
|
Sneak Circuit Analysis |
|
|
|
Test Strategy |
Suggested |
Recommended |
|
Thermal Design |
Recommended |
Essential |
|
Worst Case Circuit Analysis |
|
|
2.4 Advantages Versus Disadvantages
To help in the selection of tasks, the major advantages and disadvantages of each are listed in Table 4. The impact of various factors can either add or detract from the advantages/disadvantages of the task. For example, a very complex computer system will probably need fault tolerance if long operating times without interruption are expected. So the advantage of fault tolerance in overall product reliability is increased in importance.
Table 4. Advantages Versus Disadvantages
| Tasks | Advantages | Disadvantages |
| Accelerated Testing |
Reveal component deficiencies in less test time |
Restricted by time and component availability |
| Allocations |
Establishes design parameters and goals for components |
May restrict use of some components |
| Critical Items |
Reduces risk of new technology |
New technologies may be excluded |
| Derating |
Safety design margins and low cost to apply Size and weight may be compromised |
| Design of Experiments |
Statistical study of response variables to select best design approach |
Results may not be compatible with manufacturing |
| Design Reviews |
Determines progress of meeting needs |
Takes time and money to perform |
| Dormancy Analysis |
Identifies storage and non-operating failure modes/conditions |
Possible high cost involved |
| Durability Assessment |
Measures life before wearout, identifies problems |
High cost to perform for many items |
| Environmental Characterization |
Establishes an operating envelope for design |
May add extra features to achieve needs (potential overdesign) |
| Failure Mode, and Effects Analysis |
Identifies system critical paths and failure modes |
Moderate to high cost to perform |
| Failure Reporting and Corrective Action |
Identification of design & manufacturing problems |
Takes a dedicated program to get results |
| Fault Tolerance |
Eliminates or reduces operational failures |
Each extra item adds cost, size & weight |
| Fault Tree Analysis |
Identifies safety related events for possible redesign |
Moderate to high cost to perform |
| Finite Element Analysis |
Stress response for new or unique components |
Very high cost to perform for many items |
| Life Cycle Planning |
Total program cost may be significantly reduced |
Non value-added tasks may be required |
| Modeling & Simulation |
Benchmarks design alternatives |
May add unneeded extra features (overdesign) |
| Part Application |
Standard process procedures reduce variability |
Limits manufacturing flexibility |
| Part Obsolescence |
Parts from multiple sources can be chosen |
Limits the technologies available |
| Part Selection |
Known quality standards available |
Limits technologies, process difficulties |
| Predictions |
Provides feedback in limiting high stress areas |
Takes time and money to perform |
| Repair Strategy |
Establishes design parameters for test and replacement |
May add extra features or operating constraints |
| Sneak Circuit Analysis |
Detects hidden failures or unwanted conditions in electronic circuits |
High cost to perform manually, lower if automated |
| Test Strategy |
Avoids duplication & schedule problems |
May not be necessary for many programs |
| Thermal Design |
Limits high stress failure prone conditions |
May add costly heat removal techniques |
| Worst Case Circuit Analysis |
Accounts for circuit or component variability in product performance |
High cost to perform for many items |
2.5 Tailoring Guidelines
Each product has its own unique needs, so tailoring the reliability tasks to fit these needs is important to keep cost and schedule within product development constraints. The guidance provided in Figure 3 is not program specific, as the data was obtained from a survey of a number of commercial companies and Air Force contracts. As can be seen from the figure, a failure reporting and corrective action system (FRACAS) and design reviews are the two most popular tasks. Differences can be seen in commercial and military practices for failure modes, effects and criticality analysis (FMECA) and thermal analysis, which are probably due to the complexity of the products and the operating environment.
Figure 3. Most Important Reliability Tasks According to Survey (Click to Zoom)
Some general task benefit effectiveness information has been formulated as a guide and is shown in Table 5 to aid in the process of selecting tasks for a given program. The table shows task versus effectiveness ratings for four conditions at two extremes. For each task, a benefit rating of 0 to 5 was used, with 5 having the most benefit. To determine the more beneficial tasks, add the rating for each task and condition. For example, if a product is relatively simple, is used in a non-critical benign environment and is manufactured on a limited production run, the tasks that should be considered are: Part Selection (rating-6), Environmental Characterization (rating-5), Part Application (rating-4), and Failure Reporting (rating-4). The first two tasks are considered most beneficial for these conditions. If a severe environment is expected for the same product, then Part Selection (rating-10), Environmental Characterization (rating-9), Part Application (rating-8), Failure Reporting (rating-8) and Thermal Design (rating-5), would be the most beneficial. Other tasks that could be considered would include: Derating (rating-4) and Critical Items (rating-3). Each product development should take the template results as a starting point which must be modified by the operating modes, customer needs, and business environment.
Table 5. Reliability Task Benefits
| Design Tasks | Complexity (Simple/ Complex) | Criticality (Non-Critical/ Critical) |
Production Quantity (Few/Many) | End-Use Enviroment (Benign/Severe) |
| Accelerated Testing |
0/0
|
0/1
|
0/1
|
0/1
|
| Allocations |
0/3
|
0/3
|
0/1
|
1/2
|
| Critical Items |
0/3
|
0/5
|
0/2
|
0/3
|
| Derating |
1/3
|
0/3
|
0/0
|
0/3
|
| Design of Experiments |
0/0
|
0/1
|
0/0
|
0/1
|
| Design Reviews |
1/3
|
1/4
|
1/2
|
1/2
|
| Dormancy Analysis |
0/0
|
0/1
|
0/0
|
0/2
|
| Durability Assessment |
0/0
|
0/1
|
0/0
|
0/1
|
| Environmental Characterization |
2/4
|
2/5
|
0/0
|
0/1
|
| Failure Mode Analysis |
0/4
|
0/5
|
0/2
|
0/2
|
| Failure Reporting |
1/3
|
1/5
|
1/5
|
1/5
|
| Fault Tolerance |
0/1
|
0/3
|
0/0
|
0/1
|
| Fault Tree Analysis |
0/1
|
0/2
|
0/0
|
0/1
|
| Finite Element Analysis |
0/0
|
0/1
|
0/0
|
0/1
|
| Life Cycle Planning |
0/1
|
0/1
|
0/1
|
0/1
|
| Modeling & Simulation |
1/3
|
0/3
|
0/1
|
1/2
|
| Part Application |
1/4
|
1/5
|
1/4
|
1/5
|
| Part Obsolescence |
0/1
|
0/1
|
0/0
|
0/0
|
| Part Selection |
2/5
|
1/5
|
2/5
|
1/5
|
| Predictions |
0/3
|
0/3
|
0/1
|
0/2
|
| Repair Strategy |
0/1
|
0/1
|
0/2
|
0/2
|
| Sneak Circuit Analysis |
0/1
|
0/2
|
0/0
|
0/2
|
| Test Strategy |
0/1
|
0/2
|
0/1
|
0/2
|
| Thermal Design |
0/4
|
0/4
|
0/0
|
0/5
|
| Worst Case Circuit Analysis |
0/0
|
0/2
|
0/0
|
0/2
|
SECTION THREE - RELIABILITY DESIGN TASKS
3.1 Life Cycle Planning
3.1.1 Purpose. Basic constraints on design practices are design life and operational and environmental profiles. Life cycle planning considers what design approaches should be used to ensure that a product will perform reliably throughout its useful life, given the expected characteristics of its end-use environment (operation, storage, shipping, handling, etc.).
3.1.2 Benefits. Life cycle planning provides an assessment of the "big picture" in determining how to most effectively (reliable performance over the life of the product) and most efficiently (minimize product cost) meet the long-term needs of the customer. Thorough life cycle planning means that product designers are aware of the imposed constraints (performance, reliability, cost and schedule) and will use only those value-added design approaches which will meet those constraints.
3.1.3 Timing. Product life cycle characteristics need to be defined early in the Concept/Planning phase. Preferred design approaches cannot be selected, however, until the customer needs (or requirements) are thoroughly understood (through techniques such as market surveys or Quality Function Deployment (QFD)), and the expected end-use operational and environmental profiles have been analytically or quantitatively measured.
3.1.4 Application Guidelines. In designing for reliability, life planning activities should include the selection and analysis of materials, parts, components and software (and their respective suppliers) that will meet product life requirements. Tasks that can directly impact this aspect of product hardware design, either through direct selection or trade studies, include:
- Environmental characterization
- Durability assessment
- Thermal design (high level)
|
- Design of experiments (DOE)
- Dormancy analysis
- Accelerated testing
|
Effective and appropriate application of these tasks will result in (1) a realistic assessment of the conditions under which the product is expected to operate and (2) a means of evaluating materials, parts and components as being suitable to withstand the rigors of the end-use environment. Establishing derating policies, fault tolerance approaches, and critical item definitions will also help define which materials, parts, components and software should be selected to ensure satisfactory design reliability. Once the design approach has been selected, life cycle planning can be extended to include those tasks which will assess progress towards meeting the design reliability requirements, measure the level of achieved inherent reliability and ensure that the inherent reliability of the product is not degraded through subsequent production/manufacturing processes and customer use.
3.2 Critical Item Identification and Control
3.2.1 Purpose. The purpose of a critical item identification and control procedure is to record, analyze, plan and limit the negative reliability impact of using highly complex, advanced state-of-the-art parts and techniques in new or modified product designs.
3.2.2 Benefits. Identifying and controlling critical items is imperative since these parts are often the parts that drive unreliability. The benefits of implementing a critical item control process include:
- Reduction of limited life items (components that wear out before normal end of life) in the product design
- Development of application guides to reduce electronic circuit sensitivity
- Development of competing sources for parts and components
- Use of safety risk prevention design guides
- Development of special tests to assess ability of critical parts to meet design constraints
3.2.3 Timing. A critical item identification process should be implemented in the Concept/Planning phase, because this is the exact time that trade-offs in component technologies, part sources, and process techniques can be accomplished with minimum impact on design and production costs and schedule. Waiting until the Design/Development or Production/Manufacturing phases will likely result in excessive costs, assuming that actions to limit the impact of critical items on the product design are undertaken.
3.2.4 Application Guidelines. Reliability critical items are those items that have a significant impact on product reliability, performance, safety, availability or life cycle cost. Critical items often include high cost components, new technology, limited life items, reliability sensitive parts, single source or custom components and single failure points (failures that cause total loss of product operation).
The general approach to performing a critical item identification and control process involves three steps. Step one is the identification process. Critical items can be identified by a number of techniques, such as those described in Table 6.
Table 6. Critical Item Identification Techniques
| Technique | Advantage | Disadvantage |
| Reliability Prediction |
Early analysis, considers all components, quick |
Not specific, may overlook some parts |
| Failure Mode and Effects Analysis |
Complete, considers safety and single point failures |
Significant cost, difficult to perform in early phases |
| Fault Tree Analysis |
Alternative to FMEA, considers safety |
May overlook some parts, significant cost |
| Packaging, Handling Assessment |
Covers unusual failures, i.e., electrostatic discharge, overstress |
Limited scope, not all items covered |
| Historical Data |
Identifies new and old problem technologies |
Limited in ability to address new technology |
For step 2, some type of analysis or testing may be needed to confirm the problem area or to control the application of the item. Reliability critical items are prime candidates for detailed analysis such as worst case and part stress derating analyses. These analyses can identify misapplication of components or design flaws. Testing can indicate problem stress levels under specific operating conditions over time.
Control of the critical item is the third step. This is usually done through design reviews, documentation of design decisions and tracking the performance of critical items through the product development cycle. A systematic and comprehensive method is needed from start to finish if critical items are to be effectively managed.
A typical critical item output report is shown in Figure 4. To ensure that reliability critical components were identified and controlled, reliability predictions and an FMEA were performed. Critical items were identified via the failure rates noted in the prediction and by the single failure point in the FMEA. A potential problem with a single-source vendor was identified through an evaluation of historical experience.
System/Subsystem
Radio |
Nomenclature/Identification #
Hi-Power/XYZ |
Date: 1/1/99
Prepared By: J. Doe |
Indenture Level
Amplifier |
Reference Drawing
# XYZ-1 |
Operating Condition
Automotive |
Identification
# |
Critical
Item |
Identification
Technique |
Control/Corrective
Actions |
Initiate
Date |
Remarks |
| 1 - XYZ-1 |
Power
transistor |
• Failure mode;
single failure
item
• Prediction; high
failure rate |
• Redundant
devices
• Heat sink
transistors |
xx/xx
xx/xx |
Multi-path
reliability
Lower temp;
less failures |
| 2 - XYZ-1 |
Speaker |
• Historical data |
• Evaluate single
source vendor
quality
• Develop second
source |
xx/xx
TBD |
Reduce initial
defects
Competition |
Figure 4. Critical Item Form
3.3 Derating
3.3.1 Purpose. The purpose of derating is to enhance the item inherent design reliability, increase safety margins and reduce repair and replacement costs. This enhancement is accomplished by compensating for many variables inherent in any design, some of which include:
- Manufacturing tolerances
- Performance anomalies
- Component variation
|
- Parameter drift
- Material differences
|
3.3.2 Benefits. From an electronic component application, the benefits include lower failure rates through reduced stresses, less impact from material and manufacturing variability, proper circuit operation with part parameter changes and reduction in end of life failures. For mechanical and structural components, a reduction in stress or increase in strength means a greater factor of safety from catastrophic failure.
3.3.3 Timing. Derating principles should be defined in the Concept/Planning phase of the product life cycle and be implemented during the product Design/Development process. This is the time when item strengths or stresses can be anticipated and compensated for. Attempting to derate after designs have been "formalized" will be less effective because there will be less flexibility to change the design without significant added size, weight and cost.
3.3.4 Application Guideline. Achieving high product reliability requires the proper selection and application of electronic, mechanical and structural components. Designing the product so that these components are capable of withstanding all of the stresses that are expected requires the application of sound derating guidelines and practices.
Method for Electronic Parts. Electronic part derating is accomplished by reducing the applied stress from the absolute maximum ratings as defined by the part manufacturer in their specification or data sheet. The absolute maximum ratings usually include, as a minimum, both operating and storage values for temperature; maximum ratings for voltage, current, power; and other stress. For most parts, reliability is a function of both electrical and thermal stresses. Improved reliability is achieved through lower stress conditions. Cost effectiveness must be considered in the selection of derating levels, as excessive derating may result in higher part costs or additional equipment, i.e., cooling fans. Stress factors that should be limited in any electronic derating strategy include those listed in Table 7 for the various component types.
Table 7. Electronic Part Reliability Derating Factors
| Part Type | Derating Parameters |
| Capacitor |
Voltage, Temperature |
| Circuit Breakers |
Current, Load, Temperature |
| Connectors |
Voltage, Current, Insert Temperature |
| Diodes |
Power, Temperature |
| Filters |
Voltage, Current, Temperature |
| Fiber Optics |
Bend Radius, Tension |
| Fuses |
Current, Voltage |
| Inductors |
Voltage, Current, Temperature |
| Lamps |
Voltage |
| Microcircuits |
Voltage, Current, Temperature |
| Relays |
Current, Load, Temperature |
| Resistors |
Voltage, Power, Temperature |
| Switches |
Current, Load, Temperature |
| Transistors |
Voltage, Power, Temperature |
| Tubes |
Power, Duty Cycle |
Method for Mechanical Parts and Structures. For these components, failure versus stress data may be available from the material/component supplier or manufacturer and used as the design guide. When there is a time dependency for the failure distribution (wearout), the stress and strength distributions should be related to the cyclic or time operation. Since failure is not always related to time, the designer also needs techniques for comparing stress and strength. The classical approach has been to select every part with enough strength to handle the worst case stress conditions represented in Figure 5 (A), which will shift the strength curve to the right (Figure 5 (B)), thereby reducing the potential failure region. More recent approaches take into account the probability of "interference" of the stress and strength distributions. Procedures and data for these analyses can be obtained from strength of material reference sources. The purpose of these stress analyses is to improve the reliability of the design by achieving the optimum balance between stress and strength.
Figure 5. Stress Versus Strength Distributions (Click to Zoom)
Example of a Stress Failure Analysis. An example for an electronic component (trimmer ceramic capacitor) is used to show the differences in stress and strength derating. Using Figure 6, the failure rate versus temperature curves are shown for four voltage stress conditions, where stress equals operating voltage divided by rated voltage. Given an operating condition of 60°C and a 90% applied voltage stress, the failure rate of the trimmer capacitor is 0.10 failures per million hours. The designer recognizes that this device may have an unacceptable failure rate. To improve the reliability, the strength of the capacitor could be increased, i.e., choose a higher rated capacitor, so that the stress level would be reduced to 70%, resulting in a lower failure rate. This may add some degree of size/weight penalty to the product. Another way is to reduce the stress, that is, use the same capacitor but increase the cooling so that the temperature is reduced to 20°C. At this point, the capacitor failure rate would be the same as for the strength derating, but the size and weight of the product would remain constant.
Figure 6. Trimmer Ceramic Capacitor Failure Rate/Stress Plot (Click to Zoom)
3.4 Design Reviews
3.4.1 Purpose. Depending on the stage of development, design reviews may be conducted for different reasons. Some of the purposes that should be considered for most reviews are to:
- Ensure that the product design is reliable.
- Assess the product safety margins.
- Evaluate the ease of maintenance and inspection.
- Determine if the product is manufacturable.
- Review the allocation of design requirements and analyze the product for compliance.
- Discuss product transition concerns, i.e., design to production, production to customer use, design to customer use.
- Challenge the design from various viewpoints, i.e., safety, environment, operation, human interface, etc.
- Determine the shortfalls of the product and issues to be resolved.
- Evaluate systems engineering and manufacturing processes and procedures.
3.4.2 Benefits. The primary benefit of an organized design review includes the detailed evaluation of the product to ensure that the design or production process is technically adequate to meet the requirements for performance, cost and quality. When properly performed, the design review will ensure that no unique area of concern has been overlooked, and that lessons learned from previous efforts have been leveraged so that fewer deficiencies will reach the next phase of product development. Finding and solving concerns, errors and design faults through product reviews will result in fewer redesigns, lower production costs and increased life of the product.
3.4.3 Timing. Design reviews should be an on-going process, not a one time occurrence, in order to be effective. Reviews at each stage of product design, development and production should be conducted before progressing to the next phase. Some of the milestones that should be considered as potential design review points are:
- Completion of the customer requirements assessment (actual or derived).
- Completion of the specification and requirement allocation process.
- Completion of the initial design phase.
- Completion of the final design phase.
- Completion of prototype testing.
- Completion of the initial manufacturing phase.
3.4.4 Application Guidelines. Design reviews can be conducted at almost any point within the design process to assess the product design maturity. If concurrent engineering techniques are used, the reviews can become part of an on-going day-by- day process. For those products with scheduled reviews, typical milestones of review and some key characteristics are presented in Figure 7.
Figure 7. Potential Stages for Design Reviews (Click to Zoom)
Formal Reviews. The formal review of product design concepts and documentation for hardware and software can be an important event in any development program. If standard procedures are not explicitly stated by the customer or dictated by internal policy, the approach outlined in Figure 8 should be considered:
Figure 8. Approach to Formal Design Reviews
Table 8 recommends review team participants (which should include actual designers and independent evaluators) and their responsibilities.
Table 8. Design Review Membership
| Member | Responsibilities |
| Product Design Engineer |
Chairs the meeting, issues reports, assigns problems, responsible for closing the loop. Substantiates design decisions, capabilities, tests, costs and schedules. |
| Electrical Engineer |
Confirms the electrical capabilities and limitations of the design, such as overstress, operating restrictions, etc. |
| Mechanical Engineer |
Evaluates design in terms of packaging, environment, handling, strength of material, etc. |
| Software Engineering |
Ensures operational compatibility; evaluates hardware to software interfaces. |
| Manufacturing |
Evaluates design in terms of manufacturing limitations, cost and schedule. |
| Quality Engineering |
Substantiates the quality methods employed and implemented. |
| Reliability Engineering |
Evaluates design for capability versus the customer need. |
| Human Factor Engineering |
Identifies man-machine interface capability and limitations. |
| Customer Representative |
Requests investigations, challenges the design, determines acceptability of design. |
Informal Reviews. These reviews are generally conducted to help the product designer achieve the appropriate degree of design maturity early in the design process. A review of stresses, component failure rates, fault tolerant operation and modeling is performed for the purpose of evaluating and guiding the designer in specific areas of product reliability. These reviews are usually unscheduled and conducted during the Conceptual/Planning phase, or very early in the Design/Development phase, when major product design changes may be considered.
From a reliability perspective, the review should accomplish at least the following:
- Detect conditions that degrade reliability.
- Provide assurance of meeting the customer's reliability needs.
- Assure use of preferred parts and components.
- Assure safety margins are included.
- Assure quality management is integrated into the process.
- Provide stress analysis of components where needed.
- Use fault tolerance or fail soft designs for critical applications.
- Evaluate critical items and control procedures.
Examples of Design Review Checklists. Design review checklists include specific questions that should be considered when a product is being reviewed. Typical checklists for a reliability review during the product Concept/Planning and Design/Development phases are provided as examples in Tables 9 and 10.
Table 9. Concept/Planning Phase Reliability Review Checklist
| Questions | Remarks |
| Product design concept meets minimum customer reliability expectations? |
Reliability modeling, fault tolerance, component selection should be examined. |
| Safety margins are sufficient for operation? |
Standard criteria for safety, fault tolerance, strength of materials should be reviewed. |
| Numerical reliability estimates meet allocated needs? |
Cooling, quality, redundancy, parts count reduction and lower stress levels should be considered. |
| Product can operate in the expected environment? |
Cooling, vibration, shock, packaging, components and stress are all examined. |
| Stress derating strategy for components is defined? |
Derating criteria should be documented. |
| Critical components are identified? |
Define, examine, analyze, and test components for criticality. |
| Limited life items are identified? |
Inspection, handling, testing, and replacement techniques should be considered. |
| Test or operational data is available to ascertain product performance? |
Evaluation technique, failure trends, operating environment should be examined. |
| Trade-off studies have been performed? |
Includes reliability performance, better parts, cooling, power, speed, complexity and others. |
Table 10. Design/Development Phase Reliability Review Checklist
| Questions | Remarks |
| Reliability design goals/objectives at each level achieved? |
Allocations, models, predictions and tests are evaluated. |
| Performance indicators are included in the design? |
Fault flags, software testing, built-in-test parameters need to be estimated. |
| Critical parts are identified? |
Spares, maintenance, operating procedures need to be assessed. |
| Preferred parts and components selected? |
Known capabilities and quality levels are needed. |
| Safety margins are sufficient for each component and subassembly? |
Allocation of standard criteria is performed. |
| Derating of component stress is implemented? |
Standard design levels for better performance considered. |
| Fault tolerance included in product design? |
Fail soft conditions need to be evaluated. |
| Early failure and wearout problems identified? |
Limit conditions, testing, and inspection criteria are defined. |
| Environmental conditions match the component profiles? |
Extra cooling, stress reduction or better components are evaluated. |
| Failure modes for components are identified? |
Failure mode analysis, test and historical data evaluated. |
| Single failure points and their impact on the product have been identified? |
Failure mode criticality analysis needed; identifies areas for redundancy. |
| Software reliability impact has been assessed? |
Code failures, design flaws, specification errors accounted for. |
| Adequate corrosion protection? |
Environment and protection need to be evaluated. |
| Protection devices are included? |
Fuses, circuit breakers, sprinklers need to be considered. |
3.5 Environmental Characterization
3.5.1 Purpose. The purpose of this task is to identify the possible conditions that may be encountered, determine the environmental influence factors and evaluate possible design solutions. Solving the design problem requires knowledge about the environmental conditions, evaluation of material properties and the failure effects caused by the environment.
3.5.2 Benefit. The benefits of environmental characterization include fewer failures and better product operation. When the characteristics of the end-use environmental profile are known and understood, the appropriate components can be selected and protective design measures can be utilized, both of which will prevent over-stressing of individual components.
3.5.3 Timing. This task should be defined during the Concept/Planning phase of the product life cycle, and implemented into the product design at the very first stages of product Design/Development concurrently with power, size, performance, weight and cost constraints. Delays in environmental characterization and compensation for environmental impacts on the design are not recommended, as untimely implementation of corrective action will typically be inefficient in terms of performance, schedule and cost.
3.5.4 Application Guidelines. Designing for reliability requires identification of and control over the possible causes of component or product failure. In general, component failures could be classified as follows:
- Inadequate electrical or mechanical design tolerances
- Incorrect choice of materials
- Insufficient quality procedures or controls
- Deterioration of components due to environmental effects
- Manufacturing or component defects
The choice of materials and the understanding and control of environmental effects are two design attributes that can be improved by a well structured reliability program.
Many approaches to environmental characterization can be used. The following steps provide some general guidelines:
-
Identify the natural and induced environmental characteristics that the product may experience in its life cycle. These characteristics are included for the basic conditions of shipping, storage and operation in Figures 9 and 10. The natural and induced environmental stress factors are noted for a number of possible conditions.
Excerpt from "Figure 9. Shipping and Storage Environmental Stresses" See Full Version
Excerpt from "Figure 10. Operating Environmental Stresses" See Full Version
- Quantify the high and low extreme and rate of change conditions for each of the environmental stresses. These factors can be determined from historical experience data, measurement data or engineering estimates. An example of the use of engineering estimates would be the application of world climatic region temperature data. Data from Table 11 can be used as an estimate of product temperature extremes.
Table 11. World Climatic Region Temperatures
| Region | Operational Temperature | Induced Temperature |
| Hot & Dry |
+32°C to +49°C
|
+33°C to +71°C
|
| Hot |
+30°C to +43°C
|
+30°C to +63°C
|
| Mild Cold |
-6°C to -19°C |
-10°C to -21°C |
| Cold |
-21°C to -27°C |
-25°C to -33°C |
| Severe Cold |
-37°C to -51°C |
-37°C to -51°C |
- Evaluate the expected environmental conditions from the previous steps and compare these conditions with the component field failures to determine the conditions that have the most impact. Figure 11 describes an environmentally caused field failure distribution for an aircraft. From this distribution, the three most important areas for reliability design concern are temperature, vibration and moisture.
Figure 11. Component Field Failure Cause Distribution, Aircraft Environment (Click to Zoom)
- The final step, after determining which stress factors are important, is to implement a design process to eliminate or reduce the impact. An example for reducing high temperature impact could be:
| Stress Factor |
Source |
Expected Effects |
Improvement Techniques |
High
Temperature |
Ambient
Friction
Electronic |
Fatigue or tolerance
Wear-out or structural
failure
Parameter change |
Reduce ambient, add
cooling systems
Better lubricant or reduce
heat sources
Derate components |
Example of a Design Evaluation for Environmental Conditions. Consider the environmental impact on a automotive computer system mounted in the engine compartment. A typical reliability design improvement evaluation table is shown in Table 12.
Table 12. Environmental Stresses and Design Improvement Techniques
| Stress Factors | Expected Effects | Failures Induced | Improvement Techniques |
| High Temperature (32 to 49°C) |
Thermal aging Physical expansion Electrical parameter change |
Fatigue or change in material properties Structural failure or increased wear Tolerances exceeded |
Reduce heat dissipation Cool systems or better thermal properties Design safety margins Derate components |
| Low Temperature (-21 to -24°C) |
Flexibility reduced Ice formation Physical contraction |
Cracking or fracturing Change in electrical or mechanical functions Structural failure or increased wear |
Thermal insulation Protective materials Design safety margins |
| Thermal shock (30°C/30 mins.) |
Mechanical stress |
Structural failure Cracks or fatigue Delamination Ruptured seals |
Strengthen materials Reduce thermal inertia Match thermal coefficient of expansion |
|
Humidity (90 to 95%)
|
Moisture absorption Corrosion Electrolysis Electrical leakage |
Structural weakening Loss of electrical or mechanical properties Conductivity of insulators Performance parameters |
Hermetic sealing Protective coatings Material properties Dehumidifiers Larger tolerances |
| Salt Spray (once per week) |
Corrosion Electrolysis |
Loss of electrical properties Conduction of insulators |
Protective coatings Material properties |
| Sand & Dust (45mph .001 to .01 in diameter) |
Surface abrasion Friction increased Clogging orifices |
Wear-out reduced functions Overstressing |
Air filtering Protective finishes Seals, lubricants Screens, filters |
| Vibration (.001g2Hz) |
Mechanical stress Fatigue Electrical parameter change |
Loss of structural strength Cracks, displacement of materials Interference, loss of signals |
Increase safety margins Stiffening materials Design margins, vibration absorption |
| Rain (1" per hour) |
Water absorption Corrosion |
Structure or component weakening Loss of electrical or mechanical properties |
Protective coatings, sealing Protective coatings, sealing |
| Electromagnetic Radiation |
Spurious electrical signals Interference |
Cause other components to perform erratically Loss of signals, disruption of operation |
Part type selection Shielding material |
3.6 Fault Tolerance
3.6.1 Purpose. A fault tolerant product provides: higher product functional reliability, extra safety margin, non-stop operation, reduced downtime, and higher personnel safety in critical applications (life saving). One problem with fault tolerance is that the extra hardware (weight & size) may cause performance degradation and the life cycle costs of all failures will be higher than in a less complex unit.
3.6.2 Benefits. The general benefits from fault tolerance are:
- Continued, uninterrupted operation in spite of lower level failures.
- Reduced propagation of errors.
- Minimized downtime through continued, though possibly degraded, operation.
- Correction of failures while the product is still operating.
3.6.3 Timing. Fault tolerant design approaches should be used during the early Design/Development phase as part of the product engineering process. Up-front design decisions will reduce the impact of the negative aspects of having extra hardware and functions. Add-on features incorporated at later phases of product development (i.e., Production/Manufacturing) are not cost or schedule efficient.
3.6.4 Application Guideline. In general, all fault tolerant design techniques fall into two broad-based categories: fault masking and fault reaction. For the fault masking category, the design process adds extra features, circuits or equipments as in-line functions. Operation continues until all alternate paths are used up. This may not be an ideal situation, so another approach is to include active fault detection to initiate product functional reconfiguration. Switching to a standby spare unit or alternate mode of operation is an example of dynamic fault reaction.
Figure 12 indicates several basic forms of redundancy techniques available to product designers.
Figure 12. Redundancy Techniques (Click to Zoom)
Examples of Redundancy Techniques. Table 13 illustrates the characteristics and configuration of 3 basic types of redundancy techniques.
Excerpt from "Table 13. Three Basic Redundancy Techniques" See Full Version
3.7 Part Application
3.7.1 Purpose. The purpose of this task is to assure that suitable parts and materials are designed into the product so that it will operate reliably in the customer use environment.
3.7.2 Benefits. Sound part application criteria will result in fewer rejects during product fabrication, less rework and downtime during manufacturing/production, and fewer failures during customer use of the product. Specifically, this task assures that part capabilities match design needs; the design reflects producibility requirements; and that compatibility exists between part and assembly/ manufacturing processes.
3.7.3 Timing. Design philosophy procedures and requirements, including part application criteria, should be planned during the Concept/Planning product life cycle phase and implemented as an integral part of the product Design/ Development phase. Misapplication of parts and materials can be costly, resulting in schedule delays and financial loss.
A study of the dominant environmental effects on part operating parameters is essential to robust circuit design. Additionally, a basic knowledge of semiconductor and component materials is invaluable to the analyst and designer in assessing these environmental effects. Applying this knowledge early in the Design/Development phase minimizes the later occurrence of reliability related problems.
3.7.4 Application Guidelines. Part application criteria are typically included in well defined reliability design processes that are intended to ensure high reliability under worst case actual use conditions. This requires a structured approach during the part selection and product design process. This process should include:
- Definition of operating environments
- Establishment of lifetime requirements
- Use of reliability models to estimate lifetime under use conditions
- Estimates of reliability during the useful life
- Stress derating
- Analysis and design modifications to ensure robustness
Critical product stresses encountered during use, transportation and storage dictate part application and design criteria and philosophy. Methodologies to ensure that products are designed and parts are applied in a robust manner are available and should be implemented, depending on the nature of the product and its application. These include reliability assessment/lifetime analysis, stress analysis, failure mode and effects analysis (FMEA) (with or without criticality analysis), worst case circuit analysis (WCCA), fault tree analysis (FTA), and finite element analysis (FEA). The effort should include participants having experience in the design, component, manufacturing and reliability disciplines.
A concurrent engineering approach should be used to identify external and internal influences that could affect part (and hence, product) performance, manufacturability or reliability. These influences include:
| |
Temperature Extremes
Voltage
Temperature Rate of Change
Vibration
Part Manufacturing Variation |
|
Shock
Part Aging Characteristics
Radiation
Local Changes
Electromagnetic Interference |
Table 14 illustrates the most important and common environmental effects on parts.
Table 14. Part Types vs. Principle Sources of Variation
|
Transistor |
Diode |
Zener Diode |
Digital IC |
Linear IC |
Resistor |
Capacitor |
Inductor |
Relay |
| Temp- erature |
X |
X |
X |
X |
X |
X |
X |
X |
X |
| Aging | X | | | |
| X | X | | X |
| Radiation | X | X | X | X | X
| | | | |
| Vibration / Shock |
| | | | |
X | X | X | X |
| Humidity | | | | | |
X | X | | |
| Life | | | | | |
X | X | | |
| Altitude | | | | | |
X | X | | |
Electrical
Stress |
X | X | X | | |
| X | | |
X: Significantly effected by enviroment
Part Application Derating Guidelines. Part application problems can result from severe product environmental and manufacturing stresses. One solution is to limit the electrical, thermal and mechanical stresses on parts through derating. If a product is expected to operate reliably, one of the contributing factors must be a conservative design approach incorporating realistic derating of parts. Table 15 presents guidelines for derating parameters of some electronic part types. All of the values provided are a percentage of the part manufacturer's rated value, unless otherwise labeled. The term "Not Applicable (NA)" refers to a scenario where maximum levels will not be reached under benign conditions. Two levels of derating are indicated. The benign level is for products operating in conditions such as an office environment. The severe level is for products operating in harsh conditions such as under the hood of an automobile or in high performance aircraft.
Excerpt from "Table 15. Part Derating Levels"
| Part Type |
Derating Parameter |
Environment |
| Severe |
Benign |
| Capacitors |
DC Voltage
Temp from Max Limit |
60%
10°C |
90%
NA |
| Circuit Breakers |
Current |
80% |
80% |
| Connectors |
Voltage
Current
Insert Temp from Max Limit |
70%
70%
25°C |
90%
90%
NA |
| Diodes |
Power Dissipation
Max Junction Temp |
70%
125°C |
90%
NA |
| Fiber Optics |
Bend Radius
Cable Tension |
200%
50% |
200%
50% |
| Fuses |
Current (Maximum Capability) |
50% |
70% |
| Inductors |
Operating Current
Dielectric Voltage
Temp from Hot Spot |
60%
50%
15°C |
90%
90%
NA |
| Injection Laser |
Power Output |
70% |
90% |
| Lamps |
Voltage |
94% |
94% |
| Memories |
Supply Voltage
Output Current
Max Junction Temp |
±5%
80%
125°C |
±5%
90%
NA |
| Microcircuits |
Supply Voltage
Fan Out
Max Junction Temp |
±5%
80%
125°C |
±5%
90%
NA |
| Microcircuits, GaAs |
Max Junction Temp |
135°C |
NA |
| Microprocessors |
Supply Voltage
Fan Out
Max Junction Temp |
±5%
80%
125°C |
±5%
90%
NA |
| Photodiode |
Reverse Voltage
Max Junction Temp |
70%
125°C |
70%
NA |
| Phototransistor |
Max Junction Temp |
125°C |
NA |
| Relays |
Resistive Load Current
Capacitive Load Current
Inductive Load Current
Contact Power |
75%
75%
40%
50% |
90%
90%
50%
60% |
| Resistors |
Power Dissipation
Temp from Max Limit |
50%
30°C |
80%
NA |
| Transistor, Silicon |
Power Dissipation
Breakdown Voltage
Max Junction Temp |
70%
75%
125°C |
90%
90%
NA |
| Transistor, GaAs |
Power Dissipation
Breakdown Voltage
Max Junction Temp |
70%
70%
135°C |
90%
90%
NA |
| Thyristors |
On-State Current
Off-State Voltage
Max Junction Temp |
70%
70%
125°C |
90%
90%
NA |
| Tubes |
Power Output
Power Reflected
Duty Cycle |
80%
50%
75% |
90%
60%
75% |
| Rotating Devices |
Bearing Load
Temp from Max Limit |
90%
15°C |
90%
NA |
| Switches |
Resistive Load Current
Capacitive Load Current
Inductive Load Current
Contact Power |
75%
75%
40%
50% |
90%
90%
50%
60% |
Example of a Process Control Application. Of equal importance in the selection and application of parts is the manufacturing process. Application procedures are needed to not only procure reliable parts and materials, but also to assure that the process steps from shipping to assembly do not harm good components. It is not sufficient to qualify components to a standard qualification procedure, because some assembly processes in use today impose higher stresses than those used historically. A classic example is surface mount technology, which uses soldering processes (i.e., vapor phase, infrared heating) that provide a very fast temperature transition to 220°C, creating a thermal shock which is greater than that used for component testing (see Figure 13).
Figure 13. Vapor Phase Soldering Internal Package Temperature Above 150°C (Click to Zoom)
In order to determine if components will perform reliably after exposure to handling and assembly stresses, a preconditioning procedure emulating the manufacturing processes should be developed and applied. It can be used as a guide to define each test/procedure/operation/material that is used in component handling and fabrication/assembly for establishing appropriate processing requirements. This procedure should emulate all steps from receipt of material through manufacturing. Additional or different preconditioning may be necessary for a specific process. After exposure, these devices should be subjected to appropriate testing to determine if degradation has occurred. For example, the appropriate test for a molded plastic package would be either 85°C/85RH or Highly Accelerated Stress Testing (HAST). For a hermetic device, package seal testing should be part of the test procedure.
3.8 Part Selection
3.8.1 Purpose. The purpose of parts selection is to select parts for a product that will enable a design to meet customer reliability needs, and to control parts to ensure that the inherent reliability design of the product will not be compromised.
3.8.2 Benefits. An important step in meeting product needs is the preparation of a preferred parts list which identifies the types of components recommended for use. A successful parts selection and control program achieves a level of standardization that minimizes the number of new parts introduced into the product and yet is still flexible enough to effectively utilize the advantages offered by new technology. Some consequences of designing a product without adequate parts selection and control processes are listed in Table 16.
Table 16. Adverse Effects of an Inadequate Part Selection/Control Process
- Selection of obsolete (or soon to be) and sole sourced parts and materials
- Possibility of diminishing sources
- Use of unproven or exotic technology
- Incompatibility with the manufacturing process
- Inventory volume expansion and increase in cost
- Supplier quality may be difficult to monitor due to the added number of suppliers
- Loss of "ship-to-stock" or "just-in-time" purchase opportunities
- Limited ability to benefit from volume buys
- Increased cost and schedule delays
- Additional tooling and assembly methods may be required to account for the added variation in part characteristics
- Part reliability can decrease due to the uncertainty and lack of experience with new parts
- Automation efforts may be impeded due to the number of additional part types
|
3.8.3 Timing. The identification of parts and materials that meet product performance, reliability, delivery and cost requirements should take place during the Concept/Planning phase of the product life cycle. Early knowledge and use of preferred components and component packaging allows designers to adhere to sound design rules and utilize existing assembly/manufacturing procedures. This approach will minimize the risk of using immature technologies and unqualified suppliers.
3.8.4 Application Guidelines. It is normal for parts engineers to select, and for designers to use, components from suppliers in which they have confidence. This confidence can be attained either empirically through adequate past performance of the part manufacturer or from verification that the manufacturer is indeed producing high quality parts. The latter can be achieved by test and subsequent data analysis. The selection of parts should be based on a knowledge of both the application environment in which they are to operate and the conditions they are exposed to during manufacture. It is equally important to understand how the failure rate during the part's useful life and its wearout characteristics (lifetime) are impacted by the specific application conditions. Only with this understanding are robust designs possible.
To assure the supply of adequate parts, suppliers must be effectively managed. Each supplier/technology should be qualified in a cost effective manner. Existing data from technology families can be used for qualification by similarity. On-going testing programs for representative products can also be used to demonstrate acceptability. For example, outgoing supplier quality and user incoming inspection and board level testing can be monitored to determine device quality and compatibility between the product design and manufacturing process. A parts list should be structured to not only provide attributes of good parts and suppliers, but also to provide a listing of critical devices, technologies and suppliers. This listing can include:
- Performance Limitations (stringent environmental conditions or non-robust design practice)
- Reliability Limitations (component/materials with life limitations or unrealistic derating requirements)
- Suppliers (history of late delivery, performance problems or reliability problems)
- Old Technology (part obsolescence or performance problems)
- New Technology (parts fabricated using immature manufacturing technology)
Supplier evaluation can be accomplished by analyzing their design, manufacturing, quality, and reliability practices. An audit/validation will assess whether a documented system exists and is being used. Additionally, demonstration of generic product manufacturability, verified by reliability testing, is recommended. Representative questions such as those in Table 17 should be asked.
Table 17. Representative Questions for Part Suppliers
- Is a quality program defined and implemented?
- Have potential failure mechanisms been identified?
- What corrective actions have been put in place?
- Are the manufacturing materials and processes documented?
- Are there process controls in place?
- Are parts manufactured continuously or is there intermittent production?
- What defect levels are present?
- Is there a goal in place for continuous improvement?
- Have life limiting failure mechanisms been designed out?
- Do lifetimes of failure mechanisms exceed the expected useful life of the product?
- Are efforts being taken to identify the cause of part failure and to improve the manufacturing process to alleviate their occurrence?
- Is the part screening process effective?
- Are design rules used and adhered to that result in high quality and reliability?
- Are design changes made only after analyzing and quantifying possible reliability and qualityimpact?
- Is customer notified of major changes?
- Does the supplier track and demonstrate on-time delivery?
|
3.9 Thermal Design
3.9.1 Purpose. The purpose of a good thermal management program is to reduce the temperature-related stresses in and on the product to a level that ensures proper performance over the product lifetime. As a general guideline, the objective is to reduce the resistance to heat flow so that the overall operating temperature can be reduced and controlled in the product design.
3.9.2 Benefits. For most electronic products, reducing thermal stresses means better reliability. Historical failure rate data has shown that higher temperatures and temperature cycling cause progressive deterioration in many components. The failure mechanisms causing the deterioration are chemical and physical processes. These mechanisms attack initial physical flaws or defects in the components and are generally accelerated at higher temperatures.
3.9.3 Timing. The application of thermal design techniques should start with the product design during the Concept/Planning phase. At this time, gross thermal conditions derived from environmental characterizations can be used to define cooling approaches based on expected heat generation. As the design progresses during product development, refined techniques at the individual component level can be introduced.
3.9.4 Application Guidelines. All cooling techniques represent methods to minimize or eliminate thermal resistances. Four commonly used cooling methods, in reverse order of cooling efficiency, are radiation and free convection, forced air, forced liquid and phase change. These methods are summarized in Table 18. Implementation of current electronics technology in high density packages has limited the suitability of air as a cooling medium. As a result, more and more cooling applications are using forced liquid and phase change cooling.
Table 18. Commonly Used Cooling Methods
Comparison of Various Cooling Methods. The cooling capability ranges of various cooling methods are shown in Figure 14. Table 19 shows the relative merits of the various cooling methods. It is evident that the more efficient the cooling method, the more complex and expensive it tends to be. Selection of the best cooling method for a given application will depend on the product reliability, performance, weight and volume, power, and cost constraints.
Figure 14. Cooling Range for Each Technique (Click to Zoom)
Table 19. Relative Merits of Various Cooling Methods
| Parameter | Radiation and Natural Convection |
Forced Air | Forced Liquid | Phase Change |
|
Typical heat capacity (W/m2)
|
500 |
1.6 x 104 |
7.8 x 104 |
1.2 x 106 |
| W/m2 (W/in2) |
(0.3) |
(10) |
(50) |
(800) |
| Implementation |
Simplest |
Simple |
Most complex |
Simple to complex |
| Weight and volume |
High |
Medium |
Low |
Low to high |
| Noise and vibration |
None |
High |
Low |
None to low |
| Power consumption |
None |
High |
Low |
None |
| Fluid leakage problem |
None |
Usually none |
Possible |
Possible |
| Cost |
Low |
Medium |
High |
High |
| Maintainability |
Simplest |
Simple |
Complex |
Complex |
Example of an Air Velocity Calculation. For a non-circuit-board-mounted part using forced air cooling, determine the air velocity needed to maintain a junction temperature of 90°C or less for a power transistor, given that the transistor dissipates 10 watts, the cooling air available is 60°C, the junction to case thermal resistance is 0.2°C/watt and the total exposed area is 0.0043 m2.
|
Step 1: |
Compute the heat flux density: (i.e., power/area)
Flux Density = Power/Area = 10W/0.0043m2 = 2326 W/m 2
|
|
Step 2: |
Compute the case to ambient temperature difference using the following equation for part junction temperature:
TJ = TA + ΔTCA + θJC Q
where,
TJ = junction temperature (°C)
TA = ambient temperature (°C)
ΔTCA = case-to-ambient temperature difference (°C)
θJC = junction-to-case thermal resistance (°C per watt)
Q = power dissipation (watts)
90°C = 60°C + ΔTCA + 0.2(10)
ΔTCA = 90°C-65°C =25°C
|
|
Step 3: |
From Figure 15, locate the ΔTCA point of 25°C and draw a horizontal line to the heat flux density curve of 2326W/ m2. At this point, the required air velocity can be identified as 23 meters per second.
|
Figure 15. Temperature Vs. Air Velocity Cooling (Click to Zoom)
3.10 Allocations
3.10.1 Purpose. The purpose of allocations is to logically apportion the product design reliability into lower level design criteria, such that the cumulative reliability still meets the customer needs. These lower level reliability values are first used as guidelines to determine feasibility, then later as design objectives.
3.10.2 Benefits. A number of benefits result from a well organized allocation process, including:
- Specific objectives for lower level reliability design are established.
- Specific design oriented goals are defined.
- Specific goals should result in improved design, manufacturing and testing processes.
- Attention is focused on the product relationship to its components, giving more importance to each level of assembly.
- Initial design, technology or material limitations are revealed.
3.10.3 Timing. An allocation process should be performed as soon as possible in the Concept/Planning phase. This is the time when selection of components, types of technology and other critical decisions are made, hence reliability design goals or needs must be established as one of the criteria for trade-offs. Performing allocations after these critical decisions are made will likely result in wasted resources and schedule delays.
3.10.4 Application Guidelines. One of the first steps in any design process is to translate overall product needs, including reliability, into goals or requirements that can be used at the lower design levels. Allocation of reliability parameters involves the distribution of the product reliability to its lower level assembly reliabilities. The basic allocation process involves the solution of an inequality as indicated in the following equation:
where,
Rproduct = reliability requirement/goal for the product
R^1 . . . n = reliability allocation for each component
F = the functional relationship between the subassembly and the product
This equation has an infinite number of solutions, so the problem is to establish procedures that yield reasonable approximations.
Table 20 provides an overview of three common techniques which can be used to allocate product design reliability goals/objectives.
Excerpt from "Table 20. Three Common Allocation Techniques"See Full Version
Example for the Feasibility of Objective Techniques. A system consisting of ten elements, as shown in Table 21, has a reliability design objective of 1,800 hours mean-time-between-failure (MTBF). Engineering evaluations of the elements are made for the intricacy, state-of-the-art, operating time (or duty cycle) and environment categories, each based on a weighting scale of 1 to 10. The four categories are multiplied to determine the weighting factor for each element. For example, the antenna is Wfk = 2 x 3 x 10 x 5 = 300. The ten elements are normalized by dividing each element by the sum of the weighting factor for all elements. Starting with a product reliability design objective of 1,800 hours MTBF (product failure rate of 556 failures per million hours), a 10% reserve factor is applied. The failure rate value to be allocated is now 500 failures per million hours (556 minus 56). This results in the weighted failure rate allocation for each element, as shown in the Element Failure Rate column. Element allocated MTBFs are shown in the last column.
Table 21. Feasibility of Objective Allocation Technique
| Elements | Intricacy Factor (1-10) |
State
of
the
Art
(1-10) |
Operating Time (1-10) |
Environ- ment
(1-10) |
Weigh- ting Factor
(Wfk)* |
Percent Ck =
Wfk / ΣWfk |
Element Failure Rate
(Ck x 500
x10-6 /Hr) |
Element MTBF** (Hr) |
|
Antenna |
2 |
3 |
10 |
5 |
300 |
.06 |
30x10-6 |
33,333 |
|
Transmitter |
5 |
5 |
8 |
5 |
1000 |
.21 |
105x10-6 |
9,525 |
|
Receiver |
5 |
5 |
8 |
5 |
1000 |
.21 |
105x10-6 |
9,525 |
|
Modem |
5 |
3 |
5 |
5 |
375 |
.08 |
40x10-6 |
25,000 |
|
Processor |
1 |
4 |
5 |
5 |
100 |
.02 |
10x10-6 |
100,000 |
Input /
Output |
6 |
5 |
10 |
5 |
1500 |
.30 |
150x10-6 |
6,667 |
|
Switch Matrix |
5 |
3 |
5 |
5 |
375 |
.08 |
40x10-6 |
25,000 |
|
Patch Panel |
2 |
2 |
5 |
5 |
100 |
.02 |
10x10-6 |
100,000 |
|
Lan/Beacon |
2 |
2 |
5 |
5 |
100 |
.02 |
10x10-6 |
100,000 |
|
Misc. (Cable, Conn--) |
1 |
1 |
5 |
5 |
25 |
.005 |
3x10-6 |
333,333 |
|
TOTALS |
4875 |
1.005 |
503x10-6 |
1,988 |
Note:
* Wfk = Intricacy x State-of-the-Art x Operating Time x Environment,
**MTBF = 1/element failure rate |
3.11 Design of Experiments (DOE)
3.11.1 Purpose. Experimental designs consist of a series of specific changes of an input variable to a process or a product in order to observe or measure the corresponding change to the output. By applying Design of Experiments (DOE), the individual effects of a complex system of multiple factors can be studied simultaneously, thereby avoiding inefficient testing of one factor at a time. This approach is a scientific process that allows the engineer access to a better understanding of the product and how multiple design options affect the response.
3.11.2 Benefits. Experimental design, when performed correctly, can result in the following product or process impacts:
- Improved performance
- Selection of less costly materials
- Reduced production costs
- Control of critical factors
|
- Shortened development time
- Significantly reduced test time
- Relaxed tolerances
- Higher levels of reliability
|
3.11.3 Timing. Design of Experiments can be performed to influence product design at any time from Concept/Planning through Production/Manufacturing. The techniques can be applied to product design, process design, test, and production evaluation.
3.11.4 Application Guidelines. Because there are numerous DOE strategies, including full factorial, fractional factorial, Plackett-Burman, Box-Burman and Taguchi orthogonal arrays, a detailed list of references is included in Section 4 to aid in proper technique selection. Each of the methods has its own strengths and weakness that need to be considered based on the application. For the purposes of this Blueprint, a general process for an orthogonal array will be discussed. Step 1 of the process starts by selecting the factors to be tested. This requires the development of a "short list" of significant factors often determined through a team effort by "brainstorming" ideas. For Step 2, controlling and non-controlling factors (along with test settings) need to be identified. Usually a high and low setting is used for each factor, coded +1 and -1. More than two settings could be appropriate if the distribution of the factor is something like the data in Figure 16, which required five settings. Even for a two setting factor, the range between high and low must be chosen carefully. The next step (Step 3) is to set-up an orthogonal array that permits separation of effects. Table 22 shows a typical two factor array with two settings along with the analysis equations to determine the average and expected outputs. The variables y1 through y4 are the test measurements based on the factor settings. For example, y1 is the test utilizing a high setting for factor A, a low setting for B and a high setting for AB interaction. Each of these tests is performed at least once, repeated only if uncontrolled conditions change. It should also be noted that test results can be biased by a factor or factors not tested. As a result, a confirmation test should be performed to verify or disprove the calculated optimum solution.
Figure 16. Selecting Test Settings (Click to Zoom)
Table 22. Orthogonal Array
| Run |
Factors (Test Setting) A B |
Interaction (By-Products) A*B |
Results (Measured) |
|
1
|
-
|
-
|
+
|
y1
|
|
2
|
+
|
-
|
-
|
y 2
|
|
3
|
-
|
+
|
-
|
y3
|
|
4
|
+
|
+
|
+
|
y 4
|
|
AVG-
|
(y1 + y 3) / 2
|
(y1 + y 2) / 2
|
(y 2 + y 3) / 2
|
|
|
AVG+
|
(y 2 + y 4) / 2
|
(y 3 + y 4) / 2
|
(y1 + y 4) / 2
|
y =
(y1 + y2 + y3 + y4) / 4
|
|
Δ
|
(Avg +) - (Avg -) for each column
|
|
|
y
|
y + (ΔA / 2)A + (ΔB / 2)B + (Δ(A * B) / 2)(A * B)
|
where
y = expected output
y = average output
ΔA = (AVG +) - (AVG -) values from column A in matrix
A = coded value of A (high setting = +1, low setting = -1)
Example of an Automotive Mileage Array. A design of experiments to determine the best way to improve the gas mileage for an automobile was conducted. Brainstorming the factors resulted in the indication that gasoline grade, start-up conditions and their interaction were the significant factors. Test settings were developed with low octane as the -1 and high octane as the +1 along with slow starts as -1 and fast starts as +1. From these data, an array was constructed and tests conducted with results as shown in Table 23.
Table 23. Automotive Mileage Array
| Test Run |
Factor A |
Factor B |
Interaction A&B |
Results Y |
| 1 |
- |
- |
+ |
y1 = 20 |
| 2 |
+ |
- |
- |
y2 = 12 |
| 3 |
- |
+ |
- |
y3 = 16 |
| 4 |
+ |
+ |
+ |
y4 = 8 |
| |
|
|
|
y =
56 / 4
= 14 |
From the results of the testing, the average test y is 14, which was determined by summing the mileage for the four tests and dividing by 4. The + and - averages and the delta difference for the three factors were determined as illustrated in Table 24.
Table 24. Determination of Orthogonal Array Averages, Delta Difference, and Optimum Solution
| Factor A |
Factor B |
Interaction A & B |
Ave. (-A) = (y1 + y3) / 2
= (20 + 16) / 2 = + 18
Ave. (+A) = (y2 + y4) / 2
= (12 + 8) / 2 = + 10
ΔA =[Ave.(+A)]-[Ave.=(-A)] = + 10 - (+18) = - 8 |
Ave. (-B) = (y1 + y2) / 2
= (20 + 12) / 2 = + 16
Ave. (+B) = (y3 + y4) / 2
= (16 + 8) / 2 = + 12
ΔB =[Ave. (+B)] -[Ave. (-B) = + 12 - (+16) = - 4 |
Ave. (-AB) = (y2 + y3) / 2
= (12 + 16) / 2 = + 14
Ave. (+AB) = (y1 + y4) / 2
= (20 + 8) / 2 = + 14
ΔAB =[Ave. (+AB)] - [Ave. (-AB)] = + 14 - (+14) = 0 |
| Optimum Solution |
Based on the above information, the optimum solution is determined to be: y = y + (ΔA / 2) * A + (ΔB / 2) * B + (ΔAB / 2) * AB y = 14 + (-8 / 2) * A + (-4 / 2) * B + (0 / 2)AB y = 14 - 4A - 2B |
The conclusion reached for a maximum mileage output is that the input factors for A and B should be low octane and slow starts, which will result in an output of 20 miles to the gallon.
3.12 Dormancy Analysis
3.12.1 Purpose.
The purpose for having and using dormant design guidelines is to control the environmental stresses the product may be exposed to during nonoperating conditions. These control measures include the selection of proper components during product design, the use of corrosion protection design techniques, the reduction of moisture and testing to limit manufacturing defects.
3.12.2 Benefits.
Applying special design factors to a product that address nonoperating conditions will result in lower product life cycle costs, higher product reliability, reduction of failure mechanisms and ultimate customer satisfaction. For those times when long storage periods are expected, a dormancy analysis can determine if periodic testing is necessary to ensure proper operation.
3.12.3 Timing.
Historically, all storage analyses resulted from the experience of taking a product off the shelf, attempting to operate it and finding that it had failed. Crash fix-it programs usually resulted in the correction of the immediate problem after the fact at high costs. The best time to consider nonoperating conditions is during the initial product Design/Development phase where part and material selection procedures can be applied, protective measures can be included, and stresses can be counteracted. Planning for potential dormancy situations should be initiated during the Concept/Planning phase of the product, when the product end use environment is beginning to be characterized.
3.12.4 Application Guidelines. Designing for dormant, storage or intermittent operation of a product is important because numerous failures can occur during this portion of the product life cycle. In addition, the nonoperating phase may be the predominant part of the product's life cycle. Special engineering judgment and design guidelines are necessary if the number of failures and failure conditions are to be minimized. Every design has unique components, so a standard guideline for control of components will require modification to fit the given situation. Developing and following a guide during design is the best way to achieve high levels of reliability for dormant or storage situations.
The method proposed in Table 25 is provided as a guide that acknowledges the unique considerations needed for design, use and storage of products.
Table 25. Guidance for Addressing Dormancy in Product Design
| Dormancy Considerations |
Design Guidance |
| Design Related |
- Use parts and components that do not change tolerance levels with aging.
- Avoid semiconductors and microcircuits that use nichrome deposited resistors.
- Use parts and components that have mono-metallization to avoid galvanic corrosion.
- Avoid use of variable actuated components; tolerance and corrosion are problems for items like potentiometers.
- Use solid state relays and switches to avoid corrosion and contact problems.
- Use moisture resistant finishes for materials and nonabsorbent materials for gaskets.
- Lubricated surfaces and assemblies require sealing and drains for excess moisture.
- Impregnate electromechanical windings with varnish, encapsulation or hermetic sealing.
- Use nonporous insulating materials.
- Impregnate cut edges on plastic with moisture resistant resin.
|
| Stress Related |
- Mechanical stress can be reduced through the use of vibration and shock isolators.
- Thermal stress can be reduced by either lowering the ambient temperature or controlling the cycles and temperature change.
- Chemical corrosion can be controlled through packaging, sealing and reducing moisture.
- Manufacturing defects can be reduced by stress screen testing or rigid inspection of materials.
- Materials sensitive to cold flow and creep should be avoided.
- Hydroscopic materials should be avoided or protected against accumulation of moisture by sealing or refinishing.
- Contact resistance can be reduced by use (wiping) or controlling moisture.
|
| Long Term Storage |
- Avoid use of lubricants. Use dry graphite if necessary.
- Avoid teflon or rubber gaskets. Use silicone based gaskets.
- Disconnect all power sources and ground the components.
- Maintain a constant temperature of 50°F ± 5°F.
- Package to reduce shock and vibration through the use of isolators.
- Control the relative humidity to 50% ± 5% to reduce corrosion and electrostatic discharge failures.
- Recharge batteries every 60 days to maintain capability, or remove.
- Protect against rodent damage using screens, poison, traps, etc.
|
Example of Dormant Failure Considerations. To provide some insight into the failure mechanisms, modes and accelerating factors associated with dormancy, Table 26 shows some typical characteristics and conditions for a number of discrete electronic parts. Failure rates for dormant and active conditions are included.
Table 26. Dormant Part Failure Mechanisms and Failure Rates
| Type |
Failure Mechanisms |
% Failure Mode |
Accelerating Factor |
Dormant/Active
Failure Rate(per hr.) |
| Microcircuit |
Surface Anomalies
Wire Bond
Seal Defects |
35-70 Degradation
10-20 Open
10-30 Degradation |
Moisture, Temperature
Vibration
Shock, Vibration |
.006x10-6/.035x10-6 |
| Transistor |
Header Defects
Contamination
Corrosion |
10-30 Drift
10-50 Degradation
15-25 Drift |
Shock, Vibration
Moisture, Temperature
Moisture, Temperature |
.0005x10-6/.001x10-6 |
| Diode |
Corrosion
Lead/Die
Contact
Header Bond |
20-40 Intermittent
15-35 Open
15-35 Drift |
Moisture, Temperature
Shock, Vibration
Shock, Vibration |
.0005x10-6/.008x10-6 |
| Resistor |
Corrosion
Film Defects
Lead Defects |
30-50 Drift
15-25 Drift
10-20 Open |
Moisture, Temperature
Moisture, Temperature
Shock, Vibration |
.0002x10-6/.001x10-6 |
| Capacitor |
Connection
Corrosion
Mechanical |
10-30 Open
25-45 Drift
20-40 Short |
Temperature, Vibration
Moisture, Temperature
Shock, Vibration |
0008x10-6/.026x10-6 |
| RF Coil |
Lead Stress
Insulation |
20-40 Open
40-65 Drift |
Shock, Vibration
Moisture, Temperature |
.003x10-6/.25x10-6 |
| Relay |
Contact Resistance
Contact Corrosion |
30-40 Open
40-65 Drift |
Moisture, Temperature
Moisture |
.025x10-6/.12x10-6 |
| Battery |
Corrosion Leakage |
50-70 Degraded
20-30 No Output |
Moisture, Temperature
Moisture |
.016x10-6/4.7x10-6 |
Printed
Circuit Board |
Mechanical |
30-35 Open
15-20 Short |
Shock, Vibration
Shock, Vibration |
.83x10-6/2.1x10-6 |
| Fan |
Mechanical |
90-95 Jammed
1-5 Broken |
Vibration
Shock |
.13x10-6/1.6x10-6 |
3.13 Durability Analysis
3.13.1 Purpose. The purposes of a durability analysis are to identify component and process designs that exhibit "early" wearout failure, isolate the root cause and determine potential corrective actions. Finding and solving these design problems will result in a more acceptable product in the marketplace.
3.13.2 Benefits. The benefits of durability analysis are fewer failures during the useful life and more customer satisfaction with the product. For the design team, the durability analysis provides detailed analytical models that identify physical relationships between the product application and the operating environment.
3.13.3 Timing. Durability analysis can be performed any time after components or processes have been identified. Knowledge is required about the material characteristics, the environmental stress levels, the operating parameters and the use factors. Early application in the design phase is desirable for "critical components" or known problem areas. If potential problem areas are not known, the analysis may not be cost effective.
3.13.4 Application Guidelines. Durability analysis should be an up-front analysis that focuses on identifying and solving design problems related to early wearout. This analysis process is performed by evaluating life-cycle loads and stresses, product architecture, material properties, and failure mechanisms. Figure 17 illustrates the relationship between reliability (failure rate), as measured on the vertical axis, and time, as measured on the horizontal axis. The appropriate interpretation for the two different concepts, reliability and durability, are indicated in this figure.
Figure 17. Failure Rate Vs. Time (Click to Zoom)
The basic approach to durability analysis, which is applicable to both new or old technology, is outlined in Table 27.
Table 27. Basic Approach to Durability Analysis
| Step |
Discussion |
| 1. Define the operating and nonoperating life requirements |
Length of time or number of cycles expected or needed for both operating and nonoperating periods should be determined. |
| 2. Define the life environment |
Temperature, humidity, vibration and other parameters should be determined so that the load environment can be quantified and the cycle rates determined. For example, a business computer might expect a temperature cycle once each day from 60°F to 75°F ambient. This would quantify the maximum and minimum temperatures and a rate of one cycle per day. |
| 3. Identify the material properties |
Usually this involves determining material characteristics from a published handbook. If unique materials are being considered, then special test programs will be necessary. |
| 4. Identify potential failure sites |
Failure areas are usually assumed to fall into categories of new materials, products or technologies. Considerations should include high deflection regions, high temperature cycling regions, high thermal expansion materials, corrosion sensitive items, and test failures. |
| 5. Determine if a failure will occur within the time or number of cycles expected |
A detailed stress analysis using either a closed form or finite element simulation method should be performed. Either analysis will result in a quantifiable mechanical stress for each potential failure site. |
| 6. Calculate the component or process life |
Using fatigue cycle curves from material handbooks, estimate the number of cycles to failure. The following figure shows a typical fatigue curve for stress versus cycles to failure. Specific material fatigue data can be obtained from databases maintained by the Center for Information and Numerical Data Analysis and Synthesis (see reference section).
|
Example of a Durability Analysis. Determine the reliability of a bearing that is subjected to a radial load of 1,000 lbs. and a speed of 60,000 revolutions per hour. Dynamic capacity is 11,700 lbs. Compute the reliability for 50,000 hours of constant operation. Table 28 illustrates the steps involved.
Table 28. Example Durability Analysis
3.14 Failure Modes and Effects Analysis
3.14.1 Purpose. The purpose of a functional failure modes and effects analysis (FMEA) is to examine potential failure modes and determine the impact of failure on product operation during customer use. The output of an FMEA/FMECA is useful to enhance design reliability through corrective action implementation.
3.14.2 Benefits. The systematic nature of a functional FMEA assures that every product-level failure effect above the function under evaluation is considered. The benefits of a systematic analysis include early identification of potential operational problems, reduced criticality of functional failures, elimination of cascading failures and identification of critical items for control. This type of design analysis leads to more reliable products.
3.14.3 Timing.
The time to perform a functional analysis is late in the Conceptual/Planning product phase, when general hardware functions and configurations are identified. This analysis is applied as soon as functional block diagrams are developed and should continue through the early Design/Development stages as the configuration changes. A part-level FMEA/FMECA can be initiated as soon as information at that level becomes available.
3.14.4 Application Guidelines. A functional FMEA or FMECA process is outlined in Figure 18. In addition to the block diagram, the product theory of operation, basic failure mode assumptions and ground rules are needed to perform the analysis.
Figure 18. Process Diagram for a Functional FMEA (Click to Zoom)
Two basic methods are typical, the FMEA procedure and the Criticality Analysis. The information required to perform a functional FMEA includes the identification of each system function (and its associated failure mode) for each functional output. A generic worksheet for an FMEA is illustrated in Figure 19. The worksheet columns are used as follows:
- Identify the product
- List the product functions
- Define the functional failure for each function
- Determine the failure modes for each functional failure cause
- Determine the function and product effects for each failure mode
- Estimate the severity of the failure mode (typically defined as catastrophic, critical, marginal or minor)
- Determine the cause that resulted in the failure
- Evaluate and recommend corrective actions
| Product:______________ |
|
Analyst:____________
Date:____________ |
| Function |
Failure
Modes |
Local
Effect |
End
Effect |
Severity |
Cause |
Action |
|
|
|
|
|
|
|
Figure 19. FMEA Worksheet
When a criticality analysis is desired, more information in the form of a relative measure of the consequence will result. It should be noted that criticality analyses are difficult to perform for a functional FMEA due to the lack of detailed failure data at this level. If failure data are available, criticality numbers are developed as follows:
Failure Mode Criticality Number = ( α ) x (frequency) x (hours or cycles) x ( β )
where,
| α |
= the percentage for occurrence of each failure mode |
| frequency |
= the rate of occurrence |
| β |
= the best estimate of the percentage of occurrence of the effects
(probability that the effect will occur) |
Example of a Functional FMEA. A partial FMEA for an automotive power function is illustrated in Table 29.
Table 29. Example of an Automotive FMEA Worksheet
| |
Analyst: Carrie Awne |
| Product: |
Automobile, Model XYZ |
Date: 1 July 1995 |
| Function |
Failure Mode |
Local Effect |
End Effect |
Severity |
Cause |
Action |
Deliver 200 ±10
Horse-power |
Slippage
(Clutch) |
Slow Acceleration |
Poor Gas
Mileage |
Marginal
(Degraded
performance) |
• Glazing of clutch
• Fatigue of springs
• Clutch plate wear |
• New face plate material
• Storage springs |
| Low Torque |
etc. |
etc. |
etc. |
etc. |
etc. |
| Friction |
etc. |
etc. |
etc. |
etc. |
etc. |
Acceleration 0
- 60 mph in 8.5
±1.0 seconds |
Fuel |
etc. |
etc. |
etc. |
etc. |
etc. |
| etc. |
etc. |
etc. |
etc. |
etc. |
etc. |
3.15 Failure Reporting and Corrective Action System
3.15.1 Purpose.
A Failure Reporting, Analysis and Corrective Action System (FRACAS) is the backbone of a reliability design improvement program. It provides the detailed data needed, such as product operating time and failure characteristics, to identify design or process deficiencies for correction.
3.15.2 Benefit. FRACAS provides information needed for the timely identification and correction of design errors, part problems, workmanship defects and/or manufacturing and administrative process errors. Consequences of not having an effective FRACAS can include significant direct costs in factory rework, scrap, or warrantee service, and even greater indirect costs in dissatisfied customers. Finding and fixing product design defects in a timely fashion is critical to retaining satisfied customers. It should be an ethical concern of a producer to eliminate reliabilit problems in his products.
3.15.3 Timing. FRACAS requires a source of data before it can be implemented. Once hardware/software begin to become available, and definition and implementation of processes has begun, a working FRACAS should be in place. Failure data should be collected by the manufacturer from any tests and operational use (Design/Development through Production/Manufacturing). The FRACAS should continue as long as the product is supported by the manufacturer (i.e., through the Operation/Repair phases of the product). Customers may, and should, have their own FRACAS to identify operational reliability problems for correction during their use of the product.
Figure 20. Generic Closed-Loop FRACAS (Click to Zoom)
- Observation of the failure
- Complete documentation of the failure, including all significant conditions which existed at the time of the failure
- Failure verification, i.e., confirmation of the validity of the initial failure observation
- Failure isolation, localization to the lowest replaceable defective item within the product
- Replacement of the suspect defective item
- Confirmation that the suspect item is defective
- Failure analysis of the defective item
- Data search to uncover other similar failure occurrences and to determine the previous history of the defective item and similar related items
- Establishment of the root cause of the failure
- Determination, by an interdiscipline design team, of the necessary corrective action, especially any applicable redesign
- Incorporation of the recommended corrective action into development equipment
- Continuation of development tests
- Establishment of the effectiveness of the proposed corrective action
- Incorporation of effective corrective action into production equipment
The key to a successful FRACAS is its database. This is particularly important in establishing the significance of a failure. For example, the failure of a capacitor in a reliability growth test becomes more significant if the database shows similar failures of the part during incoming inspection or any environmental tests performed. For this reason, all available sources of data should be integrated into the FRACAS. Initial failure reports should document, as a minimum:
- Location of failure
- Description of test being performed
- Date and time of failure
- Part number and serial number of product/assembly and failed items
- Model number of product
- Description of failure symptom
- Individual who observed failure
- Circumstances of interest (e.g., occurred immediately after power outage)
The failure documentation should be augmented with the verification of failure at the product level (step 3 in Figure 20), and verification that the suspect part did indeed fail (step 6). The format of the failure reporting form should be determined by the supplier to best meet his needs for improving the product design.
Once the failure is isolated, the FRACAS database and failure analysis can be used to determine its root cause. Given the root cause, appropriate corrective action can be formulated.
Failure analysis can be performed to varying degrees, and may require coordination with the part supplier. The most critical failures (i.e., those that occur most often, are most expensive to repair, or threaten the user's safety) should receive in-depth analysis, perhaps including X-rays, scanning electron beam probing, etc. Where the manufacturer does not have a comprehensive failure analysis laboratory, outside sources are available for use.
A sample failure reporting form that includes the minimum essential information to make corrective action decisions is shown in Figure 21.
| FAILURE REPORT FORM XYZ COMPANY |
Model #: Computer #6161
|
Date of Occurrence: 10 April 96 |
| Time of Event: 0846 AM |
|
| Description of Event: |
Computer failed to perform correct computation |
| Event Observed by: |
P.C. Borde |
|
| Description of Repair: |
Replaced Accumulator Board #2 |
| Product Repaired by: |
Mike R. Sawft |
|
| Description of Failure Analysis: |
Replaced part no. IC-8086 (Intel). Part was submitted for
failure analysis, where it was determined that the failure cause was electrical overstress
(root cause: electrostatic discharge). |
| Part Analyzed by: |
J. Bush |
|
| Recommended Action: |
Use electrostatic grounding clips during all maintenance actions. |
| Report Prepared by: |
P. Tree |
Report Date: 14 April 1996 |
|
Figure 21. Sample Failure Reporting Form
3.16 Fault Tree Analysis
3.16.1 Purpose.
Fault Tree Analysis (FTA) is a top down failure consequence assessment technique that is useful in identifying product design safety concerns. When used in the design stage, the results of the analysis will identify the cause(s) of product failures which may then be eliminated through good design practice.
3.16.2 Benefits. When FTA is applied in the design stage, the benefits that can be derived include:
- Identification of single failure points
- Identification of safety concerns
- Evaluation of software and man-machine interfaces
- Evaluation of design change impacts
- Simplification of maintenance and trouble-shooting procedures
3.16.3 Timing. A FTA can be performed as early as the product Concept/Planning phase; however, application in the early stages of Design/Development is the most productive. The FTA results are useful for driving preliminary design approaches and possible reconfiguration of the product.
3.16.4 Application Guidelines. Similar to a Failure Mode and Effects Analysis, a FTA will identify major failure modes of the product based on lower level failures. The product design reliability can then be improved by eliminating the causes of those failures. Some implementation guidelines are provided in Table 30.
Table 30. Implementation of Fault Tree Analysis vs. Failure Mode and Effects Analysis
| Condition |
FTA
Preferred |
FMEA
Preferred |
| Primary concern is safety of public or operating and maintenance personnel |
X |
|
| A small number of clearly differentiated "top events" can be explicitly identified |
X |
|
| "Top events" cannot be explicitly defined or are limited to a small number |
|
X |
| Completion of a functional profile is of critical importance |
X |
|
| Multiple potentially successful functional profiles are feasible |
|
X |
| Primary concern is the identification of "all possible" failure modes |
|
X |
| High potential for failure due to "human error" contributions |
X |
|
| High potential for failure due to "software error" contributions |
X |
|
| Primary concern is a quantified "risk evaluation" |
X |
|
| Product functionality is highly complex and/or it contains highly interconnected functional paths |
X |
|
| Product functionality is basically linear, with little human or software intervention |
|
X |
| Product is not repairable once its function has been initiated (space systems) |
X |
|
A basic Fault Tree Analysis relates an undesired event to possible causes through a tree- like network branching at "AND gates" and "OR gates." For example, Figure 22 shows a partial fault tree for the event that an automobile will not start. It shows the problem may be due to electrical or fuel factors and that one electrical factor could be the combination of a weak battery and an unheated garage, if it is a cold day. Table 31 explains the symbology used.
Figure 22. Example FTA: Car Won't Start (Click to Zoom)
Table 31. FTA Symbology
Cut Set Analysis. A cut set is a combination of basic events (the circles in Table 31) that result in the undesired event. When one basic event alone can cause the end event (a cut set of one element), it is referred to as a single point of failure. A minimum cut set is the smallest combination of events that will cause the end event. For example, the basic cut sets of Figure 23 are (events 1 and 3), (2 and 4), (3) or (4). Since event 3 is a single point of failure, the cut set (1 and 3) is redundant. Since event 4 is also a single point of failure, the cut set (2 and 4) is also redundant. Hence, the minimum cut sets for Figure 23 are (3) or (4), two single points of failure. In a qualitative analysis of a fault tree, the smallest cut sets are given the most attention, with single points of failure considered first.
Figure 23. Fault Tree Analysis Problem (Click to Zoom)
As a more detailed example, Figure 24 presents a fault tree cut set for passenger injury in an elevator. Each failure mode and possible cause are indicated.
Figure 24. Example of a Fault Tree for Electromechanical Passenger Elevator (Click to Zoom)
Quantitative Fault Tree Analysis. Quantitative analysis should be considered in later product design stages. When the probability of each basic event can be estimated, it is possible to compute a number, called the criticality, from which the relative importance of the event can be determined. The criticality number is computed by multiplying the probability of the basic event happening to the conditional probability that, given the occurrence of the basic event, the end event will happen. For example, consider the fault tree presented in Figure 25.
Figure 25. Quantitative Fault Tree (Click to Zoom)
The number under each basic event is the probability that it will occur. The conditional probability that the end event will occur is determined from probability theory. For example, to determine the criticality of event 1, multiply its probability of occurrence (.01) by the probability that the end event will occur, given that event 1 has happened. From Figure 25, the end event (H) will occur when both events A and B occur. Hence, its probability is the product of the probability that A will occur and the probability that B will occur.
Since event A is connected to its causes (events 1 and 2) by an AND gate, the AND gate probability equation applies. When calculating the criticality of event 1, however, the event is assumed to have occurred and its probability will be set to 1.0 so the probability of event A, given event 1 has occurred, is simply the probability that event 2 will occur (.03).
Event B is connected by an OR gate to its causing events, so either event 3 or event 4 will cause event B. To calculate its probability, note that the probability of B occurring is one minus the probability that it will not occur, and that the probability of B not occurring is the product of the probability that event 3 will not occur times the probability that event 4 will not occur. Further, the probability that event 3 (or event 4) will not occur is one minus the probability that it will occur.
Note that when calculating the criticality of either event 3 or event 4, the probability of event B happening will be 1.0, since either event will cause event B, and the event whose criticality is being computed is assumed to have happened (i.e., has a probability of occurrence of 1.0).
Using the AND and OR gate equations, the criticality of each of the four basic events of Figure 25 can be computed. The results are given in Table 32, which shows that events 1 and 2 are the most critical, and event 3 is the least critical.
Table 32. FTA Criticality Results
| Basic Event |
P(x) |
P(A)/Xi |
P(B)/Xi |
P(H/Xi) |
Criticality P(Xi) [P(H/Xi)] |
| 1 |
.01 |
.03 |
.09 |
.0027 |
.000027 |
| 2 |
.03 |
.01 |
09 |
.0009 |
.000027 |
| 3 |
.04 |
.0003 |
1. |
.0003 |
.000012 |
| 4 |
.05 |
.0003 |
1. |
.0003 |
.000015 |
3.17 Finite Element Analysis
3.17.1 Purpose.
Simulation techniques provide very effective assessments of mechanical and thermal robustness of product designs prior to production. Finite Element Analysis (FEA) is a simulation technique, usually computer implemented, that estimates material response to loads or environmental exposure. The analysis can be used as a design tool to assess the potential for thermal or mechanical failure in reaction to the expected loads before manufacturing or testing of the product takes place.
3.17.2 Benefits.
The benefits of a Finite Element Analysis are the early discovery of life limiting material deficiencies or the uncovering of excess loading conditions. With the identification of the deficiency, either more robust components or isolation techniques can be introduced to reduce the load's impact on the product design reliability before production decisions are finalized. This design analysis can be performed before product manufacturing to uncover problems and to analyze solutions without requiring the testing of materials.
3.17.3 Timing.
The most effective FEA occurs when the product or item is developed to the point where the material and design properties can be clearly defined. This suggests a time frame after preliminary design Concept/Planning and before completion of the product Design/Development phase. Since Finite Element Analyses are time consuming and costly, the items to be analyzed should be selected very carefully. Performed during the design phase, this analysis can identify problems which could cause catastrophic failure events.
3.17.4 Application Guidelines.
A Finite Element Analysis is the breakdown of a product into one or more elements that can be represented by mathematical models of an idealized structure. Each structure is represented by a grid of node points with interconnections. Without the use of computers to solve these models, the technique is restricted to the most simple or ideal problems. With the use of high speed digital computers, the scope of this analysis has been expanded to analyze complex items such as very high speed integrated circuits (VHSIC) for mechanical displacement resulting from a mismatch of thermal coefficients of expansion relative to the circuit board. With the use of a computer, a solution can be obtained by combining individual elements into an idealized structure for which conditions of equilibrium and compatibility are satisfied.
The most difficult and time consuming part of any Finite Element Analysis is establishing the detailed mathematical models and conditions. Therefore, selection of items to be analyzed should be performed very carefully. Selection criteria should include:
- New materials or technologies
- Severe environmental load conditions
- Critical thermal or mechanical constraints
The general steps to be followed in performing the FEA are presented in Table 33.
Table 33. Steps for Performing a Finite Element Analysis

3.18 Modeling and Simulation
3.18.1 Purpose. There are a number of purposes for the development of a reliability model to support the design of a product:
- Present a clear picture of the functions and interfaces of the product.
- Establish the basis for design and/or reliability trade-off analysis.
- Allow evaluation of the product before the design is finalized.
- Identify problem areas and possible solutions.
- Present the basis for selection of reliable parts, materials and processes.
3.18.2 Benefits. The use of a reliability model will help guide design decisions for selection of reliable parts, materials and processes. The model should provide a means for evaluating quantitative reliability, assessing the design goals/objectives versus the customer's need, incorporating fault tolerance into the design, or incorporating other design reliability improvement techniques.
3.18.3 Timing. A reliability model should be developed as early as possible in the design process, including the product Concept/Planning phase, even if numerical data is not yet ready. The early models can reveal conditions or functions where the product may fail to meet customer reliability needs. At this time, major influences on design decisions can be incorporated with a minimum of impact.
3.18.4 Application Guidelines. All reliability models can be categorized into two fundamental types, series and parallel, as described in Table 34.
Table 34. Characteristics of Series and Parallel Reliability Models
Table 35 outlines three additional types of models which can be used to support reliable product designs.
Table 35. Three Reliability Modeling Types
Monte Carlo Simulation. Monte Carlo simulation synthesizes a product reliability from a reliability block diagram by means of random sampling. This method is used when the reliability model is too complex to derive a general equation for solution. The technique does not result in a general probability of success from the component probabilities. The process is usually performed by computer due to the many repetitive trials, and is based on the principle of the law of large numbers. This law states that the larger the sample, the more certain that the sample mean will be a good estimate of the product mean. Monte Carlo simulation is applicable to single and multi-function products.
Example of a Conventional Probability Model. Given a product with four components with a reliability of 0.9 for component "a" and 0.95 for components "b", "c" and "d", the customer desires a reliability of 0.80. What design configuration may be considered appropriate for this product.
Configuring a series model for the product, the results are:
(Click to Zoom)
Since this configuration fails to achieve the reliability need (0.80), a parallel configuration for the weakest link is proposed as follows:
(Click to Zoom)
As can be seen, the 0.85 parallel reliability model achieves the stated need.
3.19 Part Obsolescence
3.19.1 Purpose. This task has a two fold purpose:
- Minimize the use of obsolete, or soon to be obsolete, parts and materials (and their sources or suppliers) in the design process.
- Plan for alternate parts or suppliers to replace potential obsolete parts and diminishing sources
Successful management of this area requires close attention in order to maintain parts and materials availability to support the product throughout its useful life.
3.19.2 Benefits. The benefits of a parts obsolescence management program include timely part availability and the use of preferred manufacturing processes. This process will prevent added costs and schedule delays that may result from needed part substitutions or design changes to resolve obsolescence or diminishing source issues.
3.19.3 Timing. Part and supplier obsolescence management should be a basic component of the supplier's operating/design/manufacturing procedures (i.e., best commercial practices) and should be essentially product independent. It should evolve around needed components, operating environmental requirements and package styles. Implementation prior to the start of design will ensure reliable product operation and adequate repair support.
3.19.4 Application Guidelines. In order to guarantee part and material availability during product design, manufacturing and field service, two areas of concern should be addressed. The first is obsolescence, which occurs when parts are required for product manufacture or support, but are no longer manufactured because there is insufficient market demand. It is common to have products whose lifetimes extend beyond the life cycle of its constituent parts.
The second area that must be considered is the potential for diminishing sources, causing parts that are not yet obsolete to become difficult to obtain. This can be the result of the manufacturer experiencing limited orders, a downsizing market, industry instability, or a business decision to exit the market for a particular technology or device. Regardless of the reason, the part is unavailable, and the effect is essentially the same as if the part had become obsolete. When end-of-life parts are identified, and despite the proactive management of parts and suppliers to alleviate or minimize device obsolescence and diminishing sources problems, short and long term solutions are necessary. The short term solution begins when a device is unavailable and usually results in a part substitution. The long term solution ensures future product producibility (i.e., redesign of the product at the printed wiring assembly (PWA) or box level).
When implementing a fix for a specific obsolescence problem, the long and short-term solutions may be different. A "band-aid" or short term approach should not be emphasized, since the long term approach addresses producibility, thereby avoiding the costs associated with a disruptive design change every 2-3 years.
Early notification of part/supplier end-of-life status provides time to select an acceptable solution that will minimize the impact on manufacturing. External sources such as the Defense Logistics Agency/Defense Electronic Supply Center (DLA/DESC), Government Industry Data Exchange Program (GidEP), and suppliers themselves, can be used as a source of early notification. Figure 26 illustrates a process flow that can be used for a short and long term solution when this notification is received. The major difference is that even though a part or vendor exists, and a temporary solution may exist, the effort does not stop. A trade study using the stated factors should be performed to ensure a long term solution.
Figure 26. Part Obsolescence Solution Flowchart (Click to Zoom)
3.20 Predictions
3.20.1 Purpose. The primary purpose of a reliability prediction is to provide guidance relative to the expected reliability for a product as compared to the customer's need, expressed or implied, for the product. The use of a prediction is a means of developing information for design analysis without actually testing and measuring the product capabilities.
3.20.2 Benefits. Predictions provide an array of benefits to a product development, including:
- Determining the feasibility of a proposed product's design reliability
- Comparison of predicted reliability to the product reliability goals/objectives
- A means of ranking or identifying potential reliability design problem areas
- Evaluation of alternate designs, parts, materials and processes
- A quantitative basis for design trade studies without resorting to testing
3.20.3 Timing. Early predictions are strongly encouraged in the product Concept/Planning phase. This is when most decisions are made regarding parts, materials and processes. The first analysis should be considered as soon as the initial design data is available. Predictions should be continued throughout the design process, being updated as more detailed design information becomes available. The later predictions evaluate stress conditions and life limiting constraints, as well as identify design problem areas.
Table 36 illustrates suggested prediction methods, along with the level of product design at which the prediction may be performed.
Table 36. Reliability Hierarchy Prediction Listing
| Level
| Example
| Phase
| Technique
|
| System or Product |
Computer Product |
Conceptual Design |
Similar Item
Part Count |
| Assembly or Component |
Processor Assembly |
Early Design |
Similar Item
Part Count
Reliability Physics |
| Circuit or Part |
Microprocessor Part |
Detailed Design |
Stress Analysis
Reliability Physics
Test Data |
3.20.4 Application Guidelines. Since there are numerous ways and techniques to make reliability predictions, this guideline will be limited to the two general models (empirical and deterministic) and four methods (similar item or circuit, part count, stress analysis and physics-of-failure). Table 37 contains information on when each model type should be considered for a specific application.
Table 37. Reliability Prediction Application
| Empirical Models
| Deterministic Models
|
• Use for complex products
• Use for quick high level analysis
• Use to compare relative merits of competing designs
• Use if no changes to the basic design are allowed
• Use to select components or evaluate stresses |
• Use for life limiting failure mechanisms
• Use when no historical data available
• Use for detailed component or package design
• Use if design flexibility exists
• Use to determine the root cause of failure |
Similar Item. This method starts with the collection of past experience data on similar products. The data is evaluated for form, fit and function (FFF) compatibility with the new product. If the product is an item that is undergoing a minor enhancement, the collected data will provide a good comparison to the new product. Small differences in operating environment or conditions can be isolated and evaluated. If the product does not have a direct similar item, then lower level similar circuits can be analyzed. In this case, data for components or circuits are collected and a product reliability value is calculated. The general expression for product reliability calculated from its constituent components using the similar item method is:
Rp = R1 * R2 ... Rn
where,
Rp = Product reliability
R1, R2 ... Rn = Component reliability
Example of a Similar Item Analysis. A new computer product is composed of a processor, a display, a modem and a keyboard. The new product is expected to operate in a +40°C environment. Data on similar components was located and is shown in the second column in Table 38. The similar item data is for a unit operating in a 20°C environment. What mean-time-between-failure can be expected for the new system if a 30% technology improvement is expected?
Table 38. Reliability Analysis Similar Item
| Items |
Similar Data MTBF (Hrs.) |
Temperature* Factor |
Improvement Factor |
New Product MTBF (Hrs.) |
| Processor |
5,000 |
0.8 |
1.30 |
5,200 |
| Display |
15,000 |
0.8 |
1.30 |
15,600 |
| Modem |
30,000 |
0.8 |
1.30 |
31,200 |
| Keyboard |
60,000 |
0.8 |
1.30 |
62,400 |
| System |
3,158 |
|
|
3,284 |
| * Temperature conversion factor source "Reliability Toolkit: Commercial Practices Edition", page 176 |
Each component MTBF is corrected for the change in temperature of 20°C to 40°C. Technology improvements were also included and the product mean-time-between-failure (MTBF) was calculated using the expression:
MTBFp = ∑ 1 / λi
where,
MTBFp = mean-time-between-failure of the product
λi
= failure rate of the i component
Part Count. The part count method is generally used to analyze electronic circuits in the early design phase, when the number and type of part in each class (such as capacitor, resistor, transistor, microcircuit, etc.) are known and the overall design complexity is likely to change appreciably during later phases of design/development. The method starts with the listing of the part types and expected quantities. Reliability data is then taken from source books such as MIL-HDBK-217 "Reliability Prediction of Electronic Equipment." Failure rates and quantities of parts are multiplied and the results for each part type are summed to determine the product reliability. This method assumes that the time-to-failure of the parts is exponentially distributed. The general expression for a product failure rate using this method is:

where,
| |
λ product |
= |
Total failure rate (failures per unit time) |
| |
λ Gi |
= |
Generic failure rate for the ith generic part |
| |
π Ai |
= |
Adjustment factor for the ith generic part (quality factor, temperature factor, environmental factor) |
| |
Ni |
= |
Quantity of ith generic part |
| |
n |
= |
Number of different generic part categories |
Example of a Part Count Analysis. An electronic receiver is analyzed using the part count method. The part types and quantities are indicated in Table 39. The part failure rate data was obtained from MIL-HDBK-217 for a ground mobile ( GM) environmental condition. An adjustment to an airborne inhabited cargo (AIC) environment is needed. What is the estimated reliability of the receiver in terms of mean-time-between-failure (MTBF)?
Table 39. Electronic Receiver Reliability Part Count Analysis
| Device |
Quantity |
GM Failure Rate
(Failures/106 Hrs.) |
Adjustment* Factor
GM to AIC |
Component Type
Failure Rate
(Failures/106 Hrs.) |
| Microcircuit |
25 |
0.06 |
(1/1.4) = 0.71 |
1.07 |
| Diode |
50 |
0.001 |
(1/1.4) = 0.71 |
0.04 |
| Transistor |
25 |
0.002 |
(1/1.4) = 0.71 |
0.04 |
| Resistor |
100 |
0.002 |
(1/1.4) = 0.71 |
0.14 |
| Capacitor |
100 |
0.008 |
(1/1.4) = 0.71 |
0.57 |
| Switch |
25 |
0.02 |
(1/1.4) = 0.71 |
0.36 |
| Relay |
10 |
0.40 |
(1/1.4) = 0.71 |
2.84 |
| Transformer |
2 |
0.05 |
(1/1.4) = 0.71 |
0.07 |
| Connector |
3 |
1.00 |
(1/1.4) = 0.71 |
2.13 |
| Circuit Board |
1 |
0.70 |
( 1/1.4) = 0.71 |
0.50 |
| Totals (λ T) |
|
7.76 |
MTBFTotal = 1 / λT = 1 / 7.76x10-6 = 128,866 hours
The product reliability is determined by multiplying the quantity of each part type by its failure rate, then adjusting the failure rate from GM to AIC environmental conditions. The failure rate results of the parts are then summed to determine the product failure rate.
Part Stress Analysis. The part stress analysis method is used in the detailed Design/Development phase when individual part level information and design stress data is available. The method requires the use of defined models that include electrical and mechanical stress factors, environmental factors, duty cycles, etc. Each of these factors must be known, or be capable of being determined, so that the effects of those stresses on the part's failure rate can be evaluated. Table 40 shows several major factors which influence device reliability.
Table 40. Major Influence Factors for Device Reliability
| Device Type |
Influence Factors |
|
Device Type |
Influence Factors |
| Integrated Circuits |
• Temperature • Complexity • Supply Voltage |
|
Capacitors |
• Temperature • Voltage • Type |
| Semiconductors |
• Temperature • Power Dissipation • Breakdown Voltage • Material |
|
Inductive Devices |
• Temperature • Current • Voltage • Insulation |
| Resistors |
• Temperature • Power Dissipation • Type |
|
Switches and Relays |
• Current • Contact Power • Type |
A typical empirical mathematical model is illustrated as follows (using a ceramic trimmer capacitor as an example):
λp = λb * πT * πC * πV * πQ *
πE
where,
λp = Trimmer capacitor failure rate
λb = Base failure rate (laboratory failure rate in the absence of dynamic stresses)
πT = Temperature factor
πC = Capacitance factor
πV = Voltage stress factor
πQ = Quality factor
πE = Environmental factor (accounts for dynamic stresses in the end-user environment)
A stress-temperature failure rate plot for this example is shown in Figure 27. As can be seen from the plot, the failure rate increases as the temperature goes up, or as the applied stress (voltage) increases.
Figure 27. Trimmer Ceramic Capacitor Failure Rates/Stress Plot (Click to Zoom)
Physics-of-Failure.
A physics-of-failure analysis looks at individual failure mechanisms such as electromigration, solder joint cracking, die bond adhesion, etc. to estimate the probability of device wearout within the useful life of the product. This analysis requires detailed knowledge of all material characteristics, geometries, and environmental conditions. Specific models for each failure mechanism are available from a variety of reference books. A typical model for bond pad/die shear fatigue is illustrated below, where the dependent coefficients are determined through the use of published manuals on material characteristics.
t50 = A2 (K2ΔT)n2 (1.2)
where,
t50 = Mean-time-to-failure (hrs.)
A2 = Pad material dependent coefficient
K2 = Die material dependent coefficient
n2 = Wire material dependent coefficient
ΔT = Temperature change at bond pad and die (°C)
3.21 Repair Strategy
3.21.1 Purpose. The purpose of having a repair strategy is to define how repairs will be made, who will perform the repairs, and when or where the repair will be made. From this information, design decisions impacting fault detection, remove and replace techniques, accessibility to internal components, spares, training and test equipment can be accomplished.
3.21.2 Benefits. A well designed product that includes ease of access, visible failure indicators, repair manuals, spare parts and other features to improve the restoration of the product to operation will lower repair costs and increase customer satisfaction.
3.21.3 Timing. Repair strategy starts at the Conceptual/Planning design phase, since this is the time when decisions on repairing the product, throwing the product away, appropriate levels of testability, and support criteria are made. Adding repair functions in the later stages of design or in production is not cost effective.
3.21.4 Application Guidelines. Each product has to be evaluated considering who the customer is, who will repair the item, what the conditions for repair are, etc. A general checklist is provided in Table 41.
Table 41. Repair Strategy Characteristics
| Criteria |
Characteristics |
|
| Support Features |
- Product self test
- Test equipment
- Accessibility
- Safety
- Repair personnel skill levels |
- Maintenance manuals
- Spare parts/products
- Product test
- Circuit test
- Component test |
|
Product Failure
Fixed By:
Manufacturer
Repair Shop
Customer |
• High cost, slow turnaround, spares required, sophisticated test equipment, high repair personnel skill levels
• Moderate cost, moderate turnaround, repair manuals needed, commercial test equipment, moderate repair personnel skill levels
• Low cost, fast turnaround, repair manuals needed, built-in test, easy accessibility, low skill levels |
Product Repair
Process:
Throw Away
Fix by
Component
Replacement
Fix by Manufacturer
|
• Decision based on product replacement cost, ease of product repair, availability of spare products; cost to the customer is the cost of the product
• Decision based on overall cost of product, accessibility for repair, availability of spare components, appropriate test equipment, degree of built-in test; cost to customer should be less than cost of the product
• Decision based on product self-test capabilities, appropriate test equipment, skill levels of repair personnel, product safety features; cost to customer is generally 30-70% of original product cost for each failure |
The next step following determination of the needed repair features is to design the appropriate features into the product. For example, if a product is to be repaired by a repair shop and the action is to fix a specific failure condition, then test equipment, maintenance manuals, spare parts and at least circuit level testability should be established as the design repair strategy.
Example of an Automotive Computer. Given that a computer is going to be used as the controller of the main fuel-air mixer, what repair strategies should be considered by the manufacturer? Because of the complexity of signals and operating procedures of the on-board computer, the repair should be performed at a qualified repair shop or by the manufacturer's representative. This means that maintenance manuals, test equipment compatibility, and circuit testability are all necessary. The product itself can either be a throw-away or a factory repair item, depending on the final cost of the product. Some of the internal component and circuit test features that could be considered are shown in Figure 28.

Figure 28. Hierarchical Testability
3.22 Sneak Circuit Analysis
3.22.1 Purpose. The purpose of the sneak circuit analysis is to find and fix each sneak failure cause in order to improve the product design. Since sneak circuits are hidden in the design, common failure prevention methods such as stress analysis, derating, redundancy, or environmental screening will have little impact. The only preventive measure to identify sneak circuits is an in-depth circuit analysis or use of a computer aided tool.
3.22.2 Benefits. Finding and correcting design flaws before selling or using a product will enhance customer satisfaction. In the past, because the cost of doing in-depth sneak circuit analysis manually was prohibitive, only critical circuits were analyzed. With the development of automated tools, all computer-aided designs can be checked almost as easily as a text document can be spell-checked. These tools increase the scope of application significantly. Specific benefits include:
- Detection of hidden failures
- Prevention of costly redesigns
- Verification of circuit interface integrity
- Ensuring high reliability
3.22.3 Timing. To maximize the benefit of a sneak circuit analysis on a product design, an automated design analysis should be performed as the computer aided design progresses through the product Design/Development phase. This procedure will allow the designer to correct flaws "on the fly" without significant schedule or cost impact. If manual analysis is proposed, performance of the sneak circuit analysis should be delayed until the product design is firm, so that only one analysis is performed (driven by the high cost of analyzing interim solutions).
3.22.4 Application Guidelines.
A preliminary step to performing sneak circuit analysis is the understanding of the definitions and the causes of sneak circuits. A brief summary of these follows:
- Sneak Circuit
- A condition which causes the occurrence of an unwanted function, or inhibits a desired function, even though all components function properly.
- Sneak Timing
- Unexpected interruption or enabling of a function due to a switching fault. Usually occurs within a single function timing plan, with little influence from unrelated functions.
- Sneak Paths
- Unintended control or power paths connecting product functions that enable functions to influence each other. Usually occurs between unrelated functions that are tied to common power, ground, or control mechanisms.
- Sneak Indications
- Incorrect or ambiguous specification of sensors that do not clearly define their purpose or methods of operation. Usually impacts a product through inaccurate measurements during product operation.
- Sneak Labels
- Incorrect or ambiguous documentation of designs or production drawings which leads to conflicting interpretations of their purpose. Usually impacts a product through production flaws.
- Sneak Clues
- Design rules, guidelines, and insights applied to topographical patterns by sneak circuit analysis specialists to identify potential sneak conditions. A sneak clue is often proprietary information that is constantly updated to account for new technologies and design methods.
- Topographical Patterns
- Forms used to model system networks that enable analysts to apply sneak clues used in performing a sneak circuit analysis.
Causes of Sneak Circuits
- Complexity of design
- Interfaces between distinct functions
- Inadequate understanding of the product design
- Integration of multiple units
- Design constraints (i.e., volume, weight, or power)
The first step is the selection of an appropriate analysis technique. Table 42 illustrates three types of common sneak circuit analyses.
Table 42. Sneak Circuit Analysis Techniques
| Type of Analysis |
Characteristics |
| Sneak Path: A methodical investigation of all possible circuit paths in an electrical/electronic product. |
Used primarily for detecting sneak circuits in hardware products and systems, such as power distribution, control, switching networks, and analog circuits. The analysis is based on known topological similarities of sneak circuits in these types of products. |
| Digital Sneak Circuit: An analysis of digital hardware networks for sneak conditions, operating modes, timing races, logical errors, and inconsistencies. |
Depending on product complexity, digital sneak analysis may involve the use of sneak path analysis techniques, manual or graphical analysis, computerized logic simulators, or computer- aided design circuit analysis. |
| Software Sneak Path: An adaptation of sneak path analysis to computer program coding logical flows. |
The technique used to analyze software logical flows by comparing their topologies to those containing known sneak path conditions. |
After selecting a technique, the second step in the application of a sneak circuit analysis is the understanding of topological patterns. These patterns are the key-stones of hardware and software analysis. Typical topographical patterns are illustrated in Figures 29 and 30.
Figure 29. Software Topographs (Click to Zoom)
The third step in the sneak circuit analysis is to transform the product schematic diagrams into network tree diagrams. Finally, the SCA will attempt to identify the basic topological patterns, as shown in Figures 29 and 30, within the network trees. If one is identified, then design corrections can be determined.
Figure 30. Hardware Topographs (Click to Zoom)
Example of a SCA Solution.Given a circuit that requires multiple power sources as a redundant feature, a sneak circuit can evolve when the power is switched. In Figure 31, the problem is shown in (A). If both power switches are simultaneously engaged, then excess power will be experienced at the load. Two solutions are provided. For direct current power supplies, isolation diodes can be added to prevent a power-to-power tie to common load as shown in (B). For the alternating current situation, the use of a single pole, double-throw relay having break-before-make contacts will solve the problem, as shown in (C).
Figure 31. Sneak Circuit Power Distribution Problem (Click to Zoom)
3.23 Worst Case Circuit Analysis (WCCA)
3.23.1 Purpose. A Worst Case Circuit Analysis (WCCA) technique is used during the design process to identify design problems, safety risks, end of life short-falls, and satisfactory performance levels for potential design impact and corrective action.
3.23.2 Benefits. The benefits associated with conducting a worst case circuit analysis are that it:
- Identifies parts exceeding derating limits
- Analyzes circuits for design faults
- Identifies components that may be overstressed
- Provides a realistic estimate of true worst case performance
- Provides information on possible life limiting conditions and components
- Exposes failures that may be safety risks
3.23.3 Timing. Due to the need for detailed information on the design, materials, parts and processes, this analysis technique is not recommended for application during the product Concept/Planning phase. The best times would be after the initial design review (early Design/Development) or just before the final review (late Design/Development). The farther along in the design and development phase that WCCA is performed, the more expensive it will be to introduce changes in the design.
3.23.4 Application Guidelines. There are several techniques for performing a WCCA, each with its own advantages and disadvantages. To perform an analysis quickly and accurately, a computer circuit analysis program that is compatible with computer-aided design tools has a decided advantage. Three techniques, the extreme value analysis, the root sum squared and the Monte Carlo analysis, are described.
Extreme Value Analysis. Extreme value analysis (EVA) analyses a circuit output with all variables set to the worst possible values. For example, the output frequency of a filter will vary as the parameters of its components vary away from nominal. To do an extreme value analysis, the worst expected values of each component, in both directions from nominal, must be obtained. The output frequency is then calculated with all the components at their extreme values in the direction which would increase the output frequency, and again with all components at their extreme values in the direction which would decrease the output frequency. The calculated values are then compared to the specified limits to evaluate the robustness of the circuit. If the frequency is within specified limits when the components are at extreme values, part variation should be no problem in normal operation.
Root-Sum-Squared. Root-Sum-Squared (RSS) analysis recognizes that it is rare for all parameters of a part to drift to extreme values. While some variation is biased in a single direction, other changes vary randomly in direction, sometimes helping to compensate for bias variations and sometimes adding to the bias. For example, the initial value of a capacitor will likely vary in a manner described by a normal curve whose mean is the nominal value. The extreme values of this distribution are ordinarily taken as the values at plus and minus three standard deviations from the mean value (the points between which 99.7% of the values will lie). In RSS analysis, the extreme value of each random variation is squared, these values added, and the square root taken of the total. The resulting value is the maximum variation expected due to random factors. This is added to the bias variations to calculate the maximum and minimum worst cases. The process is illustrated in Table 43.
Table 43. Root Sum Squared Analysis of a Capacitor
| Parameters: Capacitance |
Bias (%) |
Random (%) |
| Neg. |
Pos. |
| Initial Tolerance at 25°C |
-- |
-- |
20 |
| Low Temperature (-20°C) |
28 |
-- |
-- |
| High Temperature (+80°C) |
-- |
17 |
-- |
| Other Environments (Hard Vacuum) |
20 |
-- |
-- |
| Radiation (10KR, 1013 N/cm2) |
-- |
12 |
-- |
| Aging |
-- |
-- |
10 |
| TOTAL VARIATION |
48 |
29 |
√(20)2 + (10)2 = 22.4 |
The worst case minimum value of capacitance would be the nominal value minus the negative bias variations, minus the random variation, or:
| Worst case minimum |
= Nominal (1 - bias variation - random variation) |
| |
= Nominal (1 - .48 - .224) = Nominal (1 - .704). |
The worst case maximum would be the nominal value plus the positive bias variation, plus the random variation, or:
| Worst case maximum |
= Nominal (1 + bias variation + random variation) |
| |
= Nominal (1 + .29 + .224) = Nominal (1 + .514) |
Monte Carlo. Monte Carlo analysis requires a probability density function for all variations in parameters. Through random selection, values are assigned to each part in the circuit and the output parameter computed. This is repeated many times and the distribution of the results represents the expected distribution of circuits in the field.
Factors to be Evaluated. In the process of performing a WCCA analysis, each component type has associated parameters which exhibit sensitivity to stress conditions and contribute to overstressed component conditions. Table 44 shows some common component parameters that should be evaluated as part of a thorough WCCA.
Table 44. Typical Component Factors to be Evaluated
Integrated Circuits (Linear/Digital)
• Power Dissipation
• Applied Voltage (VCC)
• Common Mode Voltage
• Loading |
• Fan-In/Fan-Out
• Differential Input Voltage
• Min/Max Input Voltage |
|
Transistors
• Applied Voltage (Vce, Vbe)
• Base/Collector Current |
• Power Dissipation
• Forward/Reverse Bias |
|
Magnetic Components
• Max Induction Levels (Saturation)/Losses
• Reset Conditions/Drive Imbalance
• Winding-to-Winding Voltages
• "Hot Spot" Temperature |
Example of a Worst Case Circuit Analysis. For this example a simple voltage divider circuit as illustrated in Figure 32 is analyzed to determine if any overstress conditions can be expected.
Figure 32. Voltage Divider Circuit (Click to Zoom)
Figure 32. Voltage Divider Circuit
From the figure:
Vin (voltage applied) = +330 volts DC (direct current)
R1 = 100,000 ohms, 5%, RCR (carbon composition resistor), 1.5 watts
R2 = 1,000 ohms, 5%, RCR (carbon composition resistor), 0.25 watts
Table 45 shows the calculations for the worst case power dissipation for the resistors in the divider circuit.
Table 45. Calculation for Example Resistors Worst Case Power Dissipation
| Condition |
Equation |
Calculation |
Maximum power dissipation for R1 occurs with the maximum circuit current |
PR1= [Vmax / (R1min + R2min)]2•R1min
where,
PR1 = R1 max power dissipation
Vmax = maximum input voltage
R1min = R1 lower resistance rating
R2min = R2 lower resistance rating |
R1min = 100,000 - (.05)(100,000)
= 100,000 - 5,000 = 95,000
ohms
R2max = 1,000 - (.05)(1,000)
1,000 - 50 = 950 ohms
Vmax = 330 + (.10)(330)
= 330 + 33 = 363 volts DC
PR1 = [363 / (95,000 + 950)]2•95,000
= 1.36 watts |
Maximum power dissipation for R2 occurs with the maximumcircuit voltage |
PR2= [ Vmax / (R1min + R2max ]2•R2max
where,
PR2 = R2 max power dissipation
Vmax = maximum input voltage
R1min = R1 lower resistance rating
R2max = R2 upper resistance rating |
R1min = 100,000 - (.05)(100,000)
= 100,000 - 5,000 = 95,000
ohms
R2max = 1,000 + (.05)(1,000)
1,000 + 50 = 1,050 ohms
Vmax = 330 + (.10)(330)
= 330 + 33 = 363 volts DC
PR2 = [363 / (95,000 + 1,050)]2•1,050
= 0.015 watts |
Conclusions. Given an R1 nominal power rating of 1.5 watts and a derating factor of 0.8 (reference Reliability Toolkit: Commercial Practices Edition), the worst case power (1.36 watts) exceeds the derating standard of 1.2 watts. Given an R2 normal power rating of 0.25 watts and a derating factor of 0.8, the worst case power (0.015 watts) is less than the expected derated value of 0.2 watts. So, resistor R1 is considered to be overstressed and resistor R2 is good for operation under worst case conditions.
3.24 Test Strategy
3.24.1 Purpose. A test strategy is the plan for performing measurements that add value to a particular program. It may call for analysis in lieu of some testing, the performance of specific tests, and combinations of analysis and test measurements. Test strategy includes all testing done on a product, but this discussion will be limited to test strategy as it impacts inherent product design reliability.
3.24.2 Benefits. A test strategy is intended to verify the achievement of product goals, determine shortcomings needing corrective action, and identify opportunities for improvement. A product specific test strategy is needed to assure performance adequacy and to avoid unnecessary expenses. Part of the test strategy should include elements of test that will aid the design reliability of the product through proper part selection, or understanding design trade-off opportunities. Testing for design reliability is not a trivial expense. On the other hand, should corrective action be needed, a timely DOE or accelerated life test can make the difference between an economical fix and one that is expensive or impossible. Program budgets and schedules cannot ignore test costs. Hence, a test strategy must be an integral part of program planning and management.
3.24.3 Timing. Initial program planning during the product Concept/Planning phase must include a test strategy, particularly for those elements of test that will support the product design. As the program progresses, changes in the program (e.g. a decision to develop an item rather than buy it off-the-shelf) should be echoed by changes in the test strategy. In turn, changes in the test strategy should be echoed by changes in the program budget and schedule. A test strategy, then, is needed at the start of a project, and may be subject to change as the product evolves through Design/Development. Every program review should include a conscious decision to keep or revise the test strategy.
3.24.4 Application Guidelines. The matrix of Table 46 relates program and product circumstances to their expected impact on the test techniques applied to the product during the early design stages. A "plus" sign (+) indicates that the test offers value to the program under that circumstance. A "minus" sign (-) means that the test is probably not cost effective for that circumstance. A "question mark" (?) indicates that the test may or may not add value for that circumstance, depending on the product. The circumstances considered are New Development (i.e., a product to be designed and built for the first time), COTS (an item available as a Commercial Off the Shelf product), Safety Critical (e.g., a nuclear plant control system), Dormancy (i.e., an item to be subjected to long periods in storage or otherwise unpowered), Long Life (i.e., an item likely to be in service for a relatively long time, like a commercial passenger aircraft), Harsh Environment (e.g., high shock, rapid thermal cycling, et. al.), and S/W (Software) development. The user should first decide whether he or she agrees with the relationships and what weights should be put on them for the product under consideration. This requires a familiarity with the methods. Also, other considerations not in the matrix should be identified and considered. These might include suppliers'
reputations, the leverage the producer has with suppliers, the relative importance of reliability and program cost and competitive factors, the customer's expectations, etc.
Table 46. Test Techniques for Design Reliability
Reliability Test Technique |
Program/Product Circumstances |
New Dev. |
COTS |
Safety Critical |
Dormancy |
Long Life |
Harsh Env. |
S/W Dev. |
| Accelerated Life Tests |
? |
? |
+ |
? |
+ |
- |
- |
| Design of Experiment |
+ |
- |
+ |
- |
- |
+ |
- |
| Growth Test |
+ |
- |
+ |
? |
? |
+ |
+ |
Example of Test Strategy for Design. A new communication project for an unsheltered operating environment is under development and is utilizing off-the-shelf components with some new technology. What test strategy is appropriate during the design phase?
One possible test strategy for design reliability could include preparation of a plan for accelerated testing to determine the reliability impact of using "new" technology parts. For possible integration problems of new and old technologies, a design of experiments test program could be developed to shorten the final testing times.
3.25 Accelerated Life Testing
3.25.1 Purpose. The purpose of Accelerated Testing is to determine or verify product performance in an expedient manner by using a variety of high environmental or electrical stress levels, singularly or in combination, with the purpose of determining the expected life span of a part or product in a shortened test time.
3.25.2 Benefits. The major reason for performing accelerated life tests is to reduce product test time, resulting in schedule and cost benefits. This type of testing often identifies design and manufacturing deficiencies, exposes dominant failure mechanisms and quantifies the relationship between stress and performance. Precautionary notes are that high stress levels may damage the product; precipitated failures may not represent normal use failures; and translating the stress data to normal conditions may be difficult due to overlapping, multiple failure modes.
3.25.3 Timing. Accelerated testing can be performed at any phase of a product development, provided the hardware is available. The Concept and Planning phase is the best time for accelerated testing as alternate design concepts, part types and material technologies can be considered before design or manufacturing processes are solidified. Testing in the Design/Development phases should be limited to resolving component deficiency problems.
3.25.4 Application Guidelines. There are many accelerated test concepts, some targeted to very specific technologies, others developed for broader applications. Constant stress testing is commonly defined by one or more stress factors, such as temperature, vibration, voltage, humidity, etc., at specific stress levels. The stress levels are predetermined and are usually well above the normal operating conditions for the product. The test items are divided into groups, one group for each stress level. For example, if temperature was the stress for an integrated circuit test, two or more groups could be tested at 150°C and 200°C given the maximum normal operating condition was 50°C. The test groups are operated under the defined stress condition for a predetermined time, usually governed by the program budget. Step stress testing differs in that the test items are exposed to progressively higher stress levels in a sequential manner. The test program starts near the upper limit of the normal operating environment with all units tested together at the same stress level. After a planned interval of time, the stress is increased to the next level. The stepping procedure is continued until all test units have failed or the planned number of steps has been performed.
A typical accelerated test program would include:
Planning: The planning aspect of the accelerated testing is very important in determining what values to measure, what stress conditions to apply and what stress levels to use. Some factors that should be kept in mind during the test planning are:
- Test units should be identical to those considered for the final product
- Only the accelerating stress should be applied; other factors should be held constant
- Stress levels should be defined such that the precipitated failure modes are the same as the normal operating conditions
- Accelerated stress levels should not exceed maximum manufacturer rated design limits
Designing: In order to develop accurate and legitimate accelerated test models, the stress levels must be near or overlap the normal operating range. By having overlapping envelopes, extrapolations of test reliability results can be performed using empirical stress models as opposed to theoretical models. An example of overlapping is shown in Figure 33.
Modeling: A number of models can be considered in evaluating accelerated testing results. Some of the more widely used are:
- Arrhenius Model - Used for electronics, this model predicts exponential increases in the rate of a given reaction with temperature.
- Eyring Model - This model also determines the relationship of temperature as the accelerating parameter for an exponential life distribution.
- Inverse Power Law Model - Used for non-thermal accelerating stresses where the underlying life distribution is Weibull.
Figure 33. Overlapping Stress Environments (Click to Zoom)
Table 47 illustrates two different methods for analyzing the results of accelerated tests.
Table 47. Two Methods for Analyzing Accelerated Test Data
| Characteristics |
Steps
|
| Probability Plot |
- Operational performance
(e.g., time before failure) of nearly all electronic and electromechanical products can be described by either the lognormal or Weibull probability density functions (pdf).
- The pdf describes how the percentage of failures is distributed as a function of operating time.
|
- Rank the failure times from first to last for each level of test stress (non-failed test unit times are at the end of the list).
- For each failure time, rank i, calculate its plotting position:
n ; n = total number of items on test at that level
- Plot P versus the failure time for each failure at each stress level on appropriate graph paper (i.e., logarithmic or Weibull).
- Visually plot lines through each set (level of stress) of points. Lines should be plotted in parallel, weighting the tendency of the data set with the most failures heaviest.
- If lines do not plot reasonably parallel, investigate failure modes.
|
| Relationship Plot
|
- Constructed on an axis that describes unit performance as a function of stress.
- Two of the most commonly assumed relationships are inverse power and Arrhenius.
|
- On a representative scaled graph (e.g., Arrhenius paper), plot the 50% points determined from the probability plot for each test stress.
- Plot a single line through the 50% points, projecting beyond the upper and lower points.
- Locate the intersection of the plotted line and the normal stress value.
This point, read from the time axis, represents the time at which 50% of the units will fail while operating under normal conditions.
- Plot the time determined in Step 3 on the probability plot, drawing a line through this point parallel to the one previously drawn.
- The resulting line represents the distribution of failures as they occur at normal levels of stress.
|
Example of Probability and Relationship Plots.Consider an electronic device life test that demonstrates an Arrhenius performance/stress relationship that fails lognormally at any given level of stress. Engineers wish to determine the device reliability (MTBF) at 90°C (maximum operating temperature) as a design goal. There are 20 products available for test.
After reviewing the design and considering the potential failure modes, the engineers concluded that the products could survive at temperatures in excess of 230°C without damage. The engineers did, however, estimate that non-regular failure modes will be precipitated above this temperature. Therefore, 230°C was established as the maximum test level, with 150°C and 180°C as interim constant stress levels. The test units were allocated to three test levels and run for 1000 hours. The resulting failure times are shown in Table 48.
Table 48. Test Results
| 9 Units @ 150°C |
7 Units @ 180°C |
4 Units @ 230°C |
Time to
Failure
(Hrs.) |
Rank |
Cum.
Failure
% |
Time to
Failure
(Hrs.) |
Rank |
Cum.
Failure
% |
Time to
Failure
(Hrs.) |
Rank |
Cum.
Failure
% |
567
688
750
840
910
999
---
---
*--- |
1
2
3
4
5
6
7
8
9 |
5.5
16.6
27.7
38.8
50.0
61.1
---
---
--- |
417
498
568
620
700
770
863 |
1
2
3
4
5
6
7 |
7.1
21.4
35.7
50.0
64.3
78.6
92.9 |
230
290
350
410 |
1
2
3
4 |
12.5
37.5
62.5
87.5 |
*Unit still operating at 1000 hours
From Table 38, the time to failure and cumulative failure percent values are plotted and lines are drawn through the points as shown in Figure 34. The median (50% cumulative failures) values for the three test temperature lines are determined and are plotted on Figure 35. A line, A--A, is established for these median points. To determine the median value for a 90 degree Celsius test, a horizontal line is constructed from the 90 degree temperature point to the intersection of the A--A line. At this intersection, indicated by an arrow, the median life of 3500 hours is determined. This value (3500 @ 90 degrees Celsius) is then replotted on Figure 34 as indicated by another arrow. A parallel line (B--B), to the other data lines is constructed through this median point. From this line, it can be determined that 10% of the test sample will fail by 2200 hours and 90% will fail at 5000 hours, as indicated by the Xs on the graph.
Figure 34. Probability Plot (Lognormal) (Click to Zoom)
Figure 35. Relationship Plot (Arrhenius) (Click to Zoom)
SECTION FOUR - REFERENCES
The references in Table 49 provide additional information on the subjects discussed in this Blueprint. The relationship between the reference and sections within the Blueprint are indicated in the table for each source.
Excerpt from "Table 49. References for Designing for Reliability" See Full Version
|
|
|
Journal Article V9, N4
Statistical Analysis of Reliability Data, Part 3: On Statistical Modeling of Reliability Data
Journal Article V9, N3
Statistical Analysis of Reliability Data, Part 2: On Estimation and Testing
Journal Article V9, N2
Non-Normal Distributions in the Real World
START 2002-1
Application of the Poisson Distribution
Journal Article V12, N1
Multivariable Testing (MVT)
START 2004-3
Censored Data
START 2004-1
Combining Data
START 2002-6
Empirical Assessment of Normal and Lognormal Distribution Assumptions
START 2003-3
Empirical Assessment of Weibull Distribution
START 2002-5
Graphical Comparisons of Two Populations
START 2003-6
Kolmogorov-Simirnov: A Goodness of Fit Test for Small Samples
START 2003-7
Reliability Estimations for the Exponential Life
START 2002-2
Statistical Assumptions of an Exponential Distribution
START 2002-4
Statistical Confidence
START 2003-4
The Chi-Square: a Large-Sample Goodness of Fit Test
Journal Article V8, N4
Tutorial: Test Risks, Confidence and OC Curves
Journal Article V10, N2
Statistics - A Reliability Engineer's Tool, Not Reliability Engineering
Journal Article V14, N1
Information Management for Systems Design for RMQSI
START 2004-2
The RMQSI Case - A Reasoned, Auditable Argument Supporting the Contention that a System Satisfies...
Journal Article V11, N2
Methods for Reducing the Cost to Maintain a Fleet of Repairable System
Journal Article V7, N4
Engineering Information Assurance into Information Systems
START 00-1
Sustained Maintenance Planning
Journal Article V13, N1
Form, Fit, Function, and Interface - An Element of an Open System Strategy
START 01-1
MicroElectroMechanical Systems (MEMS)
START 95-1
Plastic Encapsulated Microcircuits (PEM's)
Journal Article V8, N4
Tutorial: Test Risks, Confidence and OC Curves
|