| Event | Actions | Items to be Addressed |
|---|---|---|
Failure Observation |
• Identify that a failure incident has occurred • Notify all cognizant personnel of the incident |
• Operational conditions which resulted in the incident should be maintained until all required personnel have observed the failure |
| Failure Documentation | • Record all pertinent data relating to conditions surrounding the failure incident | • Pertinent data includes a clear description of the failure incident, supporting data and equipment operating hours |
| Failure Verification | • If failure is permanent, verify incident by repeating tests which identified failure • If failure is recoverable, verify incident by recreating conditions under which it occurred • If failure cannot be verified, monitor product closely for reoccurrence of failures |
• Repeating tests to verify failures helps to differentiate between hard failures and those caused by operator or procedural errors • Permanent test failures can be caused by failure of the product under test, or by the failure of associated test equipment • Unverified failures may result from human or procedural errors, but may also reflect a true intermittent failure |
| Failure Isolation | • For verified failures, perform testing and troubleshooting to isolate the cause of the incident | • Failure isolation can pertain to a defective part or assembly within the product being tested, or can relate the incident to external factors (operator error, test equipment malfunction, improper procedures, etc.) |
| Suspect Item Replacement | • For verified product failures, replace the suspect part or assembly with a known good item • Recreate the conditions causing failure and tests detecting the failure to confirm suspect item replacement • If failure repeats, repeat failure isolation activity to determine correct cause of incident |
• The end product, once proven to be functional after suspect item replacement, may proceed through its manufacturing process • The replaced part or assembly should be "tagged" for repair. The "tag" should include documentation of all information relevant to the incident. It should also allow for documentation of subsequent failure analysis and corrective action on the suspect part or assembly. |
| Suspect Item Verification | • Verify the failure of the part or assembly independent of the product • If a failure cannot be verified, review previous failure verification and isolation actions to ensure that the proper part or assembly has been replaced |
• Isolation of the failure to increasingly lower levels of hardware, software or a process is critical in determination of a root failure cause • Inability to verify the failure may occur due to (1) inconsistent test parameters between test stations or product and part/assembly, design requirements (2) functional ambiguities which cause incorrect fault isolation, or (3) defective or intermittent connections at the part/assembly or assembly/product interface |
| Data Search | • In parallel with failure analysis activities, search the databases for failure history on identical or similar parts/assemblies and products • Evaluate product failure trends for patterns |
• Failure trends of parts, assemblies and/or products may relate to bad lots of parts, operator-induced assembly defects, etc. • Searches outside the databases (other databases, technical literature and reports), may identify identical part problems experienced by others |
| Failure Analysis | • Determine from data search results and suspect item replacement, how extensive the failure analysis should be (destructive vs. non-destructive) • Perform the required analysis to a level low enough to determine the root failure cause |
• Different products and situations will require different levels of failure analysis. Determining factors should include:
• Failure analysis results should document specific failure causes • As data become available, failure reports should document all information relevant to the failure:
|
| Establish Root Cause | • Determine the initial event which was the direct cause of the failure (the cause of the overstress condition; manufacturing defect; adverse environmental condition; operator or procedural error; induced failure; or part/assembly failure mode) | • Root cause analysis places greater emphasis on failure prevention • Root cause analysis relies on an understanding of the physics of failure (for hardware) or the initial incident which precipitates the product/process failure |
| Determine Corrective Action | • Based on the failure analysis and root cause, develop a corrective action which will prevent the failure from reoccurring • Document the corrective action and communicate it through the organization |
• Corrective actions must emphasize long-term solutions rather than band-aids in order to be effective. They must directly address and correct the root cause. • Corrective actions can include product redesign; improvements in processes or procedures; selection of different parts or suppliers; or retraining of personnel |
| Incorporate Corrective Action | • Incorporate the identified corrective action in the failed product (or process) as a minimum, pending verification of its effectiveness | • Delays in the incorporation of a corrective action means additional defective products or process outputs will be generated • Large-scale incorporation of a corrective action should not occur until after the corrective action has been verified • Timing of the corrective action implementation depends on the degree of confidence that it will eliminate failure reoccurrence |
| Operational Performance Test | • Perform baseline tests (following incorporation of the corrective action) to verify proper performance under static conditions • Perform operational tests, (including conditions under which the original failure occurred) to verify proper performance under dynamic conditions • Document results from all operational performance testing and compare to pre-failure test results for potential shifts in baseline data |
• Sufficient testing should be performed under normal or accelerated stress conditions to provide a high degree of confidence that the original failure incident has been addressed and the reoccurring failure mode has been eliminated • Subsequent failure of parts and assemblies not related to the implemented corrective actions should be considered new failure incidents |
| Determine Effectiveness of Corrective Action | • Verify that the corrective action has (1) corrected the original failure incident and (2) not introduced other failures or degraded performance below acceptable levels • If the original failure incident reoccurs, repeat the analysis process to determine the correct root cause |
• A corrective action is not effective if it introduces other failures or degrades product/process performance to unacceptable levels • A corrective action is not effective if operational testing has not been applied to ensure a reasonable level of confidence that the failure has been eliminated • Effectiveness should be tracked through future production and fielded product history |
| Incorporate Corrective Action Into All Products | • Expand the implementation of the proven corrective action into the general product or process flow • Track, document and report future failure incidents that could indicate degradation or failure of the corrective action effectiveness • If the original failure incident reoccurs, repeat the analysis process to determine the correct root cause |
• Where corrective actions involve changes to procedures, training of personnel, or modifications to a process, they should be tracked to ensure that "old habits" don't eventually degrade their effectiveness • For design-related corrective actions, they should be tracked to ensure that corrective actions for different future failure incidents do not degrade the effectiveness of the initial corrective action |