Developing Reliability Goals/Requirements
About the RIAC Blueprints
The RIAC "Blueprints for Product Reliability" are a series of documents published by the Reliability Information Analysis Center (RIAC) to provide insight into, and guidance in applying, sound reliability practices. The RIAC is the Information Analysis Center chartered to be a centralized source of data, information and expertise in the subjects of reliability, maintainability and quality. While sponsored by the US Department of Defense (DoD), RIAC's charter addresses both military and commercial communities with the requirement to disseminate guidance information in these subjects. The Blueprints serve to provide informa- tion on those approaches to planning and implementing effective reliability programs based on experience, lessons learned, and state-of-the-art techniques. To make the Blueprints as useful as possible, the approaches and procedures are based on the best practices used by commercial industry and on the concepts documented in many of the now-rescinded military standards. The tree shown in Figure 1 depicts the Blueprints that make up the series (the shaded second tier box indicates this Blueprint).
Figure 1. RIAC Blueprints for Product Reliability (Click to Zoom)
In the government sector, and in particular the DoD, significant changes have been made regarding the acquisition of new products. Previously, by imposing stan- dards and specifications, a DoD customer would require contractors to use certain analytical tools and methods, perform spe- cific tests in a prescribed manner, use com- ponents from an approved list, and so forth. Current policy emphasizes the use of com- mercial technology as well as specifying "performance-based" requirements only, with suppliers left to determine how to best achieve them.
Users of the RIAC Blueprints
The Blueprints are designed for use in both the government and private sectors. They address products ranging from completely new commercial consumer products to highly specialized military systems. The documents are written in a style that is easy to understand and implement whether the reader is a manager, design engineer or reliability specialist. In keeping with the new philosophy of the DoD, which is now similar to that of the private sector, the Blueprints do not provide a cookbook of reliability tasks that should be applied in every situation. Instead, some general principles are cited as the underpinnings of a sound reliability program. Then, many of the tasks and activities that support each principle are highlighted in detail sufficient for the user to determine if a task or activ- ity is appropriate to his or her situation.
SECTION ONE - INTRODUCTION
1.1 Purpose and Scope
This Blueprint provides insight into and describes the process of developing reliability goals and requirements for products and systems.
Reliability is traditionally considered to be a performance attribute that is concerned with the probability of success and frequency of failures, and is defined as:
||The probability that an item will perform its intended function understated conditions, for either a specified interval or over its useful life.
Reliability is a measure of a product's performance that affects both product function and operating and repair costs. Too often performance is thought of only in terms of speed, capacity, range, and other "normal" measures. If, however, a product has such poor reliability that it is seldom available for use, these other performance measures become meaningless. Reliability is also critical to safety and liability issues.
1.2 Need to Establish and Allocate Reliability Requirements
Establishing reliability requirements and goals is an essential reliability task. Without understanding what is required, undesirable results may occur: the product will not satisfy the customer or resources will be wasted (i.e., overdesign). While defining a reliability program gives direction to subsequent efforts, developing the reliability goals and requirements scopes the magnitude of those efforts.
1.3 Source of Requirements/Goals
Sometimes, customers may explicitly define the reliability requirements to the supplier. In these cases, the supplier must first determine if the requirements are realistic and achievable and then usually translate them to design specifications. Often, especially in the case of consumer products, customers do not explicitly specify the reliability requirements. Instead, suppliers may themselves need to establish the requirements using a variety of techniques that will be discussed.
1.4 Timing of Reliability Requirement Development Activities
Each product, from the simplest to the most complex, passes through a sequence of phases during its life cycle. The definitions of the phases vary among commercial companies, and within the military. Table 1 describes the sequence of general phases that will be used in this document to describe a product's life.
Table 1. Product Life Cycle Phases
- Formulate ideas, estimate resources and financial needs
- Identify risks & requirements
- Program objective
- Identify and allocate needs and requirements
- Propose alternate approaches
- Design and test the product
- Develop manufacturing, operating, and repair/maintenance tasks
- Refine and implement manufacturing procedures
- Finalize production equipment
- Establish quality processes
- Build & distribute the product
- Implement operating, installation and training procedures
- Provide repair and maintenance service
- Repair warranty items
- Provide for performance feedback
- Implement refurbish- ment and disposal tasks
- Resolve potential wearout issues
What sometimes distinguishes one phase from the next is a decision milestone, sometimes referred to as a "gate." It represents a point in time where the program can go forward or stop. For many products, the phases may be abbreviated or combined. For example, the Concept/Planning and Design/Development phases may be combined under a compressed schedule for a new product that is simply an update or slightly modified version of an older, proven product. Reliability tasks for this type of program would concentrate only on the differences between the old and the modified product. As a result, the number of engineering tasks would be reduced. It is important to understand that tasks performed in one phase are often the result of the analysis, trade-offs and planning performed in an earlier phase. For example, trade-offs addressing approaches to manufacturing printed circuit boards would be performed during Design/Development, with the implementation of the process decision to follow during the Production/Manufacturing phase.
The customer or the supplier must define the product-level requirements before the product Design/Development phase begins. Requirements may be stated initially as "goals" in the Concept/Planning phase of the product development. Based on the results of early research and development, the initial goals will evolve into product-level design requirements before product development begins. Allocation of product-level requirements to a level meaningful to the design and manufacturing process engineers should be done after a product reliability model has been developed and before design efforts at that level begin. This may be at the subsystem, equipment, or assembly level. Figure 2 illustrates the relative timing of the four major activities related to requirements development.
Figure 2. Timing of Requirements Development Activities (Click to Zoom)
SECTION TWO - BASIC CONCEPTS OF RELIABILITY REQUIREMENTS
2.1 Requirements Versus Goals
A requirement is the minimum level of performance acceptable to, or expected by, the customer. If the level of reliability represented by the requirement is less than the minimum acceptable, customers either will not buy the product or will buy it because they are unaware that the reliability is not "good". Customers in the latter category may not buy products again from that supplier.
In contrast to a requirement, a goal is usually some level of reliability greater than that required which may result in one or more of the following benefits:
Early in the development of a new product, goals may be used as a starting point to challenge the development team to make use of innovative and new design approaches (such goals are sometimes called stretch targets). As development progresses, these goals evolve into firm requirements. Firm requirements should meet or exceed the customer's needs and expectations, ensure the supplier of a healthy market share, and result in a fair profit.
- a greater market share (reliability becomes a discriminator among products)
- lower life cycle costs for the customer
- lower supplier costs (fewer returns and lower warranty cost)
- less risk of liability and litigation (safety issues)
2.2 Requirements Versus Specifications
The terms specification and requirement are often used interchangeably. Requirements can include many non-design issues, such as reporting requirements, cost, and schedule. Within this Blueprint, only requirements suitable for inclusion in a specification are addressed. Specifications should only address the performance characteristics of a product. To emphasize this philosophy, specifications are sometimes called
performance-based requirements, or performance specifications.
2.3 Performance Specifications
A performance specification is one that describes to a supplier the product that is desired by the customer. It tells what level of performance is needed but does not tell the supplier how to meet the required level of performance (i.e., what tasks should be done, not how the tasks are to be accomplished). Table 2, extracted from the "Defense Standardization Program Performance Specification Guide (SD-15)", provides examples of performance requirements.
Table 2. Examples of Performance Requirements
|Examples of Performance Requirements
|The circuit breaker shall not trip when subjected to the class 1, type A, shock test specified in MIL-S-901.
||States required results.
|The binocular eyepiece shall operate at altitudes up to 10,000 feet above sea level.
||Defines the operational environment.
|The detector shall not contain foreign matter such as dust, dirt, fingerprints, or moisture that can be detected by visual examination.
||Provides criteria for verifying compliance (assuming that foreign matter affects detector performance).
|The tank shall traverse the Aberdeen Providing Ground Terrain Profile Course at all speeds up to 30 MPH.
||Provides criteria for verifying compliance.
|Fluid seals and bearings shall provide no less than 5 years use without replacement.
||States required results.
|The molybdenum disulfide content shall not be greater than 5 percent.
||States required result.
|The shoes shall be of the following standard men's sizes: 9, 9-1/2, 10, 10- 1/2, 11, 11-1/2, 12, 12-1/2, 13.
||Provides interface requirement.
|The equipment shall withstand, without damage, temperatures ranging from -46°C to +71°C.
||Defines operational environment.
|During the accuracy check conducted 18 hours into the second cycle of the humidity test, the accuracy of the indicator shall be within plus or minus 2 percent of each pre-cycle reference measurement point.
||Provides criteria for verifying compliance.
|The tractor shall be capable of utilizing contractor supplied attachments for standard category 1, 3 points mounting, front and rear.
||Provides interface requirement.
|All parts shall be capable of passing the solderability tests in accordance with MIL-STD-883, test method 2003, on delivery.
||Provides criteria for verifying compliance.
|The Standard Evaluation Circuit (SEC) shall demonstrate the operating temperature range (case, ambient, or junction) capability of the technology being offered.
||Defines the operational environment.
Parts shall be marked with the following information:
The marking shall remain legible when subjected to the resistance to solvents testing of MIL-STD-883, method 2015.
- Manufacturer's name.
- Source control number.
- Inspection lot identification.
|States required results.
|Packaging shall prevent mechanical damage of the devices during shipping and handling and shall not be detrimental to the devices.
||States required results.
A performance specification for a new audio amplifier based on an assessment of the marketplace might include:
This specification does not dictate the types of parts to be used, the types of reliability analyses required, or how soldering of circuit boards should be accomplished. This type of specification is common in the commercial world. Of course, the reliability requirements for a turbine engine or commercial aircraft are not as simple as those for an audio amplifier. Nevertheless, the objective is to allow the designer maximum flexibility and innovation in developing a design solution to meet the customer's needs. Note, however, that a failure-free period is not a meaningful reliability design requirement. The difference between reliability performance requirements and design requirements is addressed in Section 3.3.
- Continuous power output into 8 ohms, 50 W per channel
- Dynamic headroom: 2.5 db
- Total distortion: < 0.1% at maximum rated power
- Frequency response: 20 to 20,000 Hz, ±3 db
- Failure free warranty period: 1 year (all shipping costs, labor, and parts for failures occurring within one year of purchase are free)
- Maximum weight: 20 lb.
- FCC Class B certified
2.4 Verifiable Requirements
All requirements should be verifiable. Analysis, test, and inspection are the three methods used for verification of requirements. The first two are most applicable to reliability. Ideally, all reliability requirements should be verified by test. Consider the example of the audio amplifier. The supplier needs a way of determining the reliability of the amplifier in order to ensure that the customer will be satisfied and to set a selling price that will cover any warranty costs. The supplier might assume that the customer will not operate the amplifier more than 20 hours a week, or a total of 1,040 hours in a year. The best way to verify reliability is to test. Various techniques can be used where the product is subjected to simulated environmental and use conditions while failures are detected and logged. A pass-fail criteria is pre-established based on a desired level of confidence in the test results.
For other products, testing may not be a feasible method of verifying the reliability requirement. For example, the required life of a commercial satellite might be 10 years. It would not be practical to test one satellite for years or to build several test articles to test for some shorter period of time. Instead, it might be necessary to conduct testing at assembly or component levels and to use analysis to determine if the satellite can meet the 10-year life requirement.
2.5 Definition of Failure
Suppliers and customers must have a common understanding of what constitutes a
failure. Lacking such a definition, discussions on reliability become meaningless. For example, if an electronic scoreboard loses one segment of a digit but the score is still discernible, has the scoreboard "failed"? Unless this occurrence has been addressed in the specification, the question may not be satisfactorily or efficiently resolved.
Hardware and firmware are usually considered to have two basic types of failures: hard
failures and soft failures. A hard failure is one in which the product becomes completely inoperative or experiences a gross change in functional characteristics. Examples include a shorted resistor, a stuck relay, or a broken switch. Soft failures are of two kinds: failures which are intermittent and failures that degrade but do not prevent product operation. Examples of the former include switch bouncing and poor connections. The latter type result from out-of-tolerance conditions caused by deterioration, drift, and wearout. Examples include drifting of resistor values, corrosion of electrical contacts, and wearout of seals and valves. If the cause of a soft failure is not removed, a hard failure ultimately results.
Software is also usually considered to have two basic types of failure: transient
(comparable to a soft failure of hardware) and catastrophic (comparable to a hard failure of hardware). A transient failure is one that results in a restart or reload for microprocessor-based systems, subsystems, or individual units and may or may not require further correction. A catastrophic failure results in a manual or remote unit restart or a software program patch.
2.6 Performance (Operational) Reliability
Performance (or operational) reliability requirements are those that customers explicitly or implicitly use to judge the reliability of products they buy. Table 3 lists some examples of the way in which customers "measure" reliability.
Table 3. Example Measures of Performance Reliability
||Measures of Performance
||Frequency of repair
||Availability and safety
Series and Product Reliability. Generally, a customer measures the reliability of a product from two perspectives because reliability affects both function and operating and repair/replacement costs. When discussed in light of its effect on operating and repair/replacement costs, we refer to series or basic reliability, and in referring to its effect on the function, product or mission reliability. Table 4 explains the differences between these two aspects of reliability.
Table 4. Series (Basic) and Product (Mission) Reliability Characteristics
|Series (Basic) Reliability
||Product (Mission) Reliability
|Measure of product's ability to operate without repair or adjustment
||Measure of product's ability to perform and complete its function
|Recognizes effects of all occurrences that demand support without regard to effect on function or mission
||Considers only failures that cause loss of function
|Degraded by redundancy
||Improved by redundancy
|Can be equal to, but is usually lower than, functional reliability
||Usually higher than basic reliability
Performance reliability requirements can be expressed in a variety of forms that include combinations of product and series reliability, or they may combine reliability with maintainability in the form of availability.
Quantitative Performance Reliability Requirements. Many measures of performance reliability are used by various industries, so it is impractical to list even a representative number. For the manufacturer, it is important to note that customers will use measures that best suit their intended use of the product. Some parameters used to quantify operational reliability are:
When stating a required performance reliability, the conditions in which the product will be operated, stored, shipped, and maintained should be stipulated. For example, if a product is designed to operate between 45°F and 110°F, it will most likely have lower reliability if operated at 150°F. This is not a valid indication of the product reliability, however, because the product was operated in a thermal environment for which it was not designed. Major conditions that affect performance reliability are the way the customer uses the product, its end use environment and the product repair/replacement strategy and implementation.
- Mean Time Between Maintenance (MTBM). MTBM is defined as the mean time between maintenance events. When not otherwise indicated, all maintenance events, whether preventive (scheduled) or corrective (unscheduled), are counted regardless of the cause of the event (i.e., actual failure, induced failure, incorrect indication of failure, scheduled inspection, etc.). Subsets of MTBM include the mean time between corrective maintenance, mean time between preventive maintenance, etc. Time can be measured in hours, days, operating hours, or any other convenient increment. Some industries may consider mean miles between maintenance (trucking companies) or mean cycles between maintenance (hydraulic lifts) to be a more meaningful performance requirement.
- Mean Time Between Service Calls (MTBSC). In this measure of performance reliability, the events being tracked are service calls. MTBSC is frequently used by suppliers who make copiers, major home appliances, and communications systems, and who typically provide repair/ replacement support for their products. Time is usually measured in days, months, or some increment of calendar time, but can also be measured in operating hours, cycles, copies made, etc. MTBSC can include events not related to reliability. For example, assume the product is a copy machine and the customer has complained that it will not operate at all. A service call will be recorded, even if it is discovered that a service fuse has blown, turning off power at the electrical outlet (i.e., no failure of the copier).
- Schedule Reliability. Schedule reliability is used by airlines as a measure of
availability. The event tracked is an aircraft departing (moving back from) the gate. It is usually defined as the percent of scheduled flights departing within a pre-defined number of minutes after the scheduled time. Schedule reliability does not give an accurate estimate of the aircraft inherent design reliability because failures that can be repaired quickly enough to allow the aircraft to depart within the allotted time are ignored.
- Warranty Returns. Related to MTBSC, the number of warranty returns is a measure used by some suppliers as an indication of product reliability. The event being tracked is a return of the product under warranty. Usually the number of returns in a given period of time is used, but it could also be based on the number of returns per total number of products sold, or some other criteria. Since a product may have failed for a variety of reasons, including misuse or abuse, the number of warranty returns may not give a true picture of the product's inherent reliability.
How and where the product is operated and used must be considered in establishing realistic requirements. A power supply for a home sound system is used differently than one for a professional recording studio. The latter most likely will be used more often and operated for longer periods of time. Neither of these power supplies will be subjected to the shock, vibration, and extremes of temperature encountered by a power supply installed in a fighter aircraft. Depending on the specific technologies used, product reliability is affected by temperature (absolute and rate of change), vibration, shock, humidity, and other factors characteristic of the environment in which it is used.
The opportunity for induced failures, from exposure to the elements and due to maintenance error, varies depending on the support concept and environment. The support concept describes how the product will be maintained and supported during use. For some products, no repair or replacement is performed by the customer. Instead, whenever the product requires repair, it will be sent back to the supplier. Certain repairs may be made on-site by the customer with only failed components sent back to the supplier. To design the product to be reliable, the support concept and environment must be taken into account. Specifically this means that the temperature, humidity, shock and vibration, and so forth associated with the environment in which maintenance is performed, the chance of human error given the anticipated skill levels, etc., must be considered when designing the product.
The environment where repairs are made depends on the type of product and the support concept. At one extreme, the environment can be a factory floor, where the temperature ranges from 65°F to 80°F, the air is free of dust and other pollutants, and the labor force is very experienced and stable. At the other extreme, the environment may be a flight line, where temperatures range from (-40°F to 120°F); maintenance is performed with the product exposed to rain, snow, wind, dust, and other natural factors; and the labor force is less experienced with a relatively high turnover rate.
Until the rapid growth of computers and the resulting reliance on software, reliability was a discipline traditionally focused on hardware. When customers stipulate a requirement for, or have some expectation regarding product reliability, they will not tolerate failures that are caused by a software bug. To customers, a failure is a failure regardless of its source or cause.
Over the years, many authorities in the fields of reliability, quality, and software have defined software quality as dealing with the process of developing software, and software reliability as dealing with the product of the process. It is generally agreed that a product's reliability is the result of addressing all possible failures, no matter what the cause, during design. Failures can result from a hardware fault, a software fault, or a firmware fault. All sources of failure must be addressed. This total view of product reliability is sometimes called "X-ware" reliability. Failures can also result from human error. Human engineering, also called human factors engineering or man-machine interface engineering, is a design approach to minimize the probability of human error. It recognizes that the performance of a product, including reliability, is affected by those who operate and maintain the product. Some studies show that humans account for between 30 and 70 percent of field failures. By using certain proven design approaches, the possibility of a human operator or maintainer inducing a failure (by initiating an action incorrectly or responding incorrectly) can be minimized.
2.7 Design Reliability
Each customer, whether commercial or military, measures the performance of products in their own ways, to suit their own needs. A car owner may be most concerned with low cost of operation and few visits to the repair shop. An airline may be most concerned with schedule reliability. These measures may or may not include factors within the control of the designer. The way in which a customer measures the reliability of a product in use may not be meaningful in a design specification, and a translation from the customer's measures to measures more appropriate to a design specification may be needed. Table 5 shows how performance (the customer's) reliability and design reliability differ.
Table 5. Performance and Design Reliability Differences
||Define, measure and evaluate supplier's product
||Describes performance when operated in planned environment (not for design requirements)
||Experience, benchmarking, etc.
|Selected such that:
||Achieving them allows projected satisfaction of performance reliability
||Needed level of operational reliability is described
||Inherent or design values
||Failure events subject to supplier's control
||All failures, regardless of cause
||Only effects of design and manufacturing
||Combined effects of design, quality, installation environment, maintenance policy, repair, delays, etc.
||MTBF (mean-time-between-failures) MTBCF (mean-time-between-critical- failure)
||MTBM (mean-time-between-maintenance) MTBD (mean-time-between-demand) MTBR (mean-time-between-removal) MTBCF (mean-time-between-critical-failure)
Among the requirements that are often used to specify design reliability, is mean time between failure (MTBF). An immediate implication of the definition is that the product in question is repairable. So MTBF immediately becomes unsuitable for light bulbs, fuses, missiles, flash bulbs, and many other products that only fail once (i.e., one-shot devices). Even for products such as bridges and aircraft structures (i.e., airframes) that may be repairable in the strictest sense of the word, MTBF intuitively seems "wrong". Further complicating the issue is that the measure most appropriate for a product is seldom the same measure that can be used for every subsystem, component, or assembly within the product.
MTBF is just one measure of reliability. Service life, probability of success (or success ratio), and mean time to failure (MTTF) are other potential customer requirements. In choosing the most appropriate requirement for a product, one must consider:
In choosing the specific requirements to use, the primary objective must be defined. For example, the primary objective may be to ensure functional success, reduce repair costs, maximize life, or reduce the chance of a catastrophic failure. A simple product may have only one reliability requirement. More complex products can have several, and they may not all be quantitative in nature.
- what measure of "life" should be used (e.g., hours for a radio, cycles for a landing gear, rounds fired for artillery, years for a bridge, etc.)
- how the customer or supplier measures performance reliability (e.g., schedule reliability for an airline, mission reliability for a weapon system, warranty returns for a consumer product, etc.)
- whether performance reliability goals/requirements are meaningful to a design engineer or if they must be "translated" into design requirements
- whether the product is repairable or not
- how failure is defined by the user
As is true for performance reliability, it is necessary when stating a required design reliability to stipulate certain conditions. The major conditions that affect design reliability are the duty cycle, environment, and functionality.
- MTTF. For non-repairable products, MTTF, or mean time to failure, is a common measure of design reliability. It is the mean value of the life of a product, defined mathematically by the following equation:
where R(t) is the probability of failure free operation or the reliability function.
MTTF is often considered a constant, at least during the useful life of a product (i.e., after design and manufacturing problems have been resolved and before the product begins to wear out). Implicit in this use of MTTF is the assumption that the times between failures are exponentially distributed. In this case, the failure rate is constant and is the inverse of the MTTF. The reliability function, when failures are assumed to be exponentially distributed, is given by:
where λ is the failure rate in failures per hour.
Due to the nature of the exponential distribution, a product that is operated for a time equal to the MTTF (i.e., t = λ ) has less than a 37% probability of surviving without failure.
- MTBF. The mean time between failure (MTBF) is similar to MTTF, but is used for repairable products. Repairs are assumed to restore the product to
"like-new" condition. For many situations MTBF is considered constant, requiring that the distribution of failures be assumed to be exponential.
- Probability of Success. The reliability function represents the probability of failure-free operation, or the probability of success. The latter is commonly used for one-shot devices, such as fuses, automobile supplemental restraint systems (air bags), and missiles.
- Failure-Free Period of Operation. Sometimes it is not the probability of success that is of concern, but instead the period over which no failure will occur (or, more accurately, for which the probability of failure is small). Specifying a failure-free period of operation requires that the probability of failure also be stipulated. For example, the requirement might be that the probability of failure in a certain operating period or over a certain number of miles be less than 1%.
- Service Life. Related to a failure-free period of operation is service life. Service life (also referred to as safe life) is often used to describe the useful life of structures and similar products in which fatigue and wearout are the primary causes of failure. Bridges, aircraft structures, and mechanical components are typical of these types of products. Service life is usually stated in terms of time or cycles, depending on the nature of the product. Failure of a single structural element must not cause failure of the structure during its service life.
- Damage Tolerance. Damage tolerance is another reliability-related parameter associated with structures. A damage tolerant structure is one in which:
- the failure of one or more structural elements will not cause the entire structure to fail (i.e., fail-safe concept), or
- the rate at which a fatigue crack in a structural element grows is slow enough to give ample time for detection before a critical crack length is reached
- No Single-point Failures. Related to fail-operational is the requirement of no single-point failures. The requirement is that no single failure will cause the loss of a product function. The design solution is usually to use redundancy. When redundant components are added, so are additional paths of successful operation. A redundant architecture has multiple successful paths of operation.
- Fail-Safe. In recognition of the human injury or property damage that can result from the failure of certain products, it may be required that if a product fails, it fails safely. In other words, the failure will affect the function of the product or even prevent the product from performing its function, but no injury or additional damage will occur. This type of reliability requirement is qualitative. An elevator is an example of a product that must fail safely. For example, if a failure occurs in the lift mechanism, the elevator might not be able to function but it must not free fall. Fail-safe is sometimes considered a safety requirement, rather than a reliability requirement. Indeed, the design solution is often in the form of some safety device that prevents the failure from causing injury or further damage. In the case of the elevator, brakes lock the elevator in place should the lift mechanism fail.
- Fail-Operational. In many cases, it is not acceptable for one or even more failures, to cause the product to cease functioning. Unlike fail-safe, in which safety is the issue and function may cease, fail-operational is concerned with preserving the function. Fail-operational is a subset of fail-safe. A requirement is usually specified as a time interval in which a product that has experienced faults can fulfill an operational requirement.
- Fault Tolerance. Fault tolerance is the requirement for a product to be in- sensitive to one or more failures. When the requirement involves tolerance to a single failure, fault tolerance is equivalent to no single-point failures.
- Graceful Degradation. Related to the requirement of no single-point failures is graceful degradation. Graceful degradation is different from both the requirement for no single-point failures and the requirement for fault tolerance. Whereas the latter two requirements, as usually imposed, allow for no impact on function due to a failure, graceful degradation does. Suppose a product has one function that must be performed. Without graceful degradation, failure of the product is defined as the failure to perform the function. The product has only two states: operational and failed. With graceful degradation, a failure does not result in total loss of a function. Instead, the function will continue to be performed at a level that is less than normal. The product now has three states: operational at normal level, operational at some lower (degraded) level, and failed. Graceful degradation provides some level of performance between normal and failed.
- Specifying Software Reliability. No standard means for specifying software reliability requirements are universally accepted and validated. One term used to specify software reliability is defect rate per 1000 lines of code. Other ways of specifying software reliability parallel those used for hardware. For example, the specification can require that no software failures result in loss of life or damage to property (a fail-safe requirement). Or, fault tolerance may be a requirement. For example, a divide-by-zero event should not result in an abort. Finally, it may be a requirement that the software "fail gracefully." That is, events such as an invalid command, out-of-range data, or bad data may cause a failure, but the software should provide an opportunity for the function to be recovered (automatically or manually) without a hard system crash. Anyone who has used a commercial software product is familiar with this latter case. For example, hitting a wrong key may require that the operator redo several steps, or close a window and restart an application. Such a "graceful failure" is very different from hard failures in which the computer "locks up" and the operator has no choice but to shut down and reboot the entire system.
2.8 Activities Related to Developing Reliability Requirements
- Duty Cycle. When products are operated for a given time t, not all of the lower-indenture items that make up the product necessarily operate for the same time t. For example, the landing gear mechanism of an aircraft is only activated once after takeoff and once before landing. During a flight, a pump may only operate a few times for a few minutes each time to maintain a charge on an accumulator. In such cases, the percent of time during which the item operates is called the duty cycle. An item having a 50% duty cycle only operates for 50% of the total product operating time.
- Environmental Conditions. Characterizing the environmental factors to which a product may be exposed is essential to the design process. The factors of importance include those experienced by a product during shipping, handling, and storage, as well as during maintenance and operation. In some cases, the shock, vibration, and temperature extremes experienced during shipping are more severe than those experienced in operation. Typical environmental factors include:
- maximum and minimum temperature
- maximum rate of change in temperature
- vibration (magnitude and type)
- humidity and moisture
- electromagnetic fields
- static and dynamic loading
- Functionality. To define failure, and to fully understand and respond to the reliability requirement, the designer must understand the functions to be performed by the product and the criteria by which an acceptable level of functionality is determined. In the previous example of the radio transmitter, the function is to transmit signals at a certain frequency and to do so at power levels of 750 watts or higher.
Some of the most common tasks related to the development of reliability requirements and goals that will be discussed are summarized in Table 6.
Table 6. Activities Related to the Development of Goals and Requirements
|Type of Activity
||Tasks and Description
||Relevance to Purpose
|Environmental Characterization. Determination of the operational stresses the product can be expected to experience.
||A process to identify the scope and magnitude of the enduse environments to which the product will be exposed throughout its useful life. Used to establish performancebased reliability requirements
|Fault Tolerance. Designing alternate means to continue operation when components of a product fail.
||Consideration of this failure masking technique allows the establishment of higher level reliability product goals, but lowers series reliability potential.
|Allocations. Translation of product level reliability goals and requirements into reliability goals and requirements for the components making up the product.
||Used to establish lower assembly level requirements from product level requirements based on complexity, parts counts, etc. Provides an effective means to check reliability requirements for realism.
|Dormancy Analysis. Determination of the effects of expected periods of storage or other non-operating conditions on the reliability of the product.
||Accounts for periods of non-operating or stand-by conditions which the product will experience throughout its useful life, in order to establish and understand special design needs and their impacts on the product.
|Durability Assessment. Determination of whether or not the mechanical strength of a product will remain adequate for its expected life.
||Used to define life limiting aspects of the product. An effective means to strategize for repair policy requirements and plan for upgrading products.
|Life Cycle Planning. Determining reliability (and other) requirements by considering the impact over the expected useful life of the product.
||A process to set goals for all portions of its life cycle. Considers levels of reliability at each stage and plans for end-of-life. Impacted significantly by repair policies and product durability.
|Modeling and Simulation. Creation of a representation (model), usually graphical or mathematical, for estimating the expected reliability of a product and validating the selected model through simulation.
||An approach to establish meaningful reliability requirements that can be allocated to lower assembly levels. Provides a means to determine degree of appropriate fault tolerance. Provides an understanding of the impact of unit failure on the product.
|Predictions. Estimation of reliability from available design, analysis, or test data, or data from similar products.
||A means to estimate realism of potential hardware and software reliability goals and requirements. Can indicate scope of fault tolerance appropriate for challenging requirement levels.
|Thermal Analysis. Analysis of the heat dissipations, transfer paths, and cooling sources to determine if part/product temperatures are consistent with reliability needs.
||An analysis to determine the relationship between the intended design reliability and the thermal use environment to establish performance-based reliability requirements.
|Translations. Determination of product design goals (i.e., product reliability) from the user’s operational requirements for the product.
||Models that will translate customer or user performance based requirements into product design reliability goals or requirements.
|Benchmarking. Comparison of a supplier's product and process performance attributes with those of competitors or with the best level of performance achieved by any supplier in a comparable activity.
||Can be used to establish competitive position with respect to reliability. Identifies goals necessary to develop discriminated product on the basis of reliability.
|Quality Function Deployment (QFD). Capturing the desires of the customer and translating these desires to design requirements and then to tasks needed in the product development effort.
||A technique for understanding customer needs that provides a means of defining quantitative reliability goals and tasks to effectively satisfy them.
|Market Survey. Determining the needs and wants of potential customers, their probable reaction to potential products, and their level of satisfaction with existing products.
||A basic means to identify customer needs and expectations as an input to developing supplier product goals.
SECTION THREE - A PROCESS FOR DEVELOPING RELIABILITY REQUIREMENTS
Customer reliability performance needs and expectations may be expressed specifically, implicitly, or not at all. They may have to be hypothesized by the supplier based on other desired customer requirements. Table 7 summarizes typical scenarios. No matter what methods are used, the primary goal is to "thoroughly understand what the customer needs and expects."
Table 7. Customer Reliability Requirement Scenarios
||Description of Scenario
||Customer specification includes re- quirements for reliability performance as a quantitative measure (i.e., percent reliability, mean time between failure, etc.)
||The specified value may need to be adjusted to account for factors that may cause failure that are beyond supplier control, i.e., translation of user requirements to design goals.
||Specification includes product char- acteristics that necessitate certain levels of reliability in order to satisfy them (i.e., life cycle cost, support cost, maintenance manpower, warranty provisions).
||The level of reliability necessary to meet other explicitly stated product characteristics should be derived. This process is based on known or hypothesized relationships among characteristics and often involves trade-offs.
||Often the case for commercial products. Supplier must "anticipate" needs and/or position his product in the marketplace. Supplier may or may not have data on similar or competitive products.
||Market surveys, Quality Function Deployment, benchmarking, etc., are among approaches that can be used. Data on existing products can help the supplier "position" his reliability objective. Often, this can be a competitive advantage.
Developing reliability requirements for products and systems is a multi-step process as shown in Figure 3. Each step is important in choosing the level of reliability that drives the scope of design oriented tasks necessary to meet customers' needs and expectations. Each of the following subsections will discuss how typical reliability tasks relate to effectively accomplishing these steps.
Figure 3. Reliability Requirements Development Process (Click to Zoom)
3.1 Determine Customer's Product Needs
Some customers, especially the general public, do not explicitly "specify" what they need in a product, especially reliability. For example, the average buyer of an automobile does not go into the new car showroom looking for a car with an MTBR (mean time between repair) of 450 operating hours. Instead, the buyer may read a consumer guide that identifies those cars having a lower than average frequency of repair, when compared to other cars. Customers for other products may be very specific and state a numerical reliability requirement. For example, an airline usually has a very definite departure reliability requirement for a new aircraft. When customers do not explicitly specify their product reliability needs, suppliers have to determine them using one or more methods.
Determining customer needs is the basis for deriving performance (operational) reliability requirements and subsequent design requirements. Without designing to those needs, products will not succeed in the marketplace. Customer needs should be determined early in the Concept/Planning phase of a product development program, before large investments of time and resources are made. Customer needs are a prerequisite to deriving performance reliability requirements and performance reliability requirements, in turn, are the basis of design requirements, which should be defined before starting any design and development. The following sections highlight several approaches to determining customer needs.
3.1.1 Market Surveys
- Purpose. Market surveys are used to determine what customers want and need in terms of new or improved products. They address desired features and attributes that range from basic functionality to general appearance.
- Benefits. There is no better way to understand the needs of customers than to ask them. Surveys can, however, be subject to bias and sampling error. Well planned efforts can minimize these effects.
- Timing. Usually surveys are performed in the Concept/Planning phase but alternative forms of surveys can be used when a prototype product is available. If this is the case, the customers being surveyed can provide real feedback on likes and dislikes in time to introduce product changes prior to full scale development.
- Application Guidance. Developing market surveys is not a trivial matter. Great care must be exercised in writing questions so as to not introduce bias in the results. The survey must also be concise enough to encourage participation unless the participant is being compensated in some way. Most importantly the sample of the customer population must be carefully chosen so it can be interpreted as representative. It's usually appropriate to use experts in market research to develop effective market surveys.
There are blind surveys and targeted surveys, as well as alternate types of surveys such as discussion and focus groups. These have an advantage over normal surveys in that they have two way communication. Suppliers may invite customers to participate in discussion groups. These groups may discuss general categories of products (for example, family automobiles) or a specific new product (a short-range, electric commuter car). The discussion is guided by a supplier representative who asks specific questions designed to uncover the customers' basic needs and expectations. Often, suppliers will offer customers free service, reduced prices for a purchase made within so many days of the session, or some small token of appreciation for participating in discussion groups.
3.1.3 Environmental Characterization
- Purpose. Benchmarking is a proactive process for:
- comparing a process as implemented by an organization with the "same" process as implemented by one or more other organizations
- modifying an organization's process to incorporate the best practices learned through the comparison exercise
- Benefits. Benchmarking enables a supplier to better understand the characteristics of other products which, when combined with their level of market acceptance, provides a better understanding of customer needs.
- Timing. Benchmarking, ideally done prior to product planning, should be an on-going process to continually assess how products compare to those of competitors. The process should typically be initiated in the Concept/ Planning phase of a new product development.
- Application Guidance. Ideally, the process is implemented by using one organization recognized as a world-class leader as the benchmark. Benchmarking can be applied to products and services, in addition to processes. It goes beyond traditional competitive analysis to not only reveal industry's "best practices" but also to clearly understand how best practices are used. In product benchmarking, a competitor's product that is known to be the "best in class" is usually selected as the benchmark. This benchmark establishes the minimum requirements, including reliability, for the new product. Benchmarking is useful not only in developing product-level requirements but also requirements at lower levels of indenture.
In a sense, similarity analysis is a benchmarking process, where instead of comparing to the products of competitors, you compare to similar products of your own company. If the product being developed is an upgrade of or is similar to a well-established product, then the reliability and other characteristics of the older product can be used as the starting point of the requirements for the new product. For example, we intuitively assume that the reliability of a "next generation" product will be at least as reliable as its predecessor. In each specific case, however, the technology being incorporated in the new product, changes in functionality, and other performance requirements should be evaluated to determine if the assumption is realistic. This method, sometimes called comparative analysis, is also applicable to deriving reliability requirements at lower levels of indenture.
3.1.4 Life Cycle Planning
Environmental characterization is a process used to define the operational and environmental stresses that the product will experience when put into use by the customer.
- Benefits. Without an understanding of the stresses to be experienced by a product, the statement of reliability objectives, explicitly or implicitly, is meaningless. For example, a product that could have an MTBF of 500 hours in a consumer household may only experience a 200 hour MTBF in an automobile, due to a more stressful environment.
- Timing. The environment should be defined in gross terms very early in the Concept/Planning phase.
- Application Guidance. At a high level, the environment can be characterized in terms such as ground, mobile, airborne, space, etc. This provides a rough indication of the severity ranges of stresses to be experienced. As the design process starts, more detailed information may be necessary. For more severe environments, it may be appropriate to actually use instrumentation to measure the expected stress levels. For example, the Air Force Rome Laboratory has developed a family of Time Stress Measurement Devices (TSMDs) that can be used to measure and record such stresses as temperature, humidity, shock, vibration and power. Less sophisticated commercially available products are also available.
Exposure to stress is also impacted by how the product is used operationally. For example, is it turned on and left powered 24 hours per day or only subjected to electrical or mechanical stress when used. Is it part of a system (like a computer system) where it is only used 10% of the system operating time (like a printer). Another factor is who will use the product. Will mature adults use the product or will children, more apt to mishandle and abuse it?
3.2 Derive Customer's Performance Reliability Requirements
- Purpose. Life cycle planning addresses the product reliability at every phase of the life cycle. It addresses the various phases in developing a mature product all the way to end of life considerations, often including product disposal. It also addresses various time periods during the product's useful life including shipping, storage (shelf life) and operation.
- Benefits. Products need to be planned and designed to preclude failure due to any period of exposure to stress. Life cycle planning addresses all phases of the life cycle and both catastrophic and wearout types of failure.
- Timing. The Concept/Planning phase is the time to establish reliability goals and requirements that address the different stress exposures and characteristics during the product's life cycle.
- Application Guidance. Any time (shipping, storage and operation) a product is subjected to stress, there is the potential for failure. Sometimes the stresses of nonoperating periods can be more detrimental than those during operation. For example from a mechanical standpoint, a television operates in a benign environment but needs extensive protection for shipping. Also, many products during storage are susceptible to moisture and, therefore, need to be packed with desiccants. All of these factors should be considered in planning and developing the appropriate reliability goals and requirements for the product.
It is necessary to identify or derive the customer's performance reliability requirements from the customer's needs for the product. Needs may be very qualitative (e.g., "good reliability"). Reliability requirements may be "hidden" in other needs that are expressed. For example, a need may be stated as availability (which is a function of both reliability and maintainability), or as a safety concern (no safety critical failures).
Unless software is the end item in question, its reliability is usually not separately addressed by customer needs. Likewise, human reliability usually is not separately addressed. Instead, the customer's performance reliability requirements are stated in terms of overall product reliability, with no regard given to potential sources of failure. Performance reliability requirements are the basis for deriving the design reliability requirements. Customer performance reliability requirements should be derived from customer needs as soon as the latter are known.
Depending on what customer needs are stated, performance reliability requirements can be derived in one of two ways. If the need is already stated as a recognized reliability requirement (e.g., MTBM), no action is required because the need and the requirement are synonymous. However, when the performance reliability requirement is "hidden", the basic definition of the need must be analyzed to derive any reliability requirements. Ideally, the definition can be described by a relationship, in which reliability is one factor.
There are a number of reliability oriented tasks that are useful in deriving performance reliability requirements from the needs of customers.
3.2.1 Modeling and Simulation
Figure 4. Different Combinations of MTBF and MTTR Yield the Same Inherent Availability (Click to Zoom)
- Purpose. Modeling and simulation is an effective technique to determine a level of reliability, or range of reliability, necessary to meet a more general customer need or requirement.
- Benefits. Modeling and simulation enables the trade-off of various product characteristics to achieve a more general requirement. Simulation specifically makes use of computer automation to "try" various solutions until an optimized solution is achieved.
- Timing. Modeling and simulation is most beneficial if used early in the Concept/Planning phase. It provides a means of identifying solutions without the costly design, build and test process. As the product evolves during the Design/Development phase, the models should be updated to reflect the current product design configuration.
- Application Guidance. In developing reliability requirements using mission and support models, the relationships among reliability and the various mission and support figures of merit (FOMs) are defined using mathematical relationships. These FOMs may be the number of spares required for a given scenario, the number of flights that can be generated in a given time period, the average number of products down for service at any one time, and so forth. By varying the operational measures of reliability, the effect on the FOMs can be determined. Those values of operational reliability which give the "best" results, all other factors being held constant, are then selected as the reliability requirements.
Closely related to mission and support modeling is life cycle cost (LCC) modeling. In an LCC model, the FOM of interest is the overall cost of the product during its entire life cycle (i.e., from concept to obsolescence). Since reliability affects the number of spares required for repairable products, the frequency with which products must be taken out of service, and other cost- related factors, an LCC model can be used to find the value of reliability which minimizes the LCC.
Reliability is just one of many product performance requirements. Depending on the functions performed by the product, many other performance requirements may be imposed. These other requirements can include: range, payload, speed, weight, gas mileage, power output, rate of fire and maintainability. In addition to functional performance requirements, products may have form and fit requirements. These requirements describe how the product must look, and its dimensions and shape. These three types of requirements are often referred to as form, fit, and function.
Ideally, each product requirement would be optimized. But seldom, if ever, is it possible to optimize every requirement. Instead, the overall product must be optimized. That is, the product's requirements must be addressed holistically, as a set. For example, maximizing the number of people (payload) to be carried by a commercial airliner makes it difficult to also maximize the range. These requirements are conflicting. So a "best mix" of range and payload is chosen as a result of trade-off studies. Sometimes, requirements are complementary. For example, consider an automobile. Maximizing gas mileage is consistent with maximizing range. Whether requirements are conflicting or complementary, they must be viewed as a whole, and not as independent requirements.
Reliability requirements should be developed within the context of the overall requirements for the product and program constraints. It must be recognized that reliability is but one of many, often competing, requirements that are not necessarily equal in importance. The focus must be on the overall product performance. So other performance requirements, and even form and fit requirements, can complement or conflict with the reliability requirements. To reach a "best" compromise, trade-offs are conducted in which an increase in one requirement is traded for a decrease in another. When conflicts arise, reliability might be traded off (i.e., the requirement is reduced) to achieve better performance in another area.
Sometimes, by trading one requirement for another, a third, related requirement can be met. Consider a product having an inherent availability requirement. Inherent availability is a function of the designed-in reliability and maintainability (R&M). If the reliability requirement cannot be met, product availability can be achieved by improving the maintainability requirement. Figure 4 illustrates the complementary nature of R&M and shows how a given inherent availability can be met with different combinations of R&M.
3.3 Derive Product Level Design Reliability Requirements
In order for designers to have targets to design to, performance reliability requirements must be "translated" into a set of design reliability requirements. Commercial and military customers measure the reliability performance of products in their own ways, to suit their own needs. These measures may or may not include factors outside the control of the product or system supplier. The way in which a customer measures the reliability of a product in use may not be directly meaningful as a design reliability goal or requirement. While some factors are not under the supplier's control, they should be accounted for in establishing the level of design reliability necessary to meet the customer's needs or expectations. Usually, the supplier can anticipate that while many failures affecting reliability performance will be caused by the design, other classes of induced failures will occur during use (including repair and manufacturing). This hierarchy of failure causes can be visualized as segments of a pyramid as shown in Figure 5.
Figure 5. Product Performance Failures (Click to Zoom)
The shape of the "pyramid" can vary with time because each segment is a function of time. It is critical to account for all known failure causes in establishing product design reliability goals. The process of establishing design requirements/goals from needed performance measures is sometimes referred to as "translating" customer (performance) reliability to supplier (design) reliability.
Design reliability requirements, at the product level, should be derived before the start of the Design/Development phase of a program. Even if the requirements are initially stated as goals, design requirements should be available before designers attempt to implement the product concept developed in the early phases of the program. A
number of reliability oriented tasks are helpful in derive design reliability requirements from performance reliability requirements.
3.3.1 Quality Function Deployment (QFD)
Figure 6. QFD House of Quality (Click to Zoom)
- Purpose. Quality function deployment, or QFD, is a tool for translating defined customer requirements into appropriate design requirements at each stage of design and development. In a sense it addresses several steps in the requirement development process shown in Figure 3. The method uses a matrix known as the House of Quality, as depicted in Figure 6.
The right-hand side of the completed House of Quality is used to project the relative level of effort, cost, required manufacturing capability, and the supplier's competitive position regarding each WHAT. Projections are usually stated as Greatest, Average, and Least.
- Benefits. The benefits of the QFD approach are that it represents a means of developing a structured set of customer requirements and ensuring that the design features address those attributes. It also provides a means of weighting, or prioritizing the needs.
- Timing. The QFD approach should be applied in the Concept/Planning phase.
- Application Guidance. Briefly, the following steps are used in the QFD approach.
- Enter the WHATs already determined. If necessary, further define the WHATs as Primary, Secondary, and Tertiary requirements.
- Determine the HOWs (the design requirements) based on technical experience and knowledge.
- Develop HOW-WHAT relationships, assigning a numerical value to each (for example, a Very Strong relationship might be assigned a 5, a Strong relationship a 3, and a Weak relationship a 1). The determination of relationships is based on experience and technical knowledge. To provide an easily understood graphical display, the symbols shown in Figure 7 are used.
- Define and assign a customer importance factor for each of the lowest level requirements (Primary, Secondary, or Tertiary) and the degree of technical and cost risk associated with each HOW. Assign numerical values to the factors and degrees of risk (e.g., Greatest = 5, Average =3, Least =1).
Figure 7. Typical Excerpt of House of Quality (Click to Zoom)
- Develop relationships between the HOWs. Use the same definitions for the strength of the relationship and the corresponding numerical value that was used for the HOW-WHAT relationships.
- Calculate the relative and absolute weights for the HOWs. For each HOW (DR1, DR2, and DR3), the relationship values in that column are summed. The results are 39, 38, and 30, respectively. Rank ordered, the HOWs are given absolute weights of 1, 2, and 3. Now multiply the sum of each column by its risk yielding the following products: 39, 190, and 90, respectively. So the relative weights are 3, 1, and 2, respectively, for DR1, DR2, and DR3.
- Based on the values of the absolute and relative weights, select the key HOWs (i.e., those requiring the most attention). DR2 rates the most attention and DR3 the least.
Finally, if a supplier's product is a component of the customer's product, the performance and design reliability requirements may be identical. For example, a customer's performance reliability requirement for a component may be an MTBF of 500 hours. In this case, the supplier of the component need not do any translation.
- Purpose. The purpose in translating a performance reliability measure to a design reliability measure is to provide designers a target by focusing on those reliability characteristics that can be met. By satisfying this target, the performance reliability objective will "automatically" be achieved.
- Benefits. The benefit is that each designer will have a realistic target to work towards for the portion of the design under his/her direct control.
- Timing. The design target should be established before the design begins (i.e., during the Concept/Planning phase) in order to work towards achieving the reliability goal in the most efficient manner.
- Application Guidance. The first step in deriving design reliability requirements is to translate operational performance parameters into design parameters. For example, MTBM is a measure that takes into account factors that may be beyond the control of supplier who designs the product, such as induced failures during customer use, or non-verifiable failures. So MTBM must be translated into a contract specification term, such as MTBF.
Two methods of making the translation from operational reliability to design reliability have been developed by Rome Laboratory (RL). While they were developed for the military, similar relationships can be used for commercial products, or simple "k factors" can be applied to relate performance reliability to design reliability based on supplier experience.
The model provides ranges of design RM&D values derived from the operational RM&D requirements. Users of the model can then make trade-offs to find the best mix of RM&D design requirements. Although the methodology is applicable to any type of product, the model was fully developed for 10 types of fixed-wing aircraft and 30 avionic subsystems. The ART tool is available from RL or IIT Research Institute.
- Operational Parameter Translation (OPT) Models. Rome Laboratory, under contract, developed a set of models to translate operational (performance) reliability and maintainability measures into contractual, specifiable, and measurable values, i.e. design reliability. The models were developed by first identifying the variables that influence the measures being translated. Then, operational and design data was collected and statistical analysis techniques were used to develop and verify the models.
- Automated Requirements Translation Tool. In another contracted project, Rome Laboratory developed an automated translation tool called ART (for Automated Requirements Translation). ART was developed to translate operational reliability, maintainability, and diagnostic (RM&D) parameters into design requirements. A model was developed and then incorporated in a PC- based software tool. The modeling methodology used was as follows:
- identify a set of operational RM&D requirements for the type of product
- identify definitions for each operational requirement
- derive an algorithm for each definition
- simplify each algorithm to its simplest terms
- use the terms resulting from simplification as the design requirements
- use data for baseline products to set acceptable ranges of design values
- simultaneously solve the algorithms
3.3.3 Analyses Related to Deriving Reliability Design Requirements
- Purpose. Many forms of reliability related analyses aid in the process of deriving design reliability requirements. They provide a means of assuring compatibility with environmental and operational use conditions as well as to assure that the requirements are compatible with the state-of-the-art.
- Benefits. Reliability requirements should not be blindly derived by applying a set of algorithms to performance reliability requirements. By supplementing the translation process with a tailored set of analyses, compatibility with other requirements and constraints can be cost effectively assured.
- Timing. These analyses should be applied when the requirements are first being derived during the Concept/Planning phase, but should be iterated as appropriate during the Design/Development process.
- Application Guidance. The process cannot end with translation to a design specification. The translated requirements must be evaluated for realism. Questions that should be answered include (1) are the requirements compatible with the available technology, and (2) do the requirements unnecessarily drive the design (conflict with other product constraints such as weight and power). Answering these questions usually involves a review of previous studies and data for similar or comparative products (if any exist). The design requirements, operational requirements, or both may need to be adjusted to account for the improvement in technology, different operating environments, different duty cycles, and so forth.
Figure 8. Generalized Approach to Durability Assessment (Click to Zoom)
- Thermal Analysis. One of the most important influences on reliability is temperature. Although temperature effects are usually associated with electronics, the reliability of mechanical components is also affected by temperature. By conducting a thermal analysis, designers can determine heat transfer paths and modes, temperature extremes experienced by individual components and parts, and the impact of thermal shock caused by rapid changes in temperature. In performing the analysis, the designer may find that, even with reasonable cooling provisions and optimum placement of components and parts, the temperatures encountered by a product and its constituent parts make the reliability requirement technically or economically infeasible. In such cases, the reliability requirement should be adjusted.
- Durability Assessment. For mechanical and structural elements of a product, a durability assessment can be used to determine if any associated service life requirements can be met. A generalized approach to conducting such an assessment is based on a Damage Tolerance Assessment methodology developed by the US Air Force for aircraft. Figure 8 depicts the generalized approach.
Note that the approach is a process that begins during design and continues throughout testing of the product. The service life estimates that result from the assessment process can be used to evaluate the realism of the requirement. In addition to assessing the durability of the product, the approach provides information needed to establish maintenance inspection requirements (what is to be inspected and with what frequency).
In developing reliability requirements, safety factors or derating factors may be applied. Sometimes called uncertainty or "fudge" factors, safety factors (SFs) are used to account for the uncertainty in our understanding of physical phenomena, the inaccuracies of our models, and the limitations of our tools. For example, assume the probability of failure for a cable under a tensile load of 1000 pounds must be less than 5%. We could apply a SF of 25% resulting in a requirement that the probability of failure be less than 5% when the tensile load is 1250 pounds.
- Predictions. As soon as initial predictions can be made, they should be compared with the requirement. Depending on whether the prediction is lower than, equal to, or higher than the requirement, some action may be warranted.
If the predicted reliability is less than the requirement, the assumptions and methodology used in making the prediction should be reviewed and, if necessary, adjusted. If the prediction is reasonable, then the requirement must be re-examined.
If the prediction is much higher than the requirement, consideration should be given to raising the requirement to:
If the predicted reliability is approximately equal to the requirement, some concern is warranted. Predicting reliability is not a precise science, and some error is always associated with a prediction. Usually, a conservative position is taken in which the prediction should be higher than the requirement.
- gain a competitive advantage
- reduce future costs (e.g., warranty and maintenance)
- enhance safety
- Fault Tolerance. Fault tolerance is an approach that allows the product to continue to meet its functionality while experiencing failure. It is normally accomplished by redundancy (usually active where all units are operational, but sometimes by standby where additional units become operational only at the time of failure) where parallel paths are added to keep the product operational if one path fails. In the context of deriving requirements, analysis of the degree of fault tolerance is a means of assuring that a needed reliability design level is practical to achieve.
- Dormancy Analysis. Like all time periods within the product life cycle, dormant periods have the potential to cause failure. An analysis of dormant periods identifies those unique attributes that affect the reliability requirements for nonoperating periods.
Derating is essentially the complement of a safety factor. Derating is a process of limiting electrical, thermal, and mechanical stresses on parts to levels below their specified ratings. Consider again the example of a cable. Assume the tensile strength of the cable is rated at 1250 pounds. We can apply a derating factor of 80%, which means we will not use the cable in applications where it will be exposed to tensile loads greater than 1000 pounds (i.e., 25% safety factor).
3.4 Allocation of Requirements
Other more sophisticated allocation methods are described in the literature, such as the minimization-of-effort method, the feasibility-of-objectives method and the similarity method. The Minimization-of-Effort Method attempts to allocate requirements in a way that minimizes the effort needed to achieve the allocated requirements. Effort is a function of the number of tests, amount of analysis and number of trades made, and so forth.
- Purpose. Allocations provide a means to assign reliability requirements for complex products to lower levels. Product-level requirements are often insufficient to scope the design effort. For example, a requirement that a truck have an MTBF of 1000 hours doesn't help the designers of the transmission, engine, and other components. How reliable must these components be? Allocation addresses these questions. The allocation process is often iterative, requiring several attempts to satisfy all requirements. In other cases, the requirements can't be satisfied (in order to meet the product-level requirement, components are needed with unachievable levels of reliability) and dialogue with the customer and trade-offs are required to resolve the problem.
- Benefits. Allocation of product-level reliability requirements to lower levels of indenture makes it easier to manage and track requirements. It enables the tracking of progress toward meeting the product requirements, provides a means of making a sanity check of product-level requirements, and facilitates trade-off studies.
- Timing. Allocation of product reliability to lower levels of indenture begins as soon as the product-level design requirements have been derived from performance reliability requirements during the Concept/Planning phase. An initial allocation should be complete before design begins at each level of indenture, usually before a
functional design review during the Design/Development phase. Updates are normally made before any major design reviews.
- Application Guidance. For large, complex products, it is difficult for a supplier to depend only on product-level reliability requirements. Product-level requirements should not be imposed on the designers of each of the different components, subsystems, etc. When outside suppliers are involved, the problem is even more difficult. Some way of assigning a portion of the product-level requirement to designers and to outside suppliers is necessary. Allocation is the method of apportioning requirements.
If only product-level requirements were used, analysis would be the sole means of tracking progress until the entire product was built and tested. Testing of lower indenture items, however, can begin very early in a product development program. By tracking the progress made on each of these items, and then analytically "combining" the results, a good idea of the progress being made toward the product-level requirements can be gained. Problems, and solutions to problems, can be identified earlier than would otherwise be possible.
Even carefully developed product-level requirements may be unachievable. One way to check the realism of product-level requirements is through the allocation process. There are many ways of allocating reliability. Six of the most common methods are: (1) Equal Distribution, (2) Complexity, (3) With Reserve, (4) Feasibility of Objectives, (5) Minimization of Effort and (6) Similarity
- Equal Distribution Method. The Equal Distribution method allocates the same value of reliability to each lower indenture item. The method is most useful when the components are similar. It is defined as follows:
λ = Lambda, the failure rate
i = Subscript for each lower indenture "item"
a = Subscript for "allocated"
N = Number of lower indenture items
p = Subscript for "product"
r = Subscript for "requirement"
It is often appropriate to allocate reliability "with reserve" where a certain fraction of the product requirement is held in "reserve" (is unallocated). By its very nature, this method provides conservative design goals for the lower indenture items. This modification can be used with any of the allocation methods described.
- Complexity Methods. These methods are similar to the Equal Distribution method but attempt to account for the complexity of the product by weighting the allocations. Three common complexity methods of allocation are the ARINC, AGREE, and parts count methods. The ARINC and parts count methods are described below:
- ARINC Method. Complexity is assumed to be measured by the relative failure rates of the items (i.e., the higher the failure rate, the more complex the item). The method is described by:
λ = Lambda, the failure rate
i = Subscript for each lower indenture "item"
a = Subscript for "allocated"
p = Subscript for "product"
r = Subscript for "requirement"
e = Estimated (or predicted)
The component with the highest predicted reliability is allocated the highest requirement, the component with the lowest prediction is allocated the lowest requirement, and so forth. For the example, no component has an estimated reliability equal to, or greater than, the allocated value. The designers can either improve the reliability of the components, use redundancy, or select different components to meet the product reliability requirement.
- Parts Count Method. This method implicitly uses parts count as a measure of complexity. Allocations of reliability, in terms of failure rate, are made in proportion to the number of parts.
The Feasibility-of-Objectives Method was originally developed for repairable electromechanical products. Allocations to lower indenture items are based on numerical ratings of the design maturity (state of the art), intricacy, mission operating time, and conditions for each item to which the product reliability will be allocated. Ratings are assigned by a lead design engineer based on experience and judgment, or by a group of engineers using the Delphi technique. Ratings for each factor range from 1 to 10. Definitions of the factors and the meanings of the ratings follow:
Just as the product-level requirement can be based on the achieved reliability of a similar product, allocations can be made based on how previous allocations were made for products with similar architectures.
- Design Maturity -- An indication of the level of technology and degree of proven design approaches used in an item. Items with the most highly developed, mature design are assigned a rating of 1, those with the least mature design a 10.
- Intricacy -- An indication of the number of parts and sophistication of the architecture. The least intricate items are assigned a rating of 1, the most intricate a 10.
- Mission Operating Time -- An indication of the percentage of mission time during which an item operates. Items that operate for a small percentage of the mission time are assigned a rating of 1; those that operate continuously are assigned a 10.
- Environment -- An indication of the severity of the environment experienced by the item during product operation. Items that operate in the least severe environment are given a rating of 1; those operating in the most severe are assigned a 10.
SECTION FOUR - REFERENCES
4.1 Specification Development Tool
Under the Standardized Hardware Acquisition and Reliability Program of the Naval Surface Warfare Center, Crane, Indiana, a software tool has been developed to assist in the development of performance specifications. Called SpecRite, the tool uses a top- down, incremental approach to automatically build a specification as it guides a user through logic trees. SpecRite is available on diskette and CD-ROM, or it can be downloaded from the World Wide Web Home Page of the Best Manufacturing Practices (BMP) Center.
4.2 Other References
- To order the CD-ROM, write to:
Computer Science's Corporation
Attn: PMWS CD
6565 Arlington Blvd.
Falls Church, VA 22042
- To order the diskette, write to:
BMP Center of Excellence
4321 Hartwick Road
College Park, MD 20740
- The software can be downloaded from "www.bmpcoe.org".
4.2.1 Texts and Articles:
- ARINC Research Corporation, "Reliability Engineering", edited by William H. Von Alven, 1964, Prentice Hall, Englewood Cliffs, NJ.
- ARINC Research Corporation, "Product Reliability, Maintainability, and Supportability Handbook", edited by Michael Pecht, 1995, CRC Press, Boca Raton, FL.
- Boyd, James A., 1992, "Allocation of Reliability Requirements: A New Approach," 1992 Proceedings, Annual Reliability and Maintainability Symposium, Las Vegas, NE, January 21-23, pages 5-6.
- Gertman, David I. and Harold S. Blackman, "Human Reliability and Safety Analysis Data Handbook", 1994, John Wiley & Sons, New York, NY.
- Hadel, John J. and Peter B. Lakey, "A Customer-Oriented Approach to Optimizing Reliability Allocation Within a Set of Weapon-System Requirements," 1995 Proceedings, Annual Reliability and Maintainability Symposium, Washington, DC, January 16-19, pages 96-101.
- Lloyd, David K. and Myron Lipow, "Reliability: Management Methods, and Mathematics", 1962, Prentice Hall, Englewood Cliffs, NJ.
- Musa, J.D., et al, "Software Reliability, Measurement, Prediction, Application", 1987, McGraw-Hill, New York, NY.
- Park, K.S., "Human Reliability: Analysis, Prediction, and Prevention of Human Errors", 1987, Elsevier Science Publications, Amsterdam, Holland.
- Peace, G. S., "Taguchi Methods: A Hands-On Approach", 1993, Addison- Wesley Publishing Company, Reading MA.
- Phadke, M. S., "Quality Engineering Using Robust Design", 1989, Prentice Hall, Englewood Cliffs, NJ.
- Rome Laboratory and the Reliability Analysis Center, "Reliability Toolkit: Commercial Practices Edition", 1995, Rome, NY.
- Office of the Assistant Secretary of Defense for Economic Security, "Defense Standardization Program: Performance Specification Guide (SD-15)", June 1995.
4.2.2 Commercial and International Standards.
The commercial and international standards listed in this section may be obtained as follows:
- Copies of Institute of Electrical and Electronics Engineers (IEEE) documents are available from:
445 Hoes Lane
P.O. Box 1331
Piscataway, NJ 08855-1331
Telephone, (800) 678-IEEE, FAX, (908) 981-9667
- Copies of BELL (Bellcore) documents are available from:
Bellcore - Bell Communications Research
Information Exchange Management
445 South Street, Room 2J-125
PO Box 1910
Morristown, NJ 07962-1910
- Copies of International Electrotechnical Commission (IEC) documents are available from:
American National Standards Institute (ANSI)
New York, NY 10018
Telephone, (212) 642-4900, FAX, (212) 302-1286
- Copies of International Standards Organization (ISO) documents are available from ANSI or from:
International Organization for Standardization
1, Rue de Varembe
CH-1211 Geneva 20, Switzerland
Telephone: +(41) 22 749-0111
Fax: +(41) 22 733-3430
4.2.3 Technical Reports.
- BELL-FR-NWT-0000796, "Reliability and Quality Generic Requirements"
- BELL-TR-TSY-000282, "Software Reliability and Quality Acceptance Criteria"
- IEEE-577, "Standard Requirements for Reliability Analysis in the Design and Operation of Safety Systems"
- IEC 300-3-8, "Dependability Management - Application Guide - Human Reliability"
- IEC-409, "Guide for the Inclusion of Reliability Clauses into Specifications for Components (or Parts)"
- ISO-2394, "General Principles on Reliability for Structures"
The following Rome Laboratory (formerly Rome Air Development Center) reports may be requested from:
||The National Technical Information Service
Department of Commerce
5285 Port Royal Road
Springfield, VA 22151
- Brown, T.P. et al, "Specification of Software Quality Attributes", Rome Air Development Center Report, RADC-TR-85-37 (3 volumes), 1985.
- Clark, David, Ned H. Criscimagna, William Denson, and David Nicholls, "User Requirements to System Specifications: Translation Tools for Diagnostics, Reliability, and Maintainability -- Final Report", Rome Laboratory Contract F30602-92-C-0179, July 1995.
- Coit, David W., David L. Russell, and Russ O. Wrisley, "Reliability / Maintainability Operational Parameter Translation II: Final Report", Rome Air Development Center Contract F30602-86-C-0175, August 1989.
- Friedman, M.A., P.Y. Tran, and P.L. Goddard, "Reliability Techniques for Combined Hardware and Software Systems: Final Technical Report", Rome Laboratory Contract No. F30602-C-0111, April 1991.
The RMQSI Case - A Reasoned, Auditable Argument Supporting the Contention that a System Satisfies...
Journal Article V7, N4
FAQs about IEC Dependability Standards