Tutorial: Test Risks, Confidence and OC Curves
By: Anthony Coppola
Test risks, confidence levels and operating characteristic (OC) curves are related statistical concepts designed to show analysts how likely they are to get into trouble by accepting the results of a statistical test.
For example, let us assume we are willing to accept a lot of some product only if the lot has no more than a given proportion of defective units. We decide to test a sample of units for defects. If the sample does not have an excessive number of defects, we will accept the lot from which the sample came. Our problem is to determine how many defects will be acceptable in the sample. No test short of 100% inspection will reject all lots with unacceptable defect rates or accept all lots with acceptable defect rates (unacceptable and acceptable being arbitrary values defined by the analyst). Some risk, called the consumers risk, that a sample of bad product will give good results (i.e., will pass our test criteria), always exists. The test we devise must reduce this risk to a satisfactory level.
The risk of error is the probability that a sample of bad product will pass the test (i.e., a sample will have no more defects than we agree are acceptable). This probability is a function of the size of the sample, the number of defects we will allow in the sample, and a quantitative definition of bad (i.e., the lowest defect rate that the consumer would consider unacceptable). Since defects follow a binomial distribution (a unit is defective or it is not), the probability of a product passing a sample test is:
where:
- P is the probability of finding a or less defects in the sample
- n is the number of units in the sample (sample size must be much smaller than lot size for the expression to hold)
- a is the number of defects we will accept in the sample
- Pd is the proportion of the lot that is defective (which equals the probability of any given unit being defective)
- (1-Pd) is the proportion of the lot that is not defective
To calculate the consumers risk, Pd is set to the lowest value that the analyst would call unacceptable. For example, suppose we decide that we want to reject any lot with a defect rate of 5%, and plan to do it by testing 100 units and allowing no more than 2 defects. Then:
Hence, there is a 12% probability that we will accept a bad lot, which is defined as one with a defect rate of 5%. This is, by definition, a consumers risk of 12%. It can also be converted to a measure of confidence, defined as (100 - risk)%. For this case, we have an 88% confidence (100 - 12)% that the test will reject products with the bad defect rate. Note that the risk (and, hence, confidence) is based on a specified defect rate of .05. The risk of accepting a worse rate than .05 will be lower than 12%, and defect rates higher than .05 will be accepted more often than 12% of the times they are tested to the same criteria.
OC Curves
If we plot the probability of acceptance against the possible values of the defect rate, we would have something that would look like Figure 1.
Figure 1: OC Curve (Click to Zoom)
Figure 1 is an OC curve, and it is useful because the consumers risk is only part of the story. In addition to the risk of accepting a bad lot, there is also a risk of rejecting a good lot. This is called the producers risk, for obvious reason, but it also does the consumer little good to reject acceptable products. If one considers only consumers risk, it becomes easy to design a test that not only rejects bad lots with high confidence but also seldom accepts any lots, including those with the lowest defect rate that the state of the art can produce.
To avoid this untenable situation, one must at least calculate the probability of a lot with some good defect rate being rejected. However, both bad and good defect rates are arbitrary values. The OC curve provides a picture of the probability of acceptance over the range of possible values and the producer and the consumer can adjust sample size and the number of acceptable failures until they are both satisfied that their interests are being achieved, as shown by an OC curve. The bad and good values need not even be defined if both parties are satisfied with the OC curve describing a proposed test.
However, defining acceptable risks for both good values (those we want to accept most of the time) and bad values (those we want to reject most of the time) limits the tests we can use. When good and bad values are close together, it is statistically difficult to distinguish one from another. The practical result of this is that a greater sample size is needed to provide both high confidence in rejecting lots with bad defect rates and high confidence in accepting lots with good defect rates, compared to the sample size needed when the good and bad values are farther apart.