This is just an Excerpt from a larger document, click here to view the entire document.
Intuitive Background & Concept

A quantitative random process may reflect, for example, the quality (or reliability) of a product batch. We can measure this quality by counting the number of defects per thousand parts produced (or by the number of failures per million hours). But, as we know, product characteristics vary from batch to batch, because the production process is random.

Hence, we must observe the process for an extensive number of hours, to obtain a point estimate, e.g., the average number of defective items per batch or average number of failures per unit time considered. And we must also obtain a measure of the variation of these outcomes. These two values allow us to state what the true average number of defectives per batch will be, with a given probability (i.e., confidence interval).

As an example, a graphical representation of the number of defective items found in 5,000 samples of 1,000-unit batches is presented in Figure 1. Results reflect a defect rate of 8%.

In the graph, the height (value on the y axis) represents the frequency with which a given number of defective items (value on the x axis) occur in 5,000 samples of batches of 1,000 items. For example, there were 300 occurrences of 70 defects among the 5,000 samples of batches of 1,000 items each.

Figure 1. Example 1: Number of defective items in 5,000 1,000-unit batches (Defects-1) (Click to Zoom)

We can see that observations of, say, 70 to 90 defects per 1,000 items are much more frequent than, say, less than 60 or more than 100 defects per 1,000 items. However, these observations can and do occur. Furthermore, if by chance they fall in our sample, they can bias our point and interval estimates, especially if our sample is small. That is why we need to draw large and random samples or to observe (or sample) the process for a long time - in the same way we need to monitor the stock market for a long time - to acquire a high confidence in our estimate.

The second aspect of this problem is process variation, measured through standard deviations, ranges, or other measures of dispersion. Two random production processes may have the same batch averages and still have totally different variation patterns. We present in Figure 2 another process that also exhibits 8% defectives per 1,000 items. But now the process has a much larger variation than before.

Figure 2. Example 2: Number of defective items in 5,000 1,000-unit batches (Defects-2) (Click to Zoom)

We now compare the processes in Figures 1 and 2 and present the descriptive statistics for samples of 5,000. The results of the comparison are summarized in Table 1

Table 1. Comparison of Processes from Figures 1 and 2
Measure of Comparison Variable
Defects-1 Defects-2
N 5,000 5,000
Mean 79.918 80.134
Median 80.00 80.55
Standard Deviation 8.727 25.055
Minimum 51.0 0.00
Maximum 114.0 199.7
Q1 74.0 63.07
Q3 86.0 96.6

Notice how the means and medians (measures of central tendency) are still practically the same. But the standard deviations and inter-quartile ranges (Q3-Q1) (measures of variation) for the second process (Defects-2) are larger than those of the first.

This characteristic is also reflected in the corresponding graphs. Observe how values below 50 or above 120 were practically non-existent in the first process (Defects-1). But such values are very plausible in the second. These values affect the Maximum and Minimum statistics. The degree of uncertainty of our statement - or equivalently the confidence it yields - is a direct result of the sample size (or observation time) as well as of the natural variation of the phenomenon under study, expressed in terms of its Standard Deviation (σ).

We can say the same about the stock market. If a stock goes up and down continuously or we have followed it for a very short time, our confidence that this stock will go up cannot be as high as that which we place in a stock that has consistently increased its value during the past 50 weeks of observation.