This is just an Excerpt from a larger document, click here to view the entire document.Some Statistical Background
Establishing the underlying distribution of a data set or random variable is crucial for the correct implementation of some statistical procedures. For example, deriving the test and CI for the population MTBF requires knowledge about the distribution of the lives of the device. If the lives are Exponential, things will be done one way; if they are Weibull, they will be done differently. Therefore, we first need to establish the life distribution from the data, before we can correctly implement the test procedures.
The GoF tests are the statistical procedures that allow us to establish whether an assumed distribution is correct. GoF tests are essentially based on either of two distribution basics:
the cumulative distribution function, or CDF, and the probability density function or pdf Procedures based on the CDF are called "distance tests" while those based on the PDF are called "area tests" [3, 4]. The Chi-Square GoF test, which is the topic of this paper, is an area test.
To assess data, we implement a well-defined scheme. First, assume that data follow a pre-specified distribution (e.g., Normal). Then, we either estimate the distribution parameters (e.g., mean and variance) from the data or obtained from prior experience. Such process yields the "composite" distribution hypothesis (which has more than one element that jointly must be true) called the null hypothesis (or H0). The negation of the assumed distribution (null hypothesis) is called the alternative hypothesis (or H1). We then test the assumed (hypothesized) distribution using the data set. Finally, H0 is rejected whenever any one (or more) of the several elements in hypothesis H0 is not supported by the data.
The Chi-Square test is conceptually based on the probability density function (PDF) of the assumed distribution. If this distribution is correct, its PDF (yielding an area of unity) should closely encompass the data range (of X). We thus select convenient values in this data range (Figure 1) that divide it into several subintervals. Then, we compute the number of data points in each subinterval. These are called "observed" values. Then, we compute the number that should have fallen in these same subintervals, according to the PDF of the assumed distribution. These are called the "expected" values and the Chi-Square test requires at least five of them in every subinterval. Finally, we compare these two results. If they agree (probabilistically) then the data supports the assumed distribution. If they do not, the assumption is rejected. The formula (statistic) that uses the differences between "expected" and "observed" values to test the GoF follows a Chi-Square distribution. Hence, the name Chi-Square test.
In what follows we proceed as in Figure 1, using several data sets to fit a Normal, an Exponential, and a Weibull distribution. We will work with the same data sets used in the START sheets that discussed these empirical GoF procedures [7, 8, and 9]. In this way, the reader can compare the results for these two approaches and verify that they agree.
Figure 1. Area Goodness of Fit Test Conceptual Approach (Click to Zoom)
The procedure is as follows:
Divide the data range of X into k subintervals.
Count the number of data points in each subinterval (histogram).
Superimpose the PDF of the assumed (theoretical) distribution.
Compare the empirical (histogram) with theoretical (PDF).
If they agree (probabilistically) the distribution assumption is supported by the data.
If they do not, the assumption is most likely incorrect.
The formula for the Chi-Square Statistic is:
where
ei
expected number of data points in cell i (ei ≥ 5)
oi
actual (observed) number of data points in cell i;
k
total number of cells or subintervals in the range;
n
sample size for implementing the Chi-Square test (n ≥ 5*k)
k
total number of cells or range subintervals
k -1- No.
Estimated Parameters (nep); Chi-Square degrees of freedom (DF > 0)
χ2γ
is the Chi-Square distribution (table) with DF = γ