This START sheet discusses some empirical and practical methods for checking and verifying the statistical assumptions of the Weibull distribution. It presents several numerical and graphical examples and provides references for further reading.
It is important to correctly assess statistical distributions. For, when our hypothesized distribution does not hold, the derived statistical results are invalid (6). For example, the confidence levels of the confidence intervals (or of hypotheses tests) implemented may be completely off. To avoid such problems, we need to check all distribution assumptions.
Two approaches are used to assess the distribution assumptions. One is by implementing numerically convoluted, theoretical Goodness of Fit (GoF) tests such as the Chi Square, Anderson Darling or Kolmogorov-Smirnov. Their lengthy calculations often require the use of specialized software, not always readily available. On the other hand, there exist practical procedures that are easy to understand and implement and are based on intuition and graphical distribution properties. These procedures can also be used to assess the distribution assumptions (5, 7, 8).
This START sheet discusses such practical assessment procedures, for the important case of the Weibull distribution, widely used in reliability, maintainability, and safety (RMS) work (1, 2, 3, 4). We begin with a numerical example that illustrates the importance of this problem. Then, we develop additional numerical and graphical examples that illustrate the implementation and interpretation of such distribution checks.
Putting the Problem in Perspective
Assume that we need to estimate the reliability of a device, R(T), for a Mission Time T, based on some life data (X1, ..., Xn). First, consider that the distribution of the life of a device (times to failure) is Weibull (Figure 1) and then that it is Exponential (Figure 2), having the same mean = 10. Figures 1 and 2 were obtained from 5000 data points from each of these two distributions. The Weibull, in addition, has shape parameter β = 1.23 and scale parameter α = 11.
The descriptive statistics for these 5000 data points are shown in Table 1. Notice how the two means are 10. The two distributions differ mainly in that Weibull clusters about the mean and is therefore, less variable than the Exponential (contrast the StDev values).
Table 1. Descriptive Statistics for the Data Sets
Variable
N
Mean
Median
StDev
Min
Max
Q1
Q3
W(11,1.23)
5000
10.106
7.936
8.338
0.010
77.834
3.875
14.010
Expon(10)
5000
9.996
6.868
10.174
0.001
92.951
2.736
13.933
There are some practical connotations of belonging to one of these two distributions. The Weibull distribution with shape parameter larger than unity (β > 1) characterizes a life that deteriorates with time, i.e., device lives whose failure rate increases with time (reliability decay). On the other hand, when the shape parameter is unity (β = 1), Weibull becomes an Exponential distribution. Hence, the device failure rate is constant and there is no reliability growth or decay. Finally, if the shape parameter is smaller than unity (β < 1), there is reliability growth because the failure rate of the device decreases with time.
Thus, a point estimator based on the life data is obtained by calculating such reliability according to some "formula." However, reliability is defined as the probability that a device life X outlasts the device mission time T (formally, R(T) = P{X > T}). As a result, the assumption of a specific statistical distribution for the device life determines which "formula" we use, as well as which parameters it includes.
For example, assume the data are distributed as a Weibull, with shape parameter β and scale parameter α. Then, the "formula" of the Weibull reliability point estimator is:
R(T) = P{X > T} = Exp{-(T/α)β}
However, if the data are assumed Exponential, with mean , the Exponential reliability estimator becomes:
R(T) = P{X > T} = Exp{-T/θ}
Because the two distributions are different the two reliability estimations will differ (they have different formulas and parameters) except when the shape parameter β = 1 and the Weibull distribution becomes an Exponential.
For example, if the required Mission Time is T = 3 and the parameters are known and equal to α = 11, β = 1.23 and θ = 10, the two respective reliabilities are as follows:
If the true distribution of lives were Weibull (11,1.23):
R(T) = Exp{-(T/)} = Exp{-(3/11)1.23} = 0.81
If the true distribution of lives were Exponential (10):
R(T) = Exp{-T/} = Exp(-3/10) = 0.74
The difference between the two reliabilities is close to 10%! Thus, it is very importance to assess (via the sample data) whether or not that our distribution assumption is correct.
Finally, the problem becomes yet more complex when the distribution parameters are unknown. For then we also need to estimate these parameters from the samples and the uncertainty increases even more.
Statistical Assumptions and their Implications
Fortunately, distribution model assumptions are associated with very practical and useful implications, and the Weibull is no exception. In practice, the assumption that Weibull is the true distribution of the lives of a device has several important connotations:
some physical and theoretical and others algebraic and graphical.
The physical interpretations can be inferred from Weibull's relationship to the Extreme Value Theory (3, 4). For example, consider a metallic chain where each of its "n" links has the same size and strength. Such a chain can be considered a series system composed of "n" components, each having the same life distribution and failure rate. The system fails whenever the first failure occurs (link breaks). Therefore, the lives of a population of these systems (chains) would follow the Weibull.
In addition, the Weibull failure rate increases, decreases or remains constant, according to the value of shape parameter . These characteristics help us assess whether the life of a device is Weibull, by analyzing its physical conditions.
The algebraic consequences stem from another important characteristic of the Weibull: its closed functional forms that are easily manipulated from a mathematical standpoint. Weibull's density and distribution functions are, respectively:
The graphical consequences stem from such ease of algebraic manipulations. Taking the logarithms of the distribution function F(x) and doing some algebra, we obtain:
When the distribution of the lives is really Weibull, the previous equation is that of a line. Now assume that an estimation of F(x) can be obtained and denote it px. We then can substitute px in lieu of F(x) in the equation and solve for x.
We actually estimate the value px for any data point x, i.e., the "median rank" by defining:
F(x) = px = (Rank(x) - 0.3) / (n + 0.4)
where Rank(x) is the rank of life x, in the sorted sample of size n, of all device lives.
Using such pxvalues, we plot the pairs (px, x) in "Weibull paper". Alternatively, we can plot the Log-transformed, sorted data, right from the above equation, as will be shown in the next section. In either case, we use these plots to assess whether the true distribution is Weibull, and to estimate its parameters.
Practical Methods to Verify Weibull Distribution
We now apply several empirical and practical procedures to the life test data in Table 2 to determine if the sample (n = 45) was taken from the Weibull.
Table 2. Large Sample Life Data Set (sorted)
0.8997
1.2838
1.5766
1.8627
2.4193
2.4353
3.1520
3.3367
3.4850
3.9605
3.9921
3.9934
4.1013
4.8306
5.3545
5.6094
7.7829
7.8240
8.3431
9.0248
9.2627
9.2766
9.7943
11.4391
12.2847
12.4112
13.1651
13.4990
13.5532
14.1542
14.4694
14.5857
15.1603
15.6962
15.7833
17.4998
18.1497
18.6342
19.4354
19.7557
19.9496
22.5383
23.8066
29.9006
34.0658
In this life data set, two distribution assumptions need to be verified: (1) that the data are independent and (2) that they are identically distributed as a Weibull.
The assumption of independence implies that randomization (sampling) of the population of devices (and other influencing factors) must be performed before placing them on test. For example, device operators, times of operations, weather conditions, location of the devices in warehouses, etc. should be randomly selected. Only then will the sample be representative of the population.
To assess the second assertion, we use informal methods, based on the properties of the Weibull distribution. They seem appropriate for the practical engineer, since they are largely intuitive and easy to implement.
To assess a sample, we first tabulate and plot the raw data in several ways. The descriptive statistics are shown in Table 3 and the histogram in Figure 3. Next, we analyze and check (empirically but efficiently) if a Weibull assumption holds.
Table 3. Descriptive Statistics of Data in Table 2
N
Mean
Median
StdDev
Min
Max
Q1
Q3
45
11.19
9.79
7.85
0.9
34.07
3.99
15.74
There are a number of useful and easy to implement procedures, based on well-known statistical properties of the Weibull distribution, which help us to informally assess this assumption. These properties are summarized in Table 4.
Figure 3. Histogram of the Sample from Table (Click to Zoom)
Table 4. Some Properties of the Weibull Distribution
Characteristic life α lies approximately at the 63rd percentile (63% of the population). Hence, the Weibull sample should replicate this. Sample 63rd percentile should be an alternative (gross) estimator of characteristic life α.
The plot of the transformed, sorted data set of lives {X1, ..., Xn}: should be linear, if the true distribution is Weibull.
The slope of the linear trend from Property 2 is an alternative estimator of shape β.
The regression of the pairs defined in Property 2, yields better estimates of (α, β) and these should be close to the raw estimates obtained in Properties 2 and 3 above.
The transformation Y = Xβ should yield an Exponential distribution with mean μ = αβ.
The Weibull Probability and Score plots of device lives {X1, ..., Xn} should be linear.
The corresponding regressions from the plots in Property 6 should have a slope of unity.
To verify Property 1, we notice how device lives 13.49 and 13.55 (in Table 2) have ranks 28 and 29. Since the 63rd percentile is estimated by 0.63*n = 0.63*45 = 28.35 we need to interpolate. The average of these two lives (13.53) yields a rough estimate of the Weibull characteristic life α, which we will compare with results from Properties 4 and 7.
To verify Property 2, we transform the data (Table 5). The first column is the original data, the second its mean rank px, the third its transformation ln(ln(1/(1- px))) and the last column, its transformation ln(X).
For example, for the first (smallest) value (0.8997) px is:
Substituting px for F(X) in ln(ln(1/(1- F(X)))) we obtain the corresponding:
Table 5. Transformed Data
Row
Sample
Px
Ln(Ln(*))
Ln(X)
1
0.8997
0.0154
-4.1644
-0.10566
2
1.2838
0.0374
-3.2659
0.24980
3
1.5766
0.0595
-2.7918
0.45529
4
1.8627
0.0815
-2.4650
0.62200
5
2.4193
0.1035
-2.2138
0.88347
6
2.4353
0.1256
-2.0087
0.89009
7
3.1520
0.1476
-1.8346
1.14804
8
3.3367
0.1696
-1.6828
1.20498
9
3.4850
0.1916
-1.5477
1.24846
10
3.9605
0.2137
-1.4256
1.37636
11
3.9921
0.2357
-1.3139
1.38432
12
3.9934
0.2577
-1.2106
1.38465
13
4.1013
0.2797
-1.1143
1.41131
14
4.8306
0.3018
-1.0239
1.57496
15
5.3545
0.3238
-0.9384
1.67794
16
5.6094
0.3458
-0.8572
1.72444
17
7.7829
0.3678
-0.7795
2.05193
18
7.8240
0.3899
-0.7051
2.05720
19
8.3431
0.4119
-0.6333
2.12143
20
9.0248
0.4339
-0.5638
2.19998
21
9.2627
0.4559
-0.4964
2.22599
22
9.2766
0.4780
-0.4307
2.22750
23
9.7943
0.5000
-0.3665
2.28180
24
11.4391
0.5220
-0.3035
2.43704
25
12.2847
0.5441
-0.2416
2.50836
26
12.4112
0.5661
-0.1805
2.51860
27
13.1651
0.5881
-0.1199
2.57757
28
13.4990
0.6101
-0.0598
2.60262
29
13.5532
0.6322
0.0001
2.60663
30
14.1542
0.6542
0.0600
2.65001
31
14.4694
0.6762
0.1201
2.67204
32
14.5857
0.6982
0.1808
2.68004
33
15.1603
0.7203
0.2421
2.71868
34
15.6962
0.7423
0.3045
2.75342
35
15.7833
0.7643
0.3683
2.75895
36
17.4998
0.7863
0.4340
2.86219
37
18.1497
0.8084
0.5021
2.89866
38
18.6342
0.8304
0.5734
2.92500
39
19.4354
0.8524
0.6489
2.96710
40
19.7557
0.8744
0.7300
2.98344
41
19.9496
0.8965
0.8189
2.99321
42
22.5383
0.9185
0.9192
3.11522
43
23.8066
0.9405
1.0375
3.16996
44
29.9006
0.9626
1.1893
3.39788
45
34.0658
0.9846
1.4284
3.52830
We plot the pairs [ln{ln(1/1 - px)}, ln(x)] in Figure 4. They reflect a linear trend, as expected from Property 2, when the device lives are distributed as a Weibull.
From Figure 4 and the data in Table 5, we obtain the slope for the estimated linear trend:
Figure 4. Scatter Plot of the Transformed Data in Table 5 (in Columns 2 and 4) (Click to Zoom)
This slope (1.3478) is a rough estimate of the Weibull shape parameter . To obtain a formal estimation (Property 4, of Table 4) we regress ln(ln(1/(1- px))) = C2 and ln(X) = C1:
The regression equation is C2 = -3.41 + 1.35 C1
Predictor
Coef
StDev
T
P
Constant
-3.40715
0.06856
-49.69
0.000
C1
1.35424
0.03008
45.02
0.000
S = 0.1774 R-Sq = 97.9% R-Sq(adj) = 97.9%
Intercept = -3.4071, Slope = 1.3542
The regression fit is high (97.9%); its slope (1.35) is the Weibull shape parameter; CharLf is the Weibull Characteristic Life, or scale parameter, and it is obtained by:
Notice how the rough estimates of Characteristic Life and shape parameters (13.53 and 1.347) are close to the more formal Weibull estimates given by the regression above.
We now perform the transformation Y = Xβ (Table 6). If X is distributed Weibull then, by Property 5 in Table 4, Y will be Exponential with mean αβ.
The Exponentiality of Y can be assessed by any or all of the procedures in Reference 5. For example, compare the descriptive statistics and probability plots of variable Y = Xβ.
Table 6. Transformation Y = X**1.35 Yields an Exponential (μ = 29.860)
0.867
1.401
1.849
2.316
3.296
3.325
4.711
5.087
5.395
6.411
6.481
6.484
6.721
8.383
9.633
10.257
15.960
16.074
17.530
19.491
20.188
20.229
21.768
26.843
29.556
29.967
32.450
33.567
33.749
35.785
36.865
37.265
39.261
41.146
41.454
47.654
50.058
51.870
54.904
56.129
56.874
67.057
72.201
98.213
117.120
Notice in Table 7 how the mean and standard deviation of Y = Xβ are relatively close, as expected in an Exponential distribution. The Probability plot of Y, presented in Figure 5, also shows a clear linear trend.
Table 7. Descriptive Statistics
Variable
N
Mean
Median
StDev
Transf
45
28.97
21.77
26.11
Figure 5. Probability Plot for the Transformed Variable Y = Xβ (Linear Trend as Expected for Exponential) (Click to Zoom)
To assess Property 6 in Table 4, we implement Weibull probability and score plots on the original lives {X1, ..., Xn}. These plots (Figures 6 and 7) as expected, are linear.
Figure 6. Probability Plot for the Weibull Data; it Follows an Upward Linear Trend, as Expected if X is Weibull (Click to Zoom)
Figure 7. Weibull Scores Plot Displays a Linear Trend, as Expected from Property 7 (Click to Zoom)
If the Weibull assumption is correct, the linear regression of the data in Figure 6 should also reflect the one-to-one relation, yielding a slope of unity (Property 7).
The regression equation is WeibProb = -0.0113 + 1.03 Irank
Predictor
Coef
StDev
T
P
Constant
-0.01131
0.01177
-0.96
0.342
Irank
1.03051
0.02049
50.29
0.000
S = 0.03881 R-Sq = 98.3% R-Sq(adj) = 98.3%
The regression Index of Fit is very high (R2 = 98.3). The regression slope (1.03) yields an approximate 99% CI (0.97, 1.09) that covers unit, supporting Weibull by Property 7.
The Weibull scores (xi) are the percentiles corresponding to the Median Ranks px in Table 5. To obtain such percentiles, we substitute px for F(x) in the Weibull equation
and solve for Xi obtaining the equation
For example, from the smallest data point (0.899) we get the first Weibull score (using px = 0.0154) in the following manner:
Weibull scores are then plotted vs. their corresponding sorted data (e.g., 0.566 vs. 0.899). The Weibull scores plot is presented in Figure 7.
The regression of the Weibull scores on the ordered sample, according to Property 7 in Table 4, should also yield a slope of unity. The regression equation is:
WeibScr = 0.041 + 0.989 WeibSamp
Predictor
Coef
StDev
T
P
Constant
0.04080
0.26870
0.15
0.880
WeibSamp
0.98886
0.01974
50.11
0.000
S = 1.027 R-Sq = 98.3% R-Sq(adj) = 98.3%
As with the Probability Plot, the Index of Fit (98.3%) is very high. The 99% approximate CI (0.92, 0.104) also covers unity, as expected when the data is distributed Weibull.
All of the preceding empirical results support the plausibility of the Weibull assumption for our life data set. If, at such point, a stronger case for the validity of the Weibull distribution is required, then a number of theoretical GoF tests can be carried out. GoF tests will be the topic of a forthcoming paper.
Summary
In this START sheet we have discussed the important problem of (empirically) assessing the Weibull distribution assumptions of a data set. We have provided several numerical and graphical examples. We have discussed some related theoretical and practical issues, giving references to background information and further readings. In doing so, we mentioned other, very important, reliability analysis topics. Due to their complexity, these will be treated in more detail in forthcoming papers.
Bibliography
Practical Statistical Tools for Reliability Engineers, Coppola, A., RIAC, 1999.
Mechanical Applications in Reliability Engineering, Sadlon, R.J., RIAC, 1993.
Reliability and Life Testing Handbook, Kececioglu, D., Editor, Vols. 1 and 2, Prentice Hall, NJ, 1993.
* Note: The following information about the author(s) is same as what was on the original document and may not be correct anymore.
Dr. Jorge Luis Romeu has over thirty years of statistical and operations research experience in consulting, research, and teaching. He was a consultant for the petrochemical, construction, and agricultural industries. Dr. Romeu has also worked in statistical and simulation modeling and in data analysis of software and hardware reliability, software engineering and ecological problems.
Dr. Romeu has taught undergraduate and graduate statistics, operations research, and computer science in several American and foreign universities. He teaches short, intensive professional training courses. He is currently an Adjunct Professor of Statistics and Operations Research for Syracuse University and a Practicing Faculty of that school's Institute for Manufacturing Enterprises. Dr. Romeu is a Chartered Statistician Fellow of the Royal Statistical Society, Full Member of the Operations Research Society of America, and Fellow of the Institute of Statisticians.
Romeu is a senior technical advisor for reliability and advanced information technology research with Quanterion. Since joining Quanterion in 2007, Romeu has provided consulting for several statistical and operations research projects. He has written a State of the Art Report on Statistical Analysis of Materials Data, designed and taught a three-day intensive statistics course for practicing engineers, and written a series of articles on statistics and data analysis for the AMPTIAC Newsletter and RIAC Journal.
Other START Sheets Available Many Selected Topics in Assurance Related Technologies (START) sheets have been published on subjects of interest in reliability, maintainability, quality, and supportability. START sheets are available on-line in their entirety at