RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center and Operated by IIT Research Institute INSIDE T h e J o u r n a l o f t h e 5 Real-Time Prognostic Condition-Based Maintenance for High Value Systems 11 Opinion: The Military Services Still Rely on Reliability 15 Industry News 17 Whats New From the RAC? 21 Future Events 22 Tribute Reliability Analysis Center Fourth Quarter - 2001 Introduction The third in a series, this article overviews sever- al statistical procedures frequently used in relia- bility modeling and data analysis [1, 2, and 3] and illustrates the underlying philosophy. Although statistical how-tos are well explained in many excellent sources [4, 5, 6, and 7], the whys and basis for them usually are not [8]. The first article discussed random variables (RV) and their distributions and parameters. The second article discussed some problems dealing with the estimation and testing of unknown distribution parameters based on a random sample. In this 3rd article, we apply to modeling and data analysis some of these previous concepts. Statistical models are used in reliability because of the inherent variation in empirical data and the definition of reliability. Engineers work with data obtained from, hopefully, random samples. They need to understand and take advantage of inherent and unavoidable variability - statistics is the sci- ence that studies variability. In addition, the con- cept of reliability is wholly probabilistic. Statistics and reliability are inextricably interwoven. In this article, we discuss several statistical mod- eling procedures. We first discuss data analysis principles and their practical implementation. For example, the distribution of data must first be established, whether the data come from a single sample or from two or more batches. Then, the data are tested for potential outliers. We will see how outliers may be removed from the sample, if necessary. If there are two or more batches, we assess whether these can be pooled together (i.e., if they come from the same population), or if the data analysis results must be obtained separately for each individual batch. Finally, having satis- factorily determined the underlying distribution, we apply the appropriate statistical models (regression, ANOVA) to analyze our data, accord- ing to our needs and objectives. Establishing the Underlying Distribution and Parameters Any (parametric) statistical result obtained from a data set depends on the specific distribution assumed, as well as on the parameters estimated for the data set in question. Hence, the importance of establishing both the underlying population dis- tribution and its corresponding parameters. If there is any serious estimation error in this initial step, everything else that we do will be wrong since it will be based on this initial result. In the first article of this series, we saw how F(x), the Cumulative Distribution Function (CDF) and f(x), the probability density function (pdf), are related by the equation: F(x) = ò´f(t)dt. For the exponential case, for example: Fq(x) = ò´f(t)dt = ò´ 1/q exp(-t/q) dt = 1 exp(-x/q) Two types of Goodness of Fit (GoF) tests can be used to assess the underlying statistical distribu- tion of the data [8]. Both these GoF methods test that a completely specified (all distribution parameters are known) distribution Fq(x) fits a data set. Such composite GoF hypothesis (H0) has several parts: specifying the distribution function hypothesized as well as its parameters. One of the two types compares the actual (observed) number of sample points with the cor- responding expected number, obtained under the hypothesized pdf, for several data subintervals. An example of this type is the Chi Square GoF test [6]. The basis for the Chi Square test can be bet- ter understood by superimposing by eye the hypothesized pdf over the histogram of the data Statistical Analysis of Reliability Data, Part 3: On Statistical Modeling of Reliability Data By: Jorge Luis Romeu, IIT Research Institute T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F o u r t h Q u a r t e r - 2 0 0 1 2 and assessing how close they agree. Actually, the Chi Square pro- cedure reapportions the n data points to data subintervals, according to the proportional area that, over these subintervals, the pdf exhibits. Then, these expected values are compared with the actual or observed data points in such subintervals. If the results are close, the fit is acceptable. The second type of GoF test compares vertical distances between empirical (Fn) and theoretical (F0) CDFs, evaluated at the ordered sample points. Examples of this type of GoF tests are the Kolmogorov Smirnov (K-S) and the Anderson and Darling (A-D) [6]. The basis for K-S or A-D can be better understood by super- imposing by eye the hypothesized distribution CDF over the empirical cumulative function and then assessing the height dif- ferences between them, at the sample points. As before, if the two CDFs are close the fit is good. Otherwise, it is a poor fit. Both of these GoF approaches assume that the data come from a completely specified and continuous distribution (F0) with known parameter q. However, both of these approaches have been extended or approximated for the case when the parameters are unknown and need to be estimated from the sample, which is the usual case in practice. When a composite GoF hypothesis H0 is rejected, however, more than one alternative or possibility may occur. For example, assume the hypothesis H0; a data set was drawn from the Normal distribution, with mean m and variance s2. Now assume H0 is rejected. This may occur because the distribution is not Normal, even when the mean and variance may be the ones stated. It may also occur because the distribution is indeed Normal, but the mean, or the variance, or both, are not the ones stated in H0. It may finally occur that none of the stated assumptions in H0 is true, i.e., neither the distribution nor its parameters are as assumed in H0. On the other hand, it is also important to remember that, when H0 is not rejected, it just means that we have not found enough grounds to question its validity (i.e., the assumptions made). This allows us to assume H0 is correct. The A-D GoF test, for Normality, for one [1] or several [4] samples, is highly regarded among univariate GoF tests. Its asymptotic distribution (i.e., for large sample sizes) has been thoroughly studied. Many statistics software packages have implemented A-D in analytical and/or graphical form. K-S and the Chi Square GoF tests are also excellent, when applicable [4, 6, and 8]. After the distribution of the data set is established, it is screened for potential outliers. This can be done by using the MNR test [8], which singles out unusually high/low observations in a Normally distributed data set. The outliers uncovered must then be thoroughly checked for accuracy (clerical errors) and proper implementation (testing errors). If errors are detected, they must be corrected or the data point discarded. If no errors are ascer- tained, the data point should remain in the sample. Three statistical distributions are widely used in reliability and tested for GoF [3, 7]. The Weibull is one of them and is often justified, for theoretical reasons, in the derivation of extreme val- ues and by long practice. The Lognormal is also widely used in reliability studies. If a RV X is distributed Lognormally, then Log (X) is distributed Normally. We can test GoF for Lognormality by testing Log (X) for Normality (e.g., via A-D or K-S). Finally, the Exponential is also widely used in reliability, especially true when initial screening and efficient replacement policies remove infant mortality and aging elements from the well-known bathtub curve (hazard rate function), leaving only the quasi-constant useful life element. The Exponential is a special case of the Weibull, with a shape parameter of unity. Shape and scale parameters are estimated from the data via ana- lytical or graphical methods [3, 7] and then tested for GoF. If none of the previously mentioned distributions fit the data set, then a nonparametric method may be used [8]. However, this method is less accurate and also requires larger sample sizes. When working with a single sample, the described procedures will be implemented. If working with more than one, the k-sam- ple A-D GoF test can be implemented to assess the hypothesis (H0) that all samples (batches) come from the same population [8]. In the affirmative case, we can pool all the batches into a single, combined sample from which we obtain the desired results. If the test rejects H0 then an individual analysis must be carried out for each batch. Alternatively, we may implement ANOVA [4, 8] methods. Finally, if the variable of interest is associated with other meas- urements, then regression methods [5, 8] can be employed. We first verify that the regression model assumptions (i.e., inde- pendence and identically distributed observations, linearity, nor- mality) are met. If so, we can obtain the model parameter esti- mates. We must also check the model appropriateness. If the general linear model is applicable [5] then either the ANOVA or the regression procedure will provide the desired allowables [6]. These methods are discussed next. Regression Models Assume that we have two quantitative measurements: Xi (e.g., height) and Yi (e.g., weight). Assume that the pairs Pi = (Xi , Yi) 1 £ i £ n constitute the data in our data set. Assume that the vari- ables X and Y are associated (i.e., functionally related). Perhaps variable Xi may be easier, faster, cheaper or more accurately obtained than Yi. Or we may be able to exploit some (X-Y) functional relationship to estimate the parameters of some (e.g., AMSAA-Crow) model of interest. If such is the case then we can use this (X-Y) functional relation to our advantage and obtain an improved estimation of Y, through X. This is the idea behind the use of regression analysis. A suggested first step is to plot Yi vs. Xi, for each Pi = (Xi , Yi), 1 £ i £ n (e.g., each persons weight vs. height). If the variables X and Y are not associated (e.g., there is no association between T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F o u r t h Q u a r t e r - 2 0 0 1 3 a persons height and weight), then the resulting cloud of points Pi = (Xi ,Yi), will be uniformly and randomly scattered all over the plane [8]. Draw two lines (one vertical through the average of the projections over the X-axis; one horizontal, through the average of the projections over the Y-axis) over the plane. They divide the plane into four quadrants. Under H0 (i.e., variables X and Y are not associated) the set of points Pi should be equally and randomly distributed among the four plane quadrants. If X and Y are associated, H0, is rejected and the number of points in each quadrant differs. If there is a positive association (i.e., when X increases/decreases, so does Y) then the points will tend to cluster in the upper right and lower left quadrants. If there is a negative association between X and Y, the points will cluster in the upper left and lower right quadrants. The indicator covariance between X and Y, characterizing such relationship, is defined as: Cov(X, Y) = Sxy = å(xi -`x) (yi -`y)/(n-1); where`x and`y are the two corresponding sample averages. The covariance indicator is positive when a positive association between X and Y exists, negative when a negative association exists, and zero if no association between the two variables exists (e.g., their joint variation is not coordinated). As a measure of association between two variables, covariance is difficult to interpret because it depends heavily on the units in which variables X and Y are being measured. For example, the reader can obtain the sample covariance between height and weight, first measured in inches/pounds and then in meters/kilo- grams, to verify they differ. The correlation coefficient [4, 5] defined as rxy = Sxy /Sx Sy (where Sx is the sample standard devi- ation of variable Xi or Yi) is a normalized covariance. Correlation rxy measures the association between X and Y just as the covariance does. However, the correlation is easier to inter- pret, since rxy always lies between 1 and 1. In addition, rxy is dimensionless. The reader may recalculate the sample correlation rxy between height and weight, first in inches/pounds and then in meters/kilograms, and verify how they now agree. Correlation is also a measurement of linear association between X and Y. That is, if rxy > 0 and close to unity there is a linear trend that models the association between X and Y, with positive slope. If rxy < 0 and close to 1, this linear trend has a negative slope. If rxy @ 0 , there is no trend that mod- els the relationships. It is therefore, very useful, to obtain an estimate of the slope of such a linear trend (called the linear regression) and to use it to obtain a better estimate of Y (the dependent variable) given a value of X (the predictor). In mathematical terms: Yi = b0 + b1 Xi + ei ; 1 £ i £ n is the equation for a simple linear regression model. The multi- ple regression model is just an extension of this equation, when there are two or more predictor variables, X1, X2, etc. Yi = b0 + b1 Xi1 + b2 Xi2 + ... bk Xik + ei ; 1 £ i £ n The model error terms ei are independent and identically distrib- uted Normal, with mean 0. The bj (0 £ j £ k) are called regres- sion coefficients. The parameters (bj ; 0 £ j £ k; s2) are estimat- ed from the data during the regression analysis. Note that, given an adequate sample, it is always possible to obtain an estimate of its distribution and parameters. However, if a RV, Y, is associated with another RV, X, we can use correla- tion information to obtain an improved estimation (i.e., one with a smaller variance) of Y, given X. This is a clear advantage that statistical modeling of the data introduces. Linear Regression analysis requires three or more (k ³ 3) levels of measurements for the predictor variable X. If there are fewer levels, we must wait until more data (levels) are gathered. A regression model exists only if it is statistically significant, i.e., if its corresponding F-test rejects the null hypothesis H0: b1 = b2 = ... = bk = 0. For if H0 holds, variable Y (= b0 + e) does not depend on the predictor (X). Not necessarily all of the models independent variables need to be statistically significant (i.e., bj ¹ 0 for all j). For example, some predictor variables (Xj) may be highly significant (i.e., have a coefficient bj ¹ 0) while others may be redundant (i.e., not significant or bj = 0). The over- all regression model (as a whole) however, may still remain sta- tistically significant. These situations suggest correlation among predictors and require variable selection methods. Through such selection methods, redundant predictors are weeded out and the resulting regression model significantly improves [5]. If there exist four or more levels of measurements for the inde- pendent variable (X) then it is possible to fit a quadratic regres- sion model [8] to the data: Yi = b0 + b1 Xi + b2 Xi 2 + ei ; 1 £ i £ n If two or more regression models are statistically significant, we must compare them to select the most efficient [5, 8]. The best regression provides the maximum of information with the mini- mum of terms. The best model is then retained and used. Regression models are sensitive to violations of their three sta- tistical assumptions. These assumptions must be checked before the model can be correctly used. The analysis of regression residuals (ei) allows us to verify the regression assumptions [5, 8]. Normality of the data is checked via the K-S or the A-D GoF test or via graphical methods. If the normality assumption is rejected, data should be transformed [4] and the regression model recalculated. The assumption of equality of variance s2 can also be checked using statistical tests such as Bartletts [6] or through graphical analysis of residuals. If any of these procedures indicate that T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F o u r t h Q u a r t e r - 2 0 0 1 4 variances are not equal, the data should be transformed and the regression model should be recalculated. Regression models are based on two procedures. First, an optimization process is used to select a function that minimizes the sums of squares of the errors to each data point (åe2i). Then, distribution assumptions (e.g., normality, etc.) are imposed on the errors ei. If an invalid regression model is used (where the assumption of independ- ence, normality, or equality of variance of ei are not met), then the test confidence levels and the confidence intervals derived are no longer those claimed from the model [8]. For example, the point estimator for Yi (weight) given Xi (height) is valid, because of the optimization part of the regres- sion procedure. However, if the distributional assumptions of the model residuals are violated, then the confidence (interval) estimation for Yi given Xi and its probabilistic statements (e.g., that the mean weight for a person six feet tall is, with 90% con- fidence, between 180 and 220 lbs.) are approximate. ANOVA Models We have seen that we can have a single or several data samples (batches). If data come from the same population then we can pool the samples and work with the larger combined data set. However, if not all samples come from the same population, mixing them would be a mistake. Instead, these separate sam- ples become bivariate data. For, now, each data point provides two pieces of information: one is its property measurement and the other is its batch or sample number. ANOVA [4, 5 and 8] is the procedure used to establish whether k Normal batches, of n elements each, have the same mean. Otherwise, the group means differ. Two different estimates of the common variance are compared. One estimate is obtained by combining the variance estimators within groups, the other from the variance estimator between groups. If all k group means are equal, then these two variance estimators are close (for both esti- mate the same variance parameter), and their ratio is close to unit. If sample means differ, the ratio of these two variance esti- mators is different from unit. The ANOVA model is: yij = m + aj + eij ; 1 £ i £ n ; 1 £ j £ k where aj is the contribution of the ith sample (group) to the gen- eral m and eij is the error term which is distributed normally, with mean 0 and variance s2. Under H0, all group means are equal, hence all aj = 0; 1 £ j £ k. Under H1, at least one aj ¹ 0. Hence, at least one group has a different mean: mj = m + aj ¹ m. One crucial ANOVA assumption is that all group or batch vari- ances s2are equal. This assumption must be tested before imple- menting the ANOVA results. If the test fails (i.e., there is reason to believe that not all groups have the same variance s2) then data transformation or other procedures must be implemented before/in lieu of ANOVA [4]. Another important ANOVA consideration is the number of data points (nj; 1 £ j £ k) per group. ANOVA works better under bal- anced designs (i.e., nj = n) where all k groups are of equal size n. For example, think of the sample size n as the total informa- tion received. Think of the k groups as k informants and of the ANOVA test as an assessment procedure that is based on the information provided by k different informants. Optimally, we would like to assign equal weight to all informants contribution, and not to have to rely too heavily on excess information from some (potentially biased) informants, at the expense of lacking information from the others. In practice, however, samples are often of different sizes [4, 8]. To correct for this problem, we can use effective sample sizes (n) obtained via the formula: n = (N-n*)/(k-1); where n* = ån2j/N; N = ånj and 1 £ j £ k. When nj = n (i.e., all groups have the same size) then n* = n= n (i.e., the effective size is the group size). Statistical analysis strives to obtain the most efficient and unbiased assessment (test) from the data (information). ANOVAs bivariate data is categorical (qualitative). Each data point Pij = (Yij , j) includes the property measurement Yij and its corresponding sample group j. The group is not quantitative. When both measurements are quantitative, i.e., when Pi = (Xi , Yi), the association between the two variables can be established in a better way: via regression. Summary In three short review articles, we have discussed some of the main ideas and concepts behind several statistical procedures used in reliability applications in particular, and in industrial applications in general. By stressing statistical thinking over the mechanics of statisti- cal applications, the practitioner gains a better understanding of statistics. Understanding will encourage a more frequent and better use of statistical methods among practitioners and engi- neers in the field. Bibliography 1. Anderson, T.W. and D.A Darling, A Test of Goodness of Fit, JASA, Vol. 49 (1954), Pages 765-769. 2. Box, G.E.P., W.G. Hunter, and J.S. Hunter, Statistics for Experimenters, John Wiley and Sons, Inc., New York, NY, 1978. 3. Coppola, Anthony, Practical Statistical Tools for the Reliability Engineer, Reliability Analysis Center, Rome, NY, 1999. 4. Criscimagna, N., Maintainability Toolkit, Reliability Analysis Center, Rome, NY, 2000. 5. Dixon, W.J. and F.J. Massey, Introduction to Statistical Analysis, McGraw-Hill, New York, NY, 1983. 6. Draper, N. and H. Smith, Applied Regression Analysis, John Wiley and Sons, Inc., New York, NY, 1980. 7. Reliability Analysis Center, Reliability Toolkit: Commercial Practices Edition, Rome, NY, 1995. T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F o u r t h Q u a r t e r - 2 0 0 1 5 Abstract Many industries operate high value equipment often remotely that must perform reliably in severe environments. The U.S. Navy (USN) operates such an equipment the submarine Towed Array System (TAS) comprised of integrated hydraulic, mechanical, electronic and acoustic subsystems. To maintain this systems capability, the Navy has stressed conventional approaches to operation and maintenance. The USN invested in a prognostic Condition Based Maintenance (CBM) proof of concept for an individual ship TAS by developing the Thinline Health Monitoring System (THMS). THMS collects real-time and discrete reliability data, synchronizing these with other historical information, and the TASs current condition assessment. As a predic- tive intelligent code, it uses Bayesian Belief Networks (BBNs) to extract the full value of real-time data and provide a complete range of system performance evaluations, from diagnosis to prediction. Drawing upon THMS success, the USN supported expanding this capability fleet-wide to assess the health of the entire submarine TASs population. Plans have been developed to build a relational database, accessible to a geographically separated towed systems community via the Internet, for interactive analysis and diagnostics. The methodology described in this paper is directly translatable to other government and commercial critical systems that cannot afford either unscheduled or unnecessary maintenance. Introduction Conventional approaches to operation and maintenance have been used for Submarine Towed Array Systems (TAS) to maintain the system level capability necessary for ships operations. TASs are mission essential for obtaining acoustic information in support of a high percentage of critical submarine deployments. By itself, a TAS is a complex configuration that requires integrated remote operation of hydraulic, mechanical, electrical, electronic and acoustic subsystems in a severe ocean environment. It is nearly impossible to observe the full functioning of each TAS component during operation. Furthermore, if malfunctions occur at sea, repairs most often need to be deferred until return to port. Repairs are frequently costly, and if performed afloat, have the potential to result in repair-induced failures owing to poor accessibility and an adverse repair environment. Adopting a prognostic Condition Based Maintenance (CBM) capability completely alters the TASs maintenance landscape by monitoring current system conditions and predicting degradations so that necessary repairs can be com- pleted in advance under favorable conditions. In 1999, the U.S. Navy (USN) funded a team effort by Areté Associates and Life Cycle Engineering (LCE) to use existing control and signal parameters from an in-service towed system (both array and handler) in constructing a device for evaluating and displaying operating system health. At the individual ship system level, the USN invested in a proof-of-concept Thinline Health Monitoring System (THMS) for an OA-9070/TB-23 thin- line towed array baseline configured SSN 688 Class submarine [1]. The Areté/LCE team produced THMS as a real-time method for assessing the current condition of the TAS and demonstrated the ability to dynamically predict future system health. The prin- cipal elements that support this capability are real-time sensor inputs, a mature Preventive Maintenance program, and an embedded Bayesian Belief Network (BBN) intelligent code. Having demonstrated a prognostic, next generation maintenance capability for individual systems, the USN then funded CBM concept development for the fleet population of towed arrays under the Small Business Innovative Research (SBIR) program. Integral to this capability is a comprehensive discrete historical and real-time web-based relational database and a powerful soft- ware toolbox to permit diagnostic and prognostic information mining simultaneously to geographically separated users (includ- ing operators, design engineers, vendors and logisticians). The inherent object oriented BBN tree framework permits the exten- sion of the THMS health assessment output to serve as an input to the overarching BBN population model. A similar approach is directly applicable to other systems and industries. Background The Navys core maintenance efforts reside in a Reliability Centered Maintenance (RCM) program practiced at the level of Preventive Scheduled Maintenance (PSM). In this paper, we compare the introduction of a high-level CBM capability based on intelligent software with in-use preventive maintenance (PM) pro- grams. In this regard, PSM is implemented as a process of devel- oping a critical and functional list of failures and using default sta- tistical data to develop a scheduled maintenance program to pre- vent expected failures. Some naval programs have made the addi- tional investment to construct age-related failure distributions and reexamine the PM plan to bring it in line with operational per- formance. References are expanding to include CBM either as a reliability subject of its own or covered under the umbrella of RCM [2]. Here we use the RCM restricted domain of PSM to dis- tinguish it from a program incorporating a higher-level form of maintenance using real time probabilities of health and prediction. PSM uses statistical information derived from historical data to 8. Romeu, J.L and C. Grethlein, A Practical Guide to Statistical Analysis of Material Property Data, Advanced Materials and Processes Technology Information Analysis Center, Rome, NY, 1999. 9. Sadlon, R., Mechanical Applications in Reliability Engineering, Reliability Analysis Center, Rome, NY, 1993. 10. Scholz, F.W. and M.A. Stephens, K-Sample Anderson- Darling Tests, JASA, Vol. 82 (1987), Pages 918-924. Real-Time Prognostic Condition-Based Maintenance for High Value Systems By: Harry Bishop and William Matzelevich, Areté Associates Edward Rossi, Life Cycle Engineering, Ron Thomas and Meeiyun Hsu