A Strategy for Simultaneous Evaluation of Multiple Objectives
Proper measurement and evaluation of performance is the key to comparing the performance of products and processes. When there is only one objective, carefully defined quantitative evaluation most often serves the purpose. However, when the product or process under study is to satisfy multiple objectives, performances of the subject samples can be scientifically compared only when the individual criteria of evaluations are combined into a single number. This report describes a method in which multiple objectives are evaluated by combining them into an Overall Evaluation Criteria (OEC).
In engineering and scientific applications, measurements, and evaluations of performance are everyday affairs. Although there are situations where measured performances are expressed in terms of attributes such as Good, Poor, Acceptable, Deficient, etc., most evaluations can be expressed in terms of numerical quantities (instead of Good and Bad, use 10 and 0). When these performance evaluations are expressed in numbers, they can be conveniently compared to select the preferred candidate. The task of selecting the best product, a better machine, a taller building, a champion athlete, etc. is much simpler when there is only one objective (performance) which is measured in terms of a single number. Consider a product such as a 9 Volt Transistor Battery whose functional life expressed in terms of hours is the only characteristic of concern. Given two batteries: Brand A (20 hours) and Brand B (22.5 hours), it is easy to determine which one is preferable. Now suppose that you are not only concerned about the functional life, but also the unit costs which are: $1.25 for Brand A and $1.45 for Brand B. The decision about which brand of Battery is better is no longer straightforward.
Multiple performance objectives (or goals) are quite frequent in the industrial arena. A rational means of combining various performances evaluated by different units of measurement is essential for comparing one product performance or process output with another. In experimental studies like the Design of Experiments (DOE) technique, performances of a set of planned experiments are compared to determine the influence of the factors and the combination of the factor levels that produce the most desirable performance. In this case the presence of multiple objectives poses a challenge for analysis of results. Inability to treat multiple criteria of evaluations (measure of multiple performance objectives) often renders some planned experiments ineffective.
Combining multiple criteria of evaluations into a single number is quite common practice in academic institutions and sporting events. Consider the method of expressing a Grade Point Average (GPA, a single number) as an indicator of student's academic performance. The GPA is simply determined by averaging the GPA of all courses (such as scores in Math, Physics, or Chemistry
individual criteria evaluations) which the student achieves. Another example is a sporting event like Figure Skating Competition where all performers are rated in a scale of 0 to 6. The performer who receives 5.92 wins over another whose score is 5.89. How do the judges come up with these scores? People judging such events follow and evaluate each performer in an agreed upon list of items (criteria of evaluations) such as style, music, height of jump, stability of landing, etc. Perhaps each item is scored in a scale of 0 - 6, then the average scores of all judges are averaged to come up with the final scores.
If academic performances and athletic abilities can be evaluated by multiple criteria and are expressed in terms of a single number, then why isn't it commonly done in engineering and science? There are no good reasons why it should not be. For a slight extra effort in data reduction, multiple criteria can be easily incorporated in most experimental data analysis schemes.
To understand the extra work necessary, let us examine how scientific evaluation differs from those of student achievement or from an athletic event. In academic as well as in athletic performances, all individual evaluations are compiled in the same way, say 0 - 4 (in case of student's grade, there are no units). They also carry the same Quality Characteristic (QC) or the sense of desirability (the higher score the better) and the same Relative Weights (level of importance) for all. Individual evaluations (like the grades in individual courses) can be simply added as long as their (a) units of measurement, (b) sense of desirability, and (c) relative weight (importance) are the same for all courses (criteria). Unfortunately, in most engineering and scientific evaluations, the individual criteria are likely to have different units of measurement, Quality Characteristic, and relative weights. Therefore, methods specific to the application, and that which overcomes the difficulties posed by differences in the criteria of evaluations, must be devised.
Units of Measurements
Unlike GPA or Figure Skating, the criteria of evaluations in engineering and science, generally have different units of measurements. For example, in an effort to select a better automobile, the selection criteria may consist of: fuel efficiency measured in Miles/Gallon, engine output measured in Horsepower, reliability measured as Defects/1000, etc. When the units of measurements for the criteria are different, they cannot be combined easily. To better understand these difficulties, consider a situation where we are to evaluate two industrial pumps of comparable performances (see Table 1). Based on 60% priority on higher discharge pressure and 40% on lower operating noise, which pump would we select?
Table 1. Performance of Two Brands of Industrial Pumps
Pump A delivers more pressure, but is noisier. Pump B has a little lower pressure, but is quieter. What can we do with the evaluation numbers? Could we add them? If we were to add them what units will the resulting number have? Would the totals be of use? Is Pump A with 250 total better than Pump B?
Obviously, addition of numbers (evaluations) with different units of measurements is not permissible. If such numbers are added, the total serves no useful purpose, as we have no units to assign, nor do we know whether bigger or smaller value is better. If the evaluations were to be added, they must first be made dimensionless (normalized). This can be easily done by dividing all evaluations (such as 160 psi, 140 psi) of a criteria by a fixed number (such as 200 psi), such that the resulting number is a unitless fraction.
Quality Characteristic (QC)
Just because two numbers have the same or no units, they may not necessarily be meaningfully added. Consider the following two players' scores (see Table 2) and attempt to determine which player is better.
Table 2. Golf and Basketball Scores of Two Players
|Golf (9 holes)
||==> Smaller is better
||==> Bigger is better
Observe that the total of scores for Player 1 (42 + 28) is 70 and for Player 2 (52 + 18) is also 70. Are these two players of equal caliber? Are the additions of the scores meaningful and logical?
Unfortunately, the total of scores do not reflect the degree by which Player 1 is superior over Player 2 (score of 42 is better than 52 in Golf and score of 28 is better than 18 in basketball). The total scores are meaningful only when the QC's of both criteria are made the same before they are added together.
One way to combine the two scores is to first change the QC of the Golf score by subtracting from a fixed number, say 100, and then add it to the Basketball score. The new total score then is:
Overall score for Player 1 = 30 x 0.50 + (100-45) x 0.50 = 42.5 Overall score for Player 2 = 20 x 0.50 + (100-55) x 0.50 = 32.5
The overall scores indicate the relative merit of the players. Player 1 having the score of 42.5 is a better sportsman compared to Player 2 who has a score of 32.5.
In formulating the GPA, grades of all courses for the student are weighted the same. This approach is generally not valid in scientific studies. For the two Players in the earlier example, their skills in Golf and Basketball were weighted equally. Thus, the relative weight did not influence the judgment about their skills in the games. If the relative weights are not the same for all objectives, the contribution from the individual criteria of evaluations must be multiplied by the respective relative weights. For example, if Golf had a relative weight of 40%, and Basketball had 60%, the computation for the overall scores must reflect the influence of the relative weight as follows:
Overall score for Player 1 = 30 x 0.40 + (100-45) x 0.60 = 45
Overall score for Player 2 = 20 x 0.40 + (100-55) x 0.60 = 35
The Relative Weight is a subjective number assigned to each individual criteria of evaluation. Generally it is determined by the team during the experiment planning session and is assigned such that the total of all weights is 100 (set arbitrarily).
Thus, when the preceding general concerns are addressed, criteria of evaluations for evaluation of any product or process performance can be combined into a single number as demonstrated in the following application example.
An Example Application
A group of process engineers and researchers, involved in manufacturing baked food products, planned an experiment to determine the "best" recipe for one of their current brand of cakes. Surveys showed that the "best" cake is judged on taste, moistness, and smoothness rated by customers. The traditional approach has been to decide the recipe based on one criterion (say taste) at a time. Experience, however, has shown that when the recipe is optimized based on one criterion; subsequent analyses using other criteria do not necessarily produce the same recipe. When the ingredients differ, optimizing the final recipe becomes a difficult task. Arbitrary or subjectively optimized recipes have not brought the desired customer satisfaction. The group therefore decided to follow a path of consensus decision, and carefully devise a scientific scheme to incorporate all criteria of evaluations simultaneously into the analysis process.
In the planning session convened for the Cake Baking Experiment, and from subsequent reviews of experimental data, the applicable Evaluation Criteria and their characteristics as shown in Table 3 were identified. Taste, being a subjective criterion, was to be evaluated using a number between 0 and 12, with 12 being assigned to the best tasting cake. The Moistness was to be measured by weighing a standard size cake and by noting its weight in grams. It was the consensus that a weight of about 40 grams represents the most desirable moistness and indicates that its Quality Characteristic is "nominal." In this evaluation, results above and below the nominal are considered equally undesirable. Smoothness was measured by counting the number of voids in the cake, which made this a "smaller is better" (QC) evaluation. The relative weights were assigned such that the total was 100. The notations X1, X2, and X3 as shown in Table 3, are used to represent the evaluations of any arbitrary sample cake.
Table 3. Evaluation Criteria for Cake Baking Experiments
||Quality Characteristic (QC)
||Bigger is better
||Smaller is better
Two samples cakes were baked following two separate recipes under study. The performance evaluations for the two samples are as shown in Table 4. Note that each sample is evaluated by all three criteria of evaluations (taste, moistness, and smoothness). The OEC for each sample is created by combined individual evaluations into a single number (OEC = 66 for sample 1), which represents the performance of the sample cake and can be compared for the relative merit. In this case, cake sample #1 with OEC as 66 is slightly better than sample #2 with 64 as OEC.
Table 4. Trial #1 Evaluations
To examine how the OEC of the cake samples is formulated, note that the individual sample evaluations were combined by
"appropriate normalization." The term normalization refers to the act of reducing the individual evaluations into dimensionless quantities, aligning their quality characteristics to conform to a common direction (commonly bigger), and allowing each criteria to contribute in proportion of their relative weight. The OEC equation appropriate for the cake baking project is:
The contribution of each criteria is first turned into fractions (a dimensionless quantity) by dividing the evaluation by a fixed number, such as the difference between best and the worst among all the respective sample evaluations (12 - 0 for Taste, see Table 3). The numerator represents the evaluation reduced by smaller magnitude of the Worst or the Best evaluations in case of bigger and smaller QC's and by the Nominal value in case of Nominal QC. The contributions of the individual criteria are then multiplied by their respective Relative Weights (55, 20, etc.). The Relative Weights which are used as a fraction of 100, assures the OEC values to fall within 0 - 100.
Since Criteria 1 has the highest Relative Weight, all other criteria are aligned to have a Bigger QC. In the case of a Nominal QC, as it is the case for Moistness (second term in the equation above), the evaluation is first reduced to deviation from the nominal value (X2 - nominal value). The evaluation reduced to deviation naturally turns to Smaller QC. The contributions from the Smoothness and Moistness, both of which now have Smaller QC, are aligned with Bigger QC by subtracting the normalized fraction from 1. An example calculation of OEC using the evaluations of cake sample #1 (see Table 4) follows.
Trial 1, Sample 1 (x1 = 9, x2 = 34.19, x3 = 5)
OEC = 9 x 55/12 + (1 - (40 - 34.19)/15) x 20) + (1 - (5 - 2)/6) x 25
= 41.25 + 12.25 + 12.5 = 66 (shown in Table 4)
Similarly, the OEC for the second sample is calculated to be 64. The OEC values are considered as the "Results" for the purposes of the analysis of the results of designed experiments.
The OEC concept was first published by the author in the reference text in 1989. Since then it has been successfully utilized in numerous industrial experiments, particularly those that followed the Taguchi Approach of experimental designs. The OEC scheme has been found to work well for all kinds of experimental studies regardless of whether it utilizes designed experiments.
1. Roy, Ranjit K., A Primer on The Taguchi Method, Society of Manufacturing Engineers, P.O. Box 6028, Dearborn, Michigan, USA 48121, ISBN: 087263468X.
2. Roy, Ranjit, Design of Experiments Using the Taguchi Approach: 16 Steps to Product and Process Improvement by Hardcover (January 2001) John Wiley & Sons; ISBN: 0471361011.
3. Qualitek-4 Software for Automatic Design and Analysis of Taguchi Experiments, http://rkroy.com/wp-q4w.html