Subjective Reliability AssessmentIn 1974 the abandonment of construction on a section of Interstate Highway 95 provided an extraordinary opportunity to quantitatively test the reliability of expert opinions. Engineers must work with uncertainly every day. Engineering analyses usually require assumptions or estimates which may not themselves be reliable. Typical tools to deal with uncertainly are probability, statistics and decision analysis. Decision analysis can take subjective judgements into account, but often these are compromised by subjective biases. Typical sources of error are overconfidence, hedging (the opposite of overconfidence), improper updating of judgements when new information becomes available, anchoring and the influence of consequences (liability, for example). Research in geotechnical engineering at the Massachusetts Institute of Technology included a field test in which additional fill was placed on an existing embankment to cause large scale deformations to the point of failure in the clay foundation. During construction and as the height of the test section increased, pore pressures and deformation of the clay foundation were carefully monitored. In addition tests were conducted on the bank and foundation materials. The results of these tests were made available to several internationally known geotechnical engineers (the predictors), who were then challenged to predict the fill height at failure, as well as some other values, such as settlement after six feet of fill was added. In addition to a predicted value, they were asked to provide quantitative confidence estimates. When it was given, this information was used to calculate 25% and 75% quartile values. Given the quantity and quality of the information available, the prior experience of the predictors, and the variety of analyses used, this exercise presented a fairly realistic of the state-of-the-art in prediction at the time. The predictions and the true values were presented at a symposium held at MIT in November 1974 and are contained in an MIT research report. At the symposium, members of the audience were invited to make their own predictions. Ten predictors responded. Their "best estimates" of added height to failure differ by as much as a factor of 3 and have a coefficient of variation of 0.39. The average is 16.2 feet. The following figure shows the "best estimates" and the interquartile ranges for each predictor (when it could be calculated). That is, the predictor expects there to be a 25% probability that the true value is below the bottom of the bar and a 25% chance that it is above the bar. Normally, one would expect that half of the bars would contain the true value, but none of them do.
The audience had the benefit of hearing the presentations of the predictors. Their interquartile ranges captured the true value 62% of the time. Perhaps they were impressed by the wide range of the predictors' values and broadened their ranges too much. The predictors were also asked to give minimum and maximum values. Seven responded, but the actual value was contained in the ranges given by only three. 26 members of the audience also responded, but the real value was outside the ranges given by 9 of them, over a third. Oddly enough, there seems to be some confusion about the terms "minimum" and "maximum" here. Some predictors specified values very close to their interquartile ranges. The predictors displayed an unjustified confidence in their estimates. Each expected the true value to be much closer to the prediction than it really was. Nevertheless, the exercise points up the need to express predictions not as a single value, but as probabilistically-scored set of values. This came from Massachusetts Institute of Technology, Dept. of Civil Engineering, Research Report R75-42, Subjective Reliability Assessment in Geotechnical Engineering: I-95 Embankment Case Study, by Mary Ellen Hyunes and Erik H. Vanmarcke, October 1975. This was sponsored in part by the National Science Foundation under Grant No. GK-41775. |