The MUCM project concerns uncertainties in the predictions made by models. A model is a description of a real process, using mathematical equations. Usually, a computer is used to compute or solve these equations to produce the model predictions. We think of these as the outputs of the model. The model also has inputs of various kinds, which are numbers to put into the equations. For example, a model to forecast the weather is based on very complex equations describing the movement of the air at various altitudes, the formation of clouds, and so on. The numbers to be put into the model include the current state of the atmosphere, the temperature of the air at different locations and altitudes, physical constants used in the equations, and so on.
Any model is an imperfect representation of reality, and its predictions are imperfect. The predictions can be wrong because the equations are wrong, they have the wrong numbers in them, or the computer program is solving them inaccurately. In practice, all of these imperfections are present to some degree. As a result, we may expect the true real-world value corresponding to the model output to be close to the model prediction, but there is uncertainty about its precise value.
Even if we can quantify all uncertainties in model inputs and structure, it is a complex task to derive appropriate measures of output uncertainty. One well-established methodology to address this problem of uncertainty analysis, is to propagate input uncertainty through the model by Monte Carlo. However, this requires making typically tens of thousands of runs of the model, each with different randomly sampled inputs, and this is impractical for complex models. For any model that takes more than a few seconds of computer time per run, a thorough Monte Carlo sensitivity analysis becomes infeasible.
This project focuses on new methods which are orders of magnitude more efficient than Monte Carlo, requiring typically just a few hundreds of model runs, thereby providing very significant productivity gains for the researchers or analysis teams involved. Furthermore, these methods can address several related, but more demanding tasks that are of importance to modellers and model users, usually without requiring more model runs.
The basic science is based on Bayesian statistics and the use of Gaussian process modelling. A statistical representation of the simulator, known as a meta-model or emulator, is built using a sample of training runs. The Gaussian process method is analogous to regression modelling or neural networks, but more flexible, more accurate and more efficient than these methods in challenging problems where there is limited information about the simulator. The emulator is not just a very accurate approximation to the simulator itself, but also incorporates a statistically validated description of its own accuracy.
The emulator runs essentially instantaneously, making intensive exploration of the model and the consequences of uncertainty in inputs and model structure feasible for even highly complex models. Its mathematical form is also simple, so that in many cases the results of complex analyses of simulator output, such as sensitivity analysis, can be predicted analytically without needing to ‘run’ the emulator. In other situations, the analyses can be performed very much more quickly by running the emulator as a surrogate for the simulator, which may make feasible analyses that would otherwise be impossible because of computational intensity.