## UC Berkeley / Lawrence Berkeley Laboratory

#### Cost-efficient surrogate modeling of expensive simulators for scientific discovery

**Simon Mak, Duke University**

Scientific modeling is at a defining crossroad. With breakthroughs in computational technology, complex phenomena (*e.g.*, universe expansions, space flight) can now be reliably simulated at high fidelity. However, the generation of such data often entails large computing costs, resulting in limited data for scientific investigation. Surrogate models have emerged as a powerful tool for facilitating timely scientific progress. Such models are trained on a carefully designed set of simulation runs, and provide an efficient predictor (or “emulator”) for the costly scientific simulator. As simulators become more complex, however, training data becomes highly expensive to generate and is thus limited; in such a setting, existing surrogate models can yield poor predictions with poorly calibrated uncertainties.

We propose two novel surrogate models for tackling this critical challenge. The first model, called the Additive Multi-Index Gaussian process (AdMIn-GP), leverages a flexible additive structure on low-dimensional embeddings of the parameter space. This is guided by prior knowledge that the simulator is dominated by multiple distinct physical phenomena (*i.e.*, multi-physics), each involving a small number of latent parameters. The AdMIn-GP models such embedded structures within a flexible Bayesian nonparametric framework, which facilitates efficient model fitting via a carefully constructed variational inference approach with inducing points. The second, called the CONglomerate multi-FIdelity Gaussian process (CONFIG) model, makes use of data simulated at multiple fidelities (or accuracies) for cost-efficient emulator training. The CONFIG embeds the multi-fidelity form of this training data within a novel non-stationary covariance function, which captures prior numerical convergence rates of the simulator. We then demonstrate the effectiveness of our models over the state-of-the-art in a suite of numerical experiments and in our motivating application on emulating the evolution of the quark-gluon plasma, which was theorized to have filled the Universe shortly after the Big Bang.