Revisiting Penalized Likelihood Estimation for Gaussian Processes

Authors: Ayumi Mutoh, Annie S. Booth, Jonathan W. Stallrich

Abstract: Gaussian processes (GPs) are popular as nonlinear regression models for expensive computer simulations, yet GP performance relies heavily on estimation of unknown covariance parameters. Maximum likelihood estimation (MLE) is common, but it can be plagued by numerical issues in small data settings. The addition of a nugget helps but is not a cure-all. Penalized likelihood methods may improve upon t… ▽ More Gaussian processes (GPs) are popular as nonlinear regression models for expensive computer simulations, yet GP performance relies heavily on estimation of unknown covariance parameters. Maximum likelihood estimation (MLE) is common, but it can be plagued by numerical issues in small data settings. The addition of a nugget helps but is not a cure-all. Penalized likelihood methods may improve upon traditional MLE, but their success depends on tuning parameter selection. We introduce a new cross-validation (CV) metric called ``decorrelated prediction error'' (DPE), within the penalized likelihood framework for GPs. Inspired by the Mahalanobis distance, DPE provides more consistent and reliable tuning parameter selection than traditional metrics like prediction error, particularly for $K$-fold CV. Our proposed metric performs comparably to standard MLE when penalization is unnecessary and outperforms traditional tuning parameter selection metrics in scenarios where regularization is beneficial, especially under the one-standard error rule. △ Less

Submitted 22 November, 2025; originally announced November 2025.

Comments: 14 pages, 8 figures, 3 tables

arXiv:2511.10950 [pdf, ps, other]

Influence of Prior Distributions on Gaussian Process Hyperparameter Inference

Authors: Ayumi Mutoh, Junoh Heo

Abstract: Gaussian processes (GPs) are widely used metamodels for approximating expensive computer simulations, particularly in engineering design and spatial prediction. However, their performance can deteriorate significantly when covariance parameters are poorly estimated, highlighting the importance of accurate inference. The most common approach involves maximizing the marginal likelihood, yielding poi… ▽ More Gaussian processes (GPs) are widely used metamodels for approximating expensive computer simulations, particularly in engineering design and spatial prediction. However, their performance can deteriorate significantly when covariance parameters are poorly estimated, highlighting the importance of accurate inference. The most common approach involves maximizing the marginal likelihood, yielding point estimates of these parameters. However, this approach is highly sensitive to initialization and optimization settings. An alternative is to adopt a fully Bayesian hierarchical framework, where the posterior distribution over the covariance parameters is inferred. This approach provides more robust uncertainty quantification and reduces sensitivity to parameter selection. Yet, a key challenge lies in the careful specification of prior distributions for these parameters. While many available software packages provide default priors, their influence on model behavior is often underexplored. Additionally, the choice of proposal distributions can also influence sampling efficiency and convergence. In this paper, we examine how different prior and proposal distributions over the lengthscale parameters $θ$ affect predictive performance in a hierarchical GP model, using both simulated and real data experiments. By evaluating various types of priors and proposals, we aim to better understand their influence on predictive accuracy and uncertainty quantification. △ Less

Submitted 13 November, 2025; originally announced November 2025.

Comments: 26 pages, 5 figures

arXiv:1906.00349 [pdf, ps, other]

Comprehensive cluster validity Index based on structural simplicity

Authors: Anri Mutoh, Masamichi Wada, Kou Amano

Abstract: Nonhierarchical clustering depending on unsupervised algorithms may not retrieve the optimal partition of datasets. Determining if clusters fit ``natural partitions`` can be achieved using cluster validity indices (CVIs). Most existing CVIs consider criteria such as cohesion, separation, and their equivalents. However, these binary relations may provide neither the optimal measure of partition sui… ▽ More Nonhierarchical clustering depending on unsupervised algorithms may not retrieve the optimal partition of datasets. Determining if clusters fit ``natural partitions`` can be achieved using cluster validity indices (CVIs). Most existing CVIs consider criteria such as cohesion, separation, and their equivalents. However, these binary relations may provide neither the optimal measure of partition suitability nor reference values corresponding to the worst partition. Moreover, previous CVI studies have been mostly focused on fitting correct partitions according to researchers' a priori assumptions. In contrast, we investigated desirable properties of CVIs, namely, scale shift transform invariance, optimal clustering, and unbiased clustering with representing the worst partition. Then, we conducted experiments to evaluate whether existing CVIs fulfill these properties. As none of these CVIs fulfilled the desired properties, we propose the simplicity index, which measures the simplicity of tree structures in clusters. The simplicity index is the unique index invariant to the ``correct rate`` and provides both a reference indicating the most complex partition and the best value indicating the simplest one. △ Less

Submitted 2 June, 2019; originally announced June 2019.

Comments: 11 pages, 2 figures

MSC Class: 62H30; 65K10; 90C27

Showing 1–3 of 3 results for author: Mutoh, A