TY - JOUR

T1 - Optimal projection of observations in a Bayesian setting

AU - Giraldi, Loic

AU - Le Maître, Olivier P.

AU - Hoteit, Ibrahim

AU - Knio, Omar

N1 - KAUST Repository Item: Exported on 2021-02-23
Acknowledged KAUST grant number(s): CRG3-2156, OSR-2016-RPP-3268
Acknowledgements: This work is supported by King Abdullah University of Science and Technology Awards CRG3-2156 and OSR-2016-RPP-3268.

PY - 2018/3/18

Y1 - 2018/3/18

N2 - Optimal dimensionality reduction methods are proposed for the Bayesian inference of a Gaussian linear model with additive noise in presence of overabundant data. Three different optimal projections of the observations are proposed based on information theory: the projection that minimizes the Kullback–Leibler divergence between the posterior distributions of the original and the projected models, the one that minimizes the expected Kullback–Leibler divergence between the same distributions, and the one that maximizes the mutual information between the parameter of interest and the projected observations. The first two optimization problems are formulated as the determination of an optimal subspace and therefore the solution is computed using Riemannian optimization algorithms on the Grassmann manifold. Regarding the maximization of the mutual information, it is shown that there exists an optimal subspace that minimizes the entropy of the posterior distribution of the reduced model; a basis of the subspace can be computed as the solution to a generalized eigenvalue problem; an a priori error estimate on the mutual information is available for this particular solution; and that the dimensionality of the subspace to exactly conserve the mutual information between the input and the output of the models is less than the number of parameters to be inferred. Numerical applications to linear and nonlinear models are used to assess the efficiency of the proposed approaches, and to highlight their advantages compared to standard approaches based on the principal component analysis of the observations.

AB - Optimal dimensionality reduction methods are proposed for the Bayesian inference of a Gaussian linear model with additive noise in presence of overabundant data. Three different optimal projections of the observations are proposed based on information theory: the projection that minimizes the Kullback–Leibler divergence between the posterior distributions of the original and the projected models, the one that minimizes the expected Kullback–Leibler divergence between the same distributions, and the one that maximizes the mutual information between the parameter of interest and the projected observations. The first two optimization problems are formulated as the determination of an optimal subspace and therefore the solution is computed using Riemannian optimization algorithms on the Grassmann manifold. Regarding the maximization of the mutual information, it is shown that there exists an optimal subspace that minimizes the entropy of the posterior distribution of the reduced model; a basis of the subspace can be computed as the solution to a generalized eigenvalue problem; an a priori error estimate on the mutual information is available for this particular solution; and that the dimensionality of the subspace to exactly conserve the mutual information between the input and the output of the models is less than the number of parameters to be inferred. Numerical applications to linear and nonlinear models are used to assess the efficiency of the proposed approaches, and to highlight their advantages compared to standard approaches based on the principal component analysis of the observations.

UR - http://hdl.handle.net/10754/627354

UR - http://www.sciencedirect.com/science/article/pii/S0167947318300501

UR - http://www.scopus.com/inward/record.url?scp=85044954882&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2018.03.002

DO - 10.1016/j.csda.2018.03.002

M3 - Article

AN - SCOPUS:85044954882

VL - 124

SP - 252

EP - 276

JO - Computational Statistics & Data Analysis

JF - Computational Statistics & Data Analysis

SN - 0167-9473

ER -