Unknown

Dataset Information

0

Prior-Preconditioned Conjugate Gradient Method for Accelerated Gibbs Sampling in "Large n, Large p" Bayesian Sparse Regression.


ABSTRACT: In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 105-106 and of 104-105. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian method based on shrinkage priors. In the "large n and large p" setting, however, the required posterior computation encounters a bottleneck at repeated sampling from a high-dimensional Gaussian distribution, whose precision matrix Φ is expensive to compute and factorize. In this article, we present a novel algorithm to speed up this bottleneck based on the following observation: We can cheaply generate a random vector b such that the solution to the linear system Φβ = b has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through matrix-vector multiplications by Φ; this involves no explicit factorization or calculation of Φ itself. Rapid convergence of CG in this context is guaranteed by the theory of prior-preconditioning we develop. We apply our algorithm to a clinically relevant large-scale observational study with n = 72,489 patients and p = 22,175 clinical covariates, designed to assess the relative risk of adverse events from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in posterior inference, in our case cutting the computation time from two weeks to less than a day. Supplementary materials for this article are available online.

SUBMITTER: Nishimura A 

PROVIDER: S-EPMC10977663 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

altmetric image

Publications

Prior-Preconditioned Conjugate Gradient Method for Accelerated Gibbs Sampling in "Large <i>n</i>, Large <i>p</i>" Bayesian Sparse Regression.

Nishimura Akihiko A   Suchard Marc A MA  

Journal of the American Statistical Association 20220509 544


In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 10<sup>5</sup>-10<sup>6</sup> and of 10<sup>4</sup>-10<sup>5</sup>. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian method based on shrinkage priors. In the "large <i>n</i> and lar  ...[more]

Similar Datasets

| S-EPMC6215606 | biostudies-literature
| S-EPMC3779266 | biostudies-other
| S-EPMC5487008 | biostudies-literature
| S-EPMC11631066 | biostudies-literature
| S-EPMC1234247 | biostudies-literature
| S-EPMC9454085 | biostudies-literature
| S-EPMC7222437 | biostudies-literature
| S-EPMC2841907 | biostudies-literature
| S-EPMC4560121 | biostudies-literature
| S-EPMC1309704 | biostudies-literature