A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression

Qifan Song; Faming Liang

Journal ArticleOPEN ACCESS

A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression

Journal of the Royal Statistical Society. Series B: Statistical Methodology (2015) 77(5) 947-972

DOI: 10.1111/rssb.12095

46Citations

45Readers

Abstract

We propose a Bayesian variable selection approach for ultrahigh dimensional linear regression based on the strategy of split and merge. The approach proposed consists of two stages: split the ultrahigh dimensional data set into a number of lower dimensional subsets and select relevant variables from each of the subsets, and aggregate the variables selected from each subset and then select relevant variables from the aggregated data set. Since the approach proposed has an embarrassingly parallel structure, it can be easily implemented in a parallel architecture and applied to big data problems with millions or more of explanatory variables. Under mild conditions, we show that the approach proposed is consistent, i.e. the true explanatory variables can be correctly identified by the approach as the sample size becomes large. Extensive comparisons of the approach proposed have been made with penalized likelihood approaches, such as the lasso, elastic net, sure independence screening and iterative sure independence screening. The numerical results show that the approach proposed generally outperforms penalized likelihood approaches: the models selected by the approach tend to be more sparse and closer to the true model.

Author supplied keywords

Cite

CITATION STYLE

APA

Song, Q., & Liang, F. (2015). A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 77(5), 947–972. https://doi.org/10.1111/rssb.12095

A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression

Abstract

Author supplied keywords

Cite

Register to see more suggestions