Score-matching representative approach for big data analysis with generalized linear models

Keren Li; Jie Yang

Journal ArticleOPEN ACCESS

Score-matching representative approach for big data analysis with generalized linear models

Electronic Journal of Statistics (2022) 16(1) 592-635

DOI: 10.1214/21-EJS1965

5Citations

8Readers

Abstract

We propose a fast and efficient strategy, called the representative approach, for big data analysis with generalized linear models, espe-cially for distributed data with localization requirements or limited network bandwidth. With a given partition of massive dataset, this approach con-structs a representative data point for each data block and fits the target model using the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters given a homogeneous partition is com-parable with the divide-and-conquer method. Supported by comprehensive simulation studies and theoretical justifications, we conclude that mean representatives (MR) work fine for linear models or generalized linear models with a flat inverse link function and moderate coefficients of continuous predictors. For general cases, we recommend the proposed score-matching representatives (SMR), which may improve the accuracy of estimators sig-nificantly by matching the score function values. As an illustrative appli-cation to the Airline on-time performance data, we show that the MR and SMR estimates are as good as the full data estimate when available.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, K., & Yang, J. (2022). Score-matching representative approach for big data analysis with generalized linear models. Electronic Journal of Statistics, 16(1), 592–635. https://doi.org/10.1214/21-EJS1965

Score-matching representative approach for big data analysis with generalized linear models

Abstract

Author supplied keywords

Cite

Register to see more suggestions