Fitting very flexible models: Linear regression with large numbers of parameters

8Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

Abstract

There are many uses for linear fitting; we consider here the interpolation and denoising of data, as when the goal is to fit a smooth, flexible function to a set of noisy data points. Investigators often choose a polynomial basis, or a Fourier basis, or wavelets, or something equally general. They also choose an order, or number of basis functions to fit, and (often) some kind of regularization. We discuss how this basis-function fitting is done, with ordinary least squares and extensions thereof. We emphasize that it can be valuable to choose far more parameters than data points, despite folk rules to the contrary: Suitably regularized models with enormous numbers of parameters generalize well and make good predictions for held-out data; over-fitting is not (mainly) a problem of having too many parameters. It is even possible to take the limit of infinite parameters, at which, if the basis and regularization are chosen correctly, the least-squares fit becomes the mean of a Gaussian process, or a kernel regression. We recommend cross-validation as a good empirical method for model selection (for example, setting the number of parameters and the form of the regularization), and jackknife resampling as a good empirical method for estimating the uncertainties of the predictions made by the model. We also give advice for building stable computational implementations.

References Powered by Scopus

Deep learning

63550Citations
N/AReaders
Get full text

Regression Shrinkage and Selection Via the Lasso

35675Citations
N/AReaders
Get full text

Reconciling modern machine-learning practice and the classical bias–variance trade-off

1033Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Gaussian Process Regression for Astronomical Time Series

72Citations
N/AReaders
Get full text

StarHorse results for spectroscopic surveys and Gaia DR3: Chrono-chemical populations in the solar vicinity, the genuine thick disk, and young alpha-rich stars

46Citations
N/AReaders
Get full text

Half-sibling regression meets exoplanet imaging: PSF modeling and subtraction using a flexible, domain knowledge-driven, causal framework

13Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Hogg, D. W., & Villar, S. (2021). Fitting very flexible models: Linear regression with large numbers of parameters. Publications of the Astronomical Society of the Pacific, 133(1027). https://doi.org/10.1088/1538-3873/ac20ac

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 11

38%

Researcher 11

38%

Lecturer / Post doc 5

17%

Professor / Associate Prof. 2

7%

Readers' Discipline

Tooltip

Physics and Astronomy 20

69%

Computer Science 7

24%

Energy 1

3%

Nursing and Health Professions 1

3%

Save time finding and organizing research with Mendeley

Sign up for free