Coping with high dimensionality in massive datasets

1Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A massive dataset is characterized by its size and complexity. In its most basic form, such a dataset can be represented as a collection of n observations on p variables. Aggravation or even impasse can result if either number is huge. The more difficult challenge is usually associated with the case of very high dimensionality or 'big p'. There is a fast growing literature on how to handle such challenges, but most of it is in a supervised learning context involving a specific objective function, as in regression or classification. Much less is known about effective strategies for more exploratory data analytic activities. The purpose of this article is to put into historical perspective much of the recent research on dimensionality reduction and variable selection in such problems. Examples of applications that have stimulated this research are discussed along with a sampling of the latest methodologies to illustrate the onslaught of creative ideas that have surfaced. From a practitioner's perspective, the most effective strategy may be to emphasize the role of interdisciplinary teamwork with decisions on how best to grapple with high dimensionality emerging from a mixture of statistical thinking and consideration of the circumstances of the application. © 2011 John Wiley & Sons, Inc.

Cite

CITATION STYLE

APA

Kettenring, J. R. (2011, March). Coping with high dimensionality in massive datasets. Wiley Interdisciplinary Reviews: Computational Statistics. https://doi.org/10.1002/wics.141

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free