Determining best complete subsets of specimens and characters for multivariate morphometric studies in the presence of large amounts of missing data

Richard E. Strauss; Momchil N. Atanassov

Journal ArticleOPEN ACCESS

Determining best complete subsets of specimens and characters for multivariate morphometric studies in the presence of large amounts of missing data

Biological Journal of the Linnean Society (2006) 88(2) 309-328

DOI: 10.1111/j.1095-8312.2006.00671.x

21Citations

155Readers

Get full text

Abstract

Missing data are frequent in morphometric studies of both fossil and recent material. A common method of addressing the problem of missing data is to omit combinations of characters and specimens from subsequent analyses; however, omitting different subsets of characters and specimens can affect both the statistical robustness of the analyses and the resulting biological interpretations. We describe a method of examining all possible subsets of complete data and of scoring each subset by the 'condition' (ratio of first eigenvalue to second, or of second to first, depending on context) of the corresponding covariance or correlation matrix, and subsequently choosing the submatrix that either optimizes one of these criteria or matches the estimated condition of the original data matrix. We then describe an extension of this method that can be used to choose the 'best' characters and specimens for which some specified proportion of missing data can be estimated using standard imputation techniques such as the expectation-maximization algorithm or multiple imputation. The methods are illustrated with published and unpublished data sets on fossil and extant vertebrates. Although these problems and methods are discussed in the context of conventional morphometric data, they are applicable to many other kinds of data matrices. © 2006 The Linnean Society of London.

Author supplied keywords

Cite

CITATION STYLE

APA

Strauss, R. E., & Atanassov, M. N. (2006). Determining best complete subsets of specimens and characters for multivariate morphometric studies in the presence of large amounts of missing data. Biological Journal of the Linnean Society, 88(2), 309–328. https://doi.org/10.1111/j.1095-8312.2006.00671.x

Determining best complete subsets of specimens and characters for multivariate morphometric studies in the presence of large amounts of missing data

Abstract

Author supplied keywords

Cite

Register to see more suggestions