Missing values in datasets form a very relevant and often overlooked problem in many fields. Most algorithms are not able to handle missing values for training a predictive model or analyzing a dataset. For this reason, records with missing values are either rejected or repaired. However, both repairing and rejecting affects the dataset and the final results, creating bias and uncertainty. Therefore, knowledge about the nature of missing values and the underlying mechanisms behind them are of vital importance. To gain more in-depth insight into the underlying structures and patterns of missing values, the concept of Monotone Mixture Patterns is introduced and used to analyze the patterns of missing values in datasets. Several visualization methods are proposed to present the “patterns of missingness” in an informative way. Finally, an algorithm to generate missing values in datasets is provided to form the basis of a benchmarking tool. This algorithm can generate a large variety of missing value patterns for testing and comparing different algorithms that handle missing values.
CITATION STYLE
van Stein, B., Kowalczyk, W., & Bäck, T. (2016). Analysis and visualization of missing value patterns. In Communications in Computer and Information Science (Vol. 611, pp. 187–198). Springer Verlag. https://doi.org/10.1007/978-3-319-40581-0_16
Mendeley helps you to discover research relevant for your work.