In this paper, missing attribute values in incomplete data sets have two possible interpretations, lost values and “do not care” conditions. For rule induction we use characteristic sets and generalized maximal consistent blocks. Therefore we apply four different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Therefore we decided to compare our four approaches using complexity of rule sets induced from incomplete data sets. We show that the cardinality of rule sets is always smaller for incomplete data sets with “do not care” conditions. Thus the choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.
CITATION STYLE
Clark, P. G., Gao, C., Grzymala-Busse, J. W., Mroczek, T., & Niemiec, R. (2018). Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10870 LNAI, pp. 84–94). Springer Verlag. https://doi.org/10.1007/978-3-319-92639-1_8
Mendeley helps you to discover research relevant for your work.