Incompleteness in data mining

Hosagrahar Visvesvaraya Jagadish

Conference Proceedings

Incompleteness in data mining

Jagadish H

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2001) 2035 1

DOI: 10.1007/3-540-45357-1_1

0Citations

5Readers

Get full text

Abstract

Database technology, as well as the bulk of data mining tech- nology, is founded upon logic, with absolute notions of truth and false- hood, at least with respect to the data set. Patterns are discovered ex- haustively, with carefully engineered algorithms devised to determine all patterns in a data set that belong to a certain class. For large data sets, many such data mining techniques are extremely expensive, leading to considerable research towards solving these problems more cheaply. We argue that the central goal of data mining is to find SOME interesting patterns, and not necessarily ALL of them. As such, techniques that can find most of the answers cheaply are clearly more valuable than computationally much more expensive techniques that can guarantee completeness. In fact, it is probably the case that patterns that can be found cheaply are indeed the most important ones. Furthermore, knowledge discovery can be the most efiective with the human analyst heavily involved in the endeavor. To engage a human an- alyst, it is important that data mining techniques be interactive, hope- fully delivering (close to) real time responses and feedback. Clearly then, extreme accuracy and completeness (i.e., finding all patterns satisfying some specified criteria) would almost always be a luxury. Instead, incom- pleteness (i.e., finding only some patterns) and approximation would be essential. We exemplify this discussion through the notion of fascicles. Often many records in a database share similar values for several attributes. If one is able to identify and group together records that share similar values for some - even if not all - attributes, one can both obtain a more parsimonious representation of the data, and gain useful insight into the data from a mining perspective. Such groupings are called fascicles. We explore the relationship of fascicle-finding to association rule mining, and experimentally demonstrate the benefit of incomplete but inexpensive algorithms. We also present analytical results demonstrating both the limits and the benefits of such incomplete algorithms.

Cite

CITATION STYLE

APA

Jagadish, H. V. (2001). Incompleteness in data mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2035, p. 1). Springer Verlag. https://doi.org/10.1007/3-540-45357-1_1

Incompleteness in data mining

Abstract

Cite

Register to see more suggestions