Statistical principles suggest minimization of the total withingroup distance (TWGD) as a robust criterion for clustering point data associated with a Geographical Information System [17]. This NP-hard problem must essentially be solved using heuristic methods, although admitting a linear programming formulation. Heuristics proposed so far require quadratic time, which is prohibitively expensive for data mining applications. This paper introduces data structures for the management of large bi-dimensional point data sets and for fast clustering via interchange heuristics. These structures avoid the need for quadratic time through approximations to proximity information. Our scheme is illustrated with two-dimensional quadtrees, but can be extended to use other structures suited to three dimensional data or spatial data with timestamps. As a result, we obtain a fast and robust clustering method.
CITATION STYLE
Estivill-Castro, V., & Houle, M. E. (2001). Data structures for minimization of total within-group distance for spatio-temporal clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2168, pp. 91–102). Springer Verlag. https://doi.org/10.1007/3-540-44794-6_8
Mendeley helps you to discover research relevant for your work.