Algebraic techniques for analysis of large discrete-valued datasets

13Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

With the availability of large scale computing platforms and instrumentation for data gathering, increased emphasis is being placed on efficient techniques for analyzing large and extremely high-dimensional datasets. In this paper, we present a novel algebraic technique based on a variant of semi-discrete matrix decomposition (SDD), which is capable of compressing large discrete-valued datasets in an error bounded fashion. We show that this process of compression can be thought of as identifying dominant patterns in underlying data. We derive efficient algorithms for computing dominant patterns, quantify their performance analytically as well as experimentally, and identify applications of these algorithms in problems ranging from clustering to vector quantization. We demonstrate the superior characteristics of our algorithm in terms of (i) scalability to extremely high dimensions; (ii) bounded error; and (iii) hierarchical nature, which enables multiresolution analysis. Detailed experimental results are provided to support these claims. © 2002 Springer-Verlag Berlin Heidelberg.

References Powered by Scopus

Learning the parts of objects by non-negative matrix factorization

11253Citations
N/AReaders
Get full text

Probabilistic latent semantic indexing

4296Citations
N/AReaders
Get full text

Vector Quantization

2148Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets

57Citations
N/AReaders
Get full text

PROXIMUS: A framework for analyzing very high dimensional discrete-attributed datasets

37Citations
N/AReaders
Get full text

Boolean decomposition of binary matrices using a post-nonlinear mixture approach

8Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Koyutürk, M., Grama, A., & Ramakrishnan, N. (2002). Algebraic techniques for analysis of large discrete-valued datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2431 LNAI, pp. 311–324). Springer Verlag. https://doi.org/10.1007/3-540-45681-3_26

Readers over time

‘11‘1700.751.52.253

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

75%

Researcher 1

25%

Readers' Discipline

Tooltip

Computer Science 4

80%

Physics and Astronomy 1

20%

Save time finding and organizing research with Mendeley

Sign up for free
0