Algebraic techniques for analysis of large discrete-valued datasets

Mehmet Koyutürk; Ananth Grama; Naren Ramakrishnan

Conference ProceedingsOPEN ACCESS

Algebraic techniques for analysis of large discrete-valued datasets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2431 LNAI 311-324

DOI: 10.1007/3-540-45681-3_26

13Citations

5Readers

Abstract

With the availability of large scale computing platforms and instrumentation for data gathering, increased emphasis is being placed on efficient techniques for analyzing large and extremely high-dimensional datasets. In this paper, we present a novel algebraic technique based on a variant of semi-discrete matrix decomposition (SDD), which is capable of compressing large discrete-valued datasets in an error bounded fashion. We show that this process of compression can be thought of as identifying dominant patterns in underlying data. We derive efficient algorithms for computing dominant patterns, quantify their performance analytically as well as experimentally, and identify applications of these algorithms in problems ranging from clustering to vector quantization. We demonstrate the superior characteristics of our algorithm in terms of (i) scalability to extremely high dimensions; (ii) bounded error; and (iii) hierarchical nature, which enables multiresolution analysis. Detailed experimental results are provided to support these claims. © 2002 Springer-Verlag Berlin Heidelberg.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Koyutürk, M., Grama, A., & Ramakrishnan, N. (2002). Algebraic techniques for analysis of large discrete-valued datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2431 LNAI, pp. 311–324). Springer Verlag. https://doi.org/10.1007/3-540-45681-3_26

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 3

75%

Researcher 1

25%

Readers' Discipline

Computer Science 4

80%

Physics and Astronomy 1

20%

Algebraic techniques for analysis of large discrete-valued datasets

Abstract

References Powered by Scopus

Learning the parts of objects by non-negative matrix factorization

Probabilistic latent semantic indexing

Vector Quantization

Cited by Powered by Scopus

Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets

PROXIMUS: A framework for analyzing very high dimensional discrete-attributed datasets

Boolean decomposition of binary matrices using a post-nonlinear mixture approach

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline