Discovering a term taxonomy from term similarities using principal component analysis

Holger Bast; Georges Dupret; Debapriyo Majumdar; Benjamin Piwowarski

Conference Proceedings

Discovering a term taxonomy from term similarities using principal component analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4289 LNAI 103-120

DOI: 10.1007/11908678_7

6Citations

14Readers

Get full text

Abstract

We show that eigenvector decomposition can be used to extract a term taxonomy from a given collection of text documents. So far, methods based on eigenvector decomposition, such as latent semantic indexing (LSI) or principal component analysis (PCA), were only known to be useful for extracting symmetric relations between terms. We give a precise mathematical criterion for distinguishing between four kinds of relations of a given pair of terms of a given collection: unrelated (car - fruit), symmetrically related (car - automobile), asymmetrically related with the first term being more specific than the second (banana - fruit), and asymmetrically related in the other direction (fruit - banana). We give theoretical evidence for the soundness of our criterion, by showing that in a simplified mathematical model the criterion does the apparently right thing. We applied our scheme to the reconstruction of a selected part of the open directory project (ODP) hierarchy, with promising results. © 2006 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Bast, H., Dupret, G., Majumdar, D., & Piwowarski, B. (2006). Discovering a term taxonomy from term similarities using principal component analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4289 LNAI, pp. 103–120). https://doi.org/10.1007/11908678_7

Discovering a term taxonomy from term similarities using principal component analysis

Abstract

Author supplied keywords

Cite

Register to see more suggestions