We show that the singular value decomposition of a term similarity matrix induces a term hierarchy, This decomposition, used in Latent Semantic Analysis and Principal Component Analysis for text, aims at identifying "concepts" that can be used in place of the terms appearing in the documents. Unlike terms, concepts are by construction uncorrelated and hence are less sensitive to the particular vocabulary used in documents. In this work, we explore the relation between terms and concepts and show that for each term there exists a latent subspace dimension for which the term coincides with a concept. By varying the number of dimensions, terms similar but more specific than the concept can be identified, leading to a term hierarchy. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Dupret, G., & Piwowarski, B. (2006). Principal components for automatic term hierarchy building. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4209 LNCS, pp. 37–48). Springer Verlag. https://doi.org/10.1007/11880561_4
Mendeley helps you to discover research relevant for your work.