Document representation has a large impact on the performance of document retrieval and clustering algorithms. We propose a hybrid document indexing scheme that combines the traditional bagof-words representation with spectral embedding. This method accounts for the specifics of the document collection and also uses semantic similarity information based on a large scale statistical analysis. Clustering experiments showed improvements over the traditional tf-idf representation and over the spectral methods based solely on the document collection.
CITATION STYLE
Matveeva, I., & Levow, G. A. (2007). Hybrid document indexing with spectral embedding. In NAACL-HLT 2007 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Companion Volume: Short Papers (pp. 113–116). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1614108.1614137
Mendeley helps you to discover research relevant for your work.