This paper reports on a statistical stemming algorithm based on link analysis. Considering that a word is formed by a prefix (stem) and a suffix, the key idea is that the interlinked prefixes and suffixes form a community of sub-strings. Thus, discovering these communities means searching for the best word splits that give the best word stems. The algorithm has been used in our participation in the CLEF 2002 Italian monolingual task. The experimental results show that stemming improves text retrieval effectiveness. They also show that the effectiveness level of our algorithm is comparable to that of an algorithm based on a-priori linguistic knowledge. © Springer-Verlag Berlin Heidelberg 2003.
CITATION STYLE
Agosti, M., Bacchin, M., Ferro, N., & Melucci, M. (2003). Improving the automatic retrieval of text documents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2785, 279–290. https://doi.org/10.1007/978-3-540-45237-9_23
Mendeley helps you to discover research relevant for your work.