Assessing the efficiency of suffix stripping approaches for Portuguese stemming

Wadson Gomes Ferreira; Willian Antônio dos Santos; Breno Macena Pereira de Souza; Tiago Matta Machado Zaidan; Wladmir Cardoso Brandão

Conference Proceedings

Assessing the efficiency of suffix stripping approaches for Portuguese stemming

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9309 210-221

DOI: 10.1007/978-3-319-23826-5_21

1Citations

3Readers

Get full text

Abstract

Stemming is the process of reducing inflected words to their root form, the stem. Search engines use stemming algorithms to conflate words in the same stem, reducing index size and improving recall. Suffix stripping is a strategy used by stemming algorithms to reduce words to stems by processing suffix rules suitable to address the constraints of each language. For Portuguese stemming, the RSLP was the first suffix stripping algorithm proposed in literature, and it is still widely used in commercial and open source search engines. Typically, the RSLP algorithm uses a list-based approach to process rules for suffix stripping. In this article, we introduce two suffix stripping approaches for Portuguese stemming. Particularly, we propose the hash-based and the automata-based approach, and we assess their efficiency by contrasting them with the state-of-the-art list-based approach. Complexity analysis shows that the automata-based approach is more efficient in time. In addition, experiments on two datasets attest the efficiency of our approaches. In particular, the hash-based and the automata-based approaches outperform the list-based approach, with reduction of up to 65.28% and 86.48% in stemming time, respectively.

Cite

CITATION STYLE

APA

Ferreira, W. G., dos Santos, W. A., de Souza, B. M. P., Zaidan, T. M. M., & Brandão, W. C. (2015). Assessing the efficiency of suffix stripping approaches for Portuguese stemming. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9309, pp. 210–221). Springer Verlag. https://doi.org/10.1007/978-3-319-23826-5_21

Assessing the efficiency of suffix stripping approaches for Portuguese stemming

Abstract

Cite

Register to see more suggestions