Influence of stop-words removal on sequence patterns identification within comparable corpora

Daša Munková; Michal Munk; Martin Vozár

Conference Proceedings

Influence of stop-words removal on sequence patterns identification within comparable corpora

Advances in Intelligent Systems and Computing (2014) 231 67-76

DOI: 10.1007/978-3-319-01466-1_6

23Citations

48Readers

Get full text

Abstract

Short texts like advertisements are characterised by a number of slogans, phrases, words, symbols etc. To improve the quality of textual data, it is necessary to filter out noise textual data from important data. The aim of this work is to determine to what extent it is necessary to carry out the time consuming data pre-processing in the process of discovering sequential patterns in English and Slovak advertisement corpora. For this purpose, an experiment was conducted focusing on data pre-processing in these two comparable corpora. We try to find out to what extent removing the stop words has an influence on a quantity and quality of extracted rules. Stop words removal has no impact on the quantity and quality of extracted rules in English as well as in Slovak advertisement corpora. Only language has a significant impact on the quantity and quality of extracted rules.

Author supplied keywords

Cite

CITATION STYLE

APA

Munková, D., Munk, M., & Vozár, M. (2014). Influence of stop-words removal on sequence patterns identification within comparable corpora. In Advances in Intelligent Systems and Computing (Vol. 231, pp. 67–76). Springer Verlag. https://doi.org/10.1007/978-3-319-01466-1_6

Influence of stop-words removal on sequence patterns identification within comparable corpora

Abstract

Author supplied keywords

Cite

Register to see more suggestions