Influence of stop-words removal on sequence patterns identification within comparable corpora

23Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Short texts like advertisements are characterised by a number of slogans, phrases, words, symbols etc. To improve the quality of textual data, it is necessary to filter out noise textual data from important data. The aim of this work is to determine to what extent it is necessary to carry out the time consuming data pre-processing in the process of discovering sequential patterns in English and Slovak advertisement corpora. For this purpose, an experiment was conducted focusing on data pre-processing in these two comparable corpora. We try to find out to what extent removing the stop words has an influence on a quantity and quality of extracted rules. Stop words removal has no impact on the quantity and quality of extracted rules in English as well as in Slovak advertisement corpora. Only language has a significant impact on the quantity and quality of extracted rules.

Cite

CITATION STYLE

APA

Munková, D., Munk, M., & Vozár, M. (2014). Influence of stop-words removal on sequence patterns identification within comparable corpora. In Advances in Intelligent Systems and Computing (Vol. 231, pp. 67–76). Springer Verlag. https://doi.org/10.1007/978-3-319-01466-1_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free