Support Vector Machines (SVM) can classify objects described by an effectively infinite-dimensional feature vector. This gives them the ability to use counts of different words in a document, i.e. more than 100000 words, directly for classification. In this paper we describe the results of a large number of experiments of different preprocessing strategies to generate effective input features. It turns out that n-grams of syllables and phonemes are especially effective for classification.
CITATION STYLE
Paaß, G., Kindermann, J., & Leopold, E. (2004). Text Classification of News Articles with Support Vector Machines (pp. 53–64). https://doi.org/10.1007/978-3-540-45219-5_5
Mendeley helps you to discover research relevant for your work.