Automatic text classification is a very important task that consists in assigning labels (categories, groups, classes) to a given text based on a set of previously labeled texts called training set. The work presented in this paper treats the problem of automatic topical text categorization. It is a supervised classification because it works on a predefined set of classes and topical because it uses topics or subjects of texts as classes. In this context, we used a new approach based on k-NN algorithm, as well as a new set of pseudo-distances (distance metrics) known in the field of language identification. We also proposed a simple and effective method to improve the quality of performed categorization.
CITATION STYLE
Gadri, S., & Moussaoui, A. (2017). Application of a new set of pseudo-distances in documents categorization. Neural Network World, 27(2), 231–245. https://doi.org/10.14311/NNW.2017.27.011
Mendeley helps you to discover research relevant for your work.