Maintaining accessibility of biomedical literature databases has led to development of text classification systems to assist human indexers by recommending thematic categories to biomedical articles. These systems rely on using machine learning methods to learn the association between the document terms and predefined categories. The accuracy of a text classification method depends on the metric used in order to assign a weight to each term. Weighting metrics can be classified as supervised or unsupervised according to whether they use prior information on the number of documents belonging to each category. In this paper, we propose two supervised weighting metrics (One-way Klosgen and Loevinger) which both improve the quality of biomedical document classification. We also show that by using moment generating function centroids, an alternative to the traditional arithmetical average centroids, a nearest centroid classifier with Loevinger metric performs significantly better than SVM on a biomedical text classification task.
CITATION STYLE
Haddoud, M., Mokhtari, A., Lecroq, T., & Abdeddaïm, S. (2016). Supervised term weights for biomedical text classification: Improvements in nearest centroid computation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9874 LNCS, pp. 98–113). Springer Verlag. https://doi.org/10.1007/978-3-319-44332-4_8
Mendeley helps you to discover research relevant for your work.