Abstract
Multi-label classification (MLC) has drawn much attention thanks to its usefulness and omnipresence in real-world applications in which objects may be characterized by more than one label as in the traditional approach. Getting multi-label examples is costly and time-consuming; therefore, semi-supervised learning approach should be considered to take advantages of both labelled and unlabelled data. In this work, we propose a semi-supervised MLC algorithm exploiting the specific features of the prominent class label(s) chosen by a greedy approach as an extension of the LIFT algorithm, and unlabelled data consumption mechanism from the TESC algorithm. We also make a semi-supervised MLC application framework for Vietnamese texts with several feature enrichment steps including (a) a stage of enriching features by adding hidden topic features and (b) a stage of dimensional reduction for subtracting irrelevant features. Experimental results on a dataset of hotel reviews (for tourism) indicate that a reasonable amount of unlabelled data helps to increase the F1 score. Interestingly, with a small amount of labelled data, our algorithm can reach a comparative performance to the case of using a larger amount of labelled data.
Author supplied keywords
Cite
CITATION STYLE
Pham, T. N., Nguyen, V. Q., Tran, V. H., Nguyen, T. T., & Ha, Q. T. (2017). A semi-supervised multi-label classification framework with feature reduction and enrichment. Journal of Information and Telecommunication, 1(4), 305–318. https://doi.org/10.1080/24751839.2017.1364925
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.