The performance of classification models extremely relies on the quality of training data. However, label imperfection is an inherent fault of training data, which is impossible manually handled in big data environment. Various methods have been proposed to remove label noises in order to improve classification quality, with the side effect of cutting down data bulk. In this paper, we propose a knowledge based approach for tackling mislabeled multi-class big data, in which knowledge graph technique is combined with other data correction method to perceive and correct the error labels in big data. The knowledge graph is built with the medical concepts extracted from online health consulting and medical guidance. Experimental results show our knowledge graph based approach can effectively improve data quality and classification accuracy. Furthermore, this approach can be applied in other data mining tasks requiring deep understanding. © 2014 Springer International Publishing.
CITATION STYLE
Guo, M., Liu, Y., Li, J., Li, H., & Xu, B. (2014). A knowledge based approach for tackling mislabeled multi-class big social data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8465 LNCS, pp. 349–363). Springer Verlag. https://doi.org/10.1007/978-3-319-07443-6_24
Mendeley helps you to discover research relevant for your work.