AL4LA: Active Learning for Text Labeling Based on Paragraph Vectors

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Nowadays, despite the huge amount of digitized information, the biggest drawback to use machine learning in text mining is the lack of availability of a set of tagged data due to mainly, that it requires a great user effort that it is not always viable. In this paper, with the aim of reducing the great workload required to manually processing the contents of large volumes of documents, we present a methodology based on probabilistic inference and active learning to label documents in Spanish using a semi-supervised approach. First, a vector representation of the documents is generated, and then an interactive learning process to apply both, automatic and manual labeling is proposed. To evaluate the accuracy of the predictions and the efficiency of the methodology, different configurations regarding the automatic and manual labeling processes have been studied. The proposed methodology reduces the need for a large corpus of manually labeled texts by introducing a self-labeling process during training. We have shown that both tagging approaches can be combined maintaining accuracy and reducing user intervention.

Cite

CITATION STYLE

APA

Nimo-Járquez, D., Narvaez-Rios, M., Rivas, M., Yáñez, A., Bárcena-González, G., Guerrero-Lebrero, M. P., … Galindo, P. L. (2019). AL4LA: Active Learning for Text Labeling Based on Paragraph Vectors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11506 LNCS, pp. 679–687). Springer Verlag. https://doi.org/10.1007/978-3-030-20521-8_56

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free