AL4LA: Active Learning for Text Labeling Based on Paragraph Vectors

Damián Nimo-Járquez; Margarita Narvaez-Rios; Mario Rivas; Andrés Yáñez; Guillermo Bárcena-González; M. Paz Guerrero-Lebrero; Elisa Guerrero; Pedro L. Galindo

Conference Proceedings

AL4LA: Active Learning for Text Labeling Based on Paragraph Vectors

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11506 LNCS 679-687

DOI: 10.1007/978-3-030-20521-8_56

0Citations

6Readers

Get full text

Abstract

Nowadays, despite the huge amount of digitized information, the biggest drawback to use machine learning in text mining is the lack of availability of a set of tagged data due to mainly, that it requires a great user effort that it is not always viable. In this paper, with the aim of reducing the great workload required to manually processing the contents of large volumes of documents, we present a methodology based on probabilistic inference and active learning to label documents in Spanish using a semi-supervised approach. First, a vector representation of the documents is generated, and then an interactive learning process to apply both, automatic and manual labeling is proposed. To evaluate the accuracy of the predictions and the efficiency of the methodology, different configurations regarding the automatic and manual labeling processes have been studied. The proposed methodology reduces the need for a large corpus of manually labeled texts by introducing a self-labeling process during training. We have shown that both tagging approaches can be combined maintaining accuracy and reducing user intervention.

Author supplied keywords

Cite

CITATION STYLE

APA

Nimo-Járquez, D., Narvaez-Rios, M., Rivas, M., Yáñez, A., Bárcena-González, G., Guerrero-Lebrero, M. P., … Galindo, P. L. (2019). AL4LA: Active Learning for Text Labeling Based on Paragraph Vectors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11506 LNCS, pp. 679–687). Springer Verlag. https://doi.org/10.1007/978-3-030-20521-8_56

AL4LA: Active Learning for Text Labeling Based on Paragraph Vectors

Abstract

Author supplied keywords

Cite

Register to see more suggestions