Abstract
Relation extraction is a subtask of natural language processing that has seen many improvements in recent years, with the advent of complex pre-trained architectures. Many of these state-of-the-art approaches are tested against benchmarks with labelled sentences containing tagged entities, and require important pre-training and fine-tuning on task-specific data. However, in a real use-case scenario such as in a newspaper company mostly dedicated to local information, relations are of varied, highly specific type, with virtually no annotated data for such relations, and many entities co-occur in a sentence without being related. We question the use of supervised state-of-the-art models in such a context, where resources such as time, computing power and human annotators are limited. To adapt to these constraints, we experiment with an active-learning based relation extraction pipeline, consisting of a binary LSTM-based lightweight model for detecting the relations that do exist, and a state-of-the-art model for relation classification. We compare several choices for classification models in this scenario, from basic word embedding averaging, to graph neural networks and Bert-based ones, as well as several active learning acquisition strategies, in order to find the most cost-efficient yet accurate approach in our French largest daily newspaper company's use case.
Cite
CITATION STYLE
Mallart, C., Le Nouy, M., Gravier, G., & Sébillot, P. (2021). Active Learning for Interactive Relation Extraction in a French Newspaper’s Articles. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 886–894). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_101
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.