Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set.We applied this method on the French CALOR-FRAME corpus to develop the CALOR-QUEST resource presented in this paper.
CITATION STYLE
Béchet, F., Aloui, C., Charlet, D., Damnati, G., Heinecke, J., Nasr, A., & Herlédan, F. (2019). Calor-quest : Generating a training corpus for machine reading comprehension models from shallow semantic annotations. In MRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering (pp. 19–26). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d19-5803
Mendeley helps you to discover research relevant for your work.