Synthetic Data Generation and Multi-Task Learning for Extracting Temporal Information from Health-Related Narrative Text

3Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

Abstract

Extracting temporal information is critical to process health-related text. Temporal information extraction is a challenging task for language models because it requires processing both texts and numbers. Moreover, the fundamental challenge is how to obtain a large-scale training dataset. To address this, we propose a synthetic data generation algorithm. Also, we propose a novel multi-task temporal information extraction model and investigate whether multi-task learning can contribute to performance improvement by exploiting additional training signals with the existing training data. For experiments, we collected a custom dataset containing unstructured texts with temporal information of sleep-related activities. Experimental results show that utilising synthetic data can improve the performance when the augmentation factor is 3. The results also show that when multi-task learning is used with an appropriate amount of synthetic data, the performance can significantly improve from 82. to 88.6 and from 83.9 to 91.9 regarding micro-and macro-average exact match scores of normalised time prediction, respectively.

Cite

CITATION STYLE

APA

Shim, H., Lowet, D., Luca, S., & Vanrumste, B. (2021). Synthetic Data Generation and Multi-Task Learning for Extracting Temporal Information from Health-Related Narrative Text. In W-NUT 2021 - 7th Workshop on Noisy User-Generated Text, Proceedings of the Conference (pp. 260–273). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.wnut-1.29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free