ECG-QALM: Entity-Controlled Synthetic Text Generation using Contextual Q&A for NER

4Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Named Entity Recognition (NER) state-of-the-art methods requires high-quality labeled datasets. Issues such as scarcity of labeled data, under-representation of entities, and privacy concerns with using sensitive data for training, can be significant barriers. Generating synthetic data to train models is a promising solution to mitigate these problems. We propose ECG-QALM, a contextual question and answering approach using pre-trained language models to synthetically generate entity-controlled text. Generated text is then used to augment small labeled datasets for downstream NER tasks. We evaluate our method on two publicly available datasets. We find ECG-QALM is capable of producing full text samples with desired entities appearing in a controllable way, while retaining sentence coherence closest to the real world data. Evaluations on NER tasks show significant improvements (75% - 140%) in low-labeled data regimes.

Cite

CITATION STYLE

APA

Aggarwal, K., Jin, H., & Ahmad, A. (2023). ECG-QALM: Entity-Controlled Synthetic Text Generation using Contextual Q&A for NER. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 5649–5660). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.349

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free