We demonstrate a simple yet effective approach to augmenting training data for multilingual named entity recognition using machine translation. The named entity spans from the original sentences are transferred to the translations via word alignment and then filtered with the baseline recognizer to retain high quality annotations. The proposed data augmentation approach improves the baseline performance of XLM-Roberta on the multilingual dataset.
CITATION STYLE
Poncelas, A., Tkachenko, M., & Htun, O. (2023). Sakura at SemEval-2023 Task 2: Data Augmentation via Translation. In 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop (pp. 1718–1722). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.semeval-1.239
Mendeley helps you to discover research relevant for your work.