LexiClean: An annotation tool for rapid multi-task lexical normalisation

8Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

Abstract

NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean1, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialiseddomain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at: https://youtu.be/P7_ooKrQPDU.

Cite

CITATION STYLE

APA

Bikaun, T., French, T., Hodkiewicz, M., Stewart, M., & Liu, W. (2021). LexiClean: An annotation tool for rapid multi-task lexical normalisation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 212–219). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-demo.25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free