NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean1, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialiseddomain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at: https://youtu.be/P7_ooKrQPDU.
CITATION STYLE
Bikaun, T., French, T., Hodkiewicz, M., Stewart, M., & Liu, W. (2021). LexiClean: An annotation tool for rapid multi-task lexical normalisation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 212–219). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-demo.25
Mendeley helps you to discover research relevant for your work.