We describe the work carried out by the DCU-ADAPT team on the Lexical Normalisation shared task at W-NUT 2015. We train a generalised perceptron to annotate noisy text with edit operations that normalise the text when executed. Features are character n-grams, recurrent neural network language model hidden layer activations, character class and eligibility for editing according to the task rules. We combine predictions from 25 models trained on subsets of the training data by selecting the most-likely normalisation according to a character language model. We compare the use of a generalised perceptron to the use of conditional random fields restricted to smaller amounts of training data due to memory constraints. Furthermore, we make a first attempt to verify Chrupała (2014)'s hypothesis that the noisy channel model would not be useful due to the limited amount of training data for the source language model, i.e. the language model on normalised text.
CITATION STYLE
Wagner, J., & Foster, J. (2015). DCU-ADAPT: Learning Edit Operations for Microblog Normalisation with the Generalised Perceptron. In ACL-IJCNLP 2015 - Workshop on Noisy User-Generated Text, WNUT 2015 - Proceedings of the Workshop (pp. 93–98). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-4314
Mendeley helps you to discover research relevant for your work.