Twitter messages are written in an informal style,which hinders many information retrieval and natural language processing applications. Existing normalization systems have two major drawbacks. The first is that these methods largely require large-scale annotated training data. The second is that these systems assume that a nonstandard token is recovered to one standard word. However,there are many nonstandard tokens that should be recovered to two or more standard words,so the problem remains to be highly challenging. To address the above issues,we propose an unsupervised normalization system based on the context similarity. The proposed system does not require any annotated data. Meanwhile,a nonstandard token will be recovered to one or more standard words. Results show that the proposed approach achieves stateof-the-art performance.
CITATION STYLE
Ren, Y., Deng, J., & Ji, D. (2016). Twitter normalization via 1-to-N recovering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10041 LNCS, pp. 19–34). Springer Verlag. https://doi.org/10.1007/978-3-319-48740-3_2
Mendeley helps you to discover research relevant for your work.