Twitter normalization via 1-to-N recovering

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Twitter messages are written in an informal style,which hinders many information retrieval and natural language processing applications. Existing normalization systems have two major drawbacks. The first is that these methods largely require large-scale annotated training data. The second is that these systems assume that a nonstandard token is recovered to one standard word. However,there are many nonstandard tokens that should be recovered to two or more standard words,so the problem remains to be highly challenging. To address the above issues,we propose an unsupervised normalization system based on the context similarity. The proposed system does not require any annotated data. Meanwhile,a nonstandard token will be recovered to one or more standard words. Results show that the proposed approach achieves stateof-the-art performance.

Cite

CITATION STYLE

APA

Ren, Y., Deng, J., & Ji, D. (2016). Twitter normalization via 1-to-N recovering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10041 LNCS, pp. 19–34). Springer Verlag. https://doi.org/10.1007/978-3-319-48740-3_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free