Twitter normalization via 1-to-N recovering

Yafeng Ren; Jiayuan Deng; Donghong Ji

Conference Proceedings

Twitter normalization via 1-to-N recovering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 10041 LNCS 19-34

DOI: 10.1007/978-3-319-48740-3_2

1Citations

7Readers

Get full text

Abstract

Twitter messages are written in an informal style,which hinders many information retrieval and natural language processing applications. Existing normalization systems have two major drawbacks. The first is that these methods largely require large-scale annotated training data. The second is that these systems assume that a nonstandard token is recovered to one standard word. However,there are many nonstandard tokens that should be recovered to two or more standard words,so the problem remains to be highly challenging. To address the above issues,we propose an unsupervised normalization system based on the context similarity. The proposed system does not require any annotated data. Meanwhile,a nonstandard token will be recovered to one or more standard words. Results show that the proposed approach achieves stateof-the-art performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Ren, Y., Deng, J., & Ji, D. (2016). Twitter normalization via 1-to-N recovering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10041 LNCS, pp. 19–34). Springer Verlag. https://doi.org/10.1007/978-3-319-48740-3_2

Twitter normalization via 1-to-N recovering

Abstract

Author supplied keywords

Cite

Register to see more suggestions