Toward Tweets Normalization Using Maximum Entropy

Mohammad Arshi Saloot; Norisma Idris; Liyana Shuib; Ram Gopal Raj; Aiti Aw

Conference ProceedingsOPEN ACCESS

Toward Tweets Normalization Using Maximum Entropy

ACL-IJCNLP 2015 - Workshop on Noisy User-Generated Text, WNUT 2015 - Proceedings of the Workshop (2015) 19-27

DOI: 10.18653/v1/w15-4303

8Citations

88Readers

Abstract

The use of social network services and microblogs, such as Twitter, has created valuable text resources, which contain extremely noisy text. Twitter messages contain so much noise that it is difficult to use them in natural language processing tasks. This paper presents a new approach using the maximum entropy model for normalizing Tweets. The proposed approach addresses words that are unseen in the training phase. Although the maximum entropy needs a training dataset to adjust its parameters, the proposed approach can normalize unseen data in the training set. The principle of maximum entropy emphasizes incorporating the available features into a uniform model. First, we generate a set of normalized candidates for each out-of-vocabulary word based on lexical, phonemic, and morphophonemic similarities. Then, three different probability scores are calculated for each candidate using positional indexing, a dependency-based frequency feature and a language model. After the optimal values of the model parameters are obtained in a training phase, the model can calculate the final probability value for candidates. The approach achieved an 83.12 BLEU score in testing using 2,000 Tweets. Our experimental results show that the maximum entropy approach significantly outperforms previous well-known normalization approaches.

Cite

CITATION STYLE

APA

Saloot, M. A., Idris, N., Shuib, L., Raj, R. G., & Aw, A. (2015). Toward Tweets Normalization Using Maximum Entropy. In ACL-IJCNLP 2015 - Workshop on Noisy User-Generated Text, WNUT 2015 - Proceedings of the Workshop (pp. 19–27). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-4303

Toward Tweets Normalization Using Maximum Entropy

Abstract

Cite

Register to see more suggestions