Language processing tools suffer from significant performance drops in social media domain due to its continuously evolving language. Transforming non-standard words into their standard forms has been studied as a step towards proper processing of ill-formed texts. This work describes a normalization system that considers contextual and lexical similarities between standard and non-standard words for removing noise in texts. A bipartite graph that represents contexts shared by words in a large unlabeled text corpus is utilized for exploring normalization candidates via random walks. Input context of a non-standard word in a given sentence is tailored in cases where a direct match to shared contexts is not possible. The performance of the system was evaluated on Turkish social media texts.
CITATION STYLE
Demir, S. (2016). Context tailoring for text normalization. In Proceedings of TextGraphs@NAACL-HLT 2016: The 10th Workshop on Graph-Based Methods for Natural Language Processing (pp. 6–14). The Association for Computer Linguistics. https://doi.org/10.18653/v1/w16-1402
Mendeley helps you to discover research relevant for your work.