#Turki$hTweets: A benchmark dataset for Turkish text correction

8Citations
Citations of this article
65Readers
Mendeley users who have this article in their library.

Abstract

#Turki$hTweets is a benchmark dataset for the task of correcting the user misspellings, with the purpose of introducing the first public Turkish dataset in this area. #Turki$hTweets provides correct/incorrect word annotations with a detailed misspelling category formulation based on the real user data. We evaluated four state-of-the-art approaches on our dataset to present a preliminary analysis for the sake of reproducibility. The annotated dataset is publicly available at https://github.com/atubakoksal/annotated_tweets.

Cite

CITATION STYLE

APA

Koksal, A. T., Bozal, O., Yurekli, E., & Gezici, G. (2020). #Turki$hTweets: A benchmark dataset for Turkish text correction. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 4190–4198). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.374

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free