Dialect features typically do not make it into formal writing, but flourish in social media. This enables large-scale variational studies. We focus on three phonological features of African American Vernacular English and their manifestation as spelling variations on Twitter. We discuss to what extent our data can be used to falsify eight sociolinguistic hypotheses. To go beyond the spelling level, we require automatic analysis such as POS tagging, but social media language still challenges language technologies. We show how both newswire- and Twitter-adapted state-of-the-art POS taggers perform significantly worse on AAVE tweets, suggesting that large-scale dialect studies of language variation beyond the surface level are not feasible with out-of-the-box NLP tools.
CITATION STYLE
Jørgensen, A. K., Hovy, D., & Søgaard, A. (2015). Challenges of studying and processing dialects in social media. In ACL-IJCNLP 2015 - Workshop on Noisy User-Generated Text, WNUT 2015 - Proceedings of the Workshop (pp. 9–18). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-4302
Mendeley helps you to discover research relevant for your work.