Unsupervised text segmentation based on native language characteristics

7Citations
Citations of this article
107Readers
Mendeley users who have this article in their library.

Abstract

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.

Cite

CITATION STYLE

APA

Malmasi, S., Dras, M., Johnson, M., Du, L., & Wolska, M. (2017). Unsupervised text segmentation based on native language characteristics. In ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 1, pp. 1457–1469). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/P17-1134

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free