An analysis of the ability of statistical language models to capture the structural properties of language

4Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.

Abstract

We investigate the characteristics and quantifiable predispositions of both n-gram and recurrent neural language models in the framework of language generation. In modern applications, neural models have been widely adopted, as they have empirically provided better results. However, there is a lack of deep analysis of the models and how they relate to real language and its structural properties. We attempt to perform such an investigation by analyzing corpora generated by sampling from the models. The results are compared to each other and to the results of the same analysis applied to the training corpus. We carried out these experiments on varieties of Kneser- Ney smoothed n-gram models and basic recurrent neural language models. Our results reveal a number of distinctive characteristics of each model, and offer insights into their behavior. Our general approach also provides a framework in which to perform further analysis of language models.

Cite

CITATION STYLE

APA

Ghodsi, A., & De Nero, J. (2016). An analysis of the ability of statistical language models to capture the structural properties of language. In INLG 2016 - 9th International Natural Language Generation Conference, Proceedings of the Conference (pp. 227–231). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-6637

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free