An analysis of the ability of statistical language models to capture the structural properties of language

Aneiss Ghodsi; John De Nero

Conference ProceedingsOPEN ACCESS

An analysis of the ability of statistical language models to capture the structural properties of language

INLG 2016 - 9th International Natural Language Generation Conference, Proceedings of the Conference (2016) 227-231

DOI: 10.18653/v1/w16-6637

4Citations

71Readers

Abstract

We investigate the characteristics and quantifiable predispositions of both n-gram and recurrent neural language models in the framework of language generation. In modern applications, neural models have been widely adopted, as they have empirically provided better results. However, there is a lack of deep analysis of the models and how they relate to real language and its structural properties. We attempt to perform such an investigation by analyzing corpora generated by sampling from the models. The results are compared to each other and to the results of the same analysis applied to the training corpus. We carried out these experiments on varieties of Kneser- Ney smoothed n-gram models and basic recurrent neural language models. Our results reveal a number of distinctive characteristics of each model, and offer insights into their behavior. Our general approach also provides a framework in which to perform further analysis of language models.

Cite

CITATION STYLE

APA

Ghodsi, A., & De Nero, J. (2016). An analysis of the ability of statistical language models to capture the structural properties of language. In INLG 2016 - 9th International Natural Language Generation Conference, Proceedings of the Conference (pp. 227–231). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-6637

An analysis of the ability of statistical language models to capture the structural properties of language

Abstract

Cite

Register to see more suggestions