How interesting and coherent are the stories generated by a large-scale neural language model? Comparing human and automatic evaluations of machine-generated text

Dominic Callan; Jennifer Foster

Journal ArticleOPEN ACCESS

How interesting and coherent are the stories generated by a large-scale neural language model? Comparing human and automatic evaluations of machine-generated text

Expert Systems (2023) 40(6)

DOI: 10.1111/exsy.13292

3Citations

19Readers

Abstract

Evaluation of the narrative text generated by machines has traditionally been a challenge, particularly when attempting to evaluate subjective elements such as interest or believability. Recent improvements in narrative machine text generation have been largely driven by the emergence of transformer-based language models, trained on massive quantities of data, resulting in higher quality text generation. In this study, a corpus of stories is generated using the pre-trained GPT-Neo transformer model, with human-written prompts as inputs upon which to base the narrative text. The stories generated through this process are subsequently evaluated through both human evaluation and two automated metrics: BERTScore and BERT Next Sentence Prediction, with the aim of determining whether there is a correlation between the automatic scores and the human judgements. The results show variation in human evaluation results in comparison to modern automated metrics, suggesting further work is required to train automated metrics to identify text that is defined as interesting by humans.

Author supplied keywords

Cite

CITATION STYLE

APA

Callan, D., & Foster, J. (2023). How interesting and coherent are the stories generated by a large-scale neural language model? Comparing human and automatic evaluations of machine-generated text. Expert Systems, 40(6). https://doi.org/10.1111/exsy.13292

How interesting and coherent are the stories generated by a large-scale neural language model? Comparing human and automatic evaluations of machine-generated text

Abstract

Author supplied keywords

Cite

Register to see more suggestions