That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models

Gabriele Sarti; Dominique Brunato; Felice Dell'orletta

Conference ProceedingsOPEN ACCESS

That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models

CMCL 2021 - Workshop on Cognitive Modeling and Computational Linguistics, Proceedings (2021) 48-60

DOI: 10.18653/v1/2021.cmcl-1.5

13Citations

57Readers

Abstract

This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how they are modeled in a neural language model (NLM). The first perspective takes into account multiple online behavioral metrics obtained from eye-tracking recordings. The second one concerns the offline perception of complexity measured by explicit human judgments. Using a broad spectrum of linguistic features modeling lexical, morpho-syntactic, and syntactic properties of sentences, we perform a comprehensive analysis of linguistic phenomena associated with the two complexity viewpoints and report similarities and differences. We then show the effectiveness of linguistic features when explicitly leveraged by a regression model for predicting sentence complexity and compare its results with the ones obtained by a fine-tuned neural language model. We finally probe the NLM's linguistic competence before and after fine-tuning, highlighting how linguistic information encoded in representations changes when the model learns to predict complexity.

Cite

CITATION STYLE

APA

Sarti, G., Brunato, D., & Dell’orletta, F. (2021). That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models. In CMCL 2021 - Workshop on Cognitive Modeling and Computational Linguistics, Proceedings (pp. 48–60). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.cmcl-1.5

That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models

Abstract

Cite

Register to see more suggestions