Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most studies on the linguistic information encoded by BERT primarily focus on English. Our study examines a monolingual German BERT model using a semantic classification task on newspaper articles, analysing the linguistic features influencing classification decisions through SHAP values. We use the TüBa-D/Z corpus, a resource with gold-standard annotations for a set of linguistic features, including POS, inflectional morphology, phrasal, clausal, and dependency structures. Semantic features of nouns are evaluated via the GermaNet ontology using shared hypernyms. Our results indicate that the features identified in English also affect classification in German but suggests important language- and task-specific features as well.

Cite

CITATION STYLE

APA

Beyer, H., & Frassinelli, D. (2023). Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification. In Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 (Vol. 4, pp. 28–39). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2025.naacl-srw.3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free