Many tasks in text-based computational social science (CSS) involve the classification of political statements into categories based on a domain-specific codebook. In order to be useful for CSS analysis, these categories must be fine-grained. The typically skewed distribution of fine-grained categories, however, results in a challenging classification problem on the NLP side. This paper proposes to make use of the hierarchical relations among categories typically present in such codebooks: e.g., markets and taxation are both subcategories of economy, while borders is a subcategory of security. We use these ontological relations as prior knowledge to establish additional constraints on the learned model, thus improving performance overall and in particular for infrequent categories. We evaluate several lightweight variants of this intuition by extending state-of-the-art transformer-based text classifiers on two datasets and multiple languages. We find the most consistent improvement for an approach based on regularization.
CITATION STYLE
Dayanik, E., Blessing, A., Blokker, N., Haunss, S., Kuhn, J., Lapesa, G., & Padó, S. (2022). Improving Neural Political Statement Classification with Class Hierarchical Information. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 2367–2382). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-acl.186
Mendeley helps you to discover research relevant for your work.