Introducing the notion of ‘contrast’ features for language technology

Marina Santini; Benjamin Danielsson; Arne Jönsson

Conference Proceedings

Introducing the notion of ‘contrast’ features for language technology

Communications in Computer and Information Science (2019) 1062 189-198

DOI: 10.1007/978-3-030-27684-3_24

0Citations

1Readers

Get full text

Abstract

In this paper, we explore whether there exist ‘contrast’ features that help recognize if a text variety is a genre or a domain. We carry out our experiments on the text varieties that are included in the Swedish national corpus, called Stockholm-Umeå Corpus or SUC, and build several text classification models based on text complexity features, grammatical features, bag-of-words features and word embeddings. Results show that text complexity features and grammatical features systematically perform better on genres rather than on domains. This indicates that these features can be used as ‘contrast’ features because, when in doubt about the nature of a text category, they help bring it to light.

Author supplied keywords

Cite

CITATION STYLE

APA

Santini, M., Danielsson, B., & Jönsson, A. (2019). Introducing the notion of ‘contrast’ features for language technology. In Communications in Computer and Information Science (Vol. 1062, pp. 189–198). Springer Verlag. https://doi.org/10.1007/978-3-030-27684-3_24

Introducing the notion of ‘contrast’ features for language technology

Abstract

Author supplied keywords

Cite

Register to see more suggestions