Slovene multi-word units: Identification, categorization, and representation

Polona Gantar; Jaka Čibej; Mija Bon

Conference Proceedings

Slovene multi-word units: Identification, categorization, and representation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11755 LNAI 99-112

DOI: 10.1007/978-3-030-30135-4_8

0Citations

3Readers

Get full text

Abstract

In this paper, we present the results of a manual annotation of a Slovene training corpus with multi-word units (MWUs) relevant for inclusion in a lexicon of Slovene MWUs. We analyze the annotations in terms of (a) the frequency with which a string has been identified as a MWU, (b) the degree to which the annotators agree on the category of the identified MWU, and (c) the degree to which the annotators agree on the range of the MWU in terms of its lexicalized elements. The results of the analysis will be useful in different stages of the compilation of a Slovene MWU lexicon. The list of dictionary-relevant MWUs obtained in the annotation task will be used to enrich the lexicon and to train models for the automatic identification of MWUs in running text. The findings will also help revise the criteria for the identification and categorization of dictionary-relevant MWUs in relation to free phrases, as well as more clearly define the distinction between the lexicalized elements of MWUs and the more or less stable elements of their textual environment, which will be useful when determining the canonical forms of MWUs in the lexicon on one hand and their relation to their variable elements and syntactic conversions on the other.

Author supplied keywords

Cite

CITATION STYLE

APA

Gantar, P., Čibej, J., & Bon, M. (2019). Slovene multi-word units: Identification, categorization, and representation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11755 LNAI, pp. 99–112). Springer. https://doi.org/10.1007/978-3-030-30135-4_8

Slovene multi-word units: Identification, categorization, and representation

Abstract

Author supplied keywords

Cite

Register to see more suggestions