Automatic genre identification: a survey

Taja Kuzman; Nikola Ljubešić

Journal ArticleOPEN ACCESS

Automatic genre identification: a survey

Language Resources and Evaluation (2023)

DOI: 10.1007/s10579-023-09695-8

2Citations

7Readers

Abstract

Automatic genre identification (AGI) is a text classification task focused on genres, i.e., text categories defined by the author’s purpose, common function of the text, and the text’s conventional form. Obtaining genre information has been shown to be beneficial for a wide range of disciplines, including linguistics, corpus linguistics, computational linguistics, natural language processing, information retrieval and information security. Consequently, in the past 20 years, numerous researchers have collected genre datasets with the aim to develop an efficient genre classifier. However, their approaches to the definition of genre schemata, data collection and manual annotation vary substantially, resulting in significantly different datasets. As most AGI experiments are dataset-dependent, a sufficient understanding of the differences between the available genre datasets is of great importance for the researchers venturing into this area. In this paper, we present a detailed overview of different approaches to each of the steps of the AGI task, from the definition of the genre concept and the genre schema, to the dataset collection and annotation methods, and, finally, to machine learning strategies. Special focus is dedicated to the description of the most relevant genre schemata and datasets, and details on the availability of all of the datasets are provided. In addition, the paper presents the recent advances in machine learning approaches to automatic genre identification, and concludes with proposing the directions towards developing a stable multilingual genre classifier.

Author supplied keywords

Cite

CITATION STYLE

APA

Kuzman, T., & Ljubešić, N. (2023). Automatic genre identification: a survey. Language Resources and Evaluation. https://doi.org/10.1007/s10579-023-09695-8

Automatic genre identification: a survey

Abstract

Author supplied keywords

Cite

Register to see more suggestions