Parts-of-Speech (PoS) Analysis and Classification of Various Text Genres

Akshay Mendhakar; Darshan H S

Journal ArticleOPEN ACCESS

Parts-of-Speech (PoS) Analysis and Classification of Various Text Genres

Mendhakar A
H S D

Corpus-based Studies across Humanities (2024) 1(1) 99-131

DOI: 10.1515/csh-2023-0002

N/ACitations

10Readers

Abstract

Natural language processing (NLP) has made significant leaps over the past two decades due to the advancements in machine learning algorithms. Text classification is pivotal today due to a wide range of digital documents. Multiple feature classes have been proposed for classification by numerous researchers. Genre classification tasks form the basis for advanced techniques such as native language identification, readability assessment, author identification etc. These tasks are based on the linguistic composition and complexity of the text. Rather than extracting hundreds of variables, a simple premise of text classification using only the text feature of parts-of-speech (PoS) is presented here. A new dataset gathered from Project Gutenberg is highlighted in this study. PoS analysis of each text in the created dataset was carried out. Further grouping of these texts into fictional and non-fictional texts was carried out to measure their classification accuracy using the artificial neural networks (ANN) classifier. The results indicate an overall classification accuracy of 98 and 35 % for the genre and sub-genre classification, respectively. The results of the present study highlight the importance of PoS not only as an important feature for text processing but also as a sole text feature classifier for text classification.

Cite

CITATION STYLE

APA

Mendhakar, A., & H S, D. (2024). Parts-of-Speech (PoS) Analysis and Classification of Various Text Genres. Corpus-Based Studies across Humanities, 1(1), 99–131. https://doi.org/10.1515/csh-2023-0002

Parts-of-Speech (PoS) Analysis and Classification of Various Text Genres

Abstract

Cite

Register to see more suggestions