Parts-of-Speech (PoS) Analysis and Classification of Various Text Genres

  • Mendhakar A
  • H S D
N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Natural language processing (NLP) has made significant leaps over the past two decades due to the advancements in machine learning algorithms. Text classification is pivotal today due to a wide range of digital documents. Multiple feature classes have been proposed for classification by numerous researchers. Genre classification tasks form the basis for advanced techniques such as native language identification, readability assessment, author identification etc. These tasks are based on the linguistic composition and complexity of the text. Rather than extracting hundreds of variables, a simple premise of text classification using only the text feature of parts-of-speech (PoS) is presented here. A new dataset gathered from Project Gutenberg is highlighted in this study. PoS analysis of each text in the created dataset was carried out. Further grouping of these texts into fictional and non-fictional texts was carried out to measure their classification accuracy using the artificial neural networks (ANN) classifier. The results indicate an overall classification accuracy of 98 and 35 % for the genre and sub-genre classification, respectively. The results of the present study highlight the importance of PoS not only as an important feature for text processing but also as a sole text feature classifier for text classification.

Cite

CITATION STYLE

APA

Mendhakar, A., & H S, D. (2024). Parts-of-Speech (PoS) Analysis and Classification of Various Text Genres. Corpus-Based Studies across Humanities, 1(1), 99–131. https://doi.org/10.1515/csh-2023-0002

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free