Pos word class based categorization of gurmukhi language stemmed stop words

12Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Literature in Indian language must be classified for its easy retrieval. In Punjabi literature classifier, five different categories: nature, romantic, religious, patriotic and philosophical, are manually populated with 250 poems. These poems are pre-processed through data cleaning, tokenization, bag of word, stop word identification and stemming phases. Due to unavailability of Punjabi stop words in public domain, manual collection of 256 stop words are done from poetry and articles. After stemming, 184 unique stemmed words are identified. Based on part of speech tagging, 184 stop words are categorized into 98 adverbs, 7 conjunctions, 43 verbs, 24 pronouns and 12 miscellaneous words. These unique 184 stemmed words are being released for other language processing algorithm in Punjabi. This paper concentrates on providing better and deeper understanding of Punjabi stop words in lieu of Punjabi grammar and part of speech based word class categorization.

Cite

CITATION STYLE

APA

Jasleen, K., & Jatinderkumar, R. S. (2016). Pos word class based categorization of gurmukhi language stemmed stop words. In Smart Innovation, Systems and Technologies (Vol. 51, pp. 3–10). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-30927-9_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free