Literature in Indian language must be classified for its easy retrieval. In Punjabi literature classifier, five different categories: nature, romantic, religious, patriotic and philosophical, are manually populated with 250 poems. These poems are pre-processed through data cleaning, tokenization, bag of word, stop word identification and stemming phases. Due to unavailability of Punjabi stop words in public domain, manual collection of 256 stop words are done from poetry and articles. After stemming, 184 unique stemmed words are identified. Based on part of speech tagging, 184 stop words are categorized into 98 adverbs, 7 conjunctions, 43 verbs, 24 pronouns and 12 miscellaneous words. These unique 184 stemmed words are being released for other language processing algorithm in Punjabi. This paper concentrates on providing better and deeper understanding of Punjabi stop words in lieu of Punjabi grammar and part of speech based word class categorization.
CITATION STYLE
Jasleen, K., & Jatinderkumar, R. S. (2016). Pos word class based categorization of gurmukhi language stemmed stop words. In Smart Innovation, Systems and Technologies (Vol. 51, pp. 3–10). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-30927-9_1
Mendeley helps you to discover research relevant for your work.