Text Recognition with k-means Clustering

Mohammad Iman Jamnejad; Ali Heidarzadegan; Mohsen Meshki

Journal Article

Text Recognition with k-means Clustering

Iman Jamnejad M
Heidarzadegan A
Meshki M

Research in Computing Science (2014) 84(1) 29-40

DOI: 10.13053/rcs-84-1-3

N/ACitations

9Readers

Get full text

Abstract

A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations. This paper proposes an innovative approach to improve the classification performance of Persian texts considering a very large thesaurus. The paper proposes a flexible method to recognize and categorize the Persian texts employing a thesaurus as a helpful knowledge. In the corpus, when utilizing the thesaurus the method obtains a more representative set of word-frequencies comparing to those obtained when the method disables the thesaurus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. The k-nearest neighbor classifier, decision tree classifier and k-means clustering algorithm are employed as classifier over the frequency based features. Experimental results indicate enabling thesaurus causes the method significantly outperforms in text classification and clustering.

Cite

CITATION STYLE

APA

Iman Jamnejad, M., Heidarzadegan, A., & Meshki, M. (2014). Text Recognition with k-means Clustering. Research in Computing Science, 84(1), 29–40. https://doi.org/10.13053/rcs-84-1-3

Text Recognition with k-means Clustering

Abstract

Cite

Register to see more suggestions