Text Recognition with k-means Clustering

  • Iman Jamnejad M
  • Heidarzadegan A
  • Meshki M
N/ACitations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations. This paper proposes an innovative approach to improve the classification performance of Persian texts considering a very large thesaurus. The paper proposes a flexible method to recognize and categorize the Persian texts employing a thesaurus as a helpful knowledge. In the corpus, when utilizing the thesaurus the method obtains a more representative set of word-frequencies comparing to those obtained when the method disables the thesaurus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. The k-nearest neighbor classifier, decision tree classifier and k-means clustering algorithm are employed as classifier over the frequency based features. Experimental results indicate enabling thesaurus causes the method significantly outperforms in text classification and clustering.

Cite

CITATION STYLE

APA

Iman Jamnejad, M., Heidarzadegan, A., & Meshki, M. (2014). Text Recognition with k-means Clustering. Research in Computing Science, 84(1), 29–40. https://doi.org/10.13053/rcs-84-1-3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free