An automatic text document classification using modified weight and semantic method

K. Meena; R. Lawrance

Journal Article

An automatic text document classification using modified weight and semantic method

International Journal of Innovative Technology and Exploring Engineering (2019) 8(12) 2608-2622

DOI: 10.35940/ijitee.K2123.1081219

5Citations

14Readers

Get full text

Abstract

Text mining is the process of transformation of useful information from the structured or unstructured sources. In text mining, feature extraction is one of the vital parts. This paper analyses some of the feature extraction methods and proposed the enhanced method for feature extraction. Term Frequency-Inverse Document Frequency(TF-IDF) method only assigned weight to the term based on the occurrence of the term. Now, it is enlarged to increases the weight of the most important words and decreases the weight of the less important words. This enlarged method is called as M-TF-IDF. This method does not consider the semantic similarity between the terms. Hence, Latent Semantic Analysis(LSA) method is used for feature extraction and dimensionality reduction. To analyze the performance of the proposed feature extraction methods, two benchmark datasets like Reuter-21578-R8 and 20 news group and two real time datasets like descriptive type answer dataset and crime news dataset are used. This paper used this proposed method for descriptive type answer evaluation. Manual evaluation of descriptive type paper may lead to discrepancy in the mark. It is eliminated by using this type of evaluation. The proposed method has been tested with answers written by learners of our department. It allows more accurate assessment and more effective evaluation of the learning process. This method has a lot of benefits such as reduced time and effort, efficient use of resources, reduced burden on the faculty and increased reliability of results. This proposed method also used to analyze the documents which contain the details about in and around Madurai city. Madurai is a sensitive place in the southern area of Tamilnadu in India. It has been collected from the Hindu archives. This news document has been classified like crime or not. It is also used to check in which month most crime rate occurs. This analysis used to reduce the crime rate in future. The classification algorithm Support Vector Machine(SVM) used to classify the dataset. The experimental analysis and results show that the performances of the proposed feature extraction methods are outperforming the existing feature extraction methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Meena, K., & Lawrance, R. (2019). An automatic text document classification using modified weight and semantic method. International Journal of Innovative Technology and Exploring Engineering, 8(12), 2608–2622. https://doi.org/10.35940/ijitee.K2123.1081219

An automatic text document classification using modified weight and semantic method

Abstract

Author supplied keywords

Cite

Register to see more suggestions