A neural network document classifier with linguistic feature selection

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this article, a neural network document classifier with linguistic feature selection and multi-category output is presented. It consists of a feature selection unit and a hierarchical neural network classification unit. In feature selection unit, we extract terms from some original documents by text processing, and then we analyze the conformity and uniformity of each term by entropy function which is characterized to measure the significance of term. Terms with high significance will be selected as input features for neural network document classifiers. In order to reduce the input dimension, we perform a mechanism to merge synonyms. According to the uniformity analysis, we obtain a term similarity matrix by fuzzy relation operation. By this method, we can construct a synonym thesaurus to reduce input dimension. In the hierarchical neural network classification unit, we adopt the well-known back-propagation learning model to build some proper hierarchical classification units. In our experiments, a product description database from an electronic commercial company is employed. The experimental results show that this classifier achieves sufficient accuracy to help human classification. It can save much manpower and working time for classifying a large database.

Cite

CITATION STYLE

APA

Lee, H. M., Chen, C. M., & Hwang, C. W. (2000). A neural network document classifier with linguistic feature selection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1821, pp. 555–560). Springer Verlag. https://doi.org/10.1007/3-540-45049-1_66

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free