Effectiveness of syntactic information for document classification

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper compares effectiveness of document classification algorithms for a highly inflectional/derivational language that forms monolithic compound noun terms, like Korean. The system is composed of three phases: (1) a Korean morphological analyser called HAM [10], (2) compound noun phrase analysis and extraction of terms whose syntactic categories are noun, proper noun, verb, and adjective, and (3) various document classification algorithms based on preferred class score heuristics. We focus on the comparison of document classification methods including a simple voting method, and preferred class score heuristics employing two factors, namely ICF (inverse class frequency) and IDF (inverse document frequency) with/without term frequency weighting. In addition, this paper compares algorithms that use different class feature sets filtered by four syntactic categories. Compared to the results of algorithms that are not using syntactic information for class feature sets, the algorithms using syntactic information for class feature sets shows performance differences in this paper by -3.3% - 4.7%. Of the 20 algorithms that were tested, the algorithms, PCSIDF FV (i.e. Filtering Verb Terms) and Weighted PCSIDF FV, show the best performance (74.2% of F-measurement ratio). In the case of the Weighted PCSICF algorithm, the use of syntactic information for selection of class feature sets decreased the performance on document classification by 1.3 - 3.3%.

Cite

CITATION STYLE

APA

Min, K., & Wilson, W. H. (2003). Effectiveness of syntactic information for document classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2903, pp. 992–1002). Springer Verlag. https://doi.org/10.1007/978-3-540-24581-0_85

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free