Related factors of document classification performance in a highly inflectional language

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes relationships between the document classification performance and its relevant factors for a highly inflectional language that forms monolithic compound noun terms. The factors are the number of class feature sets, the size of training or testing document, ratio of overlapping class features among 8 classes, and ratio of non-overlapping class feature sets. The system is composed of three phases: a Korean morphological analyser called HAM [11], an application of compound noun phrase analysis and extraction of terms whose syntactic categories are noun, name, verb, and adjective, and an effective document classification algorithm based on preferred class score heuristics. The best algorithm in this paper, Weighted PCSICF based on inverse class frequency, shows an inverse proportional relationship between its performance and the number of class feature sets and the number of ratio of non-overlapping class feature sets. © Springer-Verlag 2003.

Cite

CITATION STYLE

APA

Min, K. (2004). Related factors of document classification performance in a highly inflectional language. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2690, 645–652. https://doi.org/10.1007/978-3-540-45080-1_87

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free