This paper describes relationships between the document classification performance and its relevant factors for a highly inflectional language that forms monolithic compound noun terms. The factors are the number of class feature sets, the size of training or testing document, ratio of overlapping class features among 8 classes, and ratio of non-overlapping class feature sets. The system is composed of three phases: a Korean morphological analyser called HAM [11], an application of compound noun phrase analysis and extraction of terms whose syntactic categories are noun, name, verb, and adjective, and an effective document classification algorithm based on preferred class score heuristics. The best algorithm in this paper, Weighted PCSICF based on inverse class frequency, shows an inverse proportional relationship between its performance and the number of class feature sets and the number of ratio of non-overlapping class feature sets. © Springer-Verlag 2003.
CITATION STYLE
Min, K. (2004). Related factors of document classification performance in a highly inflectional language. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2690, 645–652. https://doi.org/10.1007/978-3-540-45080-1_87
Mendeley helps you to discover research relevant for your work.