Preferred document classification for a highly inflectional/derivational language

Kyongho Min; William H. Wilson; Yoo Jin Moon

Conference Proceedings

Preferred document classification for a highly inflectional/derivational language

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2002) 2557 12-23

DOI: 10.1007/3-540-36187-1_2

2Citations

6Readers

Get full text

Abstract

This paper describes methods of document classification for a highly inflectional/derivational language that forms monolithic compound noun terms, like Dutch and Korean. The system is composed of three phases: (1) a Korean morphological analyzer called HAM (Kang, 1993), (2) an application of compound noun phrase analysis to the result of HAM analysis and extraction of terms whose syntactic categories are noun, name (proper noun), verb, and adjective, and (3) an effective document classification algorithm based on preferred class score heuristics. This paper focuses on the comparison of document classification methods including a simple heuristic method, and preferred class score heuristics employing two factors namely ICF (inverted class frequency) and IDF (inverted document frequency) with/without term frequency weight. In addition this paper describes a simple classification approach without a learning algorithm rather than a vector space model with a complex training and classification algorithm such as cosine similarity measurement. The experimental results show 95.7% correct classifications of 720 training data and 63.8%-71.3% of randomly chosen 80 testing data through various methods.

Cite

CITATION STYLE

APA

Min, K., Wilson, W. H., & Moon, Y. J. (2002). Preferred document classification for a highly inflectional/derivational language. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2557, pp. 12–23). Springer Verlag. https://doi.org/10.1007/3-540-36187-1_2

Preferred document classification for a highly inflectional/derivational language

Abstract

Cite

Register to see more suggestions