Design and prototype of a large-scale and fully sense-tagged corpus

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text sense tagged Corpus, which contains over 110,000 words. The Academia Sinica Balanced Corpus of Modern Chinese (also named Sinica Corpus) is treated as the tagging object, and there are 56 full texts extracted from this corpus. By using the N-gram statistics and the information of collocation, the preparation work for automatic sense tagging is planned by combining the techniques and methods of machine learning and the probability model. In order to achieve a highly precise result, the result of automatic sense tagging needs the touch of manual revising. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Ker, S. J., Huang, C. R., Hong, J. F., Liu, S. Y., Jian, H. L., Su, I. L., & Hsieh, S. K. (2008). Design and prototype of a large-scale and fully sense-tagged corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4938 LNAI, pp. 186–193). https://doi.org/10.1007/978-3-540-78159-2_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free