Detection of Language (Model) Errors

K. Y. Hung; R. W.P. Luk; D. Yeung; K. F.L. Chung; W. Shu

Conference ProceedingsOPEN ACCESS

Detection of Language (Model) Errors

Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, SIGDAT-EMNLP 2000 - Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000 (2000) 87-94

DOI: 10.3115/1117794.1117805

1Citations

70Readers

Abstract

The bigram language models are popular, in much language processing applications, in both Indo-European and Asian languages. However, when the language model for Chinese is applied in a novel domain, the accuracy is reduced significantly, from 96% to 78% in our evaluation. We apply pattern recognition techniques (i.e. Bayesian, decision tree and neural network classifiers) to discover language model errors. We have examined 2 general types of features: model-based and language-specific features. In our evaluation, Bayesian classifiers produce the best recall performance of 80% but the precision is low (60%). Neural network produced good recall (75%) and precision (80%) but both Bayesian and Neural network have low skip ratio (65%). The decision tree classifier produced the best precision (81%) and skip ratio (76%) but its recall is the lowest (73%).

Cite

CITATION STYLE

APA

Hung, K. Y., Luk, R. W. P., Yeung, D., Chung, K. F. L., & Shu, W. (2000). Detection of Language (Model) Errors. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, SIGDAT-EMNLP 2000 - Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000 (pp. 87–94). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1117794.1117805

Detection of Language (Model) Errors

Abstract

Cite

Register to see more suggestions