Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

Jingbo Zhu; Huizhen Wang; Tianshun Yao; Benjamin K. Tsou

Conference Proceedings

Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference (2008) 1 1137-1144

DOI: 10.3115/1599081.1599224

153Citations

192Readers

Get full text

Abstract

This paper addresses two issues of active learning. Firstly, to solve a problem of uncertainty sampling that it often fails by selecting outliers, this paper presents a new selective sampling technique, sampling by uncertainty and density (SUD), in which a k-Nearest-Neighbor-based density measure is adopted to determine whether an unlabeled example is an outlier. Secondly, a technique of sampling by clustering (SBC) is applied to build a representative initial training data set for active learning. Finally, we implement a new algorithm of active learning with SUD and SBC techniques. The experimental results from three real-world data sets show that our method outperforms competing methods, particularly at the early stages of active learning. © 2008. Licensed under the Creative Commons.

Cite

CITATION STYLE

APA

Zhu, J., Wang, H., Yao, T., & Tsou, B. K. (2008). Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 1137–1144). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1599081.1599224

Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

Abstract

Cite

Register to see more suggestions