Sequence labeling models like conditional random fields have been successfully applied in a variety of NLP tasks. However, as the size of label set and dataset grows, the learning speed of batch algorithms like L-BFGS quickly becomes computationally unacceptable. Several online learning methods have been proposed in large scale setting, yet little effort has been made to compare the performance of these algorithms. Comparison is often carried out on a few datasets with fine tuned parameters for specific algorithm. In this paper, we investigate and compare several online learning algorithms for sequence labeling with datasets varying in scale, feature design and label set. We find that Dual Coordinate Ascent (DCA) is robust across datasets even without careful tuning of parameter. Furthermore, a recently proposed variant of Stochastic Gradient Descent (SGD), Adaptive online gradient Descent based on feature Frequency information (ADF), has very fast training speed compared with plain SGD, but fails to converge under certain conditions. Finally, We propose a simple modification of ADF, which bears comparable convergence speed with ADF, and is consistently better than plain SGD. © 2012 The COLING.
CITATION STYLE
He, Z., & Wang, H. (2012). A comparison and improvement of online learning algorithms for sequence labeling. In 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers (pp. 1147–1162).
Mendeley helps you to discover research relevant for your work.