A comparison and improvement of online learning algorithms for sequence labeling

1Citations
Citations of this article
80Readers
Mendeley users who have this article in their library.

Abstract

Sequence labeling models like conditional random fields have been successfully applied in a variety of NLP tasks. However, as the size of label set and dataset grows, the learning speed of batch algorithms like L-BFGS quickly becomes computationally unacceptable. Several online learning methods have been proposed in large scale setting, yet little effort has been made to compare the performance of these algorithms. Comparison is often carried out on a few datasets with fine tuned parameters for specific algorithm. In this paper, we investigate and compare several online learning algorithms for sequence labeling with datasets varying in scale, feature design and label set. We find that Dual Coordinate Ascent (DCA) is robust across datasets even without careful tuning of parameter. Furthermore, a recently proposed variant of Stochastic Gradient Descent (SGD), Adaptive online gradient Descent based on feature Frequency information (ADF), has very fast training speed compared with plain SGD, but fails to converge under certain conditions. Finally, We propose a simple modification of ADF, which bears comparable convergence speed with ADF, and is consistently better than plain SGD. © 2012 The COLING.

Cite

CITATION STYLE

APA

He, Z., & Wang, H. (2012). A comparison and improvement of online learning algorithms for sequence labeling. In 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers (pp. 1147–1162).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free