An AdaBoost using a weak-learner generating several weak-hypotheses for large training data of natural language processing

Tomoya Iwakura; Seishi Okamoto; Kazuo Asakawa

Journal ArticleOPEN ACCESS

An AdaBoost using a weak-learner generating several weak-hypotheses for large training data of natural language processing

IEEJ Transactions on Electronics, Information and Systems (2010) 130(1) 83-91

DOI: 10.1541/ieejeiss.130.83

8Citations

2Readers

Abstract

AdaBoost is a method to create a final hypothesis by repeatedly generating a weak hypothesis in each training iteration with a given weak learner. AdaBoost-based algorithms are successfully applied to several tasks such as Natural Language Processing (NLP), OCR, and so on. However, learning on the training data consisting of large number of samples and features requires long training time. We propose a fast AdaBoost-based algorithm for learning rules represented by combination of features. Our algorithm constructs a final hypothesis by learning several weak-hypotheses at each iteration. We assign a confidence-rated value to each weak-hypothesis while ensuring a reduction in the theoretical upper bound of the training error of AdaBoost. We evaluate our methods with English POS tagging and text chunking. The experimental results show that the training speed of our algorithm are about 25 times faster than an AdaBoost-based learner, and about 50 times faster than Support Vector Machines with polynomial kernel on the average while maintaining state-of-the-art accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Iwakura, T., Okamoto, S., & Asakawa, K. (2010). An AdaBoost using a weak-learner generating several weak-hypotheses for large training data of natural language processing. IEEJ Transactions on Electronics, Information and Systems, 130(1), 83–91. https://doi.org/10.1541/ieejeiss.130.83

An AdaBoost using a weak-learner generating several weak-hypotheses for large training data of natural language processing

Abstract

Author supplied keywords

Cite

Register to see more suggestions