Combining co-training with ensemble learning for application on single-view natural language datasets

4Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

In this paper we propose a novel semi-supervised learning algorithm, called Random Split Statistic algorithm (RSSalg), designed to exploit the advantages of co-training algorithm, while being exempt from co-training requirement for the existence of adequate feature split in the dataset. In our method, co-training algorithm is run for a predefined number of times, using a different random split of features in each run. Each run of co-training produces a different enlarged training set, consisting of initial labeled data and data labeled in the co-training process. Examples from the enlarged training sets are combined in a final training set and pruned in order to keep only the most confidently labeled ones. The final classifier in RSSalg is obtained by training the base learner on a set created this way. Pruning of the examples is done by employing a genetic algorithm that keeps only the most reliable and informative cases. Our experiments performed on 17 datasets with various characteristics show that RSSalg outperforms all considered alternative methods on the more redundant natural language datasets and is comparable to considered alternative settings on the datasets with less redundancy.

Cite

CITATION STYLE

APA

Slivka, J., Kovačević, A., & Konjović, Z. (2013). Combining co-training with ensemble learning for application on single-view natural language datasets. Acta Polytechnica Hungarica, 10(2), 133–152. https://doi.org/10.12700/aph.10.02.2013.2.10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free