Empirical study of domain adaptation algorithms on the task of splice site prediction

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many biological problems that rely on machine learning do not have enough labeled data to use a classic classifier. To address this, we propose two domain adaptation algorithms, derived from the multinomial naïve Bayes classifier, that leverage the large corpus of labeled data from a similar, well-studied organism (the source domain), in conjunction with the unlabeled and some labeled data from an organism of interest (the target domain). When evaluated on the splice site prediction, a difficult and essential step in gene prediction, they correctly classified instances with highest average area under precision-recall curve (auPRC) values between 18.46% and 78.01%. We show that the algorithms learned meaningful patterns by evaluating them on shuffled instances and labels. Then we used one of the algorithms in an ensemble setting and produced even better results when there is not much labeled data or the domains are distantly related.

Cite

CITATION STYLE

APA

Herndon, N., & Caragea, D. (2015). Empirical study of domain adaptation algorithms on the task of splice site prediction. In Communications in Computer and Information Science (Vol. 511, pp. 195–211). Springer Verlag. https://doi.org/10.1007/978-3-319-26129-4_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free