Discriminative motif discovery via simulated evolution and random under-sampling

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes. © 2014 Song, Gu.

Cite

CITATION STYLE

APA

Song, T., & Gu, H. (2014). Discriminative motif discovery via simulated evolution and random under-sampling. PLoS ONE, 9(2). https://doi.org/10.1371/journal.pone.0087670

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free