Identifying plant pentatricopeptide repeat coding gene/protein using mixed feature extraction methods

12Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation. Results: Combining two of the feature extraction methods with the random forest classifier produces the highest area under the curve of 0.9848. Using MRMD to reduce the dimension improves this metric for J48 and naïve Bayes, but has little effect on the random forest results.

Cite

CITATION STYLE

APA

Qu, K., Wei, L., Yu, J., & Wang, C. (2019). Identifying plant pentatricopeptide repeat coding gene/protein using mixed feature extraction methods. Frontiers in Plant Science, 9. https://doi.org/10.3389/fpls.2018.01961

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free