Abstract
G-protein-coupled receptors (GPCRs) are important protein molecules in the field of cell signaling and are widely found in various organisms. GPCRs play an important role in a variety of physiological processes and are important drug targets for a variety of diseases. Accurate prediction of GPCRs using machine learning is useful for drug design in a variety of related diseases. In this paper, we propose a method for identifying GPCRs based on mixed-feature vectors. We combine three individual features, such as 400D, N-gram and Parallel correlation pseudo amino acid composition (PC-PseAAC), using mixed-feature representation methods, which are evaluated by Random Forest, Naïve Bayes, and J48 for classification purposes. To measure the performance of this classifier, ten-fold cross-validation is used. Two dimensionality reduction methods - the max-relevance-max-distance (MRMD) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - are applied to reduce the feature dimension. The 400D and PC-PseAAC feature extraction methods are combined, the random forest is used as the classifier, and the area under the curve (AUC) is up to 0.9413. Therefore, among these methods, the new feature vector obtained by combining the two features shows the best performance, and the mixed feature is better than the single feature.
Author supplied keywords
Cite
CITATION STYLE
Ao, C., Gao, L., & Yu, L. (2025). Identifying G-Protein Coupled Receptors Using Mixed-Feature Extraction Methods and Machine Learning Methods. IEEE Access, 13, 129911–129919. https://doi.org/10.1109/ACCESS.2020.2983105
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.