N-GlycoGo: Predicting protein N-glycosylation sites on imbalanced data sets by using heterogeneous and comprehensive strategy

22Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Glycosylation is the most complex post-modification effect of proteins. It participates in many biological processes in the human body and is closely related to many disease states. Among them, N-linked glycosylation is the most contained glycosylation data. However, the current N-linked glycosylation prediction tool does not take into account the serious imbalance between positive and negative data. In this study, we used protein sequence and amino acid characteristics to construct an N-linked glycosylation prediction model called N-GlycoGo. Based on sequence, structure, and function, 11 heterogeneous features were encoded. Further, XGBoost was selected for modeling. Finally, independent testing of human and mouse prediction models showed that N-GlycoGo is superior to other tools with Matthews correlation coefficient (MCC) values of 0.397 and 0.719, respectively, which is higher than other glycosylation site prediction tools. We have developed a fast and accurate prediction tool, N-GlycoGo, for N-linked glycosylation. N-GlycoGo is available at http://ncblab.nchu.edu.tw/n-glycogo/.

Cite

CITATION STYLE

APA

Chien, C. H., Chang, C. C., Lin, S. H., Chen, C. W., Chang, Z. H., & Chu, Y. W. (2020). N-GlycoGo: Predicting protein N-glycosylation sites on imbalanced data sets by using heterogeneous and comprehensive strategy. IEEE Access, 8, 165944–165950. https://doi.org/10.1109/ACCESS.2020.3022629

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free