Motivation: Allergenicity, like antigenicity and immunogenicity, is a property encoded linearly and non-linearly, and therefore the alignment- based approaches are not able to identify this property unambiguously. A novel alignment-free descriptor-based fingerprint approach is presented here and applied to identify allergens and non-allergens. The approach was implemented into a four step algorithm. Initially, the protein sequences are described by amino acid principal properties as hydrophobicity, size, relative abundance, helix and -strand forming propensities. Then, the generated strings of different length are converted into vectors with equal length by autoand cross-covariance (ACC). The vectors were transformed into binary fingerprints and compared in terms of Tanimoto coefficient. Results: The approach was applied to a set of 2427 known allergens and 2427 non-allergens and identified correctly 88% of them with Matthews correlation coefficient of 0.759. The descriptor fingerprint approach presented here is universal. It could be applied for any classification problem in computational biology. The set of E-descriptors is able to capture the main structural and physicochemical properties of amino acids building the proteins. The ACC transformation overcomes the main problem in the alignment-based comparative studies arising from the different length of the aligned protein sequences. The conversion of protein ACC values into binary descriptor fingerprints allows similarity search and classification. © 2013 The Author 2013. Published by Oxford University Press. All rights reserved.
CITATION STYLE
Dimitrov, I., Naneva, L., Doytchinova, I., & Bangov, I. (2014). AllergenFP: Allergenicity prediction by descriptor fingerprints. Bioinformatics, 30(6), 846–851. https://doi.org/10.1093/bioinformatics/btt619
Mendeley helps you to discover research relevant for your work.