Most of the algorithms used for information extraction and for processing the amino acid chains that make up proteins treat them as symbolic chains. Fewer algorithms exploit signal processing techniques that require a numerical representation of amino acid chains. However, these algorithms are very powerful for extracting regularities that cannot be detected when working with a symbolic chain, which may be important for understanding the biological meaning of a sequence or in classification tasks. In this study, a new mathematical representation of amino acid chains is proposed, which is derived using a similarity measure based on the PAM250 amino acid substitution matrix and that generates 20 signals for each protein sequence. Using this representation 20 consensus spectra for a protein family are determined and the relevance of the frequency peaks is established, obtaining a group of significant frequency peaks that manifest common periodicities of the amino acid sequences that belong to a protein family. We also show that the proposed representation in 20 signals can be integrated into Chou's pseudo amino acid composition (PseAAC) and constitute a useful alternative to amino acid physicochemical properties in Chou's PseAAC.
CITATION STYLE
Sanchez, V., Peinado, A. M., Pérez-Córdoba, J. L., & Gómez, A. M. (2015). A new signal characterization and signal-based Chou’s PseAAC representation of protein sequences. In Journal of Bioinformatics and Computational Biology (Vol. 13). World Scientific. https://doi.org/10.1142/S0219720015500249
Mendeley helps you to discover research relevant for your work.