CONTRAlign: Discriminative training for protein sequence alignment

Chuong B. Do; Samuel S. Gross; Serafim Batzoglou

Conference Proceedings

CONTRAlign: Discriminative training for protein sequence alignment

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 3909 LNBI 160-174

DOI: 10.1007/11732990_15

58Citations

36Readers

Get full text

Abstract

In this paper, we present CONTRAlign, an extensible and fully automatic framework for parameter learning and protein pairwise sequence alignment using pair conditional random fields. When learning a substitution matrix and gap penalties from as few as 20 example alignments, CONTRAlign achieves alignment accuracies competitive with available modern tools. As confirmed by rigorous cross-validated testing, CONTRAlign effectively leverages weak biological signals in sequence alignment: using CONTRAlign, we find that hydropathy-based features result in improvements of 5-6% in aligner accuracy for sequences with less than 20% identity, a signal that state-of-the-art hand-tuned aligners are unable to exploit effectively. Furthermore, when known secondary structure and solvent accessibility are available, such external information is naturally incorporated as additional features within the CONTRAlign framework, yielding additional improvements of up to 15-16% in alignment accuracy for low-identity sequences. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Do, C. B., Gross, S. S., & Batzoglou, S. (2006). CONTRAlign: Discriminative training for protein sequence alignment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3909 LNBI, pp. 160–174). https://doi.org/10.1007/11732990_15

CONTRAlign: Discriminative training for protein sequence alignment

Abstract

Cite

Register to see more suggestions