MOTIVATION: Multiple sequence alignments are generally reconstructed using a progressive approach that follows a guide-tree. During this process gaps are introduced at a cost to maximize residue pairing, but it is unclear whether inferred gaps reflect actual past events of sequence insertions or deletions. It has been found that patterns of inferred gaps in alignments contain information towards the true phylogeny, but it is as yet unknown whether gaps are simply reflecting information that was already present in the guide-tree. RESULTS: We here develop a framework to disentangle the phylogenetic signal carried by gaps from that which is already present in the guide-tree. Our results indicate that most gaps are incorrectly inserted in patterns that, nevertheless, follow the guide-tree. Thus, most gap patterns in current alignments are not informative per se. This affects different programs to various degrees, being PRANK the most sensitive to the guide-tree.
Capella-Gutiérrez, S., & Gabaldón, T. (2013). Measuring guide-tree dependency of inferred gaps in progressive aligners. Bioinformatics, 29(8), 1011–1017. https://doi.org/10.1093/bioinformatics/btt095