Biological sequence comparison algorithms that compute the optimal local and global alignments calculate a dynamic programming (DP) matrix with quadratic time complexity. The DP matrix H is calculated with a recurrence relation in which the value of each cell Hi,j is the result of a maximum operation on the cells' values Hi-1, j-1, Hi-1, j and Hi,j-1 added or subtracted by a constant value. Therefore, it can be noticed that the difference between the value of cell Hi,j being calculated and the values of direct neighbor cells previously computed respect well-defined upper and lower bounds. Using these bounds, we can show that it is possible to determine the maximum and the minimum value of every cell in H, for a given reference cell. We use this result to define a generic pruning method which determines the cells that can pruned (i.e. no need to be computed since they will not contribute to the final solution), accelerating the computation but keeping the guarantee that the optimal result will be produced. The goal of this paper is thus to investigate and formalize properties of the DP matrix in order to estimate and increase the pruning method efficiency. We also show that the pruning efficiency depends mainly on three characteristics: (a) the order in which the cells of H are calculated, (b) the values of the parameters used in the recurrence relation and (c) the contents of the sequences compared.
CITATION STYLE
Sandes, E. F. O., Teodoro, G. L. M., Walter, M. E. M. T., Martorell, X., Ayguade, E., & Melo, A. C. M. A. (2018). Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms. Computer Journal, 61(5), 687–713. https://doi.org/10.1093/comjnl/bxx090
Mendeley helps you to discover research relevant for your work.