k-link EST clustering: Evaluating error introduced by chimeric sequences under different degrees of linkage

Lauren M. Bragg; Glenn Stone

Journal ArticleOPEN ACCESS

k-link EST clustering: Evaluating error introduced by chimeric sequences under different degrees of linkage

Bioinformatics (2009) 25(18) 2302-2308

DOI: 10.1093/bioinformatics/btp410

4Citations

17Readers

Abstract

Motivation: The clustering of expressed sequence tags (ESTs) is a crucial step in many sequence analysis studies that require a high level of redundancy. Chimeric sequences, while uncommon, can make achieving the optimal EST clustering a challenge. Single-linkage algorithms are particularly vulnerable to the effects of chimeras. To avoid chimera-facilitated erroneous merges, researchers using single-linkage algorithms are forced to use stringent sequence-similarity thresholds. Such thresholds reduce the sensitivity of the clustering algorithm. Results: We introduce the concept of k-link clustering for EST data. We evaluate how clustering error rates vary over a range of linkage thresholds. Using k-link, we show that Type II error decreases in response to increasing the number of shared ESTs (ie. links) required. We observe a base level of Type II error likely caused by the presence of unmasked low-complexity or repetitive sequence. We find that Type I error increases gradually with increased linkage. To minimize the Type I error introduced by increased linkage requirements, we propose an extension to k-link which modifies the required number of links with respect to the size of clusters being compared. © 2009 The Author(s).

Cite

CITATION STYLE

APA

Bragg, L. M., & Stone, G. (2009). k-link EST clustering: Evaluating error introduced by chimeric sequences under different degrees of linkage. Bioinformatics, 25(18), 2302–2308. https://doi.org/10.1093/bioinformatics/btp410

k-link EST clustering: Evaluating error introduced by chimeric sequences under different degrees of linkage

Abstract

Cite

Register to see more suggestions