A gene pattern mining algorithm using interchangeable gene sets for prokaryotes

2Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. Results: In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable), we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH) technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. Conclusion: The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function. © 2008 Hu et al; licensee BioMed Central Ltd.

References Powered by Scopus

Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

63301Citations
N/AReaders
Get full text

A genomic perspective on protein families

2950Citations
N/AReaders
Get full text

The COG database: New developments in phylogenetic classification of proteins from complete genomes

1634Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Gene association analysis: A survey of frequent pattern mining from gene expression data

71Citations
N/AReaders
Get full text

MetaMine - A tool to detect and analyse gene patterns in their environmental context

6Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Hu, M., Choi, K., Su, W., Kim, S., & Yang, J. (2008). A gene pattern mining algorithm using interchangeable gene sets for prokaryotes. BMC Bioinformatics, 9. https://doi.org/10.1186/1471-2105-9-124

Readers' Seniority

Tooltip

Researcher 7

58%

Professor / Associate Prof. 3

25%

Lecturer / Post doc 1

8%

PhD / Post grad / Masters / Doc 1

8%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 4

44%

Biochemistry, Genetics and Molecular Bi... 3

33%

Computer Science 1

11%

Engineering 1

11%

Save time finding and organizing research with Mendeley

Sign up for free