Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches

13Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes. © 2013 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Chen, G., Wang, C., Shi, L., Tong, W., Qu, X., Chen, J., … Shi, T. (2013). Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches. Human Genetics, 132(8), 899–911. https://doi.org/10.1007/s00439-013-1300-9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free