MB-level cpg and tfbs islands visualized by ai and their roles in the nuclear organization of the human genome

8Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Unsupervised machine learning that can discover novel knowledge from big sequence data without prior knowledge or particular models is highly desirable for current genome study. We previously established a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions, which can reveal various novel genome characteristics from big sequence data, and found that transcription factor binding sequences (TFBSs) and CpG-containing oligonucleotides are enriched in human centromeric and pericentromeric regions, which support centromere clustering and form the condensed heterochromatin “chromocenter” in interphase nuclei. The number and size of chromocenters, as well as the type of centromeres gathered in individual chromocenters, vary depending on cell type. To study molecular mechanisms of cell type-dependent chromocenter formation, we analyzed distribution patterns of occurrence per Mb of hexa-and heptanucleotide TFBSs, which have been compiled by the SwissRegulon Portal, and of CpG-containing oli-gonucleotides. We found Mb-level islands enriched for TFBSs and CpG-containing oligonucleotides in centromeric and pericentromeric regions on all human chromosomes except chrY. Considering molecular mechanisms for cell type-dependent centromere clustering, the chromosome-dependent enrichment of a set of TFBSs and CpG-containing oligonucleotides is of particular interest, since the cellular content of TFs and methyl-CpG-binding proteins exhibits cell type-dependent regu-lation. A newly introduced BLSOM, which analyzed occurrences of a total of 3,946 octanucleotide TFBSs compiled by the SwissRegulon Portal, has self-organized (separated) the sequences that are characteristically enriched in TFBSs and shown that these sequences are derived primarily from centromeric and pericentromeric constitutive heterochromatin regions. Furthermore, the BLSOM identified and visualized characteristic TFBSs that are enriched in these regions. By analyzing Hi-C data for interchromosomal interactions, the present study showed that the chromatin segments supporting the interchromosomal interactions locate primarily in Mb-level TFBS and CpG islands and are thus enriched for a wide variety of TFBSs and CG-containing oligonucleotides.

References Powered by Scopus

Comprehensive mapping of long-range interactions reveals folding principles of the human genome

6099Citations
N/AReaders
Get full text

CpG islands and the regulation of transcription

2383Citations
N/AReaders
Get full text

The long-range interaction landscape of gene promoters

1137Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Regulatory snps: Altered transcription factor binding sites implicated in complex traits and diseases

41Citations
N/AReaders
Get full text

Comparative genomics of Glandirana rugosa using unsupervised AI reveals a high CG frequency

8Citations
N/AReaders
Get full text

AI for the collective analysis of a massive number of genome sequences: various examples from the small genome of pandemic SARS-CoV-2 to the human genome

4Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wada, K., Wada, Y., & Ikemura, T. (2020). MB-level cpg and tfbs islands visualized by ai and their roles in the nuclear organization of the human genome. Genes and Genetic Systems, 95(1), 29–41. https://doi.org/10.1266/ggs.19-00027

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

50%

Professor / Associate Prof. 1

17%

Lecturer / Post doc 1

17%

Researcher 1

17%

Readers' Discipline

Tooltip

Philosophy 1

25%

Computer Science 1

25%

Social Sciences 1

25%

Engineering 1

25%

Save time finding and organizing research with Mendeley

Sign up for free