CMIC: predicting DNA methylation inheritance of CpG islands with embedding vectors of variable-length k-mers

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Epigenetic modifications established in mammalian gametes are largely reprogrammed during early development, however, are partly inherited by the embryo to support its development. In this study, we examine CpG island (CGI) sequences to predict whether a mouse blastocyst CGI inherits oocyte-derived DNA methylation from the maternal genome. Recurrent neural networks (RNNs), including that based on gated recurrent units (GRUs), have recently been employed for variable-length inputs in classification and regression analyses. One advantage of this strategy is the ability of RNNs to automatically learn latent features embedded in inputs by learning their model parameters. However, the available CGI dataset applied for the prediction of oocyte-derived DNA methylation inheritance are not large enough to train the neural networks. Results: We propose a GRU-based model called CMIC (CGI Methylation Inheritance Classifier) to augment CGI sequence by converting it into variable-length k-mers, where the length k is randomly selected from the range kmin to kmax, N times, which were then used as neural network input. N was set to 1000 in the default setting. In addition, we proposed a new embedding vector generator for k-mers called splitDNA2vec. The randomness of this procedure was higher than the previous work, dna2vec. Conclusions: We found that CMIC can predict the inheritance of oocyte-derived DNA methylation at CGIs in the maternal genome of blastocysts with a high F-measure (0.93). We also show that the F-measure can be improved by increasing the parameter N, that is, the number of sequences of variable-length k-mers derived from a single CGI sequence. This implies the effectiveness of augmenting input data by converting a DNA sequence to N sequences of variable-length k-mers. This approach can be applied to different DNA sequence classification and regression analyses, particularly those involving a small amount of data.

Cite

CITATION STYLE

APA

Maruyama, O., Li, Y., Narita, H., Toh, H., Au Yeung, W. K., & Sasaki, H. (2022). CMIC: predicting DNA methylation inheritance of CpG islands with embedding vectors of variable-length k-mers. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04916-3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free