Deep learning models for RNA secondary structure prediction (probably) do not generalize across families

Marcell Szikszai; Michael Wise; Amitava Datta; Max Ward; David H. Mathews

Journal ArticleOPEN ACCESS

Deep learning models for RNA secondary structure prediction (probably) do not generalize across families

Bioinformatics (2022) 38(16) 3892-3899

DOI: 10.1093/bioinformatics/btac415

27Citations

46Readers

Abstract

Motivation: The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address the much more difficult (and practical) inter-family problem. Results: We demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modelled after structure mapping data that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalization despite the widespread assumption in the literature and provide strong evidence that many existing learning-based models have not generalized inter-family.

Cite

CITATION STYLE

APA

Szikszai, M., Wise, M., Datta, A., Ward, M., & Mathews, D. H. (2022). Deep learning models for RNA secondary structure prediction (probably) do not generalize across families. Bioinformatics, 38(16), 3892–3899. https://doi.org/10.1093/bioinformatics/btac415

Deep learning models for RNA secondary structure prediction (probably) do not generalize across families

Abstract

Cite

Register to see more suggestions