Identification of microRNA precursors based on random forest with network-level representation method of stem-loop structure

41Citations
Citations of this article
58Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: MicroRNAs (miRNAs) play a key role in regulating various biological processes such as participating in the post-transcriptional pathway and affecting the stability and/or the translation of mRNA. Current methods have extracted feature information at different levels, among which the characteristic stem-loop structure makes the greatest contribution to the prediction of putative miRNA precursor (pre-miRNA). We find that none of these features alone is capable of identifying new pre-miRNA accurately.Results: In the present work, a pre-miRNA stem-loop secondary structure is translated to a network, which provides a novel perspective for its structural analysis. Network parameters are used to construct prediction model, achieving an area under the receiver operating curves (AUC) value of 0.956. Moreover, by repeating the same method on two independent datasets, accuracies of 0.976 and 0.913 are achieved, respectively.Conclusions: Network parameters effectively characterize pre-miRNA secondary structure, which improves our prediction model in both prediction ability and computation efficiency. Additionally, as a complement to feature extraction methods in previous studies, these multifaceted features can reflect natural properties of miRNAs and be used for comprehensive and systematic analysis on miRNA. © 2011 Xiao et al; licensee BioMed Central Ltd.

Figures

  • Figure 1 Three representations of RNA secondary structure for human precursor miRNA hsa-mir-33a.
  • Figure 2 ROC curves estimate the random resampling models. The ROC curves are overlaid by the vertical average curve and box plots showing the vertical spread around the average.
  • Table 1 Comparison with existing methods
  • Figure 3 The bar charts of individual parameter contribution. The contribution of individual parameter is determined by calculating the importance score, with larger scores indicating more relevant properties. The comparison between two strategies is represented by different greyscales, the bar height is the score of individual feature, and the confidence interval is calculated for each parameter. E: Edge; V: Vertex; N: Number; A: Average; Var: Variance; M: Mean.
  • Figure 4 Results for deleting feature one by one. Models are constructed on remainder variables after deleting the feature of the lowest score each time. This process is repeated 23 times, till only one feature is left. Sensitivity and specificity are used to measure model performance.
  • Table 2 Definition of network parameter

References Powered by Scopus

Random forests

95810Citations
29772Readers

This article is free to access.

This article is free to access.

Get full text

Cited by Powered by Scopus

This article is free to access.

107Citations
117Readers

This article is free to access.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Xiao, J., Tang, X., Li, Y., Fang, Z., Ma, D., He, Y., & Li, M. (2011). Identification of microRNA precursors based on random forest with network-level representation method of stem-loop structure. BMC Bioinformatics, 12. https://doi.org/10.1186/1471-2105-12-165

Readers over time

‘11‘12‘13‘14‘15‘16‘17‘18‘19‘20‘21‘22‘23‘24036912

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 24

56%

Researcher 7

16%

Professor / Associate Prof. 6

14%

Lecturer / Post doc 6

14%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 20

45%

Computer Science 14

32%

Engineering 5

11%

Biochemistry, Genetics and Molecular Bi... 5

11%

Save time finding and organizing research with Mendeley

Sign up for free
0