Abstract
Genomic strings are not of fixed length,but provide onedimensional spatial data that do not divide for conquering by machine learning into manageable fixed size chunks obeying Dietterich’s independent and identically distributed assumption. We nonetheless need to divide genomic strings for conquering by machine learning — in this case for genomic prediction. Orthologs are genomic strings derived from a common ancestor and having the same biological function. Ortholog detection is biologically interesting since it informs us about protein divergence through evolution, and,in the present context,also has important agricultural applications. In the present paper is indicated means to obtain an associated (fixed size) attribute vector for genomic string data and for dividing and conquering the machine learning problem of ortholog detection herein seen as an analogy problem. The attributes are based on both the typical string similarity measures of bioinformatics and on a large number of differential metrics,man y new to bioinformatics. Many of the differential metrics are based on evolutionary considerations,b oth theoretical and empirically observed,in some cases observed by the authors. C5.0 with AdaBoosting activated was employed and the preliminary results reported herein re complete cDNA strings are very encouraging for eventually and usefully employing the techniques described for ortholog detection on the more readily available EST (incomplete) genomic data.
Cite
CITATION STYLE
Ouyang, M., Case, J., & Burnside, J. (2001). Divide and conquer machine learning for a genomics analogy problem (Progress report). In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2226, pp. 290–303). Springer Verlag. https://doi.org/10.1007/3-540-45650-3_26
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.