Learning layouts of biological datasets semi-automatically

4Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A key challenge associated with the existing approaches for data integration and workflow creation for bioinformatics is the effort required to integrate a new data source. As new data sources emerge, and data formats and contents of existing data sources evolve, wrapper programs need to be written or modified. This can be extremely time consuming, tedious, and error-prone. This paper describes our semi-automatic approach for learning the layout of a flat-file bioinformatics dataset. Our approach involves three key steps. The first step is to use a number of heuristics to infer the delimiters used in the program. Specifically, we have developed a metric that uses information on the frequency and starting position of sequences. Based on this metric, we are able to find a superset of delimiters, and then we can seek user input to eliminate the incorrect ones. Our second step involves generating a layout descriptor based on the relative order in which the delimiters occur. Our final step is to generate a parser based on the layout descriptor. Our heuristics for finding the delimiters has been evaluated using three popular flat-file biological datasets. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Sinha, K., Zhang, X., Jin, R., & Agrawal, G. (2005). Learning layouts of biological datasets semi-automatically. In Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science) (Vol. 3615, pp. 31–45). Springer Verlag. https://doi.org/10.1007/11530084_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free