Solving data mismatches in bioinformatics workflows by generating data converters

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Heterogeneity of data and data formats in bioinformatics entail mismatches between inputs and outputs of different services, making it difficult to compose them into workflows. To reduce those mismatches, bioinformatics platforms propose ad’hoc converters, called shims. When shims are written by hand, they are time-consuming to develop, and cannot anticipate all needs. When shims are automatically generated, they miss transformations, for example data composition from multiple parts, or parallel conversion of list elements. This article proposes to systematically detect convertibility from output types to input types. Convertibility detection relies on a rule system based on abstract types, close to XML Schema. Types allow to abstract data while precisely accounting for their composite structure. Detection is accompanied by an automatic generation of converters between input and output XML data. We show the applicability of our approach by abstracting concrete bioinformatics types (e.g., complex biosequences) for a number of bioinformatics services (e.g., blast). We illustrate how our automatically generated converters help to resolve data mismatches when composing workflows. We conducted an experiment on bioinformatics services and datatypes, using an implementation of our approach, as well as a survey with domain experts. The detected convertibilities and produced converters were validated as relevant from a biological point of view. Furthermore the automatically produced graph of potentially compatible services exhibited a connectivity higher than with the ad’hoc approaches. Indeed, the experts discovered unknown possible connexions.

Cite

CITATION STYLE

APA

Ba, M., Ferré, S., & Ducassé, M. (2016). Solving data mismatches in bioinformatics workflows by generating data converters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9510, pp. 88–115). Springer Verlag. https://doi.org/10.1007/978-3-662-49214-7_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free