An enormous effort is usually devoted to data wrangling, the tedious process of cleaning, transforming and combining data, such that it is ready for modelling, visualisation or aggregation. Data transformation and formatting is one common task in data wrangling, which is performed by humans in two steps: (1) they recognise the specific domain of data (dates, phones, addresses, etc.) and (2) they apply conversions that are specific to that domain. However, the mechanisms to manipulate one specific domain can be unique and highly different from other domains. In this paper we present BK-ADAPT, a system that uses inductive programming (IP) with a dynamic background knowledge (BK) generated by a machine learning meta-model that selects the domain and/or the primitives from several descriptive features of the data wrangling problem. To show the performance of our method, we have created a web-based tool that allows users to provide a set of inputs and one or more examples of outputs, in such a way that the rest of examples are automatically transformed by the tool.
CITATION STYLE
Contreras-Ochando, L., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., Ramírez-Quintana, M. J., & Katayama, S. (2020). BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11908 LNAI, pp. 755–759). Springer. https://doi.org/10.1007/978-3-030-46133-1_45
Mendeley helps you to discover research relevant for your work.