AutoML appeared in the last few years as the ultimate challenge in the field of machine learning and data science. However, despite the advances on hyper-parameter optimization, the data preparation step continues to face great difficulties, mainly due to the inability to incorporate human expertise on variables reengineering. In this paper, we present an algorithm able to automate the trivial preparation tasks and to generate features using domain knowledge, represented through entity-relationship (ER) diagrams. Along with the algorithm, we define a set of operators that can be applied to distinct kinds of data, with small human intervention. The algorithm is evaluated over a small set of public datasets, for which we designed basic ER models. The new method shows results comparable to the ones achieved with other automation tools, such as AutoSklearn [4], but with much lower processing times.
CITATION STYLE
Afonso, T., & Antunes, C. (2022). The Automation of Feature Generation with Domain Knowledge. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13566 LNAI, pp. 575–586). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16474-3_47
Mendeley helps you to discover research relevant for your work.