Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation

14Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we present Leva, an end-to-end system that boosts the performance of machine learning tasks over relational data. Leva builds a relational embedding by representing relational data as a graph and then using embedding methods to represent the graph as vectors. The embedding represents information from the entire database, including useful information for the downstream machine learning task. At the same time, some information in the graph will be erroneous, for example, corresponding to incorrect inclusion dependencies. However, we show that the supervision signal from the downstream task filters out information that is not useful. The result is a boost in ML performance. This result means that it is possible for analysts to avoid the time-consuming effort of collecting features across multiple relations-which requires solving a data discovery and integration problem-and instead rely on these techniques to train better-performing models. We demonstrate Leva's performance on different classification and regression datasets and compare it with multiple other baselines.

Cite

CITATION STYLE

APA

Zhao, Z., & Castro Fernandez, R. (2022). Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1504–1517). Association for Computing Machinery. https://doi.org/10.1145/3514221.3517891

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free