Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation

Zixuan Zhao; Raul Castro Fernandez

Conference ProceedingsOPEN ACCESS

Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation

Proceedings of the ACM SIGMOD International Conference on Management of Data (2022) 1504-1517

DOI: 10.1145/3514221.3517891

20Citations

20Readers

Get full text

Abstract

In this paper, we present Leva, an end-to-end system that boosts the performance of machine learning tasks over relational data. Leva builds a relational embedding by representing relational data as a graph and then using embedding methods to represent the graph as vectors. The embedding represents information from the entire database, including useful information for the downstream machine learning task. At the same time, some information in the graph will be erroneous, for example, corresponding to incorrect inclusion dependencies. However, we show that the supervision signal from the downstream task filters out information that is not useful. The result is a boost in ML performance. This result means that it is possible for analysts to avoid the time-consuming effort of collecting features across multiple relations-which requires solving a data discovery and integration problem-and instead rely on these techniques to train better-performing models. We demonstrate Leva's performance on different classification and regression datasets and compare it with multiple other baselines.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhao, Z., & Castro Fernandez, R. (2022). Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1504–1517). Association for Computing Machinery. https://doi.org/10.1145/3514221.3517891

Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation

Abstract

Author supplied keywords

Cite

Register to see more suggestions