Enabling and optimizing non-linear feature interactions in factorized linear algebra

30Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

Accelerating machine learning (ML) over relational data is a key focus of the database community. While many real-world datasets are multi-table, most ML tools expect single-table inputs, forcing users to materialize joins before ML, leading to data redundancy and runtime waste. Recent works on “factorized ML” address such issues by pushing ML through joins. However, they have hitherto been restricted to ML models linear in the feature space, rendering them less effective when users construct non-linear feature interactions such as pairwise products to boost ML accuracy. In this work, we take a first step towards closing this gap by introducing a new abstraction to enable pairwise feature interactions in multi-table data and present an extensive framework of algebraic rewrite rules for factorized LA operators over feature interactions. Our rewrite rules carefully exploit the interplay of the redundancy caused by both joins and interactions. We prototype our framework in Python to build a tool we call MorpheusFI. An extensive empirical evaluation with both synthetic and real datasets shows that MorpheusFI yields up to 5x speedups over materialized execution for a popular second-order gradient method and even an order of magnitude speedups over a popular stochastic gradient method.

References Powered by Scopus

The NumPy array: A structure for efficient numerical computation

8131Citations
N/AReaders
Get full text

Understanding machine learning: From theory to algorithms

3578Citations
N/AReaders
Get full text

SystemML: Declarative machine learning on Spark

148Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A Survey on Data Collection for Machine Learning: A Big Data-AI Integration Perspective

557Citations
N/AReaders
Get full text

SPORES: Sum-product optimization via relational equality saturation for large scale linear algebra

45Citations
N/AReaders
Get full text

Data Management for Machine Learning: A Survey

39Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Li, S., Chen, L., & Kumar, A. (2019). Enabling and optimizing non-linear feature interactions in factorized linear algebra. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1571–1588). Association for Computing Machinery. https://doi.org/10.1145/3299869.3319878

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 11

100%

Readers' Discipline

Tooltip

Computer Science 12

100%

Save time finding and organizing research with Mendeley

Sign up for free