Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and beyond

37Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep Learning revolutionizes almost all fields of computer science including data management. However, the demand for high-quality training data is slowing down deep neural nets' wider adoption. To this end, data augmentation (DA), which generates more labeled examples from existing ones, becomes a common technique. Meanwhile, the risk of creating noisy examples and the large space of hyper-parameters make DA less attractive in practice. We introduce Rotom, a multi-purpose data augmentation framework for a range of data management and mining tasks including entity matching, data cleaning, and text classification. Rotom features InvDA, a new DA operator that generates natural yet diverse augmented examples by formulating DA as a seq2seq task. The key technical novelty of Rotom is a meta-learning framework that automatically learns a policy for combining examples from different DA operators, whereby combinatorially reduces the hyper-parameters space. Our experimental results show that Rotom effectively improves a model's performance by combining multiple DA operators, even when applying them individually does not yield performance improvement. With this strength, Rotom outperforms the state-of-the-art entity matching and data cleaning systems in the low-resource settings as well as two recently proposed DA techniques for text classification.

Cite

CITATION STYLE

APA

Miao, Z., Li, Y., & Wang, X. (2021). Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and beyond. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1303–1316). Association for Computing Machinery. https://doi.org/10.1145/3448016.3457258

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free