Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and beyond

Zhengjie Miao; Yuliang Li; Xiaolan Wang

Conference ProceedingsOPEN ACCESS

Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and beyond

Proceedings of the ACM SIGMOD International Conference on Management of Data (2021) 1303-1316

DOI: 10.1145/3448016.3457258

37Citations

32Readers

Get full text

Abstract

Deep Learning revolutionizes almost all fields of computer science including data management. However, the demand for high-quality training data is slowing down deep neural nets' wider adoption. To this end, data augmentation (DA), which generates more labeled examples from existing ones, becomes a common technique. Meanwhile, the risk of creating noisy examples and the large space of hyper-parameters make DA less attractive in practice. We introduce Rotom, a multi-purpose data augmentation framework for a range of data management and mining tasks including entity matching, data cleaning, and text classification. Rotom features InvDA, a new DA operator that generates natural yet diverse augmented examples by formulating DA as a seq2seq task. The key technical novelty of Rotom is a meta-learning framework that automatically learns a policy for combining examples from different DA operators, whereby combinatorially reduces the hyper-parameters space. Our experimental results show that Rotom effectively improves a model's performance by combining multiple DA operators, even when applying them individually does not yield performance improvement. With this strength, Rotom outperforms the state-of-the-art entity matching and data cleaning systems in the low-resource settings as well as two recently proposed DA techniques for text classification.

Author supplied keywords

Cite

CITATION STYLE

APA

Miao, Z., Li, Y., & Wang, X. (2021). Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and beyond. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1303–1316). Association for Computing Machinery. https://doi.org/10.1145/3448016.3457258

Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and beyond

Abstract

Author supplied keywords

Cite

Register to see more suggestions