Multimodal AutoML via Representation Evolution

4Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

With the increasing amounts of available data, learning simultaneously from different types of inputs is becoming necessary to obtain robust and well-performing models. With the advent of representation learning in recent years, lower-dimensional vector-based representations have become available for both images and texts, while automating simultaneous learning from multiple modalities remains a challenging problem. This paper presents an AutoML (automated machine learning) approach to automated machine learning model configuration identification for data composed of two modalities: texts and images. The approach is based on the idea of representation evolution, the process of automatically amplifying heterogeneous representations across several modalities, optimized jointly with a collection of fast, well-regularized linear models. The proposed approach is benchmarked against 11 unimodal and multimodal (texts and images) approaches on four real-life benchmark datasets from different domains. It achieves competitive performance with minimal human effort and low computing requirements, enabling learning from multiple modalities in automated manner for a wider community of researchers.

Cite

CITATION STYLE

APA

Škrlj, B., Bevec, M., & Lavrač, N. (2023). Multimodal AutoML via Representation Evolution. Machine Learning and Knowledge Extraction, 5(1), 1–13. https://doi.org/10.3390/make5010001

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free