Structure-inducing pre-training

Matthew B.A. McDermott; Brendan Yap; Peter Szolovits; Marinka Zitnik

Journal ArticleOPEN ACCESS

Structure-inducing pre-training

Nature Machine Intelligence (2023) 5(6) 612-621

DOI: 10.1038/s42256-023-00647-z

9Citations

34Readers

Abstract

Language model pre-training and the derived general-purpose methods have reshaped machine learning research. However, there remains considerable uncertainty regarding why pre-training improves the performance of downstream tasks. This challenge is pronounced when using language model pre-training in domains outside of natural language. Here we investigate this problem by analysing how pre-training methods impose relational structure in induced per-sample latent spaces—that is, what constraints do pre-training methods impose on the distance or geometry between the pre-trained embeddings of samples. A comprehensive review of pre-training methods reveals that this question remains open, despite theoretical analyses showing the importance of understanding this form of induced structure. Based on this review, we introduce a pre-training framework that enables a granular and comprehensive understanding of how relational structure can be induced. We present a theoretical analysis of the framework from the first principles and establish a connection between the relational inductive bias of pre-training and fine-tuning performance. Empirical studies spanning three data modalities and ten fine-tuning tasks confirm theoretical analyses, inform the design of novel pre-training methods and establish consistent improvements over a compelling suite of methods.

Cite

CITATION STYLE

APA

McDermott, M. B. A., Yap, B., Szolovits, P., & Zitnik, M. (2023). Structure-inducing pre-training. Nature Machine Intelligence, 5(6), 612–621. https://doi.org/10.1038/s42256-023-00647-z

Structure-inducing pre-training

Abstract

Cite

Register to see more suggestions