Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering

Hai Ye; Qizhe Xie; Hwee Tou Ng

Conference Proceedings

Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 1 9647-9660

DOI: 10.18653/v1/2023.acl-long.537

4Citations

20Readers

Get full text

Abstract

In this work, we study multi-source test-time model adaptation from user feedback, where K distinct models are established for adaptation. To allow efficient adaptation, we cast the problem as a stochastic decision-making process, aiming to determine the best adapted model after adaptation. We discuss two frameworks: multi-armed bandit learning and multi-armed dueling bandits. Compared to multi-armed bandit learning, the dueling framework allows pairwise collaboration among K models, which is solved by a novel method named Co-UCB proposed in this work. Experiments on six datasets of extractive question answering (QA) show that the dueling framework using Co-UCB is more effective than other strong baselines for our studied problem.

Cite

CITATION STYLE

APA

Ye, H., Xie, Q., & Ng, H. T. (2023). Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 9647–9660). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.537

Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering

Abstract

Cite

Register to see more suggestions