Structured prediction via learning to search under bandit feedback

7Citations
Citations of this article
78Readers
Mendeley users who have this article in their library.

Abstract

We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: A pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.

Cite

CITATION STYLE

APA

Sharaf, A., & Daumé, H. (2017). Structured prediction via learning to search under bandit feedback. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the 2nd Workshop on Structured Prediction (pp. 17–26). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4304

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free