Structured prediction via learning to search under bandit feedback

Amr Sharaf; Hal Daumé

Conference ProceedingsOPEN ACCESS

Structured prediction via learning to search under bandit feedback

EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the 2nd Workshop on Structured Prediction (2017) 17-26

DOI: 10.18653/v1/w17-4304

7Citations

80Readers

Abstract

We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: A pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.

Cite

CITATION STYLE

APA

Sharaf, A., & Daumé, H. (2017). Structured prediction via learning to search under bandit feedback. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the 2nd Workshop on Structured Prediction (pp. 17–26). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4304

Structured prediction via learning to search under bandit feedback

Abstract

Cite

Register to see more suggestions