We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: A pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.
CITATION STYLE
Sharaf, A., & Daumé, H. (2017). Structured prediction via learning to search under bandit feedback. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the 2nd Workshop on Structured Prediction (pp. 17–26). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4304
Mendeley helps you to discover research relevant for your work.