Online learning with constraints

10Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Mannor, S., & Tsitsiklis, J. N. (2006). Online learning with constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4005 LNAI, pp. 529–543). Springer Verlag. https://doi.org/10.1007/11776420_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free