We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Mannor, S., & Tsitsiklis, J. N. (2006). Online learning with constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4005 LNAI, pp. 529–543). Springer Verlag. https://doi.org/10.1007/11776420_39
Mendeley helps you to discover research relevant for your work.