Online learning with constraints

Shie Mannor; John N. Tsitsiklis

Conference Proceedings

Online learning with constraints

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4005 LNAI 529-543

DOI: 10.1007/11776420_39

10Citations

9Readers

Get full text

Abstract

We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Mannor, S., & Tsitsiklis, J. N. (2006). Online learning with constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4005 LNAI, pp. 529–543). Springer Verlag. https://doi.org/10.1007/11776420_39

Online learning with constraints

Abstract

Cite

Register to see more suggestions