Optimal adaptive policies for sequential allocation problems

114Citations
Citations of this article
76Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Consider the problem of sequential sampling from m statistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters θ= ∈ Θ, it is shown that there exists a class CR of adaptive policies with the following properties: (i) The expected n horizon reward Vπ0n (θ=) under any policy π0 in CR is equal to nμ,*(η=) - M(θ=)log n + o(log n), as n → ∞, where μ*(θ=) is the largest population mean and M(θ=) is a constant. (ii) Policies in CR are asymptotically optimal within a larger class CUF of "uniformly fast convergent" policies in the sense that limn → ∞(nμ,*(θ=) - Vπ0n (θ=))/ (nμ*(θ=) - Vπn(θ=)) ≤ 1, for any π ∈ CUF and any θ= ∈ Θ such that M(θ=) > 0. Policies in CR are specified via easily computable indices, defined as unique solutions to dual problems that arise naturally from the functional form of M(θ=). In addition, the assumptions are verified for populations specified by nonparametric discrete univariate distributions with finite support. In the case of normal populations with unknown means and variances, we leave as an open problem the verification of one assumption. © 1996 Academic Press, Inc.

Cite

CITATION STYLE

APA

Burnetas, A. N., & Katehakis, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17(2), 122–142. https://doi.org/10.1006/aama.1996.0007

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free