Optimal adaptive policies for sequential allocation problems

Apostolos N. Burnetas; Michael N. Katehakis

Journal ArticleOPEN ACCESS

Optimal adaptive policies for sequential allocation problems

Advances in Applied Mathematics (1996) 17(2) 122-142

DOI: 10.1006/aama.1996.0007

114Citations

76Readers

Abstract

Consider the problem of sequential sampling from m statistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters θ= ∈ Θ, it is shown that there exists a class CR of adaptive policies with the following properties: (i) The expected n horizon reward Vπ0n (θ=) under any policy π0 in CR is equal to nμ,*(η=) - M(θ=)log n + o(log n), as n → ∞, where μ*(θ=) is the largest population mean and M(θ=) is a constant. (ii) Policies in CR are asymptotically optimal within a larger class CUF of "uniformly fast convergent" policies in the sense that limn → ∞(nμ,*(θ=) - Vπ0n (θ=))/ (nμ*(θ=) - Vπn(θ=)) ≤ 1, for any π ∈ CUF and any θ= ∈ Θ such that M(θ=) > 0. Policies in CR are specified via easily computable indices, defined as unique solutions to dual problems that arise naturally from the functional form of M(θ=). In addition, the assumptions are verified for populations specified by nonparametric discrete univariate distributions with finite support. In the case of normal populations with unknown means and variances, we leave as an open problem the verification of one assumption. © 1996 Academic Press, Inc.

Cite

CITATION STYLE

APA

Burnetas, A. N., & Katehakis, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17(2), 122–142. https://doi.org/10.1006/aama.1996.0007

Optimal adaptive policies for sequential allocation problems

Abstract

Cite

Register to see more suggestions