Scaling model-based average-reward reinforcement learning for product delivery

Scott Proper; Prasad Tadepalli

Conference ProceedingsOPEN ACCESS

Scaling model-based average-reward reinforcement learning for product delivery

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4212 LNAI 735-742

DOI: 10.1007/11871842_74

18Citations

46Readers

Abstract

Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce "tabular linear functions" that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery - an optimization problem that combines inventory control and vehicle routing. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Proper, S., & Tadepalli, P. (2006). Scaling model-based average-reward reinforcement learning for product delivery. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4212 LNAI, pp. 735–742). Springer Verlag. https://doi.org/10.1007/11871842_74

Scaling model-based average-reward reinforcement learning for product delivery

Abstract

Cite

Register to see more suggestions