Abstract
For an agent to be successful in interacting against many different and unknown types of opponents it should excel at learning fast a model of the opponent and adapt online to non-stationary (changing) strategies. Recent works have tackled this problem by continuously learning models of the opponent while checking for switches in the opponent strategy. However, these approaches fail to use a priori information which can be useful for a faster detection of the opponent model. Moreover, if an opponent uses only a finite set of strategies, then maintaining a list of those strategies would also provide benefits for future interactions, in case of opponents who return to previous strategies (such as periodic opponents). Our contribution is twofold, first, we propose an algorithm that can use a priori information, in the form of a set of models, in order to promote a faster detection of the opponent model. The second is an algorithm that while learning new models keeps a record of them in case the opponent reuses one of those. Our approach outperforms the state of the art algorithms in the field (in terms of model quality and cumulative rewards) in the domain of the iterated prisoner’s dilemma against a non-stationary opponent that switches among different strategies.
Cite
CITATION STYLE
Hernandez-Leal, P., De Cote, E. M., & Sucar, L. E. (2014). Using a priori information for fast learning against non-stationary opponents. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8864, 536–547. https://doi.org/10.1007/978-3-319-12027-0_43
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.