Preference-based monte carlo tree search

Tobias Joppen; Christian Wirth; Johannes Fürnkranz

Conference Proceedings

Preference-based monte carlo tree search

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11117 LNAI 327-340

DOI: 10.1007/978-3-030-00111-7_28

2Citations

18Readers

Get full text

Abstract

Monte Carlo tree search (MCTS) is a popular choice for solving sequential anytime problems. However, it depends on a numeric feedback signal, which can be difficult to define. Real-time MCTS is a variant which may only rarely encounter states with an explicit, extrinsic reward. To deal with such cases, the experimenter has to supply an additional numeric feedback signal in the form of a heuristic, which intrinsically guides the agent. Recent work has shown evidence that in different areas the underlying structure is ordinal and not numerical. Hence erroneous and biased heuristics are inevitable, especially in such domains. In this paper, we propose a MCTS variant which only depends on qualitative feedback, and therefore opens up new applications for MCTS. We also find indications that translating absolute into ordinal feedback may be beneficial. Using a puzzle domain, we show that our preference-based MCTS variant, wich only receives qualitative feedback, is able to reach a performance level comparable to a regular MCTS baseline, which obtains quantitative feedback.

Cite

CITATION STYLE

APA

Joppen, T., Wirth, C., & Fürnkranz, J. (2018). Preference-based monte carlo tree search. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11117 LNAI, pp. 327–340). Springer Verlag. https://doi.org/10.1007/978-3-030-00111-7_28

Preference-based monte carlo tree search

Abstract

Cite

Register to see more suggestions