We modify Q-MDP value method and observe the behaviors of a robot with the modified method in an environment, where state information of the robot is essentially indefinite. In Q-MDP valuemethod, an action in every time step is chosen based on a calculation of expectation values with a probability distribution, which is the output of a probabilistic state estimator. The modified method uses a weighting function with the probability distribution in the calculation so as to give precedence to the states near the goal of the task. We applied our method to a simple robot navigation problem in an incomplete sensor environment. As a result, the method makes the robot take a kind of searching behavior without explicit implementation.
CITATION STYLE
Ueda, R. (2016). Generation of search behavior by a modification of Q-MDP value method. In Advances in Intelligent Systems and Computing (Vol. 302, pp. 3–15). Springer Verlag. https://doi.org/10.1007/978-3-319-08338-4_1
Mendeley helps you to discover research relevant for your work.