Learning to deal with objects -
2009 IEEE 8TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING 1 Learning to deal with objects Mar��a Malfaz and Miguel A. Salichs Abstract���In this paper, a modification of the standard learning algorithm Q-learning is presented: Object Q-learning (OQ- learning). An autonomous agent should be able to decide its own goals and behaviours in order to fulfil these goals. When the agent has no previous knowledge, it must learn what to do in every state (policy of behaviour). If the agent uses Q-learning, this implies that it learns the utility value Q of each action- state pair. Typically, an autonomous agent living in a complex environment has to interact with different objects present in that world. In this case, the number of states of the agent in relation to those objects may increase as the number of objects increases, making the learning process difficult to deal with. The proposed modification appears as a solution in order to cope with this problem. The experimental results prove the usefulness of the OQ-learning in this situation, in comparison with the standard Q-learning algorithm. Index Terms���Q-Learning, objects, decision making, au- tonomous agents. I. INTRODUCTION A N autonomous agent is a natural or artificial system in constant interaction with dynamic environments that must satisfy a set of possible goals in order to survive [1]. Moreover, according to Bellman [2], autonomy implies decision making and this implies some knowledge about the current state of the agent and its environment, including its goals. This means that the agent must have enough knowledge of itself in order to think about how to move and act in its environment, using all its properties and skills. Besides, some authors affirm that an autonomous agent has goals and motivations and it has some way to evaluate its behaviours in terms of the environment and its own motivations. Its motivations are desires or preferences that can lead to the generation and adoption of objectives. The final goals of the agent, or its motivations, must be oriented to maintain the internal equilibrium of the agent [1][3]. Learning has been denominated as one of the distinctive marks of the intelligence and introducing adaptation and learning skills in artificial systems is one of the greatest chal- lenges of the artificial intelligence [4]. Gadanho [3] states that learning is an important skill for an autonomous agent, since it gives the agent the plasticity needed for being independent. An autonomous agent must know what action to execute in every situation in order to fulfil its goal. In the case that this agent does not have this knowledge, the autonomous agent must learn this relation between situations and actions. The agent learns this relation by interacting with its own environment where several objects can exist. As it is going to M. Malfaz and M.A. Salichs are with the RoboticsLab at the Carlos III University of Madrid. 28911, Legan��s, Madrid, Spain e-mail: mmal- faz@ing.uc3m.es and salichs@ing.uc3m.es be shown later, learning to deal with those objects can become quite tedious. In this paper, a solution to this disadvantage is proposed. The rest of the paper is organized as follows. In section II, a brief introduction to reinforcement learning is given. Next, in section III, one of the most commonly used reinforcement learning algorithm is presented: Q-learning. In section IV, the state is introduced as a combination of the inner and the external state of the agent and, in section V a reduced version of the state is presented. In this last section, the Q-learning algorithm is adapted in order to consider this approach and, as will be shown, this adaptation will imply some shortcomings that must be solved. The solution to this problem is proposed in section VI by considering an algorithm based on Q-learning: the Object Q-learning (OQ-learning). Next, in section VII, the experimental platform is described and later, in section VIII, the results obtained using both known algorithms in the same environment are also presented. Finally, the main conclusions of this paper and future applications are summarized in section IX. June 4, 2009 II. REINFORCEMENT LEARNING In a decision making process, the agent, in a certain state s, executes an action a leading him to a new state s and generating a reinforcement r. From that new state the agent executes another action and so on. The value is defined as the discounted sum of all the expected reinforcements: value = r1 + �� �� r2 + ��2 �� r3 + ��3 �� r4 + ... (1) Parameter �� (0 �� 1) is known as the discount factor and defines how much expected future rewards affect a decision now. The goal of reinforcement learning is to maximize the total expected reward [5]. The agent that uses reinforcement learning tries to learn, through interaction with the environment, how to behave in or- der to fulfil a certain objective. The agent and the environment are continuously interacting, the agent selecting actions and the environment responding to those actions and presenting new situations to the agent. The environment and the proper agent also give rise to rewards that the agent tries to maximize over time. This type of learning allows the agent to adapt to the environment through the development of a policy. This policy determines the most suitable action in each state in order to maximize the reinforcement. The goal of the algorithm is to maximize the total amount of reward it receives over the long run [6]. Reinforcement learning has been successfully implemented in several virtual agents and robots [7], [8], [9], [10], [11], 978-1-4244-4118-1/09/25.00 c 2009 IEEE