During communication, humans express their emotional states using various modalities (e.g., facial expressions and gestures), and they estimate the emotional states of others by paying attention to multimodal signals. To ensure that a communication robot with limited resources can pay attention to such multimodal signals, the main challenge involves selecting the most effective modalities among those expressed. In this study, we propose an active perception method that involves selecting the most informative modalities using a criterion based on energy minimization. This energy-based model can learn the probability of the network state using energy values, whereby a lower energy value represents a higher probability of the state. A multimodal deep belief network, which is an energy-based model, was employed to represent the relationships between the emotional states and multimodal sensory signals. Compared to other active perception methods, the proposed approach demonstrated improved accuracy using limited information in several contexts associated with affective human–robot interaction. We present the differences and advantages of our method compared to other methods through mathematical formulations using, for example, information gain as a criterion. Further, we evaluate performance of our method, as pertains to active inference, which is based on the free energy principle. Consequently, we establish that our method demonstrated superior performance in tasks associated with mutually correlated multimodal information.
CITATION STYLE
Horii, T., & Nagai, Y. (2021). Active Inference Through Energy Minimization in Multimodal Affective Human–Robot Interaction. Frontiers in Robotics and AI, 8. https://doi.org/10.3389/frobt.2021.684401
Mendeley helps you to discover research relevant for your work.