Human-object interaction (HOI) detection is important to understand human-centric scenes and is challenging due to subtle difference between fine-grained actions, and multiple co-occurring interactions. Most approaches tackle the problems by considering the multi-stream information and even introducing extra knowledge, which suffer from a huge combination space and the non-interactive pair domination problem. In this paper, we propose an Action-Guided attention mining and Relation Reasoning (AGRR) network to solve the problems. Relation reasoning on human-object pairs is performed by exploiting contextual compatibility consistency among pairs to filter out the non-interactive combinations. To better discriminate the subtle difference between fine-grained actions, an action-aware attention based on class activation map is proposed to mine the most relevant features for recognizing HOIs. Extensive experiments on V-COCO and HICO-DET datasets demonstrate the effectiveness of the proposed model compared with the state-of-the-art approaches.
CITATION STYLE
Lin, X., Zou, Q., & Xu, X. (2020). Action-guided attention mining and relation reasoning network for human-object interaction detection. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 1104–1110). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/154
Mendeley helps you to discover research relevant for your work.