Compositional learning for human object interaction

Keizo Kato; Yin Li; Abhinav Gupta

Conference ProceedingsOPEN ACCESS

Compositional learning for human object interaction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11218 LNCS 247-264

DOI: 10.1007/978-3-030-01264-9_15

19Citations

191Readers

Abstract

The world of human-object interactions is rich. While generally we sit on chairs and sofas, if need be we can even sit on TVs or top of shelves. In recent years, there has been progress in modeling actions and human-object interactions. However, most of these approaches require lots of data. It is not clear if the learned representations of actions are generalizable to new categories. In this paper, we explore the problem of zero-shot learning of human-object interactions. Given limited verb-noun interactions in training data, we want to learn a model than can work even on unseen combinations. To deal with this problem, In this paper, we propose a novel method using external knowledge graph and graph convolutional networks which learns how to compose classifiers for verb-noun pairs. We also provide benchmarks on several dataset for zero-shot learning including both image and video. We hope our method, dataset and baselines will facilitate future research in this direction.

Cite

CITATION STYLE

APA

Kato, K., Li, Y., & Gupta, A. (2018). Compositional learning for human object interaction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11218 LNCS, pp. 247–264). Springer Verlag. https://doi.org/10.1007/978-3-030-01264-9_15

Compositional learning for human object interaction

Abstract

Cite

Register to see more suggestions