Adding object detection skills to visual dialogue agents

Gabriele Bani; Davide Belli; Gautier Dagan; Alexander Geenen; Andrii Skliar; Aashish Venkatesh; Tim Baumgärtner; Elia Bruni; Raquel Fernández

Conference ProceedingsOPEN ACCESS

Adding object detection skills to visual dialogue agents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11132 LNCS 180-187

DOI: 10.1007/978-3-030-11018-5_17

0Citations

6Readers

Abstract

Our goal is to equip a dialogue agent that asks questions about a visual scene with object detection skills. We take the first steps in this direction within the GuessWhat?! game. We use Mask R-CNN object features as a replacement for ground-truth annotations in the Guesser module, achieving an accuracy of 57.92%. This proves that our system is a viable alternative to the original Guesser, which achieves an accuracy of 62.77% using ground-truth annotations, and thus should be considered an upper bound for our automated system. Crucially, we show that our system exploits the Mask R-CNN object features, in contrast to the original Guesser augmented with global, VGG features. Furthermore, by automating the object detection in GuessWhat?!, we open up a spectrum of opportunities, such as playing the game with new, non-annotated images and using the more granular visual features to condition the other modules of the game architecture.

Author supplied keywords

Cite

CITATION STYLE

APA

Bani, G., Belli, D., Dagan, G., Geenen, A., Skliar, A., Venkatesh, A., … Fernández, R. (2019). Adding object detection skills to visual dialogue agents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11132 LNCS, pp. 180–187). Springer Verlag. https://doi.org/10.1007/978-3-030-11018-5_17

Adding object detection skills to visual dialogue agents

Abstract

Author supplied keywords

Cite

Register to see more suggestions