Scenario referring expression comprehension via attributes of vision and language

Shaonan Wei; Jianming Wang; Yukuan Sun; Guanghao Jin; Jiayu Liang; Kunliang Liu

Conference Proceedings

Scenario referring expression comprehension via attributes of vision and language

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11859 LNCS 430-441

DOI: 10.1007/978-3-030-31726-3_37

0Citations

1Readers

Get full text

Abstract

Referring Expression Comprehension (REC) is a task that requires to indicate particular objects within an image by natural language expressions. Previous studies on this task have assumed that the language expression and the image are one-to-one correspondence, that is, the language refers to the target region must exist in the current image and then the region with the highest score will be located, no matter whether they match or not. However, in practical applications, REC is required to locate the reference target region from a series of matched, semi-matched and mismatched scene image sequences. It is the 3D version of this challenge that refers to as Scenario Referring Expression Comprehension (SREC) in this paper. To accomplish such a task, we made a testset based on the existing real-scenario dataset enhancement, constructed a Dual Attributes Recursive Retrieve Reasoning Model (DA3R) for the first time with the Attributes of both images and expressions, and finally verified the feasibility of the method on the testset by assess with three different types of enhanced expression.

Author supplied keywords

Cite

CITATION STYLE

APA

Wei, S., Wang, J., Sun, Y., Jin, G., Liang, J., & Liu, K. (2019). Scenario referring expression comprehension via attributes of vision and language. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11859 LNCS, pp. 430–441). Springer. https://doi.org/10.1007/978-3-030-31726-3_37

Scenario referring expression comprehension via attributes of vision and language

Abstract

Author supplied keywords

Cite

Register to see more suggestions