Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation

90Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.

Abstract

This paper addresses a new task called referring 3D instance segmentation, which aims to segment out the target instance in a 3D scene given a query sentence. Previous work on scene understanding has explored visual grounding with natural language guidance, yet the emphasis is mostly constrained on images and videos. We propose a Text-guided Graph Neural Network (TGNN) for referring 3D instance segmentation on point clouds. Given a query sentence and the point cloud of a 3D scene, our method learns to extract per-point features and predicts an offset to shift each point toward its object center. Based on the point features and the offsets, we cluster the points to produce fused features and coordinates for the candidate objects. The resulting clusters are modeled as nodes in a Graph Neural Network to learn the representations that encompass the relation structure for each candidate object. The GNN layers leverage each object's features and its relations with neighbors to generate an attention heatmap for the input sentence expression. Finally, the attention heatmap is used to “guide” the aggregation of information from neighborhood nodes. Our method achieves state-of-the-art performance on referring 3D instance segmentation and 3D localization on ScanRefer, Nr3D, and Sr3D benchmarks, respectively.

References Powered by Scopus

GloVe: Global vectors for word representation

27046Citations
N/AReaders
Get full text

Dynamic graph Cnn for learning on point clouds

4785Citations
N/AReaders
Get full text

ScanNet: Richly-annotated 3D reconstructions of indoor scenes

2488Citations
N/AReaders
Get full text

Cited by Powered by Scopus

CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

149Citations
N/AReaders
Get full text

3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

101Citations
N/AReaders
Get full text

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

83Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Huang, P. H., Lee, H. H., Chen, H. T., & Liu, T. L. (2021). Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 2B, pp. 1610–1618). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i2.16253

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 15

79%

Lecturer / Post doc 2

11%

Researcher 2

11%

Readers' Discipline

Tooltip

Computer Science 19

70%

Engineering 6

22%

Social Sciences 1

4%

Mathematics 1

4%

Save time finding and organizing research with Mendeley

Sign up for free