Semantic scene completion (SSC) requires the estimation of the 3D geometric occupancies of objects in the scene, along with the object categories. Currently, many methods employ RGB-D images to capture the geometric and semantic information of objects. These methods use simple but popular spatial- and channel-wise operations, which fuse the information of RGB and depth data. Yet, they ignore the large discrepancy of RGB-D data and the uncertainty measurements of depth data. To solve this problem, we propose the Frequency Fusion Network (FFNet), a novel method for boosting semantic scene completion by better utilizing RGB-D data. FFNet explicitly correlates the RGB-D data in the frequency domain, different from the features directly extracted by the convolution operation. Then, the network uses the correlated information to guide the feature learning from the RGB and depth images, respectively. Moreover, FFNet accounts for the properties of different frequency components of RGBD features. It has a learnable elliptical mask to decompose the features learned from the RGB and depth images, attending to various frequencies to facilitate the correlation process of RGB-D data. We evaluate FFNet intensively on the public SSC benchmarks, where FFNet surpasses the state-of-the-art methods. The code package of FFNet is available at https://github.com/alanWXZ/FFNet.
CITATION STYLE
Wang, X., Lin, D., & Wan, L. (2022). FFNet: Frequency Fusion Network for Semantic Scene Completion. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 2550–2557). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i3.20156
Mendeley helps you to discover research relevant for your work.