Deep modular network architecture for depth estimation from single indoor images

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We propose a novel deep modular network architecture for indoor scene depth estimation from single RGB images. The proposed architecture consists of a main depth estimation network and two auxiliary semantic segmentation networks. Our insight is that semantic and geometrical structures in a scene are strongly correlated, thus we utilize global (i.e. room layout) and mid-level (i.e. objects in a room) semantic structures to enhance depth estimation. The first auxiliary network, or layout network, is responsible for room layout estimation to infer the positions of walls, floor, and ceiling of a room. The second auxiliary network, or object network, estimates per-pixel class labels of the objects in a scene, such as furniture, to give mid-level semantic cues. Estimated semantic structures are effectively fed into the depth estimation network using newly proposed discriminator networks, which discern the reliability of the estimated structures. The evaluation result shows that our architecture achieves significant performance improvements over previous approaches on the standard NYU Depth v2 indoor scene dataset.

Cite

CITATION STYLE

APA

Ito, S., Kaneko, N., Shinohara, Y., & Sumi, K. (2019). Deep modular network architecture for depth estimation from single indoor images. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11129 LNCS, pp. 324–336). Springer Verlag. https://doi.org/10.1007/978-3-030-11009-3_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free