StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models

Lezhong Wang; Jeppe Revall Frisvad; Mark Bo Jensen; Siavash Arjomand Bigdeli

Conference Proceedings

StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models

IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2024) 7416-7425

DOI: 10.1109/CVPRW63382.2024.00737

5Citations

8Readers

Get full text

Abstract

The demand for stereo images increases as manufacturers launch more extended reality (XR) devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is training-free and straightforward to use with seamless integration into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight method for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layer Modifications to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, L., Frisvad, J. R., Jensen, M. B., & Bigdeli, S. A. (2024). StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 7416–7425). IEEE Computer Society. https://doi.org/10.1109/CVPRW63382.2024.00737

StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models

Abstract

Author supplied keywords

Cite

Register to see more suggestions