StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models

5Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The demand for stereo images increases as manufacturers launch more extended reality (XR) devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is training-free and straightforward to use with seamless integration into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight method for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layer Modifications to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.

Cite

CITATION STYLE

APA

Wang, L., Frisvad, J. R., Jensen, M. B., & Bigdeli, S. A. (2024). StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 7416–7425). IEEE Computer Society. https://doi.org/10.1109/CVPRW63382.2024.00737

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free