Abstract
The demand for stereo images increases as manufacturers launch more extended reality (XR) devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is training-free and straightforward to use with seamless integration into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight method for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layer Modifications to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
Author supplied keywords
Cite
CITATION STYLE
Wang, L., Frisvad, J. R., Jensen, M. B., & Bigdeli, S. A. (2024). StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 7416–7425). IEEE Computer Society. https://doi.org/10.1109/CVPRW63382.2024.00737
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.