We propose a new recurrent generative model for generating images from text captions while attending on specific parts of text captions. Our model creates images by incrementally adding patches on a “canvas” while attending on words from text caption at each timestep. Finally, the canvas is passed through an upscaling network to generate images. We also introduce a new method for generating visual-semantic sentence embeddings based on self-attention over text. We compare our model’s generated images with those generated by Reed’s model and show that our model is a stronger baseline for text to image generation tasks.
CITATION STYLE
Singh, A., & Agrawal, S. (2020). CanvasGAN: A Simple Baseline for Text to Image Generation by Incrementally Patching a Canvas. In Advances in Intelligent Systems and Computing (Vol. 943, pp. 86–98). Springer Verlag. https://doi.org/10.1007/978-3-030-17795-9_7
Mendeley helps you to discover research relevant for your work.