ARTIST: A Transformer-based Chinese Text-to-Image Synthesizer Digesting Linguistic and World Knowledge

2Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

Text-to-Image Synthesis (TIS) is a popular task to convert natural language texts into realistic images. Recently, transformer-based TIS models (such as DALL-E) have been proposed using the encoder-decoder architectures. Yet, these billion-scale TIS models are difficult to tune and deploy in resource-constrained environments. In addition, there is a lack of language-specific TIS benchmarks for Chinese, together with high-performing models with moderate sizes. In this work, we present ARTIST, A tRansformer-based Chinese Text-to-Image SynThesizer for high-quality image generation. In ARTIST, the rich linguistic and relational knowledge facts are injected into the model to ensure better model performance without the usage of ultra-large models. We further establish a large-scale Chinese TIS benchmark with the re-production results of state-of-the-art transformer-based TIS models. Results show ARTIST outperforms previous approaches.

Cite

CITATION STYLE

APA

Liu, T., Wang, C., Zhu, X., Li, L., Qiu, M., Huang, J., … Xiao, Y. (2022). ARTIST: A Transformer-based Chinese Text-to-Image Synthesizer Digesting Linguistic and World Knowledge. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 881–888). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.62

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free