Resolving Ambiguities in Text-to-Image Generative Models

8Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and commonsense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate the Text-to-image Ambiguity Benchmark (TAB) dataset to study different types of ambiguities in text-to-image generative models. We then propose the Text-to-ImagE Disambiguation (TIED) framework to disambiguate the prompts given to the text-to-image generative models by soliciting clarifications from the end user. Through automatic and human evaluations, we show the effectiveness of our framework in generating more faithful images aligned with end user intention in the presence of ambiguities.

Cite

CITATION STYLE

APA

Mehrabi, N., Goyal, P., Verma, A., Dhamala, J., Kumar, V., Hu, Q., … Gupta, R. (2023). Resolving Ambiguities in Text-to-Image Generative Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 14367–14388). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.804

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free