Enhancing Multimodal Understanding With LIUS: A Novel Framework for Visual Question Answering in Digital Marketing

Chunlai Song

Journal ArticleOPEN ACCESS

Enhancing Multimodal Understanding With LIUS: A Novel Framework for Visual Question Answering in Digital Marketing

Song C

Journal of Organizational and End User Computing (2024) 36(1)

DOI: 10.4018/JOEUC.336276

2Citations

44Readers

Abstract

VQA (visual question and answer) is the task of enabling a computer to generate accurate textual answers based on given images and related questions. It integrates computer vision and natural language processing and requires a model that is able to understand not only the image content but also the question in order to generate appropriate linguistic answers. However, current limitations in cross-modal understanding often result in models that struggle to accurately capture the complex relationships between images and questions, leading to inaccurate or ambiguous answers. This research aims to address this challenge through a multifaceted approach that combines the strengths of vision and language processing. By introducing the innovative LIUS framework, a specialized vision module was built to process image information and fuse features using multiple scales. The insights gained from this module are integrated with a “reasoning module” (LLM) to generate answers.

Author supplied keywords

Cite

CITATION STYLE

APA

Song, C. (2024). Enhancing Multimodal Understanding With LIUS: A Novel Framework for Visual Question Answering in Digital Marketing. Journal of Organizational and End User Computing, 36(1). https://doi.org/10.4018/JOEUC.336276

Enhancing Multimodal Understanding With LIUS: A Novel Framework for Visual Question Answering in Digital Marketing

Abstract

Author supplied keywords

Cite

Register to see more suggestions