Abstract
Detecting cheapfakes, particularly out-of-context images, is crucial for maintaining the integrity of information and preserving trust in multimedia content. This study proposes a new Cross-modals approach for cheapfakes detection that blends Natural Language Processing and Computer Vision techniques, including Image Captioning, Name Entity Recognition, and Natural Language Inference. Our approach enhances the robustness of the Cross-modals methods by not only considering both the meaning and context of the image titles but also using additional information such as Name Entity. According to the experiments, our method achieved an accuracy of , which can improve previous approaches. This paper highlights the potential of combining Natural Language Processing and Computer Vision techniques to tackle real-world problems, making it a significant advancement in cheapfakes detection.
Author supplied keywords
Cite
CITATION STYLE
Nguyen, T. S., Dang, V., Tran, M. T., & Dang-Nguyen, D. T. (2023). Leveraging Cross-Modals for Cheapfakes Detection. In Proceedings of the 4th ACM Workshop on Intelligent Cross-Data Analysis and Retrieval, ICDAR 2023 Joint with ACM International Conference on Multimedia Retrieval, ICMR 2023 (pp. 51–59). Association for Computing Machinery, Inc. https://doi.org/10.1145/3592571.3592975
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.