MUSTIE: Multimodal Structural Transformer for Web Information Extraction

Qifan Wang; Jingang Wang; Xiaojun Quan; Fuli Feng; Zenglin Xu; Shaoliang Nie; Sinong Wang; Madian Khabsa; Hamed Firooz; Dongfang Liu

Conference ProceedingsOPEN ACCESS

MUSTIE: Multimodal Structural Transformer for Web Information Extraction

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 1 2405-2420

DOI: 10.18653/v1/2023.acl-long.135

9Citations

16Readers

Abstract

The task of web information extraction is to extract target fields of an object from web pages, such as extracting the name, genre and actor from a movie page. Recent sequential modeling approaches have achieved state-of-the-art results on web information extraction. However, most of these methods only focus on extracting information from textual sources while ignoring the rich information from other modalities such as image and web layout. In this work, we propose a novel MUltimodal Structural Transformer (MUST) that incorporates multiple modalities for web information extraction. Concretely, we develop a structural encoder that jointly encodes the multimodal information based on the HTML structure of the web layout, where high-level DOM nodes, low-level text, and image tokens are introduced to represent the entire page. Structural attention patterns are designed to learn effective cross-modal embeddings for all DOM nodes and low-level tokens. An extensive set of experiments has been conducted on WebSRC and Common Crawl benchmarks. Experimental results demonstrate the superior performance of MUST over several state-of-the-art baselines.

Cite

CITATION STYLE

APA

Wang, Q., Wang, J., Quan, X., Feng, F., Xu, Z., Nie, S., … Liu, D. (2023). MUSTIE: Multimodal Structural Transformer for Web Information Extraction. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 2405–2420). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.135

MUSTIE: Multimodal Structural Transformer for Web Information Extraction

Abstract

Cite

Register to see more suggestions