Retrieving evidences from tabular and textual resources is essential for open-domain question answering (OpenQA), which provides more comprehensive information. However, training an effective dense table-text retriever is difficult due to the challenges of table-text discrepancy and data sparsity problem. To address the above challenges, we introduce an optimized OpenQA Table-TExt Retriever (OTTER) to jointly retrieve tabular and textual evidences. Firstly, we propose to enhance mixed-modality representation learning via two mechanisms: modality-enhanced representation and mixed-modality negative sampling strategy. Secondly, to alleviate data sparsity problem and enhance the general retrieval ability, we conduct retrieval-centric mixed-modality synthetic pre-training. Experimental results demonstrate that OTTER substantially improves the performance of table-and-text retrieval on the OTTQA dataset. Comprehensive analyses examine the effectiveness of all the proposed mechanisms. Besides, equipped with OTTER, our OpenQA system achieves the state-of-the-art result on the downstream QA task, with 10.1% absolute performance gain in terms of the exact match over the previous best system.
CITATION STYLE
Huang, J., Zhong, W., Liu, Q., Gong, M., Jiang, D., & Duan, N. (2022). Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 4146–4158). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.303
Mendeley helps you to discover research relevant for your work.