A Test Collection of Synthetic Documents for Training Rankers: ChatGPT vs. Human Experts

Arian Askari; Mohammad Aliannejadi; Evangelos Kanoulas; Suzan Verberne

Conference ProceedingsOPEN ACCESS

A Test Collection of Synthetic Documents for Training Rankers: ChatGPT vs. Human Experts

International Conference on Information and Knowledge Management, Proceedings (2023) 5311-5315

DOI: 10.1145/3583780.3615111

15Citations

19Readers

Get full text

Abstract

We investigate the usefulness of generative large language models (LLMs) in generating training data for cross-encoder re-rankers in a novel direction: generating synthetic documents instead of synthetic queries. We introduce a new dataset, ChatGPT-RetrievalQA, and compare the effectiveness of strong models fine-tuned on both LLM-generated and human-generated data. We build ChatGPT-RetrievalQA based on an existing dataset, the human ChatGPT comparison corpus (HC3), consisting of multiple public question collections featuring both human- and ChatGPT-generated responses. We fine-tune a range of cross-encoder re-rankers on either human-generated or ChatGPT-generated data. Our evaluation on MS MARCO DEV, TREC DL'19, and TREC DL'20 demonstrates that cross-encoder re-ranking models trained on LLM-generated responses are significantly more effective for out-of-domain re-ranking than those trained on human responses. For in-domain re-ranking, however, the human-trained re-rankers outperform the LLM-trained re-rankers. Our novel findings suggest that generative LLMs have high potential in generating training data for neural retrieval models and can be used to augment training data, especially in domains with less labeled data. ChatGPT-RetrievalQA presents various opportunities for analyzing and improving rankers with both human- and LLM-generated data. Our data, code, and model checkpoints are publicly available.

Author supplied keywords

Cite

CITATION STYLE

APA

Askari, A., Aliannejadi, M., Kanoulas, E., & Verberne, S. (2023). A Test Collection of Synthetic Documents for Training Rankers: ChatGPT vs. Human Experts. In International Conference on Information and Knowledge Management, Proceedings (pp. 5311–5315). Association for Computing Machinery. https://doi.org/10.1145/3583780.3615111

A Test Collection of Synthetic Documents for Training Rankers: ChatGPT vs. Human Experts

Abstract

Author supplied keywords

Cite

Register to see more suggestions