Abstract
Legal case retrieval is of vital importance for ensuring justice in different kinds of law systems and has recently received increasing attention in information retrieval (IR) research. However, the relevance judgment criteria of previous retrieval datasets are either not applicable to non-cited relationship cases or not instructive enough for future datasets to follow. Besides, most existing benchmark datasets do not focus on the selection of queries. In this paper, we construct the Chinese Legal Case Retrieval Dataset (LeCaRD), which contains 107 query cases and over 43,000 candidate cases. Queries and results are adopted from criminal cases published by the Supreme People's Court of China. In particular, to address the difficulty in relevance definition, we propose a series of relevance judgment criteria designed by our legal team and corresponding candidate case annotations are conducted by legal experts. Also, we develop a novel query sampling strategy that takes both query difficulty and diversity into consideration. For dataset evaluation, we implemented several existing retrieval models on LeCaRD as baselines. The dataset is now available to the public together with the complete data processing details.
Author supplied keywords
Cite
CITATION STYLE
Ma, Y., Shao, Y., Wu, Y., Liu, Y., Zhang, R., Zhang, M., & Ma, S. (2021). LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2342–2348). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3463250
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.