Systematic review is a crucial method that has been widely used by scholars from different research domains. However, screening for relevant scientific literature from paper candidates remains an extremely time-consuming process so the task of screening prioritization has been established to reduce the human workload. Various methods under the human-in-the-loop fashion are proposed to solve this task by using lexical features. These methods, even though achieving better performance than more sophisticated feature-based models such as BERT, omit rich and essential semantic information, therefore suffered from feature bias. In this study, we propose a novel framework SciMine to accelerate this screening process by capturing semantic feature representations from both background and the corpus. In particular, based on contextual representation learned from the pre-trained language models, our approach utilizes an autoencoder-based classifier and a feature-dependent classification module to extract general document-level and phrase-level information. Then a ranking ensemble strategy is used to combine these two complementary pieces of information. Experiments on five real-world datasets demonstrate that SciMine achieves state-of-the-art performance and comprehensive analysis further shows the efficacy of SciMine to solve feature bias.
CITATION STYLE
Guo, F., Luo, Y., Yang, L., & Zhang, Y. (2023). SciMine: An Efficient Systematic Prioritization Model Based on Richer Semantic Information. In SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 205–215). Association for Computing Machinery, Inc. https://doi.org/10.1145/3539618.3591764
Mendeley helps you to discover research relevant for your work.