The paper addresses the task of information extraction from scientific literature with machine learning methods. In particular, the tasks of definition and result extraction from scientific publications in Russian are considered. We note that annotation of scientific texts for creation of training dataset is very labor insensitive and expensive process. To tackle this problem, we propose methods and tools based on active learning. We describe and evaluate a novel adaptive density-weighted sampling (ADWeS) meta-strategy for active learning. The experiments demonstrate that active learning can be a very efficient technique for scientific text mining, and the proposed meta-strategy can be beneficial for corpus annotation with strongly skewed class distribution. We also investigate informative task-independent features for information extraction from scientific texts and present an openly available tool for corpus annotation, which is equipped with ADWeS and compatible with well-known sampling strategies.
CITATION STYLE
Suvorov, R., Shelmanov, A., & Smirnov, I. (2018). Active learning with adaptive density weighted sampling for information extraction from scientific papers. In Communications in Computer and Information Science (Vol. 789, pp. 77–90). Springer Verlag. https://doi.org/10.1007/978-3-319-71746-3_7
Mendeley helps you to discover research relevant for your work.