Selecting documents relevant for chemistry as a classification problem

Zhemin Zhu; Saber A. Akhondi; Umesh Nandal; Marius Doornenbal; Michelle Gregory

Conference Proceedings

Selecting documents relevant for chemistry as a classification problem

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10180 LNAI 198-201

DOI: 10.1007/978-3-319-58694-6_31

0Citations

13Readers

Get full text

Abstract

We present a first version of a system for selecting chemical publications for inclusion in a chemistry information database. This database, Reaxys (https://www.elsevier.com/solutions/reaxys), is a portal for the retrieval of structured chemistry information from published journals and patents. There are three challenges in this task: (i) Training and input data are highly imbalanced; (ii) High recall (≥95%) is desired; and (iii) Data offered for selection is numerically massive but at the same time, incomplete. Our system successfully handles the imbalance with the undersampling technique and achieves relatively high recall using chemical named entities as features. Experiments on a real-world data set consisting of 15,822 documents show that the features of chemical named entities boost recall by 8% over the usual n-gram features being widely used in general document classification applications. For fostering research on this challenging topic, a part of the data set compiled in this paper can be requested.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhu, Z., Akhondi, S. A., Nandal, U., Doornenbal, M., & Gregory, M. (2017). Selecting documents relevant for chemistry as a classification problem. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10180 LNAI, pp. 198–201). Springer Verlag. https://doi.org/10.1007/978-3-319-58694-6_31

Selecting documents relevant for chemistry as a classification problem

Abstract

Author supplied keywords

Cite

Register to see more suggestions