We develop an enhanced version of CORD-19 dataset released by the Allen Institute for AI. Tools in the SeerSuite project are used to exploit information in original articles not directly provided in the CORD-19 datasets. We add 728 new abstracts, 70,102 figures and 31,446 tables with captions that are not provided in the current data release. We also built a vertical search engine COVIDSeer based on the new dataset we created. COVIDSeer has a relatively simple architecture with features like keyword filtering, and similar paper recommendation. The goal was to provide a system and dataset that can help scientists better navigate through the literature concerning COVID-19. The enriched dataset can serve as a supplement to the existing dataset. The search engine, which offers keyphrase-enhanced search, will hopefully help biomedical and life science researchers, medical students, and the general public to more effectively explore coronavirus-related literature. The entire data set and the system will be made open source.
CITATION STYLE
Rohatgi, S., Karishma, Z., Chhay, J., Keesara, S. R. R., Wu, J., Caragea, C., & Giles, C. L. (2020). COVIDSeer: Extending the CORD-19 Dataset. In Proceedings of the ACM Symposium on Document Engineering, DocEng 2020. Association for Computing Machinery, Inc. https://doi.org/10.1145/3395027.3419597
Mendeley helps you to discover research relevant for your work.