A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries

6Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Natural language processing techniques enable extraction of valuable information from large amounts of published literature for the application of data science and technology, i.e. machine learning in the field of materials science. Nevertheless, the automated extraction of data from full-text documents remains a complex task. We propose a document-level natural language processing pipeline for literature extraction of comprehensive information on layered cathode materials for sodium-ion batteries. The pipeline enhances entity recognition with contextual supplementary information while capturing the article structure. Finally, a heuristic multi-level relationship extraction algorithm is employed in relation extraction to extract experimental parameters and complex performance relationships respectively. We successfully extracted a comprehensive dataset containing 5265 records from 1747 documents, encompassing essential information such as chemical composition, synthesis parameters, and electrochemical properties. By implementing our pipeline, we have made significant progress in overcoming the challenges associated with data scarcity in battery informatics. The extracted datasets provide a valuable resource for further research and development in the field of layered cathode materials.

Cite

CITATION STYLE

APA

Gou, Y., Zhang, Y., Zhu, J., & Shu, Y. (2024). A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries. Scientific Data, 11(1). https://doi.org/10.1038/s41597-024-03196-1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free