MS-MENTIONS: Consistently Annotating Entity Mentions in Materials Science Procedural Text

13Citations
Citations of this article
60Readers
Mendeley users who have this article in their library.

Abstract

Material science synthesis procedures are a promising domain for scientific NLP, as proper modeling of these recipes could provide insight into new ways of creating materials. However, a fundamental challenge in building information extraction models for material science synthesis procedures is getting accurate labels for the materials, operations, and other entities of those procedures. We present a new corpus of entity mention annotations over 595 Material Science synthesis procedural texts (157,488 tokens), which greatly expands the training data available for the Named Entity Recognition task. We outline a new label inventory designed to provide consistent annotations and a new annotation approach intended to maximize the consistency and annotation speed of domain experts. Inter-annotator agreement studies and baseline models trained upon the data suggest that the corpus provides high-quality annotations of these mention types. This corpus helps lay a foundation for future high-quality modeling of synthesis procedures.

Cite

CITATION STYLE

APA

O’Gorman, T., Jensen, Z., Mysore, S., Mahbub, R., Huang, K., Olivetti, E., & McCallum, A. (2021). MS-MENTIONS: Consistently Annotating Entity Mentions in Materials Science Procedural Text. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1337–1352). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.101

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free