S2AMP: A high-coverage dataset of scholarly mentorship inferred from publications

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Mentorship is a critical component of academia, but is not as visible as publications, citations, grants, and awards. Despite the importance of studying the quality and impact of mentorship, there are few large representative mentorship datasets available. We contribute two datasets to the study of mentorship. The first has over 300,000 ground truth academic mentor-mentee pairs obtained from multiple diverse, manually-curated sources, and linked to the Semantic Scholar (S2) knowledge graph. We use this dataset to train an accurate classifier for predicting mentorship relations from bibliographic features, achieving a held-out area under the ROC curve of 0.96. Our second dataset is formed by applying the classifier to the complete co-Authorship graph of S2. The result is an inferred graph with 137 million weighted mentorship edges among 24 million nodes. We release this first-of-its-kind dataset to the community to help accelerate the study of scholarly mentorship: https://github.com/allenai/S2AMP-data.

Cite

CITATION STYLE

APA

Rohatgi, S., Downey, D., King, D., & Feldman, S. (2022). S2AMP: A high-coverage dataset of scholarly mentorship inferred from publications. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3529372.3533283

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free