Using supervised learning to classify metadata of research data by field of study

6Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

Many interesting use cases of research data classifiers presuppose that a research data item can be mapped to more than one field of study, but for such classification mechanisms, reproducible evaluations are lacking. This paper closes this gap: It describes the creation of a training and evaluation set comprised of labeled metadata, evaluates several supervised classification approaches, and comments on their application in scientometric research. The metadata were retrieved from the DataCite index of research data, pre processed, and compiled into a set of 613,585 records. According to our experiments with 20 general fields of study, multi layer perceptron models perform best, followed by long short-term memory models. The models can be used in scientometric research, for example to analyze interdisciplinary trends of digital scholarly output or to characterize growth patterns of research data, stratified by field of study. Our findings allow us to estimate errors in applying the models. The best performing models and the data used for their training are available for re use.

References Powered by Scopus

Random forests

94858Citations
N/AReaders
Get full text

Long Short-Term Memory

76931Citations
N/AReaders
Get full text

Extremely randomized trees

6040Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Recalibrating the scope of scholarly publishing: A modest step in a vast decolonization process

43Citations
N/AReaders
Get full text

Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering

9Citations
N/AReaders
Get full text

Journal article classification using abstracts: a comparison of classical and transformer-based machine learning methods

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Weber, T., Kranzlmüller, D., Fromm, M., & Sousa, N. T. de. (2020). Using supervised learning to classify metadata of research data by field of study. Quantitative Science Studies, 1(2), 525–550. https://doi.org/10.1162/qss_a_00049

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 8

80%

Professor / Associate Prof. 1

10%

Researcher 1

10%

Readers' Discipline

Tooltip

Computer Science 6

55%

Social Sciences 3

27%

Business, Management and Accounting 1

9%

Economics, Econometrics and Finance 1

9%

Save time finding and organizing research with Mendeley

Sign up for free