Ontology-Driven Scientific Literature Classification Using Clustering and Self-supervised Learning

Zhengtong Pan; Patrck Soong; Setareh Rafatirad

Book Chapter

Ontology-Driven Scientific Literature Classification Using Clustering and Self-supervised Learning

Springer Science and Business Media Deutschland GmbH, (2023), 133-155

DOI: 10.1007/978-981-19-2600-6_10

2Citations

11Readers

Get full text

Abstract

The rapid growth of scientific literature in the fields of computer engineering (CE) and computer science (CS) presents difficulties to researchers who are interested in exploring research publication records based on standard scientific categories. This urges the need for a context-aware, automatic classification of text documents into standard scientific categories. Document classification is a significant application of supervised learning which requires a labeled dataset for training the classifier. However, research publication records available on Google Scholar and dblp services are not labeled. First, manual annotation of a large body of scientific research work based on standard scientific terminology requires domain expertise and is extremely time-consuming. Second, hierarchical labeling of records facilitates a more effective and context-aware retrieval of documents. In this paper, we propose an ontology-driven classification technique based on zero-shot learning in conjunction with agglomerative clustering to automatically label a scientific literature dataset related to CE and CS. We further study and compare the effectiveness of multiple text classifiers such as logistic regression (LR), support vector machines (SVM), gradient boosting with Word2vec and bag of words (BOW) embedding, recurrent neural networks (RNN) with GloVe embedding, and feed-forward neural networks with BOW embedding. Our study showed that RNN with GloVe embedding outperforms other models with an above 0.85 F1 score on all granularity levels. Our proposed technique will help junior and experienced researchers identify new emerging technologies and domains for their research purposes.

Author supplied keywords

Cite

CITATION STYLE

APA

Pan, Z., Soong, P., & Rafatirad, S. (2023). Ontology-Driven Scientific Literature Classification Using Clustering and Self-supervised Learning. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 137, pp. 133–155). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-2600-6_10

Ontology-Driven Scientific Literature Classification Using Clustering and Self-supervised Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions