Ontology-Driven Scientific Literature Classification Using Clustering and Self-supervised Learning

2Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The rapid growth of scientific literature in the fields of computer engineering (CE) and computer science (CS) presents difficulties to researchers who are interested in exploring research publication records based on standard scientific categories. This urges the need for a context-aware, automatic classification of text documents into standard scientific categories. Document classification is a significant application of supervised learning which requires a labeled dataset for training the classifier. However, research publication records available on Google Scholar and dblp services are not labeled. First, manual annotation of a large body of scientific research work based on standard scientific terminology requires domain expertise and is extremely time-consuming. Second, hierarchical labeling of records facilitates a more effective and context-aware retrieval of documents. In this paper, we propose an ontology-driven classification technique based on zero-shot learning in conjunction with agglomerative clustering to automatically label a scientific literature dataset related to CE and CS. We further study and compare the effectiveness of multiple text classifiers such as logistic regression (LR), support vector machines (SVM), gradient boosting with Word2vec and bag of words (BOW) embedding, recurrent neural networks (RNN) with GloVe embedding, and feed-forward neural networks with BOW embedding. Our study showed that RNN with GloVe embedding outperforms other models with an above 0.85 F1 score on all granularity levels. Our proposed technique will help junior and experienced researchers identify new emerging technologies and domains for their research purposes.

Cite

CITATION STYLE

APA

Pan, Z., Soong, P., & Rafatirad, S. (2023). Ontology-Driven Scientific Literature Classification Using Clustering and Self-supervised Learning. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 137, pp. 133–155). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-2600-6_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free