An AI-Based Methodology for the Automatic Classification of a Multiclass Ebook Collection Using Information from the Tables of Contents

10Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Book recommendation to support professors and students in the identification of relevant sources is of significant importance for both universities and digital libraries and, hence, motivates the development of a recommendation system. This paper aims at automatically classifying a multiclass corpus that was created from ebooks from the Springer collection, which is available through the Hellenic Academic Libraries' subscription, by utilizing an unsupervised neural network (NN) (self-organizing maps, SOM) and two deep neural network (DNN) architectures, namely, a long short-term memory (LSTM) and a convolutional neural network (CNN) combined with a LSTM(CNN+LSTM) under various configuration scenarios. The vector construction leverages information that was extracted from the table of contents (ToC) of each book using the TF-IDF weighting scheme (for the first case) and the Keras tokenizer (for the second). Extensive experiments were conducted using various configurations of preprocessing steps, NN set up and vector and vocabulary sizes to assess their impact on the classifier's performance. Furthermore, we show that majority voting is more suitable for selecting the dominant label for a specified node. The experimental analysis showed the feasibility of developing a recommendation system for supporting professors and students in the identification of related sources based on a detailed thematic description (e.g., abstract or table of contents of a book) rather than a few keywords. In the conducted experiments, the subsystem that utilized the DNN (LSTM) performed the best, with F1-scores of 67% for the 26 categories and 80% for the 5 general categories, whereas SOM realizes F1-scores of less than 5% in both cases.

Cite

CITATION STYLE

APA

Giannopoulou, E., & Mitrou, N. (2020). An AI-Based Methodology for the Automatic Classification of a Multiclass Ebook Collection Using Information from the Tables of Contents. IEEE Access, 8, 218658–218675. https://doi.org/10.1109/ACCESS.2020.3041651

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free