Development of Text Data Processing Pipeline for Scientific Systems

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The aim of this work was to develop pipeline processing of scientific texts, including articles and abstracts, for their further categorization, identify patterns and build recommendations to users of scientific systems. The authors proposed a number of methods of pre-processing of texts, the method of cluster and classification analysis of texts, developed a software system of recommendations to users of scientific publications. To solve the problem of data preprocessing it is proposed to use parametrical approach to retrieve new – semantic – feature from textual publications – the type of scientific result. Scientific result type extraction is built just based on user’s need for content having specific property. To solve the problem of users’ profile clustering it is proposed to use ensemble method with distance metric change. For classification, ensemble method based on entropy is used. Evaluation of proposed methods and algorithms employment efficiency was carried out as applied to operation of search module of “Technologies in Education” International Congress of Conferences information system. Author acknowledges support from the MEPhI Academic Excellence Project (Contract No. 02.a03.21.0005).

Cite

CITATION STYLE

APA

Guseva, A. I., Kuznetsov, I. A., Bochkaryov, P. V., Filippov, S. A., & Kireev, V. S. (2020). Development of Text Data Processing Pipeline for Scientific Systems. In Advances in Intelligent Systems and Computing (Vol. 948, pp. 124–136). Springer Verlag. https://doi.org/10.1007/978-3-030-25719-4_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free