Development of Text Data Processing Pipeline for Scientific Systems

Anna I. Guseva; Igor A. Kuznetsov; Pyotr V. Bochkaryov; Stanislav A. Filippov; Vasiliy S. Kireev

Conference Proceedings

Development of Text Data Processing Pipeline for Scientific Systems

Advances in Intelligent Systems and Computing (2020) 948 124-136

DOI: 10.1007/978-3-030-25719-4_17

2Citations

7Readers

Get full text

Abstract

The aim of this work was to develop pipeline processing of scientific texts, including articles and abstracts, for their further categorization, identify patterns and build recommendations to users of scientific systems. The authors proposed a number of methods of pre-processing of texts, the method of cluster and classification analysis of texts, developed a software system of recommendations to users of scientific publications. To solve the problem of data preprocessing it is proposed to use parametrical approach to retrieve new – semantic – feature from textual publications – the type of scientific result. Scientific result type extraction is built just based on user’s need for content having specific property. To solve the problem of users’ profile clustering it is proposed to use ensemble method with distance metric change. For classification, ensemble method based on entropy is used. Evaluation of proposed methods and algorithms employment efficiency was carried out as applied to operation of search module of “Technologies in Education” International Congress of Conferences information system. Author acknowledges support from the MEPhI Academic Excellence Project (Contract No. 02.a03.21.0005).

Author supplied keywords

Cite

CITATION STYLE

APA

Guseva, A. I., Kuznetsov, I. A., Bochkaryov, P. V., Filippov, S. A., & Kireev, V. S. (2020). Development of Text Data Processing Pipeline for Scientific Systems. In Advances in Intelligent Systems and Computing (Vol. 948, pp. 124–136). Springer Verlag. https://doi.org/10.1007/978-3-030-25719-4_17

Development of Text Data Processing Pipeline for Scientific Systems

Abstract

Author supplied keywords

Cite

Register to see more suggestions