Subtopic segmentation of scientific texts: parameter optimisation

4Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Information research within a scientific text needs to deal with the problem of automatic document partition on subtopics by taking text specifics and user purposes into account. This task is important for primary source selection, for working with texts in foreign languages or for getting acquainted with research problems. This paper is focused on the application of subtopic segmentation algorithms to real-life scientific texts. For studying this we use monographs on the same subject written in three languages. The corpus includes several original and professionally trasnlated fragments. The research is based on the TextTiling algorithm that analyses how tightly adjoining parts of the text cohere. We examine how some parameters (the cutoff rate, the size of moving window and of the shift from one block to the next one) influence the segmentation quality and define the optimal combinations of these parameters for several languages. The studies on Russian suggest that external lexical resources notably improve the segmentation quality.

Cite

CITATION STYLE

APA

Avdeeva, N., Artemova, G., Boyarsky, K., Gusarova, N., Dobrenko, N., & Kanevsky, E. (2015). Subtopic segmentation of scientific texts: parameter optimisation. In Communications in Computer and Information Science (Vol. 518, pp. 3–15). Springer Verlag. https://doi.org/10.1007/978-3-319-24543-0_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free