Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

Michał Łopuszyński; Łukasz Bolikowski

Conference Proceedings

Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

Communications in Computer and Information Science (2014) 416 CCIS 16-27

DOI: 10.1007/978-3-319-08425-1_3

2Citations

10Readers

Get full text

Abstract

In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.). © Springer International Publishing Switzerland 2014.

Author supplied keywords

Cite

CITATION STYLE

APA

Łopuszyński, M., & Bolikowski, Ł. (2014). Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools. In Communications in Computer and Information Science (Vol. 416 CCIS, pp. 16–27). Springer Verlag. https://doi.org/10.1007/978-3-319-08425-1_3

Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

Abstract

Author supplied keywords

Cite

Register to see more suggestions