CiteSeerX data: Semanticizing scholarly papers

Jian Wu; Chen Liang; Huaiyu Yang; C. Lee Giles

Conference ProceedingsOPEN ACCESS

CiteSeerX data: Semanticizing scholarly papers

Proceedings of the ACM SIGMOD International Conference on Management of Data (2016)

DOI: 10.1145/2928294.2928306

6Citations

23Readers

Get full text

Abstract

Scholarly big data is, for many, an important instance of Big Data. Digital library search engines have been built to acquire, extract, and ingest large volumes of scholarly papers. This paper provides an overview of the scholarly big data released by CiteSeerX, as of the end of 2015, and discusses various aspects such as how the data is acquired, its size, general quality, data management, and accessibility. Preliminary results on extracting semantic entities from body text of scholarly papers with Wikifier show biases towards general terms appearing in Wikipedia and against domain specific terms. We argue that the latter will play a more important role in extracting important facts from scholarly papers.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, J., Liang, C., Yang, H., & Giles, C. L. (2016). CiteSeerX data: Semanticizing scholarly papers. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery. https://doi.org/10.1145/2928294.2928306

CiteSeerX data: Semanticizing scholarly papers

Abstract

Author supplied keywords

Cite

Register to see more suggestions