Abstract
Scholarly big data is, for many, an important instance of Big Data. Digital library search engines have been built to acquire, extract, and ingest large volumes of scholarly papers. This paper provides an overview of the scholarly big data released by CiteSeerX, as of the end of 2015, and discusses various aspects such as how the data is acquired, its size, general quality, data management, and accessibility. Preliminary results on extracting semantic entities from body text of scholarly papers with Wikifier show biases towards general terms appearing in Wikipedia and against domain specific terms. We argue that the latter will play a more important role in extracting important facts from scholarly papers.
Author supplied keywords
Cite
CITATION STYLE
Wu, J., Liang, C., Yang, H., & Giles, C. L. (2016). CiteSeerX data: Semanticizing scholarly papers. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery. https://doi.org/10.1145/2928294.2928306
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.