A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications

6Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Purpose: Our study proposes a bootstrapping-based method to automatically extract data-usage statements from academic texts. Design/methodology/approach: The method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. In each iteration, new patterns are constructed and added to the pattern list based on their calculated score. Three seed-selection strategies are also proposed in this paper. Findings: The performance of the method is verified by means of experiments on real data collected from computer science journals. The results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns. Research limitations: While the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. Additional features that can address complex sentences should thus be explored in the future. Practical implications: Data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation. Originality/value: To the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.

References Powered by Scopus

Data reuse and the open data citation advantage

389Citations
N/AReaders
Get full text

A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts

284Citations
N/AReaders
Get full text

Who shares? Who doesn't? Factors associated with openly archiving raw research data

149Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A review on method entities in the academic literature: extraction, evaluation, and application

25Citations
N/AReaders
Get full text

A Literature Review on Methods for the Extraction of Usage Statements of Software and Data

15Citations
N/AReaders
Get full text

Aqueous two-phase systems for cephalexin monohydrate partitioning using poly ethylene glycol and sodium tartrate dihydrate: Experimental and thermodynamic modeling

13Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Zhang, Q., Cheng, Q., Huang, Y., & Lu, W. (2016). A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications. Journal of Data and Information Science, 1(1), 69–85. https://doi.org/10.20309/jdis.201606

Readers' Seniority

Tooltip

Researcher 3

75%

PhD / Post grad / Masters / Doc 1

25%

Readers' Discipline

Tooltip

Social Sciences 4

67%

Medicine and Dentistry 2

33%

Save time finding and organizing research with Mendeley

Sign up for free