Definitions of dataset in the scientific and technical literature

  • Renear A
  • Sacchi S
  • Wickett K
  • 72


    Mendeley users who have this article in their library.
  • 22


    Citations of this article.


The integration of heterogeneous data in varying formats
and from diverse communities requires an improved
understanding of the concept of a dataset, and of key
related concepts, such as format, encoding, and version.
Ultimately, a normative formal framework of such concepts
will be needed to support the effective curation, integration,
and use of shared multi-disciplinary scientific data. To
prepare for the development of this framework we reviewed
the definitions of dataset found in technical documentation
and the scientific literature. Four basic features can be
identified as common to most definitions: grouping,
content, relatedness, and purpose. In this summary of our
results we describe each of these features, indicating the
directions a more formal analysis might take.

Author-supplied keywords

  • Data Curation
  • Dataset
  • Information Organization. Data Conservancy

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Get full text


  • Allen H. Renear

  • Simone Sacchi

  • Karen M. Wickett

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free