Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation A Case Study of the HuggingFace and GEM Data and Model Cards

N/ACitations
Citations of this article
74Readers
Mendeley users who have this article in their library.

Abstract

Developing documentation guidelines and easy-to-use templates for datasets and models is a challenging task, especially given the variety of backgrounds, skills, and incentives of the people involved in the building of natural language processing (NLP) tools. Nevertheless, the adoption of standard documentation practices across the field of NLP promotes more accessible and detailed descriptions of NLP datasets and models, while supporting researchers and developers in reflecting on their work. To help with the standardization of documentation, we present two case studies of efforts that aim to develop reusable documentation templates – the HuggingFace data card, a general purpose card for datasets in NLP, and the GEM benchmark data and model cards with a focus on natural language generation. We describe our process for developing these templates, including the identification of relevant stakeholder groups, the definition of a set of guiding principles, the use of existing templates as our foundation, and iterative revisions based on feedback.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

McMillan-Major, A., Osei, S., Rodriguez, J. D., Ammanamanchi, P. S., Gehrmann, S., & Jernite, Y. (2021). Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation A Case Study of the HuggingFace and GEM Data and Model Cards. In GEM 2021 - 1st Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings (pp. 121–135). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.gem-1.11

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 16

70%

Researcher 5

22%

Professor / Associate Prof. 1

4%

Lecturer / Post doc 1

4%

Readers' Discipline

Tooltip

Computer Science 18

69%

Linguistics 4

15%

Social Sciences 2

8%

Engineering 2

8%

Save time finding and organizing research with Mendeley

Sign up for free