PYTHON SCRIPTS FOR WEB SCRAPING METADATA FROM DESCRIPTIONS ABOUT THE DATASETS OF THE INTERNATIONAL SCENARIO OF RESEARCH DATA REPOSITORIES

0Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Objective: Research data repositories are an evolution of document repositories that aim to access and preserve all materials used before, during, and after scientific research. In this context, this study aims to conduct an exploratory and descriptive investigation of the international scenario of data repositories by monitoring the descriptive metadata of the international register of this type of repositories in the Registry of Research Data Repositories (re3data.org). Methods: The process requires applying knowledge inherent to the techniques and technologies used for descriptive data analysis, information retrieval, manipulation, analysis, and data visuali zation. Consequently, three scripts in Python 3.11 are provided for collecting metadata from re3data and scripts and converting the metadata to enable visualization in software such as VOSviewer, a dataset with metadata descriptions of repositories and conversions for visualization of networks. The datasets produced in this study can be found in the ZENODO Data Repository (https://doi.org/10.5281/zenodo.7903109). In a collection on (05/05/2023), 3108 links to the repository descriptions were retrieved. Data and scripts were created for this methodological experiment and shared at (DOI: doi.org/10.5281/zenodo.7903109). The dataset contains a root directory with three subdirectories: (scripts) with (.py) Python codes, another directory called (data) with textual files containing tab-separated values (.TSV), and the file (Information Systems Research, RIS). The third directory (env) contains the Python libraries required to run the scripts. Potential for reuse: The research method applied to manipulate this dataset is based on automated re3data metadata extraction and network visualization; after the data collection and analysis process, it is possible to trigger a study based on the descriptions extracted from the Registry of Research Data Repositories (re3data), researchers can visualize the international scenario of research data repositories, verified by re3data, which allows ethical monitoring of the number of research data repositories that are registered in re3data, what are their areas, institutions, countries, the language of research data, the typology of repositories and deposited data, their themes, areas of knowledge, types of access, licenses and software used. In addition, other issues can be raised while interpreting the data. The community of Librarianship and Information Science professionals need to share data and the extraction technique these research data. Finally, it can be concluded whether information about research data repositories allows us to state that they are heterogeneous data sources that enable access and preservation of a wide range of research data types.

Cite

CITATION STYLE

APA

Semeler, A. R., Oliveira, A. L., Matiquite, P. C. S., & Pereira, F. A. (2023). PYTHON SCRIPTS FOR WEB SCRAPING METADATA FROM DESCRIPTIONS ABOUT THE DATASETS OF THE INTERNATIONAL SCENARIO OF RESEARCH DATA REPOSITORIES. Encontros Bibli, 28. https://doi.org/10.5007/1518-2924.2023.e94877

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free