Datasets, Corpora and other Language Resources

Victoria Arranz; Khalid Choukri; Valérie Mapelli; Mickaël Rigault; Penny Labropoulou; Miltos Deligiannis; Leon Voukoutis; Stelios Piperidis

Book ChapterOPEN ACCESS

Datasets, Corpora and other Language Resources

Springer Science and Business Media Deutschland GmbH, (2023), 151-169

DOI: 10.1007/978-3-031-17258-8_8

0Citations

1Readers

Abstract

This chapter provides an overview of what is available in ELG in terms of datasets, corpora and other language resources (LRs) and how this has been achieved. We look at the procedures and steps that have been followed to complete the full resource ingestion cycle, which goes from repository and LR identification to metadata description and ingestion. We explain the approaches, priorities and methodology. The chapter also outlines the repositories that have been integrated into ELG, discussing the different procedures followed (metadata conversion, extraction, and completion, as well as harvesting) and the reasons behind these choices. Furthermore, the ELG catalogue content is described, with details on key elements and features as well as accomplishments. The last two sections are devoted to the crucial legal issues behind such a complex platform and its data management plan, respectively.

Cite

CITATION STYLE

APA

Arranz, V., Choukri, K., Mapelli, V., Rigault, M., Labropoulou, P., Deligiannis, M., … Piperidis, S. (2023). Datasets, Corpora and other Language Resources. In Cognitive Technologies (pp. 151–169). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-17258-8_8

Datasets, Corpora and other Language Resources

Abstract

Cite

Register to see more suggestions