Google dataset search: Building a search engine for datasets in an open web ecosystem

276Citations
Citations of this article
175Readers
Mendeley users who have this article in their library.
Get full text

Abstract

There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments, scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others' work, and providing data journalists easier access to information and its provenance. In this paper, we discuss Google Dataset Search, a dataset-discovery tool that provides search capabilities over potentially all datasets published on the Web. The approach relies on an open ecosystem, where dataset owners and providers publish semantically enhanced metadata on their own sites. We then aggregate, normalize, and reconcile this metadata, providing a search engine that lets users find datasets in the “long tail” of the Web. In this paper, we discuss both social and technical challenges in building this type of tool, and the lessons that we learned from this experience.

Cite

CITATION STYLE

APA

Noy, N., Burgess, M., & Brickley, D. (2019). Google dataset search: Building a search engine for datasets in an open web ecosystem. In The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019 (pp. 1365–1375). Association for Computing Machinery, Inc. https://doi.org/10.1145/3308558.3313685

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free