Hyphe, a curation-oriented approach to web crawling for the social sciences

20Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.

Abstract

The web is a field of investigation for social sciences, and platform-based studies have long proven their relevance. However the generic web is rarely studied in itself though it contains crucial aspects of the embodiment of social actors: personal blogs, institutional websites, hobby-specific media? We realized that some sociologists see existing web crawlers as "black boxes" unsuitable for research though they are willing to study the broad web. In this paper we present Hyphe, a crawler developed with and for social scientists, with an innovative "curation-oriented" approach. We expose the problems of using web-mining techniques in social science research and how to overcome those by specific features such as step-by-step corpus building and a memory structure allowing researchers to redefine dynamically the granularity of their "web entities".

Cite

CITATION STYLE

APA

Jacomy, M., Girard, P., Ooghe-Tabanou, B., & Venturini, T. (2016). Hyphe, a curation-oriented approach to web crawling for the social sciences. In Proceedings of the 10th International Conference on Web and Social Media, ICWSM 2016 (pp. 595–598). AAAI Press. https://doi.org/10.1609/icwsm.v10i1.14777

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free