Four datasets derived from an archive of personal homepages (1995–2009)

Sean C. Rife

Journal ArticleOPEN ACCESS

Four datasets derived from an archive of personal homepages (1995–2009)

Rife S

Data (2017) 2(2)

DOI: 10.3390/data2020019

0Citations

15Readers

Abstract

While data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved and processed to remove non-text data, then further refined to create separate datasets, each of which provides unique insights into modes of personal expression on the early Internet. The present paper describes four datasets, all of which were derived from a larger collection of personal websites: (1) a large corpus of raw text data from Geocities personal homepages, (2) a linguistic analysis of basic psychological properties of the same Geocities pages, using an open-source implementation of the Linguistic Inquiry Word Count (LIWC), (3) a dataset of links between homepages (suitable for network analysis), and (4) a manifest dataset summarizing the size and last update date for each file in the dataset. Data from over 378,000 Geocities pages are included. In addition to providing a detailed description of how these datasets were created, I describe how they might be utilized in future research.

Author supplied keywords

Cite

CITATION STYLE

APA

Rife, S. C. (2017). Four datasets derived from an archive of personal homepages (1995–2009). Data, 2(2). https://doi.org/10.3390/data2020019

Four datasets derived from an archive of personal homepages (1995–2009)

Abstract

Author supplied keywords

Cite

Register to see more suggestions