ProppLearner: Deeply annotating a corpus of Russian folktales to enable the machine learning of a Russian formalist theory

24Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

I describe the collection and deep annotation of the semantics of a corpus of Russian folktales. This corpus, which I call the 'ProppLearner' corpus, was assembled to provide data for an algorithm designed to learn Vladimir Propp's morphology of Russian hero tales. The corpus is the most deeply annotated narrative corpus available at this time. The algorithm and learning results are described elsewhere; here, I provide detail on the layers of annotation and how they were chosen, novel layers of annotation required for successful learning, the selection of the texts for annotation, the annotation process itself, and the resulting inter-annotator agreement measures. In particular, the corpus comprised fifteen texts totaling 18,862 words. There were eighteen layers of annotation, five of which were developed specifically to support learning Propp's morphology: referent attributes, context relationships, event valences, Propp's 'dramatis personae', and Propp's functions. All annotations were created by trained annotators with the Story Workbench annotation tool, following a double-annotation paradigm. I discuss lessons learned from this effort and what they mean for future digital humanities efforts when working with the semantics of natural language text.

Cite

CITATION STYLE

APA

Finlayson, M. A. (2017). ProppLearner: Deeply annotating a corpus of Russian folktales to enable the machine learning of a Russian formalist theory. Digital Scholarship in the Humanities, 32(2), 284–300. https://doi.org/10.1093/llc/fqv067

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free