Abstract
I describe the collection and deep annotation of the semantics of a corpus of Russian folktales. This corpus, which I call the 'ProppLearner' corpus, was assembled to provide data for an algorithm designed to learn Vladimir Propp's morphology of Russian hero tales. The corpus is the most deeply annotated narrative corpus available at this time. The algorithm and learning results are described elsewhere; here, I provide detail on the layers of annotation and how they were chosen, novel layers of annotation required for successful learning, the selection of the texts for annotation, the annotation process itself, and the resulting inter-annotator agreement measures. In particular, the corpus comprised fifteen texts totaling 18,862 words. There were eighteen layers of annotation, five of which were developed specifically to support learning Propp's morphology: referent attributes, context relationships, event valences, Propp's 'dramatis personae', and Propp's functions. All annotations were created by trained annotators with the Story Workbench annotation tool, following a double-annotation paradigm. I discuss lessons learned from this effort and what they mean for future digital humanities efforts when working with the semantics of natural language text.
Cite
CITATION STYLE
Finlayson, M. A. (2017). ProppLearner: Deeply annotating a corpus of Russian folktales to enable the machine learning of a Russian formalist theory. Digital Scholarship in the Humanities, 32(2), 284–300. https://doi.org/10.1093/llc/fqv067
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.