Genre classification for a corpus of academic webpages

Erika Dalan; Serge Sharoff

Conference ProceedingsOPEN ACCESS

Genre classification for a corpus of academic webpages

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2016) 90-98

DOI: 10.18653/v1/w16-2611

2Citations

77Readers

Abstract

In this paper we report our analysis of the similarities between webpages that are crawled from European academic websites, and comparison of their distribution in terms of the English language variety (native English vs English as a lingua franca) and their language family (based on the country’s official language). After building a corpus of university webpages, we selected a set of relevant descriptors that can represent their text types using the framework of the Functional Text Dimensions. Manual annotation of a random sample of academic pages provides the basis for classifying the remaining texts on each dimension. Reliable thresholds are then determined in order to evaluate precision and assess the distribution of text types by each dimension, with the ultimate goal of analysing language features over English varieties and language families.

Cite

CITATION STYLE

APA

Dalan, E., & Sharoff, S. (2016). Genre classification for a corpus of academic webpages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 90–98). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2611

Genre classification for a corpus of academic webpages

Abstract

Cite

Register to see more suggestions