Genre classification for a corpus of academic webpages

2Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.

Abstract

In this paper we report our analysis of the similarities between webpages that are crawled from European academic websites, and comparison of their distribution in terms of the English language variety (native English vs English as a lingua franca) and their language family (based on the country’s official language). After building a corpus of university webpages, we selected a set of relevant descriptors that can represent their text types using the framework of the Functional Text Dimensions. Manual annotation of a random sample of academic pages provides the basis for classifying the remaining texts on each dimension. Reliable thresholds are then determined in order to evaluate precision and assess the distribution of text types by each dimension, with the ultimate goal of analysing language features over English varieties and language families.

Cite

CITATION STYLE

APA

Dalan, E., & Sharoff, S. (2016). Genre classification for a corpus of academic webpages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 90–98). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2611

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free