Abstract
In this paper we report our analysis of the similarities between webpages that are crawled from European academic websites, and comparison of their distribution in terms of the English language variety (native English vs English as a lingua franca) and their language family (based on the country’s official language). After building a corpus of university webpages, we selected a set of relevant descriptors that can represent their text types using the framework of the Functional Text Dimensions. Manual annotation of a random sample of academic pages provides the basis for classifying the remaining texts on each dimension. Reliable thresholds are then determined in order to evaluate precision and assess the distribution of text types by each dimension, with the ultimate goal of analysing language features over English varieties and language families.
Cite
CITATION STYLE
Dalan, E., & Sharoff, S. (2016). Genre classification for a corpus of academic webpages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 90–98). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2611
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.