The CELI corpus: Design and linguistic annotation of a new online learner corpus

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This article introduces the CELI corpus, a new learner corpus of written Italian consisting of ca. 600,000 tokens, evenly distributed among CEFR (Common European Framework of Reference for Languages) proficiency levels B1, B2, C1 and C2. The collected texts derive from the language certification exams administered by the University for Foreigners of Perugia all around the world. The corpus contains rich metadata pertaining to text-related and learner-related variables. It expands the domain of learner corpora by being, among other things, both freely available online to the research community, and by focusing on a target language other than English. The article also presents and evaluates the POS-tagging procedure, thus contributing to best practices in learner corpus annotation.

Cite

CITATION STYLE

APA

Spina, S., Fioravanti, I., Forti, L., & Zanda, F. (2024, April 1). The CELI corpus: Design and linguistic annotation of a new online learner corpus. Second Language Research. SAGE Publications Ltd. https://doi.org/10.1177/02676583231176370

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free