CANELC: Constructing an e-language Corpus

  • Knight D
  • Adolphs S
  • Carter R
  • 20

    Readers

    Mendeley users who have this article in their library.
  • 8

    Citations

    Citations of this article.

Abstract

This paper reports on the construction of the Cambridge and Nottingham e-language Corpus (CANELC).33This corpus has been built as part of a collaborative project between the University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, Tweets, discussion board content and private/business e-mails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At present, the annotated corpus is only available to authors and researchers working for CUP and is not more generally available. CANELC is a one-million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, e-mails and Short Message Services (SMS). The paper outlines the approaches used when planning the corpus: obtaining consent, collecting the data and compi...

Author-supplied keywords

  • Blogs
  • Corpus linguistics
  • Discussion boards
  • E-language
  • SMS
  • Tweets

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free