Introducing and evaluating ukWaC, a very large web-derived corpus of English

  • Ferraresi A
  • Zanchetta E
  • Baroni M
  • et al.
N/ACitations
Citations of this article
69Readers
Mendeley users who have this article in their library.

Abstract

In this paper we introduce ukWaC, a large corpus of English constructed by crawling the .uk Internet domain. The corpus contains more than 2 billion tokens and is one of the largest freely available linguistic resources for English. The paper describes the tools and methodology used in the construction of the corpus and provides a qualitative evaluation of its contents, carried out through a vocabularybased comparison with the BNC. We conclude by giving practical information about availability and format of the corpus.

Cite

CITATION STYLE

APA

Ferraresi, A., Zanchetta, E., Baroni, M., & Bernardini, S. (2008). Introducing and evaluating ukWaC, a very large web-derived corpus of English. In Proceedings of the 4th Web as CorpusWorkshop (WAC-4). Can we beat Google? (pp. 47–54).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free