Using the Web to Obtain Frequencies for Unseen Bigrams

239Citations
Citations of this article
170Readers
Mendeley users who have this article in their library.

Abstract

This article shows that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the Web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between Web frequencies and corpus frequencies; (b) a reliable correlation between Web frequencies and plausibility judgments; (c) a reliable correlation between Web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of Web frequencies in a pseudodisambiguation task.

Cite

CITATION STYLE

APA

Keller, F., & Lapata, M. (2003). Using the Web to Obtain Frequencies for Unseen Bigrams. Computational Linguistics. MIT Press Journals. https://doi.org/10.1162/089120103322711604

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free