This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating that web frequencies and correlate with frequencies obtained from a carefully edited, balanced corpus. We also perform a task-based evaluation, showing that web frequencies can reliably predict human plausibility judgments.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Keller, F., Lapata, M., & Ourioupina, O. (2002). Using the Web to Overcome Data Sparseness. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (pp. 230–237). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1118693.1118723