Using the Web to Overcome Data Sparseness

58Citations
Citations of this article
96Readers
Mendeley users who have this article in their library.

Abstract

This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating that web frequencies and correlate with frequencies obtained from a carefully edited, balanced corpus. We also perform a task-based evaluation, showing that web frequencies can reliably predict human plausibility judgments.

References Powered by Scopus

Introduction to wordnet: An on-line lexical database

2878Citations
N/AReaders
Get full text

Measures of distributional similarity

369Citations
N/AReaders
Get full text

Magnitude estimation of linguistic acceptability

338Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

731Citations
N/AReaders
Get full text

Ontology learning and population from text: Algorithms, evaluation and applications

391Citations
N/AReaders
Get full text

The Oxford Guide to Practical Lexicography

375Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Keller, F., Lapata, M., & Ourioupina, O. (2002). Using the Web to Overcome Data Sparseness. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (pp. 230–237). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1118693.1118723

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 32

59%

Researcher 11

20%

Professor / Associate Prof. 9

17%

Lecturer / Post doc 2

4%

Readers' Discipline

Tooltip

Computer Science 42

74%

Linguistics 11

19%

Social Sciences 2

4%

Agricultural and Biological Sciences 2

4%

Save time finding and organizing research with Mendeley

Sign up for free