Corpus-based Semantic Lexicon Induction with Web-based Corroboration

22Citations
Citations of this article
83Readers
Mendeley users who have this article in their library.

Abstract

Various techniques have been developed to automatically induce semantic dictionaries from text corpora and from the Web. Our research combines corpus-based semantic lexicon induction with statistics acquired from the Web to improve the accuracy of automatically acquired domain-specific dictionaries. We use a weakly supervised bootstrapping algorithm to induce a semantic lexicon from a text corpus, and then issue Web queries to generate co-occurrence statistics between each lexicon entry and semantically related terms. The Web statistics provide a source of independent evidence to confirm, or disconfirm, that a word belongs to the intended semantic category. We evaluate this approach on 7 semantic categories representing two domains. Our results show that the Web statistics dramatically improve the ranking of lexicon entries, and can also be used to filter incorrect entries.

Cite

CITATION STYLE

APA

Igo, S. P., & Riloff, E. (2009). Corpus-based Semantic Lexicon Induction with Web-based Corroboration. In NAACL HLT 2009 - Unsupervised and Minimally Supervised Learning of Lexical Semantics, Proceedings of the Workshop (pp. 18–26). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1641968.1641971

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free