Corpus-based Semantic Lexicon Induction with Web-based Corroboration

Sean P. Igo; Ellen Riloff

Conference Proceedings

Corpus-based Semantic Lexicon Induction with Web-based Corroboration

NAACL HLT 2009 - Unsupervised and Minimally Supervised Learning of Lexical Semantics, Proceedings of the Workshop (2009) 18-26

DOI: 10.3115/1641968.1641971

22Citations

93Readers

Get full text

Abstract

Various techniques have been developed to automatically induce semantic dictionaries from text corpora and from the Web. Our research combines corpus-based semantic lexicon induction with statistics acquired from the Web to improve the accuracy of automatically acquired domain-specific dictionaries. We use a weakly supervised bootstrapping algorithm to induce a semantic lexicon from a text corpus, and then issue Web queries to generate co-occurrence statistics between each lexicon entry and semantically related terms. The Web statistics provide a source of independent evidence to confirm, or disconfirm, that a word belongs to the intended semantic category. We evaluate this approach on 7 semantic categories representing two domains. Our results show that the Web statistics dramatically improve the ranking of lexicon entries, and can also be used to filter incorrect entries.

Cite

CITATION STYLE

APA

Igo, S. P., & Riloff, E. (2009). Corpus-based Semantic Lexicon Induction with Web-based Corroboration. In NAACL HLT 2009 - Unsupervised and Minimally Supervised Learning of Lexical Semantics, Proceedings of the Workshop (pp. 18–26). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1641968.1641971

Corpus-based Semantic Lexicon Induction with Web-based Corroboration

Abstract

Cite

Register to see more suggestions