Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n -gram Lattice

  • Brooke J
  • Šnajder J
  • Baldwin T
N/ACitations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates. In contrast to the standard approach of simple ranking by association measure, in our model n-grams are arranged in a lattice structure based on subsumption and overlap relationships, with nodes inhibiting other nodes in their vicinity when they are selected as a lexical item. We show how the configuration of such a lattice can be optimized tractably, and demonstrate using annotations of sampled n-grams that our method consistently outperforms alternatives by at least 0.05 F-score across several corpora and languages.

Cite

CITATION STYLE

APA

Brooke, J., Šnajder, J., & Baldwin, T. (2017). Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n -gram Lattice. Transactions of the Association for Computational Linguistics, 5, 455–470. https://doi.org/10.1162/tacl_a_00073

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free