Automatically identifying the source words of lexical blends in english

N/ACitations
Citations of this article
119Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Newly coined words pose problems for natural language processing systems because they are not in a system's lexicon, and therefore no lexical information is available for such words. A common way to form new words is lexical blending, as in cosmeceutical, a blend of cosmetic and pharmaceutical. We propose a statistical model for inferring a blend's source words drawing on observed linguistic properties of blends; these properties are largely based on the recognizability of the source words in a blend. We annotate a set of 1,186 recently coined expressions which includes 515 blends, and evaluate our methods on a 324-item subset. In this first study of novel blends we achieve an accuracy of 40% on the task of inferring a blend's source words, which corresponds to a reduction in error rate of 39% over an informed baseline. We also give preliminary results showing that our features for source word identification can be used to distinguish blends from other kinds of novel words. © 2010 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Cook, P., & Stevenson, S. (2010). Automatically identifying the source words of lexical blends in english. Computational Linguistics, 36(1), 129–150. https://doi.org/10.1162/coli.2010.36.1.36104

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free