Newly coined words pose problems for natural language processing systems because they are not in a system's lexicon, and therefore no lexical information is available for such words. A common way to form new words is lexical blending, as in cosmeceutical, a blend of cosmetic and pharmaceutical. We propose a statistical model for inferring a blend's source words drawing on observed linguistic properties of blends; these properties are largely based on the recognizability of the source words in a blend. We annotate a set of 1,186 recently coined expressions which includes 515 blends, and evaluate our methods on a 324-item subset. In this first study of novel blends we achieve an accuracy of 40% on the task of inferring a blend's source words, which corresponds to a reduction in error rate of 39% over an informed baseline. We also give preliminary results showing that our features for source word identification can be used to distinguish blends from other kinds of novel words. © 2010 Association for Computational Linguistics.
CITATION STYLE
Cook, P., & Stevenson, S. (2010). Automatically identifying the source words of lexical blends in english. Computational Linguistics, 36(1), 129–150. https://doi.org/10.1162/coli.2010.36.1.36104
Mendeley helps you to discover research relevant for your work.