Automatically Identifying the Source Words of Lexical Blends in English

  • Cook P
  • Stevenson S
  • 49


    Mendeley users who have this article in their library.
  • 6


    Citations of this article.


Newly coined words pose problems for natural language processing systems because they are not in a system’s lexicon, and therefore no lexical information is available for such words. A common way to form new words is lexical blending, as in cosmeceutical, a blend of cosmetic and pharmaceutical.We propose a statistical model for inferring a blend’s source words drawing on observed linguistic properties of blends; these properties are largely based on the recognizability of the source words in a blend. We annotate a set of 1,186 recently coined expressions which includes 515 blends, and evaluate our methods on a 324-item subset. In this first study of novel blends we achieve an accuracy of 40% on the task of inferring a blend’s source words, which corresponds to a reduction in error rate of 39% over an informed baseline. We also give preliminary results showing that our features for source word identification can be used to distinguish blends from other kinds of novel words.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Get full text


  • Paul Cook

  • Suzanne Stevenson

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free