Unsupervised classification of verb noun multi-word expression tokens

Mona T. Diab; Madhav Krishna

Conference Proceedings

Unsupervised classification of verb noun multi-word expression tokens

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5449 LNCS 98-110

DOI: 10.1007/978-3-642-00382-0_8

5Citations

10Readers

Get full text

Abstract

We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges uponthe assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is measured by contextual overlap. To this end, we set out to explore different contextual variations and different similarity measures. We also identify a new data set OPAQUE that comprises only non-decomposable VNC expressions. Our approach yields state of the art performance with an overall accuracy of 77.56% on a TEST data set and 81.66% on the newly characterized data set OPAQUE. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Diab, M. T., & Krishna, M. (2009). Unsupervised classification of verb noun multi-word expression tokens. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5449 LNCS, pp. 98–110). https://doi.org/10.1007/978-3-642-00382-0_8

Unsupervised classification of verb noun multi-word expression tokens

Abstract

Cite

Register to see more suggestions