An unsupervised method for part-of-speech discovery is presented whose aim is to induce a system of word classes by looking at the distributional properties of words in raw text. Our assumption is that the word pair consisting of the left and right neighbors of a particular token is characteristic of the part of speech to be selected at this position. Based on this observation, we cluster all such word pairs according to the patterns of their middle words. This gives us centroid vectors that are useful for the induction of a system of word classes and for the correct classification of ambiguous words.
CITATION STYLE
Rapp, R. (2007). Part-of-speech discovery by clustering contextual features. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 627–634). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-540-70981-7_72
Mendeley helps you to discover research relevant for your work.