Modelling the lexicon in unsupervised part of speech induction

Greg Dubbin; Phil Blunsom

Conference ProceedingsOPEN ACCESS

Modelling the lexicon in unsupervised part of speech induction

14th Conference of the European Chapter of the Association for Computational Linguistics 2014, EACL 2014 (2014) 116-125

DOI: 10.3115/v1/e14-1013

0Citations

78Readers

Abstract

Automatically inducing the syntactic partof- speech categories for words in text is a fundamental task in Computational Linguistics. While the performance of unsupervised tagging models has been slowly improving, current state-of-the-Art systems make the obviously incorrect assumption that all tokens of a given word type must share a single part-of-speech tag. This one-tag-per-type heuristic counters the tendency of Hidden Markov Model based taggers to over generate tags for a given word type. However, it is clearly incompatible with basic syntactic theory. In this paper we extend a state-ofthe- Art Pitman-Yor Hidden Markov Model tagger with an explicit model of the lexicon. In doing so we are able to incorporate a soft bias towards inducing few tags per type. We develop a particle filter for drawing samples from the posterior of our model and present empirical results that show that our model is competitive with and faster than the state-of-the-Art without making any unrealistic restrictions. © 2014 Association for Computational Linguistics.

References Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Dubbin, G., & Blunsom, P. (2014). Modelling the lexicon in unsupervised part of speech induction. In 14th Conference of the European Chapter of the Association for Computational Linguistics 2014, EACL 2014 (pp. 116–125). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/e14-1013

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 22

63%

Researcher 9

26%

Lecturer / Post doc 3

Professor / Associate Prof. 1

Readers' Discipline

Computer Science 34

79%

Linguistics 7

16%

Neuroscience 1

Social Sciences 1

Modelling the lexicon in unsupervised part of speech induction

Abstract

References Powered by Scopus

Particle Markov chain Monte Carlo methods

CoNLL-X shared task on multilingual dependency parsing

A hierarchical Bayesian language model based on Pitman-Yor processes

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline