A hierarchical Bayesian language model based on Pitman-Yor processes

374Citations
Citations of this article
495Readers
Mendeley users who have this article in their library.

Abstract

We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney. © 2006 Association for Computational Linguistics.

References Powered by Scopus

A Neural Probabilistic Language Model

5166Citations
N/AReaders
Get full text

Hierarchical Dirichlet processes

2601Citations
N/AReaders
Get full text

Gibbs sampling methods for stick-breaking priors

1089Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Bayesian nonparametrics

418Citations
N/AReaders
Get full text

Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling

198Citations
N/AReaders
Get full text

The Handbook of Computational Linguistics and Natural Language Processing

167Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Teh, Y. W. (2006). A hierarchical Bayesian language model based on Pitman-Yor processes. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 985–992). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220175.1220299

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 260

67%

Researcher 84

22%

Professor / Associate Prof. 34

9%

Lecturer / Post doc 9

2%

Readers' Discipline

Tooltip

Computer Science 299

80%

Engineering 31

8%

Mathematics 25

7%

Linguistics 21

6%

Article Metrics

Tooltip
Mentions
References: 1

Save time finding and organizing research with Mendeley

Sign up for free