Monolingual and cross-lingual probabilistic topic models and their applications in information retrieval

3Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Probabilistic topic models are a group of unsupervised generative machine learning models that can be effectively trained on large text collections. They model document content as a two-step generation process, i.e., documents are observed as mixtures of latent topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested into transferring the probabilistic topic modeling concept from monolingual to multilingual settings. Novel topic models have been designed to work with parallel and comparable multilingual data (e.g., Wikipedia or news data discussing the same events). Probabilistic topics models offer an elegant way to represent content across different languages. Their probabilistic framework allows for their easy integration into a language modeling framework for monolingual and cross-lingual information retrieval. Moreover, we present how to use the knowledge from the topic models in the tasks of cross-lingual event clustering, cross-lingual document classification and the detection of cross-lingual semantic similarity of words. The tutorial also demonstrates how semantically similar words across languages are integrated as useful additional evidences in cross-lingual information retrieval models. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Moens, M. F., & Vulić, I. (2013). Monolingual and cross-lingual probabilistic topic models and their applications in information retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7814 LNCS, pp. 874–877). https://doi.org/10.1007/978-3-642-36973-5_106

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free