Efficient probabilistic latent semantic analysis through parallelization

Raymond Wan; Vo Ngoc Anh; Hiroshi Mamitsuka

Conference Proceedings

Efficient probabilistic latent semantic analysis through parallelization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5839 LNCS 432-443

DOI: 10.1007/978-3-642-04769-5_38

11Citations

14Readers

Get full text

Abstract

Probabilistic latent semantic analysis (PLSA) is considered an effective technique for information retrieval, but has one notable drawback: its dramatic consumption of computing resources, in terms of both execution time and internal memory. This drawback limits the practical application of the technique only to document collections of modest size. In this paper, we look into the practice of implementing PLSA with the aim of improving its efficiency without changing its output. Recently, Hong et al. [2008] has shown how the execution time of PLSA can be improved by employing OpenMP for shared memory parallelization. We extend their work by also studying the effects from using it in combination with the Message Passing Interface (MPI) for distributed memory parallelization. We show how a more careful implementation of PLSA reduces execution time and memory costs by applying our method on several text collections commonly used in the literature. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Wan, R., Anh, V. N., & Mamitsuka, H. (2009). Efficient probabilistic latent semantic analysis through parallelization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5839 LNCS, pp. 432–443). https://doi.org/10.1007/978-3-642-04769-5_38

Efficient probabilistic latent semantic analysis through parallelization

Abstract

Cite

Register to see more suggestions