The use of the latent factor models technique overcomes two major problems of most collaborative filtering approaches: scalability and sparseness of the user's profile matrix. The most successful realizations of latent factor models are based on matrix factorization. Among the algorithms for matrix factorization, alternating least squares (ALS) stands out due to its easily parallelizable computations. In this work we propose a methodology for comparing the performance of two parallel implementations of the ALS algorithm, one executed with MapReduce in Apache Hadoop framework and another executed in Apache Spark framework. We performed experiments to evaluate the accuracy of generated recommendations and the execution time of both algorithms, using publicly available datasets with different sizes and from different recommendation domains. Experimental results show that running the recommendation algorithm on Spark framework is in fact more efficient, once it provides in-memory processing, in contrast to Hadoop's two-stage disk-based MapReduce paradigm.
CITATION STYLE
Meira, D., Viterbo, J., & Bernardini, F. (2018). An experimental analysis on scalable implementations of the alternating least squares algorithm. In Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, FedCSIS 2018 (pp. 351–359). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.15439/2018F166
Mendeley helps you to discover research relevant for your work.