Coupling a database and a parallel-programming framework reduces the I/O overhead between them. However, there will be serious issues such as memory bandwidth limitations, load imbalances, and race conditions. Existing frameworks such as MapReduce do not resolve these problems because they adopt flat parallelization, i.e., partitioning a task without regard to its structure. In this paper, we propose a recursive divide-and-conquer-based method for spatial databases which supports high-throughput machine learning. Our approach uses a tree-based task structure, which improves the reference locality, and load balancing is realized by setting the grain size of tasks dynamically. Race conditions are also avoided.We applied our method to the task of learning a hierarchical Poisson mixture model. The results show that our approach achieves strong scalability and robustness against load-imbalanced datasets.
CITATION STYLE
Kawakatsu, T., Kinoshita, A., Takasu, A., & Adachi, J. (2015). Highly efficient parallel framework: A divide-and-conquer approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9262, pp. 162–176). Springer Verlag. https://doi.org/10.1007/978-3-319-22852-5_15
Mendeley helps you to discover research relevant for your work.