Distributed balanced partitioning via linear embedding

33Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.

Abstract

Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems: in some cases, a big graph is chopped into pieces that fit on one machine to be processed independently before stitching the results together, leading to certain suboptimality from the interaction among different pieces. In other cases, links between different parts may show up in the running time and/or network communications cost, hence the desire to have small cut size. We study a distributed balanced partitioning problem where the goal is to partition the vertices of a given graph into k pieces, minimizing the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks, e.g., MapReduce. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, e.g., via a hierarchical clustering or Hilbert curves. Then we apply four different techniques such as local swaps, minimum cuts on partition boundaries, as well as contraction and dynamic programming. Our empirical study compares the above techniques with each other, and to previous work in distributed algorithms, e.g., a label propagation method [34], FENNEL [32] and Spinner [23]. We report our results both on a private map graph and several public social networks, and show that our results beat previous distributed algorithms: we notice, e.g., 15-25% reduction in cut size over [34]. We also observe that our algorithms allow for scalable distributed implementation for any number of partitions. Finally, we apply our techniques for the Google Maps Driving Directions to minimize the number of multi-shard queries with the goal of saving in CPU usage. During live experiments, we observe an ≈ 40% drop in the number of multi-shard queries when comparing our method with a standard geography-based method.

References Powered by Scopus

What is Twitter, a social network or a news media?

5138Citations
N/AReaders
Get full text

A fast and high quality multilevel scheme for partitioning irregular graphs

4180Citations
N/AReaders
Get full text

Pregel: A system for large-scale graph processing

2935Citations
N/AReaders
Get full text

Cited by Powered by Scopus

AnnexML: Approximate nearest neighbor search for extreme multi-label classification

126Citations
N/AReaders
Get full text

Spinner: Scalable graph partitioning in the cloud

68Citations
N/AReaders
Get full text

Multi-dimensional balanced graph partitioning via projected gradient descent

17Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Aydin, K., Bateni, M. H., & Mirrokni, V. (2016). Distributed balanced partitioning via linear embedding. In WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining (pp. 387–396). Association for Computing Machinery, Inc. https://doi.org/10.1145/2835776.2835829

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 20

74%

Researcher 5

19%

Professor / Associate Prof. 1

4%

Lecturer / Post doc 1

4%

Readers' Discipline

Tooltip

Computer Science 29

85%

Engineering 2

6%

Mathematics 2

6%

Psychology 1

3%

Save time finding and organizing research with Mendeley

Sign up for free