Approximation algorithms for data placement problems

  • Baev I
  • Rajaraman R
  • 3


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


We develop approximation algorithms for the problem of placing replicated data in arbitrary networks, where the nodes may both issue requests for data objects and have capacity for storing data objects, so as to minimize the average data-access cost. We introduce the data placement problem to model this problem. We have a set of caches F, a set of clients D, and a set of data objects O. Each cache i can store at most ui data objects. Each client j ∈ D has demand dj for a specific data object o(j) ∈ O and has to be assigned to a cache that stores that object. Storing an object o in cache i incurs a storage cost of fo i, and assigning client j to cache i incurs an access cost of djcij . The goal is to find a placement of the data objects to caches respecting the capacity constraints, and an assignment of clients to caches, so as to minimize the total storage and client access costs. We present a 10-approximation algorithm for this problem. Our algorithm is based on rounding an optimal solution to a natural LP-relaxation of the problem. One of the main technical challenges encountered during rounding is to preserve the cache capacities while incurring only a constant factor increase in the solution cost. We also introduce the connected data placement problem, to capture settings where write requests are also issued for data objects, so that one requires a mechanism to maintain consistency of data. We model this by requiring that all caches containing a given object be connected by a Steiner tree to a root for that object, which issues a multicast message upon a write to (any copy of) that object. The total cost now includes the cost of these Steiner trees. We devise a 14 approximation algorithm for this problem. We show that our algorithms can be adapted to handle two variants of the problem: (a) a k median variant, where there is a specified bound on the number of caches that may contain a given object; (b) a generalization where objects have lengths and the total length of the objects stored in any cache must not exceed its capacity.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Ivan Baev

  • R Rajaraman

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free