Sparse matrix-vector multiplication is the kernel for many scientific computations. Parallelizing this operation requires the matrix to be divided among processors. This division is commonly phrased in terms of graph partitioning. Although this abstraction has proved to be very useful, it has significant flaws and limitations. The cost model implicit in this abstraction is only a weak approximation to the true cost of the parallel matrix-vector multiplication. And the graph model is unnecessarily restrictive. This paper will detail the shortcomings of the current paradigm and suggest directions for improvement and further research.
Hendrickson, B. (1998). Graph partitioning and parallel solvers: Has the emperor no clothes? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1457 LNCS, pp. 218–225). Springer Verlag. https://doi.org/10.1007/bfb0018541