Coded Sequential Matrix Multiplication for Straggler Mitigation

3Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this work, we consider a sequence of J matrix multiplication jobs which needs to be distributed by a master across multiple worker nodes. For iϵ \1,2,J , job- i begins in round- i and has to be completed by round- (i+T). In order to provide resiliency against slow workers (stragglers), previous works focus on coding across workers, which is the special case of T=0. We propose here two schemes with T > 0 , which allow for coding across workers as well as the dimension of time. Our first scheme is a modification of the polynomial coding scheme introduced by Yu et al. and places no assumptions on the straggler model. Exploitation of the temporal dimension helps the scheme handle a larger set of straggler patterns than the polynomial coding scheme, for a given computational load per worker per round. The second scheme assumes a particular straggler model to further improve performance (in terms of encoding/decoding complexity). We develop theoretical results establishing (i) optimality of our proposed schemes for certain classes of straggler patterns and (ii) improved performance for the case of i.i.d. stragglers. These are further validated by experiments, where we implement our schemes to train neural networks.

References Powered by Scopus

Algorithm-Based Fault Tolerance for Matrix Operations

933Citations
N/AReaders
Get full text

Speeding Up Distributed Machine Learning Using Codes

642Citations
N/AReaders
Get full text

MPI for Python

298Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Accelerating Neural BP-Based Decoder Using Coded Distributed Computing

1Citations
N/AReaders
Get full text

Joint Dynamic Grouping and Gradient Coding for Time-Critical Distributed Machine Learning in Heterogeneous Edge Networks

1Citations
N/AReaders
Get full text

DRGC: A Dynamic Redundant Gradient Coding method

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Nikhil Krishnan, M., Hosseini, E., & Khisti, A. (2021). Coded Sequential Matrix Multiplication for Straggler Mitigation. In IEEE Journal on Selected Areas in Information Theory (Vol. 2, pp. 830–844). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/JSAIT.2021.3104970

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

80%

Professor / Associate Prof. 1

20%

Readers' Discipline

Tooltip

Computer Science 3

50%

Engineering 3

50%

Save time finding and organizing research with Mendeley

Sign up for free