A. Proof of Thm. 4.34

  • Cao F
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We provide a brief proof sketch due to space constraints. We make the same assumptions as [8, 11], and they are as below. Assumption 1. Local objectives are smooth, i.e., krf k (w 1) rf k (w 2)k  Lkw 1 w 2 k, 8 w 1 , w 2 , k and some L > 0. Assumption 2. Global objective is lipschitz, i.e., kf (w 1) f (w 2)k  L p kw 1 w 2 k, 8 w 1 , w 2 and some L p > 0. Assumption 3. Client's stochastic gradients are unbiased, i.e., E[g k (w)] = rf k (w), 8 k, w. Assumption 4. Local models have bounded gradient variance, i.e., Ekg k (w) rf k (w)k 2  2 , 8 k, w. Assumption 5. The gradients from clients do not deviate much from the global model, i.e., krf (w)rf k (w)k 2  ✏ 2 , 8 k, w. Assumption 6. Time independent gradients, i.e., E h g (t1) k g (t2) k i = E h g (t1) k i E h g (t2) k i , 8 t 1 6 = t 2. Assumption 7. Client independent gradients, i.e., E h g (t1) k1 g (t2) k2 i = E h g (t1) k1 i E h g (t2) k2 i , 8 k 1 6 = k 2 and any t 1 , t 2. Proof. Since we enforce sparse structure found in previous iterations during client training and do not allow parameters to resurrect, we only need to demonstrate convergence of the average over rf (w (t)) m (t) terms. Our proof technique is similar to previous approaches that have demonstrated convergence for federated learning under di↵erent scenarios [9, 11, 18]. Considering E ⇥ f w (t+1) m (t) f w (t) ⇤ we get-E ⇥ f (w t+1 m t) f (w t) ⇤  E ⌦ rf (w t), w t+1 m t w t ↵ + L 2 E w t+1 m t w t 2 (A.1) Considering the first term from above, E ⌦ rf (w t), w t+1 m t w t ↵ = ⌘E * rf (w t), 1 N N X k=1 S1 X i=0 g t,i k m t + = ⌘E * rf (w t) m t , 1 N N X k=1 S1 X i=0 rf k (w t,s k) m t + = ⌘ rf (w t) m t 2 ⌘ 1 N N X k=1 1 S S1 X i=0 rf k (w t,i k) 2 + ⌘ rf (w t) m t 1 N N X k=1 1 S S1 X i=0 m t rf k (w t,i k) 2  ⌘ m t rf (w t) 2 ⌘ NS N X k=1 S1 X i=0 m t rf k (w t,i) 2 + ⌘L 2 NS N X k=1 S1 X i=0 w t w t,i k 2 (A.2) For the second term in Eq. A.1, we can establish by using assumptions 4-7 that, E w t+1 m t w t 2 = E 1 N N X k=1 S1 X i=0 m t g t,i k 2  S 2 + S N N X k=1 S1 X i=0 E m t rf k (w t,i) 2 (A.3) By repeating analysis similar to lemma 10 from [9], we can obtain the below result. E w t,i w t 2  16⌘ 2 S 2 m t rf (w t) 2 + 16⌘ 2 S 2 ✏ 2 + 4⌘ 2 S 2 (A.4)

Cite

CITATION STYLE

APA

Cao, F. (2004). A. Proof of Thm. 4.34 (pp. 171–176). https://doi.org/10.1007/978-3-540-36392-7_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free