We provide a brief proof sketch due to space constraints. We make the same assumptions as [8, 11], and they are as below. Assumption 1. Local objectives are smooth, i.e., krf k (w 1) rf k (w 2)k Lkw 1 w 2 k, 8 w 1 , w 2 , k and some L > 0. Assumption 2. Global objective is lipschitz, i.e., kf (w 1) f (w 2)k L p kw 1 w 2 k, 8 w 1 , w 2 and some L p > 0. Assumption 3. Client's stochastic gradients are unbiased, i.e., E[g k (w)] = rf k (w), 8 k, w. Assumption 4. Local models have bounded gradient variance, i.e., Ekg k (w) rf k (w)k 2 2 , 8 k, w. Assumption 5. The gradients from clients do not deviate much from the global model, i.e., krf (w)rf k (w)k 2 ✏ 2 , 8 k, w. Assumption 6. Time independent gradients, i.e., E h g (t1) k g (t2) k i = E h g (t1) k i E h g (t2) k i , 8 t 1 6 = t 2. Assumption 7. Client independent gradients, i.e., E h g (t1) k1 g (t2) k2 i = E h g (t1) k1 i E h g (t2) k2 i , 8 k 1 6 = k 2 and any t 1 , t 2. Proof. Since we enforce sparse structure found in previous iterations during client training and do not allow parameters to resurrect, we only need to demonstrate convergence of the average over rf (w (t)) m (t) terms. Our proof technique is similar to previous approaches that have demonstrated convergence for federated learning under di↵erent scenarios [9, 11, 18]. Considering E ⇥ f w (t+1) m (t) f w (t) ⇤ we get-E ⇥ f (w t+1 m t) f (w t) ⇤ E ⌦ rf (w t), w t+1 m t w t ↵ + L 2 E w t+1 m t w t 2 (A.1) Considering the first term from above, E ⌦ rf (w t), w t+1 m t w t ↵ = ⌘E * rf (w t), 1 N N X k=1 S1 X i=0 g t,i k m t + = ⌘E * rf (w t) m t , 1 N N X k=1 S1 X i=0 rf k (w t,s k) m t + = ⌘ rf (w t) m t 2 ⌘ 1 N N X k=1 1 S S1 X i=0 rf k (w t,i k) 2 + ⌘ rf (w t) m t 1 N N X k=1 1 S S1 X i=0 m t rf k (w t,i k) 2 ⌘ m t rf (w t) 2 ⌘ NS N X k=1 S1 X i=0 m t rf k (w t,i) 2 + ⌘L 2 NS N X k=1 S1 X i=0 w t w t,i k 2 (A.2) For the second term in Eq. A.1, we can establish by using assumptions 4-7 that, E w t+1 m t w t 2 = E 1 N N X k=1 S1 X i=0 m t g t,i k 2 S 2 + S N N X k=1 S1 X i=0 E m t rf k (w t,i) 2 (A.3) By repeating analysis similar to lemma 10 from [9], we can obtain the below result. E w t,i w t 2 16⌘ 2 S 2 m t rf (w t) 2 + 16⌘ 2 S 2 ✏ 2 + 4⌘ 2 S 2 (A.4)
CITATION STYLE
Cao, F. (2004). A. Proof of Thm. 4.34 (pp. 171–176). https://doi.org/10.1007/978-3-540-36392-7_8
Mendeley helps you to discover research relevant for your work.