We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug H “ 1, where H is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.
|Original language||English (US)|
|Title of host publication||NeurIPS 2019 Federated Learning Workshop|
|Number of pages||10|
|State||Published - 2020|