Foundations of Multiple-Time-Scale Stochastic Approximation for Fast and Resilient Distributed Optimization Algorithms
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation establishes a theoretical framework for analyzing and designing fast and resilient distributed optimization algorithms for large-scale networks. The central focus is on understanding and leveraging two-time-scale dynamics, which may arise naturally from the underlying network structure or be introduced through algorithmic design. Motivated by challenges in large-scale decentralized networks—such as communication constraints, agent failures, and dynamic participation—this work develops and analyzes four key aspects. First, we study consensus-based gradient methods over networks with a clustered structure, characterized by dense intra-cluster and sparse inter-cluster communication. This setting naturally gives rise to textit{two-time-scale} dynamics. We develop a framework to analyze these dynamics and, using the singular perturbation method, derive new convergence rate results that reveal how local and global interactions influence the scalability and performance of distributed algorithms over clustered networks. Second, we analyze a two-time-scale distributed gradient descent algorithm for networks where agents communicate using quantized information. By averaging the quantized values at a faster time scale and performing gradient updates at a slower time scale, the algorithm effectively mitigates the impact of quantization errors. We introduce a novel analysis and step-size selection strategy that allows us to establish exact convergence to the optimal solution at the optimal rate. Third, we develop a resilient distributed algorithm for networks exposed to malicious agents. By introducing separate update speeds for the gradient estimation and gradient descent steps, the algorithm exhibits a two-time-scale behavior. We show that this design effectively mitigates the impact of both malicious agents and stochastic gradients, while still ensuring exact convergence at the optimal rate. Finally, we study open networks, where agents can dynamically join or leave. By leveraging redundancy in local objectives, we convert the time-varying problem into an effectively static one. This enables us to design both a local stochastic gradient descent algorithm and a gradient-balancing protocol for resource allocation, each of which achieves optimal and exact convergence.