Abstract: Distributed machine learning (DML) has recently experienced widespread application. A major performance bottleneck is the costly communication for gradients synchronization. Recently, ...