總是很高興看到更多的工作擴展 diloco 並減少預訓練的帶寬需求!
Amir Sarfi
Amir Sarfi2025年8月22日
Introducing SparseLoCo: a communication-efficient method for LLM pre-training. TL;DR: We leverage Top-k sparsification + error feedback with DiLoCo’s infrequent outer steps—communicating only 1–3% gradients with 2-bit quantization—outperforming DiLoCo and DeMo. 1/N, ArXiv: Github:
4.96K