Mason Wang

Distributed Training

DDP Syncs Gradients.
Then each optimizer updates their copy of the weights

Last Reviewed: 4/28/25