Mason Wang
Distributed Training
DDP Syncs Gradients.
Then each optimizer updates their copy of the weights
Last Reviewed: 4/28/25