Mason Wang

Training

Batch Size

LR

Batch size

Gradient Clipping

very commonly used. 1.0 can help with functions with exploding gradients

My thoughts

Last Reviewed: 5/1/25