Mason Wang

Momentum, RMSProp, Adam

Notes from “A visual explanation”

Momentum in Physics - F = ma, a force will cause a constant change in velocity. Same as momentum in ML - momentum = velocity, forces = decay (friction), and the additional gradient derivative = applying a force for one time frame, leading to an acceleration (change in velocity) momentum helps with plateaus and local minima

AdaGrad - history of squared gradients for a direction accumulate, updates in that direction are divided by this = encourages exploration in directions where not many changes have happened

RMSProp - squared gradients decay, squared gradients have momentum

Adam - gradients have momentum, so do squared gradients.

Notes from Andrew NG: Momentum cancels oscillations Corrections are usually applied to Adam so things get rolling earlier Last Reviewed: 11/9/24