Mason Wang

1M-image net

fine tune on small-scale data, which is more limited, lower learning rate.

revolutionary, NLP, speech, robotics.

Data used to be a burden, but now it helps you learn better representations. Revolutionized LLMs and vision.

classificaion is common for pretraining.

don’t need to copy all the layers from pretraining, can randomly initialize

can freeze layers too

can do some network surgery, modify some layers, like add more heads for downstream tasks.

Last Reviewed: 10/25/2025