1M-image net
fine tune on small-scale data, which is more limited, lower learning rate.
revolutionary, NLP, speech, robotics.
Data used to be a burden, but now it helps you learn better representations. Revolutionized LLMs and vision.
classificaion is common for pretraining.
don’t need to copy all the layers from pretraining, can randomly initialize
can freeze layers too
can do some network surgery, modify some layers, like add more heads for downstream tasks.
Last Reviewed: 10/25/2025