BERT
Mask tokens
tries to predict masked tokens.
before chatGPT, most popular representation learning method, for two years
(less efficient than next-token prediction)
Last Reviewed: 10/25/2025
Mask tokens
tries to predict masked tokens.
before chatGPT, most popular representation learning method, for two years
(less efficient than next-token prediction)
Last Reviewed: 10/25/2025