Mason Wang

BERT

Mask tokens

tries to predict masked tokens.

before chatGPT, most popular representation learning method, for two years

(less efficient than next-token prediction)

Last Reviewed: 10/25/2025