HuBERT

offline clustering step provides aligned labels for BERT-like prediction

apply prediction loss over masked regions, forcing model to learn combined acoustic and language model.

starts with a simple k-means teacher of 100 clusters

improves on wav2vec 2.0

Last Reviewed: 10/31/25