Language Modeling from Scratch
BPE Few-Shot/Zero Shot Generalization Scaling Laws - parameters, data, training time, result in linear log-log curves with loss
Last Reviewed: 6/1/24
BPE Few-Shot/Zero Shot Generalization Scaling Laws - parameters, data, training time, result in linear log-log curves with loss
Last Reviewed: 6/1/24