PAC Bayes

Countable hypothesis bound.

\[R(h) \leq \hat{R}(h) + \Delta \sqrt{\frac{\log \frac{1}{P(h)} + \log \frac{1}{\delta}}{2n}}\]

for a bad prior, the trained model doesn’t really fit it, making the bound vacuous.

Why does pretraining work so well?

fine-tuning

maybe D

Last Reviewed 10/7/25