PAC Bayes
Countable hypothesis bound.
\[R(h) \leq \hat{R}(h) + \Delta \sqrt{\frac{\log \frac{1}{P(h)} + \log \frac{1}{\delta}}{2n}}\]for a bad prior, the trained model doesn’t really fit it, making the bound vacuous.
- decide priors
- train a model
- measure empirical risk
- calculate P(h), decide gamma, plug in.
Why does pretraining work so well?
fine-tuning
maybe D
Last Reviewed 10/7/25