Mason Wang

PAC Bayes

Countable hypothesis bound.

\[R(h) \leq \hat{R}(h) + \Delta \sqrt{\frac{\log \frac{1}{P(h)} + \log \frac{1}{\delta}}{2n}}\]

for a bad prior, the trained model doesn’t really fit it, making the bound vacuous.

  1. decide priors
  2. train a model
  3. measure empirical risk
  4. calculate P(h), decide gamma, plug in.

Why does pretraining work so well?

fine-tuning

maybe D

Last Reviewed 10/7/25