KL-Divergence
Asymmetric DKL(P||Q) - symbols are drawn from P, but if we encode assuming drawn from Q, how many extra bits on expectation are used Ex~p[log(P(x))-log(Q(x))] Asymmetric b/c depends on which distribution you’re sampling from - two examples (Should read more)
how much more suprised you’d be seeing P while expecting Q Last Reviewed: 10/27/24 Reference Page: #2