Cross Entropy
-∫(p(x) log(q(x) dx))
Entropy is: -∫(p(x) log(p(x) dx)) Call this Ent(p, p)
Cross Entropy is: -∫(p(x) log(q(x) dx)) Call this Ent(p, q)
KL Divergence is:
- ∫(p(x) log(q(x) dx)) - (-∫(p(x) log(p(x) dx))) Or ∫(p(x) log(q(x) dx)) + ∫(p(x) log(p(x) dx))
This is Ent(p, q) - Ent(p,p)
When we add KL divergence and entropy, we get cross entropy Cross entropy = number of bits it takes to encode samples from P using an encoding trained on Q Entropy = number of bits it takes to encode samples from P using an encoding trained on P KL Divergence = number of extra bits it takes to encode samples from P using an encoding trained on Q. Or, KL divergence is cross entropy minus entropy. Last Reviewed: 1/20/25