# Information Theory
Information is -logP(x) for an event Independent events have additive infomration Less likely events have higher Information knowing outcome of an event with 50% prob has 1 bit of information Measured in nats or bits (recall logs of all bases are proportional) 0 information if certain
setup: a bitstream encodes a sequence of random vairables. Prefix requirements impose a cost of 2^l
Think about information as like, what is a signal? you can have a prior distribution about what distribution the signal is drawn from, letting you encode it better. If you process it through neural network layers you will only lose information (expand the distribution of possible signals it could have been).
How much this distribution of possible signals EXPANDs (or how EXPANDED it is) is called entropy.
Last Reviewed: 10/27/24 Reference Sheet #3, 3.1