A Short Introduction to Entropy, Cross-Entropy and KL-Divergence - YouTube
Motivation. Suppose a weather station is sending you information about the current weather. There is 50% chance of sun, and 50% chance of rain. Then the weather station can send you just one bit of information to sum this information: 1 if sunny, 0 if rainy.
def. Shannon Information.1 Given a random variable , the information given in a particular realization of is:
where , the base, determines the units (either bits when or nats when )
def. Shannon Entropy.2 Given a random variable , the entropy of this random variable is:
Intuition. Entropy is the average amount of information transmitted in total.
Example. In our weather station example, is the random variable:
When , transmitted is of information; same when is transmitted. Entropy of is:
which matches our intution that entropy is the average amount of information transmitted.
def. Cross Entropy.