A Short Introduction to Entropy, Cross-Entropy and KL-Divergence - YouTube

Motivation. Suppose a weather station is sending you information about the current weather. There is 50% chance of sun, and 50% chance of rain. Then the weather station can send you just one bit of information to sum this information: 1 if sunny, 0 if rainy.

def. Shannon Information.1 Given a random variable , the information given in a particular realization of is:

where , the base, determines the units (either bits when or nats when )

def. Shannon Entropy.2 Given a random variable , the entropy of this random variable is:

Intuition. Entropy is the average amount of information transmitted in total.

Example. In our weather station example, is the random variable:

When , transmitted is of information; same when is transmitted. Entropy of is:

which matches our intution that entropy is the average amount of information transmitted.

def. Cross Entropy.


  1. Information content - Wikipedia

  2. Entropy (information theory) - Wikipedia