One of the fundamental drives of Mathematics is to study the world around us. To shed some light on the underlying processes that govern our world. Now sometimes this approach leads to surprising results, what seems like a simple task can turn out to be an impossibility. However sometimes just the opposite can be true.

Unpredictability is by it’s very nature, well, unpredictable. You would have thought that a concept such as unpredictability would have been a nightmare for Mathematicians. A spanner is the works of the air tight proofs and certain knowledge, but not to be beaten by anything in this world Mathematics has found a way to measure unpredictability.

Enter ‘Entropy’, informally defined as a *measure of the unpredictability of information content*, entropy is an important aspect of the branch of Mathematics known as Information Theory. Entropy can be intuitively understand when viewed through the lens of analyzing information. We all know the outcome to the rolling of a die is harder to predict than that of a coin flip. This can be considered a result of the fact that there is more information to consider when rolling a die (six outcomes) as opposed to the relatively simple coin flip scenario (two outcomes). Thus flipping a coin has lower entropy than rolling a die.

Ideas have little place in Mathematics without rigour, so just how do we formally define Entropy? As seems natural, the entropy of a random variable, X, uses some probability notation. The formula is shown below, where H(X) is the entropy, I(X) is the ‘information content’ of X (For now, we can see I(X) as just some fancy terminology for the equation it precedes, indeed information content is sometimes just used as a synonym for Entropy. There are subtle differences however, explained here). P(X) refers to the probability mass function of the random variable X.

Let’s see this formula in action.

Again consider a coin flip, however this time we’ll look at a biased coin. Plotting the results of the entropy function H(X) against the distribution P(X) corresponding to different expected values of heads. We obtain the following graph:

From this graph we can see that H(X) is maximal, equal to one bit (the unit that entropy is measured in) when flipping a fair coin. H(X) is minimal, equal to zero bits when their is only one outcome to each flip. This result makes sense as there is no information to consider when the result is predetermined, leading to zero entropy in the case of a totally biased coin. When flipping a fair coin there is more information to consider added to the fact that both results are equally likely, making it impossible to distinguish between the two options, it is natural that it is in this case that the entropy is maximal.

One application of Information theory is determining how to encode a language into binary form in the most efficient way. Put simply, assigning short sequences of 0’s and 1’s to the most commonly appearing letters or phrases in a language will result in less binary code having to be transmitted. Entropy can be used to measure different languages giving an indication of how hard it is to transmit them. The English language has been shown to require between 0.6 and 1.3 bits per character.

Entropy is a fantastic little tool in Mathematics’ arsenal showing that unpredictability is, well, quite predictable.