top of page
Search
Writer's pictureCameron Gordon

Telephone and the Data Processing Inequality

A basic rule of mathematics is that it should seek to illuminate, not obscure. A complicated concept should be understandable by a grandmother or a preschooler. In this spirit here is a primer on one of the most powerful concepts in information theory - the data processing inequality - using the game of Telephone.


In Telephone a message is transmitted (e.g. "This is a farcical example") through a line of people, each relying on their best interpretation of the whisper of the person before them. After a few people, the message begins to get garbled "This is a fart icicle sample", and before long the message turns to complete nonsense.*


Why does this happen? It turns out that we can explain this through the data processing inequality. In short: information retained from the original message can only decrease as the number of people increases.


A line of people is a Markov chain - a series of nodes in which each node's state depends only on the node directly before it. The message can be described by the information it contains. The information that any two nodes contain that is the same is known as their mutual information.


As each node depends only on the previous for the message (which is transmits as faithfully as it can) information can be lost. The mutual information between person 1 and person N is less than or equal to that between person 1 and 2.

The mutual information between person 1 and 2 is "This is a farcical _ample." This worsens as the chain lengthens.

This is the data processing inequality. More formally, for a Markov Chain X -> Y -> Z and denoting the mutual information be two nodes A, B as I(A; B), we have I(X;Y) ≥ I(X;Z).


The data processing inequality is extremely useful. In telecommunications it describes how signal degrades over a transmission line and requires boosting. In business it show why going direct to the source is often more useful than dealing with a line of intermediaries. In neural networks, it helps explain the vanishing gradient problem in very deep networks. And in organisational structures - a topic I'll return to in a later post - it explains why important information can be lost between ground level staff and board-level management.


* For reasons unknown to humanity with a long enough line the final message will invariably end with the phrase "Purple monkey dishwasher".

79 views0 comments

Comments


bottom of page