Friendship Paradox

You know, your friends, on average, have more friends than you do! I know it is a bit difficult to swallow that feeling. We will explore it mathematically. On the one hand, it follows from what we have seen before – the inspection paradox and the waiting-time paradox. But we will use a different approach here.

Count your friends

Consider the following relationship tree.

What it means is that A has only one friend (i.e., B) denoted by A(1; B). Similarly B(4; A, C, E, H), C (4; B, D, E, H), D(2; C, H), E(3; B, C, F), F(2; E, G), G(1; F), H(3; B, C, D). So the total number of friends among those eight is 1 + 4 + 4 + 2 + 3 + 2 + 1 + 3 = 20. The average number of friends, therefore, is 20/8 = 2.5.

And their friends

How do we do it? The easier way is to call out each of them and ask how many friends they have. For example, take A: she will ask her only friend, B, to call out her friends. B has four friends (note that it also includes A). Let us represent that as A{B(4)}. Similarly, B{A(1), C(4), E(3), H(3)}, C {B(4), D(2), E(3), H(3)}, D{C(4), H(3)}, E{B(4), C(4), F(2)}, F{E(3), G(1)}, G{F(2)}, H{B(4), C(4), D(2)}. The total number is 60. To calculate the average friends’ friends, you should divide 60 by the friends you calculated earlier, i.e., 20. 60/20 = 3.

So by counting, you prove that the average number of friends (2.5) is smaller than friends’ friends (3). The whole exercise can be summarised in the following table

IndividualFriends
(P)
Friends’ friends
(Q)
Mean
Friends’ friends
(P/Q)
A1 44
B4 112.75
C4123
D273.5
E3103.33
F242
G1 22
H3103.33
Total206023.91

Analytical Proof

Look at the diagram once more. We replace the number of friends that A have with dA (dA represents the degree of the vertex that points from person A).

\\ \text{In other words, the total number of friends here is }, \\\\ d_A + d_B + d_C + d_D + d_E + d_F + d_G + d_H } =  \Sigma{x_i}

which we know is 20. The average number of friends is obtained by dividing this by the total number of individuals, n.

\\ \frac{d_A + d_B + d_C + d_D + d_E + d_F + d_G + d_H}{n} = \frac{\Sigma{x_i}}{n}

Now, we will move to the total friends of friends.

\\ \text{the number of friends that A's friends have = } d_B \\ \\  \text{the number of friends that B's friends have= } d_A + d_C + d_E + d_H \\ \\  \text{the number of friends that C's friends have= } d_B + d_D + d_E + d_H \\ \\  \text{the number of friends that D's friends have= } d_C + d_H \\ \\  \text{the number of friends that E's friends have= } d_B + d_C + d_F\\ \\  \text{the number of friends that F's friends have= } d_E + d_G \\ \\  \text{the number of friends that G's friends have= } d_F \\ \\  \text{the number of friends that H's friends have= } d_B + d_C + d_D \\ \\  \text{Total number of friends that all friends have= }   d_B+ d_A + d_C + d_E + d_H + d_B + d_D + d_E + d_H +  d_C + d_H + d_B + d_C +  d_E + d_G +  d_F + d_B + d_C + d_D = \\ \\ d_A + 4 d_B + 4 d_C + 2 d_D + 3 d_E + 2 d_F + d_G + 3 d_H \\ \\ \text{the average number of friends of friends is = } \frac{d_A + 4 d_B + 4 d_C + 2 d_D + 3 d_E + 2 d_F + d_G + 3 d_H}{d_A + d_B + d_C + d_D + d_E + d_F + d_G + d_H}

which was 60/20 in our case. If you are confused, remember these:

the total number of individuals = n
the avg. number of friends of individuals = total number of friends / total number of individuals
the avg. number of friends of friends = total number of friends of friends / total number of friends

Back to the equations.

\\ \text{the average number of friends of friends = } \\\\ \frac{d_A + 4 d_B + 4 d_C + 2 d_D + 3 d_E + 2 d_F + d_G + 3 d_H}{d_A + d_B + d_C + d_D + d_E + d_F + d_G + d_H}

Look carefully, dA appears once in the numerator, and dA has one friend (B). dB appears four times and dB has four friends, and so on. This is no accident as the number of friends is counted multiple times as they appear in the friends’ friends list. Apply this rule to the equation, and you get.

\\ \text{the average number of friends of friends = } \\\\ \frac{d_A*d_A + d_B*d_B + d_C*d_C + d_D*d_D + d_E*d_E + d_F*d_F + d_G*d_G + d_H*d_H}{d_A + d_B + d_C + d_D + d_E + d_F + d_G + d_H} \\ \\ = \frac{d_A^2 + d_B^2 + d_C^2 + d_D^2 + d_E^2 + d_F^2 + d_G^2 + d_H^2}{d_A + d_B + d_C + d_D + d_E + d_F + d_G + d_H} \\ \\ = \frac{\Sigma{x_i^2}}{\Sigma{x_i}} \\ \\ \text{divide the numerator and the denominator by n} \\ \\  =  \frac{\Sigma{x_i^2}/n}{\Sigma{x_i}/n} \\ \\ \text{add and subtract } (\Sigma{x_i})^2/n^2 \text { at the numerator} \\ \\  =  \frac{[\Sigma{x_i^2}/n +  (\Sigma{x_i})^2/n^2  -   (\Sigma{x_i})^2/n^2]}{\Sigma{x_i}/n} \\ \\ =  \frac{[\Sigma{x_i^2}/n  -   (\Sigma{x_i})^2/n^2]}{\Sigma{x_i}/n} +  \frac{[(\Sigma{x_i})/n]^2}{\Sigma{x_i}/n} \\ \\ =  \frac{[\Sigma{x_i^2}/n  -   (\Sigma{x_i})^2/n^2]}{\Sigma{x_i}/n} + {\Sigma{x_i}/n}

The second term, we know from the earlier section, is the average number of friends of individuals. The first term is nothing but the variance divided by the mean.

The mean number of friends of friends = (mean of friends ) + (variance / mean), which is equal to or greater than the mean of friends.

References

Why do your friends have more friends than you do? The American Journal of Sociology

The friendship paradox: MIT Blossoms