The Federalist papers were published anonymously in 1787-88 by Alexander Hamilton, John Jay and James Madison. Of the 77 essays, it is generally agreed that Jay wrote 5, Hamilton 43 and Madison 14. The remaining papers are either jointly written by Hamilton and Madison or by one of the two (not Jay). The problem was solved by Mosteller and Wallace using the Bayesian approach but using Poisson distribution.
We try to go through their approach using a simple Bayes’ rule. Consider paper no. 54. The starting point was the style of writing. Both Hamilton and Madison used similar styles, so it was difficult to get an answer that easily. The authors then looked for the usage of specific words, such as by, from, to, while, whilst, war etc. We take one such word, upon. The frequency distribution of upon collected from a set of papers published by Hamiton and Madison (including the ones outside The Federalist) is given below
Rate / 1000 | H | M |
0 | 0 | 41 |
(0,1] | 1 | 7 |
(1,2] | 10 | 2 |
(2,3] | 11 | |
(3,4] | 11 | |
(4,5] | 10 | |
(5,6] | 3 | |
(6,7] | 1 | |
(7,8] | 1 | |
Total | 48 | 50 |
Bayesian Problem
Let’s the formula the problem using upon as the tag word:
In paper 54, the word upon comes 2 times in the text (in 2004 words). So upon frequency is 0.99.
P(Hamilton|upon = 0.99) = P(upon = 0.99|Hamilton) * P(Hamilton) / [P(upon = 0.99|Hamilton) * (Hamilton) + P(upon = 0.99|Madison) * P(Madison)]
P(upon = 0.99|Hamilton) = (1/48) based on the frequency table
P(upon = 0.99|Madison) = (7/50) based on the frequency table
P(Hamilton) = can be 43/77 based on the existing known data of authorship. But we take 0.5
P(Madison) = can be 14/77 based on the existing known data of authorship. But we take 0.5
P(Hamilton|upon = 0.99 ) = (1/48 * 0.5)/(1/48 * 0.5 + 7/50*0.5) = 0.13 or 13%. Naturally, P(Madison|upon = 0.99) = 1 – P(Hamilton|upon = 0.99) = 87%
The Federalist Papers: No. 54
Inference in an Authorship Problem: Mosteller and Wallance, Journal of American Statistical Association, 58 (302), 275-309