Inference with multinomial data: why to weaken the prior strength
Cassio de Campos and Alessio Benavoli
This paper considers inference from multinomial data and addresses the problem of choosing the strength of the Dirichlet prior under a mean-squared error criterion. We compare the Maximum Likelihood Estimator (MLE) and the most commonly used Bayesian estimators (such as Laplace, Perks, Jeffreys and Haldane) obtained by assuming a prior Dirichlet distribution with non-informative prior parameters, i.e., the parameters of the Dirichlet are equal and altogether sum up to the so called strength of the prior. In particular, we show that, under mean-squared error, MLE becomes more preferable than the Bayesian estimators at the increase of the number of categories of the multinomial. This happens because non-informative Bayesian estimators do have a region where they are dominant, but such region quickly reduces with the increase in the number of categories. However, this reduction can be avoided if the strength of prior is not kept constant but decreased with the number of categories. We argue that the strength should decrease at the rate of one over the number of categories.