Seminars 2008 — Abstracts
Friday, October 10
Speaker:
Gauri Datta,
Georgia
Title:
Bayesian model selection for count data: Poisson or zero-inflated Poisson?
Abstract: Count data are often encountered in many studies, most frequently in disease modeling. The Poisson distribution, which is usually adopted to describe a model for such dataset, sometimes does not work well in presence of many zeros in the data. To account for excessive zeros in count data, a zero-inflated Poisson (ZIP) distribution is suggested in the literature. A ZIP distribution is a mixture of a standard Poisson distribution and a degenerate distribution at zero, with a mixing probability p. The ZIP distribution has been used both for independent and identically distributed (i.i.d.) observations and for non-i.i.d. observations where suitable auxiliary variables are available to model the mean. In the latter case, which is referred to as a ZIP regression model, each count is assumed to have a different distribution depending on some explanatory variable(s) where suitable generalized linear models are fitted to the Poisson parameter and/or to the mixing probability. Although there are a number of frequentist papers discussing statistical inference for such models, Bayesian contribution to this problem is limited. In this talk, we propose a Bayesian solution to this problem. Treating the problem as a model selection problem, we rewrite the ZIP model as a mixture of a zero-truncated Poisson distribution and a degenerate distribution at zero. We justify an objective prior for the new parameters. Using this prior and the standard Jeffreys’ prior for the Poisson mean we obtain the Bayes factor for the ZIP model versus the standard Poisson model. We apply our method to several examples and propose a practical extension to the regression case.