Skip to content | Change text size
 

Seminars 2009 — Abstracts

Friday, May 29


Speaker: David Pitt, Melbourne

Title: Model selection and workers’ compensation claim frequency analysis

Abstract: We consider a set of workers’ compensation insurance claim data from the USA where the aggregate number of losses (claims) reported to insurers are classified by year of occurrence of the event causing loss to the insurer, the state in which the loss event occurred and the occupation class of the insured workers. A measure of exposure, being equal to the total payroll of observed workers in each three-way classification, is also included in the dataset. The data employed in this paper include aggregate losses for ten different US states, 25 different occupation classes and seven separate observation years. A regression analysis, using indicator variables for each of the occupation classes and each of the states, a linear term for year and an intercept can be estimated in ways – there are more than 17 billion different possible models! In addition, one would anticipate that the number of claims recorded in each year in the same state and relating to the same occupation class, are positively correlated. Different modelling assumptions as to the nature of this correlation must also be considered. On the other hand it may reasonably be assumed that the number of losses reported from different states and from different occupation classes are independent. Our data can therefore be modelled using the statistical techniques applicable to panel data and we work with generalised estimating equations (GEE) in the paper. Model selection in this framework of statistical analysis cannot proceed using the usual Akaike’s Information Criterion (AIC) or commonly performed likelihood ratio tests. Pan (2001) suggested the use of an alternative to the AIC, namely the quasi-likelihood information criterion (QIC), for model comparison. This paper investigates the use of a Gibbs sampling algorithm for efficiently locating, out of the more than 17 billion possible models that could be considered for the analysis, that model with the optimal (least) QIC value. The technique is illustrated using both a simulation study and using workers’ compensation insurance claim data.