Strategies for dealing with overdispersion in contingency tables when performing correspondence analysis
Eric Beh and Rosaria Lombardo
When a correspondence analysis is applied to a two-way contingency table, it is performed by first decomposing a matrix of standardised residuals using singular value decomposition. The advantage of doing this is that the sum-of-squares of these residuals, and of the squared singular values, is equivalent to Pearson's classic chi-squared statistic. Such residuals, which are treated as being asymptotically normally distributed, arise by assuming that the cell frequencies of the contingency table are Poisson random variables; doing so means that their expectation and variance are equivalent. However there is clear evidence in the statistics literature that suggests that the variance of these residuals exceeds their expectation. Thus, we observe overdispersion in the table. Therefore, this project investigates various strategies can be undertaken to deal with overdispersion and include assuming that the cell counts are from a generalised Poisson, Conway-Maxwell Poisson or negative binomial distribution. Variance stabilising strategies can also be included such as by considering the adjusted standardised residual and the Freeman-Tukey residual. As part of this project, adopting such strategies means that one needs to examine their impact on how to quantify the overall association between the variables, and the interpretation of the low-dimensional visual display that can be generated. Extensions to examining this issue for multiple categorical variables is also under consideration.