Correspondence analysis and the Cressie-Read family of divergence statistics
Eric Beh and Rosaria Lombardo
The foundations of correspondence analysis rests with Pearson's famous chi-squared statistic and provides the numerical groundwork for visualising how categorical variables are associated. It has been recently shown that the Freeman-Tukey statistic can also play an important role and confirmed the advantages of the Hellinger distance that have long been advocated in the literature. Pearson's and the Freeman-Tukey statistics are two of five commonly used special cases of the Cressie-Read family of divergence statistics. Therefore, correspondence analysis can be expanded so this family lies at the heart of how the association is quantified and visualised. The advantage of using the Cressie-Read family of divergence statistics when performing correspondence analysis is that it includes as special cases two variants that have gained some attention in the literature - the Hellinger distance decomposition (HDD) method and log-ratio analysis (LRA). Expanding correspondence analysis in this way also enables for some general features to be obtained – such as coordinate systems, models of association/correlation, and distance measures – and for flexibility to be considered when defining the “best" and “worst" possible visualisation of the association. This project therefore examines the role of the Cressie-Read family of divergence statistics in the correspondence analysis of a two-way contingency table. Possible extensions to this project include expanding it to the analysis of a multi-way contingency table, examining the impact on the visual display (such as the traditional correspondence plot, or the biplot) and exploring whether asymmetric associations can be incorporated into this framework.