## Current projects

*Sugnet Lubbe & Raeesa Ganey*

The Generalised Singular Value Decomposition (GSVD), also termed the quotient SVD, simultaneously decomposes two matrices
**A** and
**B** with an equal number of columns into the product of three matrices each.

**A = UCH**

**B = VSH**

As with the SVD, the matrices U and V are orthonormal and C and S are diagonal. The matrix H is not orthogonal, but non-singular and the same matrix H appears in both the decompositions of A and B.

This project explores different avenues of constructing biplots associated with analyses based on the GSVD.

*Niël le Roux & Sugnet Lubbe*

CVA biplots are useful for visualising group separation and overlap associated with linear discriminant analysis (LDA). Since LDA is based on maximising the between group vs within group variance, the dimension of the canonical space depends on the rank of the between group sums-of-squares-and-cross-products matrix. Assuming more variables than groups, the rank of this matrix is the number of groups minus one. This means that the canonical space reduces to a one-dimensional line in the two-group case. A transformation can be made from the p-dimensional original space to a p-dimensional canonical space, but since all but the first eigenvalue is zero, the second, third, etc. dimensions are not ordered and not uniquely defined. In this project an optimal second dimension is found for a useful 2D CVA biplot.

*Johané Nienkemper-Swanepoel, Sugnet Lubbe &Niël le Roux*

Multiple imputation is a well-established technique for analysing missing data. Multiple imputed data sets are obtained and analysed separately using standard complete data techniques. The estimates from the separate analyses are then combined for inference. However, the exploratory analysis options of multiple imputed data sets are limited. Biplots are regarded as generalised scatterplots which provide a simultaneous configuration of both samples and variables. Therefore, a visualisation for each of the multiple imputed data sets can be constructed and interpreted individually, but in order to formulate an unbiased conclusion, the visualisations have to be appropriately combined for a unified interpretation. The GPAbin technique has been developed to address this problem for multiple correspondence analysis biplots of multiple imputed data sets. Generalised orthogonal Procrustes analysis (GPA) is used to align the biplots before combining them in a mean coordinate matrix. The name GPAbin is derived from the amalgamation of GPA and Rubin’s rules, which are the combining steps used after multiple imputation. Simulation studies have confirmed the usefulness of the GPAbin method for categorical data. In this project the GPAbin methodology is extended to multivariate continuous data for using principal component analysis biplots.

*Carel van der Merwe & Delia Sandilands*

Biplots are useful when visualizing multivariate data. It can, however, sometimes be challenging to interpret, for example when the axes and points cause overcrowding of the plot. This overcrowding is often due to the presence of many variables, highly correlated variables, or merely data sets with a large number of observations. In this paper improvements to the biplot are made to address these shortcomings. These improvements include: i) the automatic parallel translation, or "explosion", of axes, ii) the use of densities on the axes to improve interpretation and representation of large data sets, and iii) introducing interactive biplots via the use of the Plotly package in R. These improvements result in a better composition of the plot to make it seem less crowded, more easily interpretable, offer additional information that can get lost in the case of a high volume of data, and allowing the user to inspect the biplot element-wise. An accompanying Shiny web-based application was also created and is a vailable at
https://carelvdmerwe.shinyapps.io/ExplodingBiplots/.