## Navorsingsbelangstellings

**Visualisation of multi-dimensional data and biplots**

This well-established research group collaborates with several international experts in the field, as well as with Industry in South Africa. New students join the group continually to proceed with their master's or doctoral studies. Since 1996 the group has already produced 30 Masters and 10 PhD graduates. Several additional registered postgraduate students are currently engaged with research in visualisation of multi-dimensional data and related biplots. As a result, numerous R packages and chunks of R code for biplots have been developed by many different individuals. The research group was awarded the EMS faculty's elite research grant for 2020, specifically to collate and enhance the capabilities of this existing software with new R packages like Plotly and Shiny. The aim is to provide a coherent, user friendly visualisation package for researchers and application. The application of biplot methodology to data sets originating from diverse fields of application such as archaeology, ecology, psychology, accountancy, chemistry, wood science, health sciences and industry stimulates the development of new theory and procedures which in turn set the scene for subsequent theoretical research. *NJ le Roux, S Lubbe, CJ van der Merwe & J Nienkemper-Swanepoel*

**Saddlepoint approximations in Extreme Value Theory**

Saddlepoint approximations have been applied successfully in many areas of statistics (as well as in other sciences, e.g. physics, applied mathematics and engineering). However, very little work has been done on applying the saddlepoint in Extreme Value Theory (EVT). In recent research the authors have applied it to approximating the distribution of the Hill estimator, the well-known estimator for the extreme value index (EVI). The approximation in that case was extremely accurate. Further research is now being carried out in which the saddlepoint is applied to other estimators of the EVI as well as to estimators of other relevant EVT parameters e.g. quantiles. The saddlepoint will also be used to find improved confidence intervals for these parameters. **S Buitendag, T de Wet and J Beirlant (KUL, Belgium)**

**Repairable systems in Reliability: Bayesian Approaches**

Research on repairable systems and their evaluation of their performance in terms of reliability and availability. Multi-unit systems are investigated. A Bayesian method of assessing reliability is of primary interest, since very little is published on this topic.* PJ Mostert, VSS Yadavalli (University of Pretoria) and A Bekker (University of Pretoria)*

**Novelty detection using Extreme Value Theory**

Novelty detection is a branch of statistics that concerns detecting deviations from the expected normal behaviour. It is generally the case that the anomalous events have catastrophic financial or social impacts and, therefore, only occur rarely. Consequently, the broad approach is to construct a model representing the normal behaviour of the underlying system. New observations are then tested against this model of normality. One approach to discriminate between expected and anomalous observations is to threshold the model of normality probabilistically. This method has the advantage that the certainty in discriminating between normal and novel observations is quantified. Recently, an approach based on extreme value theory has been formulated to threshold the model representing the normal state. Under the assumption that the probability density function of the variables in their normal state is well defined, extreme value theory is utilised to derive a limiting distribution for the minimum probability density of the data. A significant advantage that this approach inherits is the ability to perform novelty detection in multimodal and multivariate data spaces. Further research is now being carried out in which the theory of second order regular variation is used to determine the rate of convergence of the extreme value-based novelty detection algorithm. This research extends current models by using the lower order statistics of the probability density values to approximate the limiting distribution of the minimum probability density. Consequently, the extreme value distribution is approximated by using more information than only the sample of minima.* ML Steyn and T de Wet*

**Application of measures of divergence in statistical inference**

In the literature a large number of measures of divergence between two probability distributions have been proposed such as Pearson's chi-square divergence, power density divergence, phi measures of divergence and many others. Some of these measures have been applied to particular areas of statistical inference, but some areas have either been “avoided" or “neglected" by only receiving marginal attention. In this research divergence measures are applied to some of these areas, including goodness-of-fit tests and extreme value theory. In particular the class of f-divergences is studied, extended and applied to such problems. **T de Wet and F Österreicher (University of Salzburg)**

**Campanometry**

Campanology, the study of bells, bell-casting and bell-ringing, is quite an old discipline. A major aspect of camponology is clearly the sound produced and thus the discipline is traditionally closely tied to physics, and in particular to acoustics, the scientific study of sound and sound waves. In contrast to this, the study of the quantitative or statistical aspects of bells and their properties is much more recent. The term Campanometry was coined for this multi-disciplinary field of music, history, mathematics/statistics, acoustics and metallurgy. The famous campanologist Andre Lehr (1929 – 2007) is credited as the founder of Campanometry. Of particular interest is the measurement and statistical study of the different partials of bells and carillons. Since bells are usually tuned in order to have their partials at the ideal values, the deviations from these ideal values supply important information on the sound quality of a bell. Furthermore, measurements on their physical properties also provide useful data for analyses. In this research bells in the Western Cape are identified, pictured and measured physically and acoustically and the information is stored in the SUNDigital Collections, the digital heritage repository of the Stellenbosch University Library. The data obtained thus obtained is used to statistically model different aspects of bells and carillons, inter alia to what extent they comply with certain standard design criteria and to statistically analyse the sound quality of the bells. Furthermore, using statistical classification techniques, bells in the database of unknown founders can be classified as being founded by a particular founder. The latter is analogous to the statistical classification of unknown authors of manuscripts. **T de Wet, PJU van Deventer and JL Teugels (KUL, Belgium)**

**Statistical inference of complex survey data**

Most survey data analysed in practice originate from non-simple random sampling (non-SRS) designs. These designs typically combine different sampling methods, such as stratification and cluster sampling. This is known as complex sampling, a technique employed to ensure that the sample collected represents the target population as closely as possible. This project extends our previous research in the field of complex sampling to develop models and methods of analysis to account for the complex design of non-SRS multivariate data, a highly unexplored area. The newly developed models and methods will be evaluated in two ways, using simulated hierarchical data, such that the evaluation can be carried out under controllable circumstances, as well as using real-world data to ensure that the developed models account for real-world anomalies. **R Luus (UWC), A Neethling (SU and UFS) and T de Wet**

**Bayesian analysis of cancer survival data using the lifetime model**

Bayes estimators for some of the lifetime distribution parameters, such as the mean survival time, the hazard function and the survival distribution function are derived for survival data from various lifetime models. The estimators are derived using a selection of loss functions. The survival data are normally censored and the theory is based on right-censored data – other types of censoring are also investigated – non-parametrically and parametrically. Various types of prior distribution are used in this study. **PJ Mostert, JJJ Roux (University of South Africa) and A Bekker (University of Pretoria)**

**Forecasting by identification of linear structure in a time series**

Forecasting is an important and difficult problem in time series analysis. Traditional methods are based on fitting a model to the available data, and extrapolating to future time points. An example is the class of Box-Jenkins models. In this research a new model-free approach is investigated, based on the principal components structure of the so-called time-delay matrix. **H Viljoen**

**Analysis of the performance of actuarial science students**

Studies have been carried out to better understand the performance of actuarial science students both in the various university modules as well as in the examinations of the actuarial profession. Performance has been analysed by degree programme, investigating the time taken to graduate, the number of exemptions from the profession's examinations obtained, the programmes to which students who leave the actuarial programme migrate, and the influence on performance of factors such as school mathematics, language, etc. The perceptions of students on the factors which lead to success has been investigated. The performance of students in the professions examinations has also been investigated, taking into account the university attended, gender, race, examination centre, etc. **PG Slattery**

* *