## Research interests

**Accounting fair valuation in the context of sparse data**

The International Accounting Standards Board (IASB) defines, within the International Financial Reporting Standards (IFRS) 13 *Fair Value Measurement*, the fair value for financial instruments as the price that would be received from selling an asset or paid to transfer a liability in an orderly transaction between market participants at the measurement date, i.e. an exit price. The definition of fair value is similar under the Financial Accounting Standards Board's Accounting Standards Codification (ASC) Topic 810 (formerly, Statement of Financial Accounting Standards (SFAS) 157) *Fair Value Measurement*. While there are much published research on how to calculate these fair values in areas where data are readily available, little research on how to calculate the fair values in areas where data are sparse have been done. Through the application of simulation and statistical learning techniques, we construct a framework which can be applied to areas with sparse data, which includes estimation of risk-free rates, credit spreads, and expected exposure for xVA calculation, amongst others in order to calculate the fair value. **CJ van der Merwe, T de Wet, D Heyman (UGent, Belgium)**

**Visualisation of multivariate data and biplots**

This
well established research group collaborates with prof John Gower (UK),
prof Patrick Groenen (The Netherlands), prof Michael Greenacre (Spain)
as well as with Industry in South Africa. New students join the group
continually to proceed with their master's or doctoral studies. Since
2001 the group has already produced 11 Masters and 4 PhD graduates.
Several additional registered master's and PhD students are currently
engaged with research in visualisation of multivariate data and related
biplots. Two invited papers in WIREs Computational Statistics provide
an overview of the biplot methodology developed for quantitative (doi
10.1002/wics.1338) and qualitative (doi 10.1002/wics.1377) data. The
second paper raised some research questions for further development
especially in the area of Fisher's optimal scores (FOS). FOS is
essentially a method dating back to 1938, but revisiting this in the
modern computational age has opened up a whole new plethora of
possibilities to explore. The visualisation of multivariate categorical
data including compositional data is currently a research focus. The
three main objectives of the project are: to extend biplot methodology
for addressing several challenges in visualising multivariate data; to
apply newly derived biplot-based techniques to data sets originating
from various fields of application and lastly, the development of
extensive R collections of functions for constructing all necessary
graphical displays and performing all biplot-based techniques. The
application of biplot methodology to data sets originating from diverse
fields of application such as archaeology, ecology, psychology,
accountancy, chemistry, wood science, health sciences and industry
stimulates the development of new theory and procedures which in turn
set the scene for subsequent theoretical research. **NJ le Roux & S Lubbe**

**Saddlepoint approximations in Extreme Value Theory**

Saddlepoint approximations have been applied successfully in many areas of statistics (as well as in other sciences, e.g. physics, applied mathematics and engineering). However, very little work has been done on applying the saddlepoint in Extreme Value Theory (EVT). In recent research the authors have applied it to approximating the distribution of the Hill estimator, the well-known estimator for the extreme value index (EVI). The approximation in that case was extremely accurate. Further research is now being carried out in which the saddlepoint is applied to other estimators of the EVI as well as to estimators of other relevant EVT parameters e.g. quantiles. The saddlepoint will also be used to find improved confidence intervals for these parameters. **S Buitendag, T de Wet and J Beirlant (KUL, Belgium)**

**Repairable systems in Reliability: Bayesian Approaches**

Research
on repairable systems and their evaluation of their performance in
terms of reliability and availability. Multi-unit systems are
investigated. A Bayesian method of assessing reliability is of primary
interest, since very little is published on this topic.* PJ Mostert, VSS Yadavalli (University of Pretoria) and A Bekker (University of Pretoria)*

**Novelty detection using Extreme Value Theory**

Novelty detection is a branch of statistics that concerns detecting deviations from the expected normal behaviour. It is generally the case that the anomalous events have catastrophic financial or social impacts and, therefore, only occur rarely. Consequently, the broad approach is to construct a model representing the normal behaviour of the underlying system. New observations are then tested against this model of normality. One approach to discriminate between expected and anomalous observations is to threshold the model of normality probabilistically. This method has the advantage that the certainty in discriminating between normal and novel observations is quantified. Recently, an approach based on extreme value theory has been formulated to threshold the model representing the normal state. Under the assumption that the probability density function of the variables in their normal state is well defined, extreme value theory is utilised to derive a limiting distribution for the minimum probability density of the data. A significant advantage that this approach inherits is the ability to perform novelty detection in multimodal and multivariate data spaces. Further research is now being carried out in which the theory of second order regular variation is used to determine the rate of convergence of the extreme value-based novelty detection algorithm. This research extends current models by using the lower order statistics of the probability density values to approximate the limiting distribution of the minimum probability density. Consequently, the extreme value distribution is approximated by using more information than only the sample of minima.* ML Steyn and T de Wet*

**Application of measures of divergence in statistical inference**

In the literature a large number of measures of divergence between two probability distributions have been proposed such as Pearson's chi-square divergence, power density divergence, phi measures of divergence and many others. Some of these measures have been applied to particular areas of statistical inference, but some areas have either been “avoided" or “neglected" by only receiving marginal attention. In this research divergence measures are applied to some of these areas, including goodness-of-fit tests and extreme value theory. In particular the class of f-divergences is studied, extended and applied to such problems. **T de Wet and F Österreicher (University of Salzburg)**

**Campanometry**

Campanology, the study of bells, bell-casting and bell-ringing, is quite an old discipline. A major aspect of camponology is clearly the sound produced and thus the discipline is traditionally closely tied to physics, and in particular to acoustics, the scientific study of sound and sound waves. In contrast to this, the study of the quantitative or statistical aspects of bells and their properties is much more recent. The term Campanometry was coined for this multi-disciplinary field of music, history, mathematics/statistics, acoustics and metallurgy. The famous campanologist Andre Lehr (1929 – 2007) is credited as the founder of Campanometry. Of particular interest is the measurement and statistical study of the different partials of bells and carillons. Since bells are usually tuned in order to have their partials at the ideal values, the deviations from these ideal values supply important information on the sound quality of a bell. Furthermore, measurements on their physical properties also provide useful data for analyses. In this research bells in the Western Cape are identified, pictured and measured physically and acoustically and the information is stored in the SUNDigital Collections, the digital heritage repository of the Stellenbosch University Library. The data obtained thus obtained is used to statistically model different aspects of bells and carillons, inter alia to what extent they comply with certain standard design criteria and to statistically analyse the sound quality of the bells. Furthermore, using statistical classification techniques, bells in the database of unknown founders can be classified as being founded by a particular founder. The latter is analogous to the statistical classification of unknown authors of manuscripts. **T de Wet, PJU van Deventer and JL Teugels (KUL, Belgium)**

**Statistical inference of complex survey data**

Most survey data analysed in practice originate from non-simple random sampling (non-SRS) designs. These designs typically combine different sampling methods, such as stratification and cluster sampling. This is known as complex sampling, a technique employed to ensure that the sample collected represents the target population as closely as possible. This project extends our previous research in the field of complex sampling to develop models and methods of analysis to account for the complex design of non-SRS multivariate data, a highly unexplored area. The newly developed models and methods will be evaluated in two ways, using simulated hierarchical data, such that the evaluation can be carried out under controllable circumstances, as well as using real-world data to ensure that the developed models account for real-world anomalies. **R Luus (UWC), A Neethling (SU and UFS) and T de Wet**

**Bayesian analysis of cancer survival data using the lifetime model**

Bayes estimators for some of the lifetime distribution parameters, such as the mean survival time, the hazard function and the survival distribution function are derived for survival data from various lifetime models. The estimators are derived using a selection of loss functions. The survival data are normally censored and the theory is based on right-censored data – other types of censoring are also investigated – non-parametrically and parametrically. Various types of prior distribution are used in this study. **PJ Mostert, JJJ Roux (University of South Africa) and A Bekker (University of Pretoria)**

**Forecasting by identification of linear structure in a time series**

Forecasting is an important and difficult problem in time series analysis. Traditional methods are based on fitting a model to the available data, and extrapolating to future time points. An example is the class of Box-Jenkins models. In this research a new model-free approach is investigated, based on the principal components structure of the so-called time-delay matrix. **H Viljoen**

**Process prior for hazard in Cox regression from a Bayesian viewpoint**

Process prior methodology within the Cox regression models and non-parametric Bayesian survival analysis are investigated. Estimation of the baseline hazard and cumulative baseline hazard with or without ties are the main focus. Estimation is done using the Bayesian approach. Focus also on kernel estimation of non-parametric hazard rate relative to bias and variation. **PJ Mostert**

**Analysis of the performance of actuarial science students**

Studies have been carried out to better understand the performance of actuarial science students both in the various university modules as well as in the examinations of the actuarial profession. Performance has been analysed by degree programme, investigating the time taken to graduate, the number of exemptions from the profession's examinations obtained, the programmes to which students who leave the actuarial programme migrate, and the influence on performance of factors such as school mathematics, language, etc. The perceptions of students on the factors which lead to success has been investigated. The performance of students in the professions examinations has also been investigated, taking into account the university attended, gender, race, examination centre, etc. **PG Slattery**

* *

**Feature selection for multi-label classification**

Single label classification is concerned with learning from a set of instances where each instance is associated with a single label from a set of disjoint labels. This includes binary classification, where only two different labels are available, and multi-class classification for cases where more than two labels are available. Multi-label learning problems are concerned with learning from instances where each instance is associated with multiple labels. This is an important problem in applications such as textual classification, music categorization, protein function classification and the semantic classification of images.

There has been a significant increase in the literature available on feature selection in the multi-label context. We consider the problem of variable selection in a multi-label setting. Variable selection aims at identifying the variables which are relevant or important when we assign labels to a new data case. This is a more complicated question than in single label scenarios, since a variable may be relevant when we consider assigning say Label 1, while it is largely irrelevant when a decision has to be made regarding assignment of the other labels. *IE Contardo-Berning*