## Navorsingsbelangstellings

**Accounting fair valuation in the context of sparse data**

The International Accounting Standards Board (IASB) defines, within the International Financial Reporting Standards (IFRS) 13 *Fair Value Measurement*,
the fair value for financial instruments as the price that would be
received from selling an asset or paid to transfer a liability in an
orderly transaction between market participants at the measurement date,
i.e. an exit price. The definition of fair value is similar under the
Financial Accounting Standards Board's Accounting Standards Codification
(ASC) Topic 810 (formerly, Statement of Financial Accounting Standards
(SFAS) 157) *Fair Value Measurement*. While there are much
published research on how to calculate these fair values in areas where
data are readily available, little research on how to calculate the fair
values in areas where data are sparse have been done. Through the
application of simulation and statistical learning techniques, we
construct a framework which can be applied to areas with sparse data,
which includes estimation of risk-free rates, credit spreads, and
expected exposure for xVA calculation, amongst others in order to
calculate the fair value. **CJ van der Merwe, T de Wet, D Heyman (UGent, Belgium)**

**Visualisation of multivariate data and biplots**

This
well established research group collaborates with prof John Gower (UK),
prof Patrick Groenen (The Netherlands), prof Michael Greenacre (Spain)
as well as with Industry in South Africa. New students join the group
continually to proceed with their master's or doctoral studies. Since
2001 the group has already produced 11 Masters and 4 PhD graduates.
Several additional registered master's and PhD students are currently
engaged with research in visualisation of multivariate data and related
biplots. Two invited papers in WIREs Computational Statistics provide
an overview of the biplot methodology developed for quantitative (doi
10.1002/wics.1338) and qualitative (doi 10.1002/wics.1377) data. The
second paper raised some research questions for further development
especially in the area of Fisher's optimal scores (FOS). FOS is
essentially a method dating back to 1938, but revisiting this in the
modern computational age has opened up a whole new plethora of
possibilities to explore. The visualisation of multivariate categorical
data including compositional data is currently a research focus. The
three main objectives of the project are: to extend biplot methodology
for addressing several challenges in visualising multivariate data; to
apply newly derived biplot-based techniques to data sets originating
from various fields of application and lastly, the development of
extensive R collections of functions for constructing all necessary
graphical displays and performing all biplot-based techniques. The
application of biplot methodology to data sets originating from diverse
fields of application such as archaeology, ecology, psychology,
accountancy, chemistry, wood science, health sciences and industry
stimulates the development of new theory and procedures which in turn
set the scene for subsequent theoretical research. **NJ le Roux & S Lubbe**

**Saddlepoint approximations in Extreme Value Theory**

Saddlepoint
approximations have been applied successfully in many areas of
statistics (as well as in other sciences, e.g. physics, applied
mathematics and engineering). However, very little work has been done on
applying the saddlepoint in Extreme Value Theory (EVT). In recent
research the authors have applied it to approximating the distribution
of the Hill estimator, the well-known estimator for the extreme value
index (EVI). The approximation in that case was extremely accurate.
Further research is now being carried out in which the saddlepoint is
applied to other estimators of the EVI as well as to estimators of other
relevant EVT parameters e.g. quantiles. The saddlepoint will also be
used to find improved confidence intervals for these parameters. **S Buitendag, T de Wet and J Beirlant (KUL, Belgium)**

**Repairable systems in Reliability: Bayesian Approaches**

Research
on repairable systems and their evaluation of their performance in
terms of reliability and availability. Multi-unit systems are
investigated. A Bayesian method of assessing reliability is of primary
interest, since very little is published on this topic.* PJ Mostert, VSS Yadavalli (University of Pretoria) and A Bekker (University of Pretoria)*

**Novelty detection using Extreme Value Theory**

Novelty
detection is a branch of statistics that concerns detecting deviations
from the expected normal behaviour. It is generally the case that the
anomalous events have catastrophic financial or social impacts and,
therefore, only occur rarely. Consequently, the broad approach is to
construct a model representing the normal behaviour of the underlying
system. New observations are then tested against this model of
normality. One approach to discriminate between expected and anomalous
observations is to threshold the model of normality probabilistically.
This method has the advantage that the certainty in discriminating
between normal and novel observations is quantified. Recently, an
approach based on extreme value theory has been formulated to threshold
the model representing the normal state. Under the assumption that the
probability density function of the variables in their normal state is
well defined, extreme value theory is utilised to derive a limiting
distribution for the minimum probability density of the data. A
significant advantage that this approach inherits is the ability to
perform novelty detection in multimodal and multivariate data spaces.
Further research is now being carried out in which the theory of second
order regular variation is used to determine the rate of convergence of
the extreme value-based novelty detection algorithm. This research
extends current models by using the lower order statistics of the
probability density values to approximate the limiting distribution of
the minimum probability density. Consequently, the extreme value
distribution is approximated by using more information than only the
sample of minima.* ML Steyn and T de Wet*

**Application of measures of divergence in statistical inference**

In
the literature a large number of measures of divergence between two
probability distributions have been proposed such as Pearson's
chi-square divergence, power density divergence, phi measures of
divergence and many others. Some of these measures have been applied to
particular areas of statistical inference, but some areas have either
been “avoided" or “neglected" by only receiving marginal attention. In
this research divergence measures are applied to some of these areas,
including goodness-of-fit tests and extreme value theory. In particular
the class of f-divergences is studied, extended and applied to such
problems. **T de Wet and F Österreicher (University of Salzburg)**

**Campanometry**

Campanology,
the study of bells, bell-casting and bell-ringing, is quite an old
discipline. A major aspect of camponology is clearly the sound produced
and thus the discipline is traditionally closely tied to physics, and in
particular to acoustics, the scientific study of sound and sound waves.
In contrast to this, the study of the quantitative or statistical
aspects of bells and their properties is much more recent. The term
Campanometry was coined for this multi-disciplinary field of music,
history, mathematics/statistics, acoustics and metallurgy. The famous
campanologist Andre Lehr (1929 – 2007) is credited as the founder of
Campanometry. Of particular interest is the measurement and statistical
study of the different partials of bells and carillons. Since bells are
usually tuned in order to have their partials at the ideal values, the
deviations from these ideal values supply important information on the
sound quality of a bell. Furthermore, measurements on their physical
properties also provide useful data for analyses. In this research bells
in the Western Cape are identified, pictured and measured physically
and acoustically and the information is stored in the SUNDigital
Collections, the digital heritage repository of the Stellenbosch
University Library. The data obtained thus obtained is used to
statistically model different aspects of bells and carillons, inter alia
to what extent they comply with certain standard design criteria and to
statistically analyse the sound quality of the bells. Furthermore,
using statistical classification techniques, bells in the database of
unknown founders can be classified as being founded by a particular
founder. The latter is analogous to the statistical classification of
unknown authors of manuscripts. **T de Wet, PJU van Deventer and JL Teugels (KUL, Belgium)**

**Statistical inference of complex survey data**

Most
survey data analysed in practice originate from non-simple random
sampling (non-SRS) designs. These designs typically combine different
sampling methods, such as stratification and cluster sampling. This is
known as complex sampling, a technique employed to ensure that the
sample collected represents the target population as closely as
possible. This project extends our previous research in the field of
complex sampling to develop models and methods of analysis to account
for the complex design of non-SRS multivariate data, a highly unexplored
area. The newly developed models and methods will be evaluated in two
ways, using simulated hierarchical data, such that the evaluation can be
carried out under controllable circumstances, as well as using
real-world data to ensure that the developed models account for
real-world anomalies. **R Luus (UWC), A Neethling (SU and UFS) and T de Wet**

**Bayesian analysis of cancer survival data using the lifetime model**

Bayes
estimators for some of the lifetime distribution parameters, such as
the mean survival time, the hazard function and the survival
distribution function are derived for survival data from various
lifetime models. The estimators are derived using a selection of loss
functions. The survival data are normally censored and the theory is
based on right-censored data – other types of censoring are also
investigated – non-parametrically and parametrically. Various types of
prior distribution are used in this study. **PJ Mostert, JJJ Roux (University of South Africa) and A Bekker (University of Pretoria)**

**Forecasting by identification of linear structure in a time series**

Forecasting
is an important and difficult problem in time series analysis.
Traditional methods are based on fitting a model to the available data,
and extrapolating to future time points. An example is the class of
Box-Jenkins models. In this research a new model-free approach is
investigated, based on the principal components structure of the
so-called time-delay matrix. **H Viljoen**

**Process prior for hazard in Cox regression from a Bayesian viewpoint**

Process
prior methodology within the Cox regression models and non-parametric
Bayesian survival analysis are investigated. Estimation of the baseline
hazard and cumulative baseline hazard with or without ties are the main
focus. Estimation is done using the Bayesian approach. Focus also on
kernel estimation of non-parametric hazard rate relative to bias and
variation. **PJ Mostert**

**Analysis of the performance of actuarial science students**

Studies
have been carried out to better understand the performance of actuarial
science students both in the various university modules as well as in
the examinations of the actuarial profession. Performance has been
analysed by degree programme, investigating the time taken to graduate,
the number of exemptions from the profession's examinations obtained,
the programmes to which students who leave the actuarial programme
migrate, and the influence on performance of factors such as school
mathematics, language, etc. The perceptions of students on the factors
which lead to success has been investigated. The performance of students
in the professions examinations has also been investigated, taking into
account the university attended, gender, race, examination centre, etc.
**PG Slattery**

* *

**Feature selection for multi-label classification**

Single label classification is concerned with learning from a set of instances where each instance is associated with a single label from a set of disjoint labels. This includes binary classification, where only two different labels are available, and multi-class classification for cases where more than two labels are available. Multi-label learning problems are concerned with learning from instances where each instance is associated with multiple labels. This is an important problem in applications such as textual classification, music categorization, protein function classification and the semantic classification of images.

There has been a significant increase in the literature available on feature selection in the multi-label context. We consider the problem of variable selection in a multi-label setting. Variable selection aims at identifying the variables which are relevant or important when we assign labels to a new data case. This is a more complicated question than in single label scenarios, since a variable may be relevant when we consider assigning say Label 1, while it is largely irrelevant when a decision has to be made regarding assignment of the other labels. *IE Contardo-Berning*