Engaging Data with R
The course is suitable for postgraduate students, staff and professionals who are familiar with statistical analysis and have managed or interacted with datasets using other software platforms. It is assumed that course participants are already beginner to intermediate users of R.
In the field of Data Science, R is an incredible resource for engaging with data in order to extract meaningful information that can be used to make actionable insights across a wide variety of scientific disciplines and businesses. Before any modelling and advanced analysis, data scientists must wrangle their data into “tidy” datasets, as well as learn how to probe and interrogate their data through summaries and visualizations. This module is a perfect precursor to the modules “Shareable and reproducible reporting with R and Rmarkdown” and “Fundamentals of Data Visualisation”, but can also be taken as a stand-alone module. The module covers: a refresher on basic programming in R; cleaning and manipulating datasets; creating custom summary tables; and generating professional plots. For those who are already intermediate or advanced users of R, this course will consolidate their programming knowledge under the R “tidyverse” framework for managing and exploring datasets. Important to note: This course does not cover any form of statistical modelling (e.g. using statistical tests, predictive or causal modelling).
Participants will acquire the skills to:
Clean and transform data into a format suitable for analysis with the libraries dplyr and tidyr
Create summaries of data with tailor-made tables
Explore data with ggplot, a library used to produce professional plots
There will be lectures, interwoven with practical exercises. Participants will be encouraged to follow along with exercises and programming on their own laptops.
Dr Roxanne Beauclair is a specialist in applying biostatistical methods to epidemiological data. She holds a PhD in this field from Ghent University, and has launched her own statistical consultancy company, Data Yarn based in Pretoria. She received training in Epidemiology (MPH) from the University of Cape Town in South Africa. She has been involved in an analytical capacity for several epidemiological studies. Over the past few years she has become an R enthusiast and enjoys learning new ways to improve upon statistical programmes by creating clean, reproducible, and legible code.