​​Introduction to Machine Learning with Python


20 to 24 January 2020 (Monday to Friday)

Target audience

The course will be suitable for undergraduate students who have completed courses in linear algebra, calculus and introductory programming and want to get experience in practical ML. Examples would be second-year engineering or BSc students. The course could also be valuable to postgraduate students or faculty with experience in statistics and ML who, for instance, have programmed in R before and want to gain practical experience in Python.​


The goal of this course is to introduce a number of core machine learning (ML) techniques, and as a side-effect gain practical experience in Python. Rather than just showing participants how to do ML by calling Python libraries, the goal is that participants would gain an understanding of the mathematics and principles behind (a small number of) ML techniques, and walk away with an understanding of the pitfalls of applying these techniques in practice. We hope that the course will provide a foundation for further study and exploration.


Although this will be an introductory short course, it will not be light-weight. As concrete examples, attendees will have to know the following beforehand: what a vector is, what a matrix is, how to multiply one matrix with another one, what a derivative is. Prior programming experience in any language (R, Matlab, C, Java, Delphi, Basic) will be required (the course will, unfortunately, not be a general introduction to programming).

Expected outcomes

Participants will acquire the skills to: 
  • Use Python to load and process data, specifically using Jupyter notebooks and the NumPy package.
  • Derive linear regression with one variable by hand.
  • Implement linear and multiple linear regression from scratch in Python using NumPy.
  • Use the sklearn package to perform more advanced forms of regression.
  • Write down the mathematics and pseudo code for K-nearest neighbour (KNN) and logistic regression classification.
  • Implement KNN classification from scratch using NumPy. 
  • Use sklearn to perform logistic regression classification. 
  • Write down the mathematics and pseudo code for K-means clustering.
  • Implement K-means clustering from scratch using NumPy, and use more advanced forms of clustering using sklearn.
  • Understand the importance of splitting data into training, validation and test sets.
  • Know the difference between supervised, unsupervised, semi-supervised and reinforcement learning. 
  • Know when not to use ML.
  • Learn how to run the entire workflow on a remote compute environment in the cloud with AWS.​

Course format

Each day will consist of two lectures and one or two practical sessions. The rough outline for the week is as follows: 
Day 1: Introduction to the relevant features and functionalities on AWS that will be used during the course 
Day 2: Introducing Python 
Day 3: Regression 
Day 4: Practical ML and classification 
Day 5: Clustering and thoughts on ML


Herman Kamper is a senior lecturer in E&E Engineering at Stellenbosch University. He uses machine learning to solve problems in speech and language processing, with a specific focus on low- and zero-resource settings. The teaching assistants for this short course are all active postgraduate researchers in machine learning, using it to solve foundational and practical problems.