The South African Tuberculosis Bioinformatics Initiative (SATBBI) hosts and maintains a number of computational resources for use by members of the group as well as the division.
- The oldest of these servers is Khaos, a dual 4 core Intel Xeon E5520 machine running Centos Linux, with 96 GB RAM and 88 TB of RAID 6 disk space.
- A somewhat newer server is Aither, an Ubuntu Linux server with 256 GB of RAM and 50 TB of RAID 6 disk space. This server has two Intel Xeon E5-2690 v3 12-core CPUs.
- Another, newer, server is Gaia, a CentOS Linux server with 512 GB of RAM and xx TB of RAID 60 disk space. This server has two Intel Xeon xxx. 16-core CPUs.
- The newest server is Hemera, an Ubuntu Linux server with 1,536 GB of RAM and 280 TB of RAID 6 disk space. This server has two Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz 18-core CPUs.
- Additionally, SATBBI has access to high-performance clusters both on the SU campus and through a project on the DSI Centre for High-Performance Computing (CHPC), which enables SATBBI to support projects with very high computational needs.
SATBBI is also the driving force behind ResComDat an initiative to implement a large capacity storage system based on Ceph technology. The Research Commons Data (ResComDat) project will provide infrastructure that can expand to exabyte capacity and is intended to be a repository for large data sets with well-annotated data (FAIR compliant). Large datasets from high-throughput techniques, such as next-generation sequencing, mass spectrometry and cytometry by time of flight (CyTOF), cannot be accommodated in other resources. ResComDat will fill the storage need.
Useful Resources to Prepare for Bioinformatics
1. Become familiar with Unix/Linux
a) Gain access to a Unix/Linux computer
- Existing server
If there is a server available, ask the systems administrator to create a user account for you. You will likely need to use secure shell (ssh) to access the server.
- Virtual instance of Linux (Ubuntu) on VirtualBox
VirtualBox is a virtualization server from Oracle. Please follow the instructions here
- MacOS – terminal
- The underlying operating system of MacOS is a variant of Unix derived from the NEXT OS (built as a variant of UNIX without the proprietary UNIX constraints, not unlike Linux)
- One can try to follow the Bash exercises in the Mac terminal (bash is usually the default shell). If you experience problems, type “echo $0" and record the name of the shell. If it is not bash, it might be best to install VirtualBox and Ubuntu.
b) Practice using commands in the Bash shell
1. There are some resources below. Please try to work through examples and exercises in these resources. Try typing the commands if there is an example output. Some important commands are:
- cd – change directory
- ls – list directory contents
- pwd– present working directory (where am I?)
- echo– usually “echo $varname", prints the content of the variable varname.
- Date – prints the current date (and time) to the screen
- cat – concatenate. cat filename will print the contents of the file. cat can also be used to append files together.
- Tree – not always installed by default. Displays directory in a text-based tree structure
- mv – move a file, also acts to rename a file
- rm – remove a file
- rmdir – remove a directory (if empty)
- exit – leave the current shell (closes shell and terminal window)
2. Elementary Bash
3. Bash Scripting:
2. Learn R
a) Download R and install
b) Download RStudio and instal
c) Install the tidyverse set of packages in your R.
Note that on Windows, it is often easiest to install everything as a personal installation, i.e., in your userspace. If you install R, RStudio and the packages for everyone on the computer, you need to run the RGUI as administrator.
d) Some beginning resources (free):
- https://bookdown.org/rwnahhas/IntroToR/ (NOTE: You can look around bookdown.org for more books on R)
e) The R for Data Scientists book FREE ONLINE here
f) Handy “cheat sheets" (2-page summaries) for Rstudio and tidyverse packages available here
3. Learn Python
a) Download python and install
b) Download an integrated development environment (IDE):
- PyDev (plugin for eclipse)
- Visual Studio
c) Learning Resources
- Python Tutorial
- Other websites