The focus of the Informatics Lab is on data analytics and visualisation as well as molecular visualisation and systems biology, computational systems on data storage and retrieval and is the Research Unit’s primary connection to the UCT Medical School. Using data from the Medical School, and techniques such as machine learning, bioinformatics and visualisation analytics the Informatics Lab outputs visualised gene expressions. These gene expressions are for the purpose of cancer diagnostics. Upregulated genes are then investigated as targets for drug design by the Computational GlycoEnzymology Group.
The resources associated with the Informatics Lab are
- a nine-screen wall allowing for analysis of multiple data set models and UHD biomolecular structures
- an IBM DS3512 with two expansion drawers storage attached network (SAN)
Machine Learning and Bioinformatics Group
The main objective of the Machine Learning and Bioinformatics group at SCRU is to develop methods for Cancer and Respiratory Infection data analytics research where we process NGS produced data and perform analytics on this as well as data from publicly accessible databases. Using a systems biology approach we aim to understand the role of glycoenzymes in cause respiratory infection as well as tumourigenesis.
We employ statistics and bioinformatics methods such as multivariate analysis and gene expression profiling. The core focus of the research in the Bioinformatics members of the group is developing computational strategies that provide novel biological insights into human disease mechanisms and to introduce potential biomarkers for early diagnosis. The machine learning group members develop unsupervised learning methods aimed at data analytics for noisy biological data. The Denoising Autoencoder Self Organising Map (DASOM) is our signature software whereupon we develop deep learning as well as Growing hierarchical variations of this method.
The members of the Informatics group have developed an efficient Biomolecular Reaction and Interaction Dynamics Global Environment (BRIDGE) platform based on Galaxy for high throughput screening. The virtual HTS of TSA inhibitor families (Hits) based on the findings from the reaction dynamics studies undertaken in the Computational GlycoEnzymology Group. These families are then narrowed down to leads using cheminformatics methods and passed on to the Glycobiomedical Laboratory for synthesis, testing and systems data development.
Laboratory Infrastructure and Resources
The SCRU hardware platform includes state-of-the-art GPU clusters, data servers and infiniband clusters. The computational capability is modest and designed for code development, modelling testing and sort runs prior to using either the national HPC facility (CHPC) or the UCT and regional facility (Ilifu). Our computing environment is set up for computational software developed in-house, by academic developers and is commercially licenced. These have been modularised to allow users to manage their environment either in an interactive session or a batch job. Our software modules also enhance users to dynamically change environments User workstations in the computation and modelling lab perform computations on the GPU and CPU cluster compute nodes.