Pathogens lie behind the deadliest pandemics in history. To date, AIDS pandemic has resulted in more than 25 million fatal cases,
while tuberculosis and malaria annually claim
more than 2 million lives.
Comparative genomic analyses are needed to
gain insights into the molecular mechanisms
of pathogens, but the abundance of biological
data dictates that such studies cannot be
performed without the assistance of computational approaches.
This explains the significant need for computational pipelines for genome assembly
The aim of this research is to develop such
This work utilizes various bioinformatics
approaches to analyze the high-throughput
genomic sequence data that has been obtained from several strains of bacterial pathogens.
A pipeline has been compiled for quality control for sequencing and assembly, and
several protocols have been developed to detect contaminations.
Visualization has been generated of genomic
data in various formats, in addition to alignment, homology detection and sequence variant detection.
We have also implemented a metaheuristic
algorithm that significantly improves
bacterial genome assemblies compared to other
known methods. Experiments on Mycobacterium
tuberculosis H37Rv data showed that our method resulted in improvement of N50 value of up to 9697% while consistently maintaining
high accuracy, covering around 98% of the
Other improvement efforts were also implemented, consisting of iterative local assemblies and iterative correction of contiguated bases.
Our result expedites the genomic analysis of virulent genes up to single base pair resolution.
It is also applicable to virtually every pathogenic microorganism, propelling further research in the control of and protection from pathogen-associated diseases.
|Date of Award||Jun 2011|
- Computer, Electrical and Mathematical Science and Engineering
|Supervisor||Arnab Pain (Supervisor)|