Roughly ~50% of the human genome, contains noncoding sequences serving as regulatory elements responsible for the diverse gene expression of the cells in the body. One very well studied category of regulatory elements is the category of enhancers. Enhancers increase the transcriptional output in cells through chromatin remodeling or recruitment of complexes of binding proteins. Identification of enhancer using computational techniques is an interesting area of research and up to now several approaches have been proposed. However, the current state-of-the-art methods face limitations since the function of enhancers is clarified, but their mechanism of function is not well understood.
This PhD thesis presents a bioinformatics/computer science study that focuses on the problem of identifying enhancers in different human cells using computational techniques. The dissertation is decomposed into four main tasks that we present in different chapters. First, since many of the enhancer’s functions are not well understood, we study the basic biological models by which enhancers trigger transcriptional functions and we survey comprehensively over 30 bioinformatics approaches for identifying enhancers.
Next, we elaborate more on the availability of enhancer data as produced by different enhancer identification methods and experimental procedures. In particular, we analyze advantages and disadvantages of existing solutions and we report obstacles that require further consideration. To mitigate these problems we developed the Database of Integrated Human Enhancers (DENdb), a centralized online repository that archives enhancer data from 16 ENCODE cell-lines. The integrated enhancer data are also combined with many other experimental data that can be used to interpret the enhancers content and generate a novel enhancer annotation that complements the existing integrative annotation proposed by the ENCODE consortium.
Next, we propose the first deep-learning computational framework for identifying enhancers. The proposed system called Dragon Ensemble Enhancer Predictor (DEEP) is based on the novel deep learning two-layer ensemble algorithm capable of identifying enhancers characterized by different cellular conditions. Experimental results using data from ENCODE and FANTOM5, demonstrate that DEEP surpasses in terms of recognition performance the major systems for enhancer prediction and shows very good generalization capabilities in unknown cell-lines and tissues.
Finally, we take a step further by developing a novel feature selection method suitable for defining a computational framework capable of analyzing the genomic content of enhancers and reporting cell-line specific predictive signatures.
|Date of Award||Mar 24 2016|
|Original language||English (US)|
- Computer, Electrical and Mathematical Science and Engineering
|Supervisor||Panos Kalnis (Supervisor)|
- machine learning
- computer science
- transcription regulation