Towards Structured Prediction in Bioinformatics with Deep Learning

  • Yu Li

Student thesis: Doctoral Thesis

Abstract

Using machine learning, especially deep learning, to facilitate biological research is a fascinating research direction. However, in addition to the standard classi cation or regression problems, whose outputs are simple vectors or scalars, in bioinformatics, we often need to predict more complex structured targets, such as 2D images and 3D molecular structures. The above complex prediction tasks are referred to as structured prediction. Structured prediction is more complicated than the traditional classi cation but has much broader applications, especially in bioinformatics, considering the fact that most of the original bioinformatics problems have complex output objects. Due to the properties of those structured prediction problems, such as having problem-speci c constraints and dependency within the labeling space, the straightforward application of existing deep learning models on the problems can lead to unsatisfactory results. In this dissertation, we argue that the following two ideas can help resolve a wide range of structured prediction problems in bioinformatics. Firstly, we can combine deep learning with other classic algorithms, such as probabilistic graphical models, which model the problem structure explicitly. Secondly, we can design and train problem-speci c deep learning architectures or methods by considering the structured labeling space and problem constraints, either explicitly or implicitly. We demonstrate our ideas with six projects from four bioinformatics sub elds, including sequencing analysis, structure prediction, function annotation, and network analysis. The structured outputs cover 1D electrical signals, 2D images, 3D structures, hierarchical labeling, and heterogeneous networks. With the help of the above ideas, all of our methods can achieve state-of-the-art performance on the corresponding problems. The success of these projects motivates us to extend our work towards other more challenging but important problems, such as health-care problems, which can directly bene t people's health and wellness. We thus conclude this thesis by discussing such future works, and the potential challenges and opportunities.
Date of AwardNov 1 2020
Original languageEnglish (US)
Awarding Institution
  • Computer, Electrical and Mathematical Science and Engineering
SupervisorXin Gao (Supervisor)

Keywords

  • Bioinformatics
  • Structured prediction
  • Deep learning

Cite this

'