Motivation: For the last few years, Bayesian networks (BNs) have received increasing attention from the computational biology community as models of gene networks, though learning them from gene-expression data is problematic. Most gene-expression databases contain measurements for thousands of genes, but the existing algorithms for learning BNs from data do not scale to such high-dimensional databases. This means that the user has to decide in advance which genes are included in the learning process, typically no more than a few hundreds, and which genes are excluded from it. This is not a trivial decision. We propose an alternative approach to overcome this problem. Results: We propose a new algorithm for learning BN models of gene networks from gene-expression data. Our algorithm receives a seed gene S and a positive integer R from the user, and returns a BN for the genes that depend on S such that less than R other genes mediate the dependency. Our algorithm grows the BN, which initially only contains S, by repeating the following step R + 1 times and, then, pruning some genes; find the parents and children of all the genes in the BN and add them to it. Intuitively, our algorithm provides the user with a window of radius R around S to look at the BN model of a gene network without having to exclude any gene in advance. We prove that our algorithm is correct under the faithfulness assumption. We evaluate our algorithm on simulated and biological data (Rosetta compendium) with satisfactory results.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics