Highly scalable Ab initio genomic motif identification

Benoit Marchand, Vladimir B. Bajic, Dinesh Kaushik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.
Original languageEnglish (US)
Title of host publicationProceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
PublisherAssociation for Computing Machinery (ACM)
ISBN (Print)9781450307710
DOIs
StatePublished - 2011

Fingerprint Dive into the research topics of 'Highly scalable Ab initio genomic motif identification'. Together they form a unique fingerprint.

Cite this