ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation

Ramzan Umarov, Yu Li, Takahiro Arakawa, Satoshi Takizawa, Xin Gao, Erik Arner

Research output: Contribution to journalArticlepeer-review

Abstract

Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring “false positive” predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.
Original languageEnglish (US)
Pages (from-to)e1009376
JournalPLOS Computational Biology
Volume17
Issue number9
DOIs
StatePublished - Sep 7 2021

ASJC Scopus subject areas

  • Ecology
  • Cellular and Molecular Neuroscience
  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Modeling and Simulation
  • Computational Theory and Mathematics
  • Molecular Biology

Fingerprint

Dive into the research topics of 'ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation'. Together they form a unique fingerprint.

Cite this