Dna-prot: Identification of dna binding proteins from protein sequence information using random forest

K. Krishna Kumar, Pugalenthi Ganesan, P. N. Suganthan*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    83 Scopus citations

    Abstract

    DNA-binding proteins (DNABPs) are important for various cellular processes, such as transcriptional regulation, recombination, replication, repair, and DNA modification. So far various bioinformatics and machine learning techniques have been applied for identification of DNA-binding proteins from protein structure. Only few methods are available for the identification of DNA binding proteins from protein sequence. In this work, we report a random forest method, DNA-Prot, to identify DNA binding proteins from protein sequence. Training was performed on the dataset containing 146 DNA-binding proteins and 250 non DNA-binding proteins. The algorithm was tested on the dataset containing 92 DNA-binding proteins and 100 non DNA-binding proteins. We obtained 80.31% accuracy from training and 84.37% accuracy from testing. Benchmarking analysis on the independent of 823 DNA-binding proteins and 823 non DNA-binding proteins shows that our approach can distinguish DNA-binding proteins from non DNA-binding proteins with more than 80% accuracy. We also compared our method with DNAbinder method on test dataset and two independent datasets. Comparable performance was observed from both methods on test dataset. In the benchmark dataset containing 823 DNA-binding proteins and 823 non DNA-binding proteins, we obtained significantly better performance from DNA-Prot with 81.83% accuracy whereas DNAbinder achieved only 61.42% accuracy using amino acid composition and 63.5% using PSSM profile. Similarly, DNA-Prot achived better performance rate from the benchmark dataset containing 88 DNA-binding proteins and 233 non DNA-binding proteins. This result shows DNA-Prot can be efficiently used to identify DNA binding proteins from sequence information. The dataset and standalone version of DNA-Prot software can be obtained from http://www3.ntu.edu.sg/home/EPNSugan/index_files/dnaprot.htm.

    Original languageEnglish (US)
    Pages (from-to)679-686
    Number of pages8
    JournalJournal of Biomolecular Structure and Dynamics
    Volume26
    Issue number6
    DOIs
    StatePublished - Jan 1 2009

    ASJC Scopus subject areas

    • Structural Biology
    • Molecular Biology

    Fingerprint Dive into the research topics of 'Dna-prot: Identification of dna binding proteins from protein sequence information using random forest'. Together they form a unique fingerprint.

    Cite this