EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection

Krishna Kumar Umar Kandaswamy, Pugalenthi Ganesan, Kai Uwe Kalies, Enno Hartmann, Thomas M. Martinetz

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

The extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred. © 2012 Elsevier Ltd.
Original languageEnglish (US)
Pages (from-to)377-383
Number of pages7
JournalJournal of Theoretical Biology
Volume317
DOIs
StatePublished - Jan 2013

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Modeling and Simulation
  • Applied Mathematics
  • Statistics and Probability
  • Immunology and Microbiology(all)
  • Medicine(all)

Fingerprint

Dive into the research topics of 'EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection'. Together they form a unique fingerprint.

Cite this