1. PLoS Comput Biol. 2005 Oct;1(5):e45. Epub 2005 Oct 7.
Protein molecular function prediction by Bayesian phylogenomics.
Engelhardt BE, Jordan MI, Muratore KE, Brenner SE.
Department of Electrical Engineering and Computer Sciences, University of
California, Berkeley, California, United States of America.
We present a statistical graphical model to infer specific molecular function for
unannotated protein sequences using homology. Based on phylogenomic principles,
SIFTER (Statistical Inference of Function Through Evolutionary Relationships)
accurately predicts molecular function for members of a protein family given a
reconciled phylogeny and available function annotations, even when the data are
sparse or noisy. Our method produced specific and consistent molecular function
predictions across 100 Pfam families in comparison to the Gene Ontology
annotation database, BLAST, GOtcha, and Orthostrapper. We performed a more
detailed exploration of functional predictions on the
adenosine-5'-monophosphate/adenosine deaminase family and the lactate/malate
dehydrogenase family, in the former case comparing the predictions against a gold
standard set of published functional characterizations. Given function
annotations for 3% of the proteins in the deaminase family, SIFTER achieves 96%
accuracy in predicting molecular function for experimentally characterized
proteins as reported in the literature. The accuracy of SIFTER on this dataset is
a significant improvement over other currently available methods such as BLAST
(75%), GeneQuiz (64%), GOtcha (89%), and Orthostrapper (11%). We also
experimentally characterized the adenosine deaminase from Plasmodium falciparum,
confirming SIFTER's prediction. The results illustrate the predictive power of
exploiting a statistical model of function evolution in phylogenomic problems. A
software implementation of SIFTER is available from the authors.