Protein Function Prediction: Problems and Pitfalls

William R. Pearson1

1 University of Virginia School of Medicine, Charlottesville
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.12
DOI:  10.1002/0471250953.bi0412s51
Online Posting Date:  September, 2015
The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of “functional similarity,” and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer “one‐size‐fits‐all” solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood. © 2015 by John Wiley & Sons, Inc.

Keywords: homology; orthology; paralogy; function prediction; gene ontology; EC numbers

Table of Contents

  • Introduction
  • Annotating Function
  • Homologs, Orthologs, and Paralogs
  • Function Prediction and Evolutionary Distance
  • Similarity Search, Database Size, and Database Redundancy
  • Summary
  • Literature Cited
  • Tables
Literature Cited

