Inferring Protein Function from Homology Using the Princeton Protein Orthology Database (P‐POD)

Michael S. Livstone1, Rose Oughtred1, Sven Heinicke1, Benjamin Vernot2, Curtis Huttenhower3, Dannie Durand4, Kara Dolinski1

1 Lewis‐Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, 2 Currently at the Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, 3 Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, 4 Department of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 6.11
DOI:  10.1002/0471250953.bi0611s33
Online Posting Date:  March, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Inferring a protein's function by homology is a powerful tool for biologists. The Princeton Protein Orthology Database (P‐POD) offers a simple way to visualize and analyze the relationships between homologous proteins in order to infer function. P‐POD contains computationally generated analysis distinguishing orthologs from paralogs combined with curated published information on functional complementation and on human diseases. P‐POD also features an applet, Notung, for users to explore and modify phylogenetic trees and generate their own ortholog/paralogs calls. This unit describes how to search P‐POD for precomputed data, how to find and use the associated curated information from the literature, and how to use Notung to analyze and refine the results.Curr. Protoc. Bioinform. 33:6.11.1‐6.11.12. © 2011 by John Wiley & Sons, Inc.

Keywords: functional complementation; disease; conservation; phylogenetic analysis; trees; paralogs; Notung

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Searching for Homologs
  • Basic Protocol 2: Investigating the Conserved Function and Significance of a Protein
  • Basic Protocol 3: Using the Notung Applet to Examine Homology Relationships in Greater Detail
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Alexeyenko, A., Tamas, I., Liu, G., and Sonnhammer, E.L.L. 2006. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9‐e15.
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Durand, D., Halldórsson, B.V., and Vernot, B. 2006. A hybrid micro‐macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13:320‐335.
   Guindon, S. and Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696‐704.
   Hamosh, A., Scott, A.F., Amberger, J., Bocchini, C., Valle, D., and McKusick, V.A. 2002. Online Mendelian Inheritance in Man (OMIM): A knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30:52‐55.
   Heinicke, S., Livstone, M.S., Lu, C., Oughtred, R., Kang, F., Angiuoli, S.V., White, O., Botstein, D., and Dolinski, K. 2007. The Princeton Protein Orthology Database (P‐POD): A comparative genomics analysis tool for biologists. PLoS One 22:e766.
   Katoh, K. and Toh, H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9:286‐298.
   Li, L., Stoeckert, C.J. Jr., and Roos, D.S. 2003. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178‐2189.
   Mi, H., Dong, Q., Muruganujan, A., Gaudet, P., Lewis, S., and Thomas, P.D. 2010. PANTHER version 7: Improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38:D204‐D210.
   Reference Genome Group of the Gene Ontology Consortium. 2009. The Gene Ontology's Reference Genome Project: A unified framework for functional annotation across species. PLoS Comput. Biol. 5:e1000431.
Key References
   Heinicke et al., 2007. See above.
  The original 2007 P‐POD paper, with discussion of the reasons for building P‐POD and testing of the literature curation. The pipeline and user interface have changed since 2007; refer to the P‐POD help page (below) for a current technical description of P‐POD.
   Durand et al., 2006. See above
  Technical description of Notung.
Internet Resources
  The main P‐POD page and search interface.
  The P‐POD help page contains an overview of the P‐POD pipeline, a brief tutorial, and links to additional information.
  Valid identifiers for P‐POD and sample searches.
  P‐POD technical information, including version numbers and settings for all software in the P‐POD pipeline.
  A more extensive and illustrated explanation of how Notung infers orthologs and paralogs in P‐POD.
  The P‐POD ftp site containing all families, support files, and the 48‐species PANTHER 7.0 dataset. The current release is in the “version4” folder. More detail is available in README's.
  Archival technical information for the original 2007 P‐POD release only.∼durand/Notung/
  The Notung application and documentation.
  Online Mendelian Inheritance in Man (OMIM).
  The PANTHER 7.0 database.
  The Saccharomyces Genome Database.
  Homepage of the Gene Ontology Consortium's Reference Genome project.‐bin/amigo/go.cgi
  The Gene Ontology Consortium's AmiGO database.
  Description of the Newick tree format.
PDF or HTML at Wiley Online Library