Prediction of Protein‐Protein Interaction Networks

Shawn M. Gomez1, Kwangbom Choi2, Yang Wu1

1 Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 2 Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 8.2
DOI:  10.1002/0471250953.bi0802s22
Online Posting Date:  June, 2008
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


This unit offers a general overview of several techniques that have been developed for inferring functional and/or protein‐protein interaction networks. The majority of these use whole‐genome sequences as their primary input source of data. In addition, a few methods that utilize both protein features and experimental protein‐protein interaction data directly in the prediction of new interactions have recently been developed. While an exhaustive list of approaches is not presented, it is hoped that the reader will gain a sense of how these approaches are implemented and an idea of their relative strengths and weaknesses, and a broader perspective on the type of work being conducted in this highly active area of research. Curr. Protoc. Bioinform. 22:8.2.1–8.2.14. © 2008 by John Wiley & Sons, Inc.

Keywords: protein interactions; bioinformatics; interaction networks

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Approaches
  • Observations and Conclusions
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., Gelpke, M.D., Roach, J., Oh, T., Ho, I.Y., Wong, M., Detter, C., Verhoef, F., Predki, P., Tay, A., Lucas, S., Richardson, P., Smith, S.F., Clark, M.S., Edwards, Y.J., Doggett, N., Zharkikh, A., Tavtigian, S.V., Pruss, D., Barnstead, M., Evans, C., Baden, H., Powell, J., Glusman, G., Rowen, L., Hood, L., Tan, Y.H., Elgar, G., Hawkins, T., Venkatesh, B., Rokhsar, D., and Brenner, S. 2002. Whole‐genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301‐1310.
   Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, I., Corpet, L.F., Croning, M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M., Servant, F., Sigrist, C.J., and Zdobnov, E.M. 2001. The InterPro database, an integrated documentation resource for protein families domains and functional sites. Nucleic Acids Res. 29:37‐40.
   Bader, G.D., Donaldson, I., Wolting, C., Ouellette, B.F., Pawson, T., and Hogue, C.W. 2001. BIND—The Biomolecular Interaction Network Database. Nucleic Acids Res. 29:242‐245.
   Barabasi, A.L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286:509‐512.
   Barker, D. and Pagel M. 2005. Predicting functional gene links from phylogenetic‐statistical analyses of whole genomes. PLoS Comput. Biol. 1:e3.
   Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Finn, R.D., and Sonnhammer, E.L. 1999. Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res. 27:260‐262.
   Ben‐Hur, A. and Noble, WS. 2005. Kernel methods for predicting protein‐protein interactions. Bioinformatics 21:i38‐i46.
   Berger, J.M., Gamblin, S.J., Harrison, S.C., and Wang, J.C. 1996. Structure and mechanism of DNA topoisomerase II. Nature 379:225‐232.
   Bock, J.R. and Gough, D.A. 2001. Predicting protein—protein interactions from primary structure. Bioinformatics 17:455‐460.
   Botstein, D. 1999. Of genes and genomes. Ann. N.Y. Acad. Sci. 882:32‐41.
   Corpet, F., Gouzy, J., and Kahn, D. 1998. The ProDom database of protein domain families. Nucleic Acids Res. 26:323‐326.
   Craig, R.A. and Liao, L. 2007. Phylogenetic tree information aids supervised learning for predicting protein‐protein interaction based on distance matrices. BMC Bioinformatics. 8:6.
   Dandekar, T., Snel, B., Huynen, M., and Bork, P. 1998. Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem. Sci. 23:324‐328.
   Demerec, M.E. and Hartman, P. 1959. Complex loci in microorganisms. Annu. Rev. Microbiol. 13:377‐406.
   Deng, M., Mehta, S., Sun, F., and Chen, T. 2002. Inferring domain‐domain interactions from protein‐protein interactions. Genome Res. 12:1540‐1548.
   Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863‐14868.
   Eisenberg, D., Marcotte, E.M., Xenarios, I., and Yeates, T.O. 2000. Protein function in the post‐genomic era. Nature 405:823‐826.
   Enright, A.J., Iliopoulos, I., Kyrpides, N.C., and Ouzounis, C.A. 1999. Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86‐90.
   Fryxell, K.J. 1996. The coevolution of gene family trees. Trends Genet 12:364‐369.
   Goh, C.‐S., Bogan, A.A., Joachimiak, M., Walther, D., and Cohen, F.E. 2000. Co‐evolution of proteins with their interaction partners. J. Mol. Biol. 299:283‐293.
   Gomez, S.M. and Rzhetsky, A. 2002. Towards the prediction of complete protein—protein interaction networks. Pac. Symp. Biocomput. 2002:413‐424.
   Gomez, S.M., Lo, S.H., and Rzhetsky, A. 2001. Probabilistic prediction of unknown metabolic and signal‐transduction networks. Genetics 159:1291‐1298.
   Gomez, S.M., Noble, W.S., and Rzhetsky, A. 2003. Learning to predict protein‐protein interactions from protein sequences. Bioinformatics 19:1875‐1881.
   Hallas, C., Pekarsky, Y., Itoyama, T., Varnum, J., Bichi, R., Rothstein, J.L., and Croce, C.M. 1999. Genomic analysis of human and mouse TCL1 loci reveals a complex of tightly clustered genes. Proc. Natl. Acad. Sci. U.S.A. 96:14418‐14423.
   Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., and Sakaki, Y. 2000. Toward a protein‐protein interaction map of the budding yeast: A comprehensive system to examine two‐hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. U.S.A. 97:1143‐1147.
   Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., and Gerstein, M. 2003. A Bayesian networks approach for predicting protein‐protein interactions from genomic data. Science 302:449‐453.
   Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Barabasi, A.L. 2000. The large‐scale organization of metabolic networks. Nature 407:651‐654.
   Jeong, H., Mason, S.P., Barabasi, A.L., and Oltvai, Z.N. 2001. Lethality and centrality in protein networks. Nature 411:41‐42.
   Jothi, R., Kann, M.G., and Przytycka, T.M. 2005. Predicting protein‐protein interaction by searching evolutionary tree automorphism space. Bioinformatics 21:i241‐i250.
   Jothi, R., Cherukuri, P.F., Tasneem, A., and Przytycka, T.M. 2006. Co‐evolutionary analysis of domains in interacting proteins reveals insights into domain‐domain interactions mediating protein‐protein interactions. J. Mol. Biol. 362:861‐875.
   Lawrence, J.G. 2002. Shared strategies in gene organization among prokaryotes and eukaryotes. Cell 110:407‐413.
   Lin, N., Wu, B., Jansen, R., Gerstein, M., and Zhao, H. 2005. Information assessment on predicting protein‐protein interactions. BMC Bioinformatics 5:154.
   Lu, L.G., Xia, Y., Paccanaro, A., Yu, H., and Gerstein, M. 2005. Assessing the limits of genomic data integration for predicting protein networks. Genome Res. 15:945‐953.
   Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., and Eisenberg, D. 1999. Detecting protein function and protein‐protein interactions from genome sequences. Science 285:751‐753.
   Moyle, W.R., Campbell, R.K., Myers, R.V., Bernard, M.P., Han, Y., and Wang, X. 1994. Co‐evolution of ligand‐receptor pairs. Nature 368:251‐255.
   Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D., and Maltsev, N. 1999. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. U.S.A. 96:2896‐2901.
   Pazos, F. and Valencia, A. 2001. Similarity of phylogenetic trees as an indicator of protein‐protein interaction. Protein Eng. 14:609‐614.
   Pazos, F., Ranea, J.A., Juan, D., and Sternberg, M.J. 2005. Assessing protein co‐evolution in the context of the tree of life assists in the prediction of the interactome. J. Mol. Biol. 352:1002‐1015.
   Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., and Yeates, T.O. 1999. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. U.S.A. 96:4285‐4288.
   Ramani, A.K. and Marcotte, E.M. 2003. Exploiting the co‐evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327:273‐284.
   Rhodes, D.R., Tomlins, S.A., Varambally, S., Mahavisno, V., Barrette, T., Kalyana‐Sundaram, S., Ghosh, D., Pandey, A., and Chinnaiyan, A.M. 2005. Probabilistic model of the human protein‐protein interaction network. Nat. Biotechnol. 23:951‐959.
   Riley, R., Lee, C., Sabatti, C., and Eisenberg, D. 2005. Inferring protein domain interactions from databases of interacting proteins. Genome Biol. 6:R89.
   Sato, T., Yamanishi, Y., Kanehisa, M., and Toh, H. 2005. The inference of protein‐protein interactions by co‐evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21:3482‐3489.
   Scott, M.S. and Barton, G.J. 2007. Probabilistic prediction and ranking of human protein‐protein interactions. Bioinformatics 8:239.
   Sprinzak, E. and Margalit, H. 2001. Correlated sequence‐signatures as markers of protein‐protein interaction. J. Mol. Biol. 311:681‐692.
   Sprinzak, E., Sattath, S., and Margalit, H. 2003. How reliable are experimental protein‐protein interaction data? J. Mol. Biol. 327:919‐923.
   Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi‐Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J.M. 2000. A comprehensive analysis of protein‐protein interactions in Saccharomyces cerevisiae. Nature 403:623‐627.
   von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., and Bork, P. 2002. Comparative assessment of large‐scale data sets of protein‐protein interactions. Nature 417:399‐403.
   Witten, I.H. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, Calif.
   Wojcik, J. and Schachter, V. 2001. Protein‐protein interaction map inference using interacting domain profile pairs. Bioinformatics 17:S296‐S305.
   Wu, Q. and Maniatis, T. 1999. A striking organization of a large family of human neural cadherin‐like cell adhesion genes. Cell 97:779‐790.
   Xenarios, I., Rice, D.W., Salwinski, L., Baron, M.K., Marcotte, E.M., and Eisenberg, D. 2000. DIP: The database of interacting proteins. Nucleic Acids Res. 28:289‐291.
   Yanai, I., Derti, A., and DeLisi, C. 2001. Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes. Proc. Natl. Acad. Sci. U.S.A. 98:7940‐7945.
   Yeang, C.‐H. and Haussler, D. 2007. Detecting coevolution in and among protein domains. PLoS Comput. Biol. 3:e211.
   Zhang, L.V., Wong, S.L., King, O.D., and Roth, F.P. 2004. Predicting co‐complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics. 5:38.
Internet Resources
  The Database of Interacting Proteins (DIP). A database of both manually and automatically curated experimental protein‐protein interactions.
  STRING is a database of known and predicted protein‐protein interactions. The interactions include direct (physical) and indirect (functional) associations taken from high‐throughput experiments, genomic context, coexpression, and literature.
  The Biomolecular Interaction Network Database (BIND). Database of interactions, molecular complexes, and pathways. Includes interactions other than protein‐protein (e.g., protein‐DNA).
  The Molecular Interactions Database (MINT). A manually curated database designed to store functional interactions between biological molecules (i.e., proteins RNA and DNA).
  PathCalling Yeast Interaction Database. Database of results from Uetz et al. ().
  The WIT homepage. A Web site of reconstructed metabolic pathways for a number of genomes.
  The Munich Information Center for Protein Sequences (MIPS) homepage. Maintains curated database designed to store functional interactions between biological molecules (e.g., proteins, RNA, DNA).
  KEGG: Kyoto Encyclopedia of Genes and Genomes. In addition to other material, this site provides a database of molecular interactions as well as metabolic and signal transduction pathways.
  The Encyclopedia of Escherichia coli Genes and Metabolism (EcoCyc) Web site.
  Web site for Hybrigenics’ Protein Interaction Map (PIM) functional proteomics software platform.
PDF or HTML at Wiley Online Library