The InterPro Database and Tools for Protein Domain Analysis

Nicola J. Mulder1, Rolf Apweiler1

1 The EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge, U.K.
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.7
DOI:  10.1002/0471250953.bi0207s21
Online Posting Date:  March, 2008
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

InterPro provides a one‐stop shop for protein‐sequence classification, freeing the user from having to visit multiple databases separately and rationalize the different results in varying formats. This unit describes how to submit a sequence to InterProScan via a Web server. It also provides instructions for installing and running InterProScan locally. In addition, details on browsing InterPro families and domains of interest using the InterPro Web and sequence retrieval system (SRS) are provided to show users how to get the most from the resource. Curr. Protoc. Bioinform. 21:2.7.1‐2.7.18. © 2008 by John Wiley & Sons, Inc.

Keywords: InterPro; SRS; protein domain

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Protein Sequence Classification Using InterProScan via the Internet
  • Alternate Protocol 1: Local InterProScan Installation for Bulk Sequence Searches
  • Basic Protocol 2: Browsing the InterPro Database from the Web Server
  • Alternate Protocol 2: Browsing InterPro with an SRS‐Based Text Search
  • Alternate Protocol 3: Searching InterPro in SRS
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

   Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C., and Murzin, A.G. 2004. SCOP database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Res. 32: D226‐D229.
   Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., and Zygouri, C. 2003. PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 31: 400‐402.
   Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S., and Kahn, D. 2005. The ProDom database of protein domain families: More emphasis on 3D. Nucleic Acids Rese. 33: D212‐D215.
   Corpet, F., Servant, F., Gouzy, J., and Kahn, D. 2000. ProDom and ProDom‐CG: Tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28: 267‐269.
   Eddy, S.R. 1998. Profile hidden Markov models. Bioinformatics 14: 755‐763. (Available online as HMMER2 Profile hidden Markov models for biological sequence analysis; http://hmmer.janelia.org.)
   Finn, R.D., Mistry, J., Schuster‐Bockler, B., Griffiths‐Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L., and Bateman, A. 2006. Pfam: Clans, web tools and services. Nucleic Acids Res. 34: D247‐D251.
   The Gene Ontology Consortium. 2006. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 34: D322‐D326.
   Greene, L.H., Lewis, T.E., Addou, S., Cuff, A., Dallman, T., Dibley, M., Redfern, O., Pearl, F., Nambudiry, R., Reid, A., Sillitoe, I., Yeats, C., Thornton, J.M., and Orengo, C.A. 2007. The CATH domain structure database: New protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res. 35: D291‐D297.
   Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk‐Genevaux, P.S., Pagni, M., and Sigrist, C.J.A. 2006. The PROSITE database. Nucleic Acids Res. 34: D227‐D230.
   International Human Genome Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860‐921.
   Kanapin, A., Apweiler, R., Biswas, M., Fleischmann, W., Karavidopoulou, Y., Kersey, P., Kriventseva, E.V., Mittard, V., Mulder, N., Oinn, T., Phan, I., Servant, F., and Zdobnov, E. 2002. Interactive InterPro‐based comparisons of proteins in whole genomes. Bioinformatics 18: 374‐375.
   Kersey P., Bower, L., Morris, L., Horne, A., Petryszak, R., Kanz, C., Kanapin, A., Das, U., Michoud, K., Phan, I., Gattiker, A., Kulikova, T., Faruque, N., Duggan, K., Mclaren, P., Reimholz, B., Duret, L., Penel, S., Reuter, I., and Apweiler, R. 2005 Integr8 and genome reviews: Integrated views of complete genomes and proteomes. Nucleic Acids Res. 33: D297‐D302.
   Kopp, J. and Schwede, T. 2006. The SWISS‐MODEL repository: New features and functionalities. Nucleic Acids Res. 34: D315‐D318.
   Kouranov, A., Xie, L., de la Cruz, J., Chen, L., Westbrook, J., Bourne, P.E., and Berman, H.M. 2006. The RCSB PDB information portal for structural genomics. Nucleic Acids Res. 34: D302‐D305.
   Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305: 567‐580.
   Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J., and Bork, P. 2006. SMART 5: Domains in the context of genomes and networks. Nucleic Acids Res. 34: D257‐D260.
   Mi, H., Guo, N., Kejariwal, A., and Thomas, P.D. 2007. PANTHER version 6: Protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res. 35: D247‐D252.
   Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R., Courcelle, E., Das, U., Daugherty, L., Dibley, M., Finn, R., Fleischmann, W., Gough, J., Haft, D., Hulo, N., Hunter, S., Kahn, D., Kanapin, A., Kejariwal, A., Labarga, A., Langendijk‐Genevaux, P.S., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Nikolskaya, A.N., Orchard, S., Orengo, C., Petryszak, R., Selengut, J.D., Sigrist, C.J., Thomas, P.D., Valentin, F., Wilson, D., Wu, C.H., and Yeats, C. 2007. New developments in the InterPro database. Nucleic Acids Res. 35: D224‐D228.
   Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. 1997. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int. J. Neural. Syst. 8: 581‐599.
   Pieper, U., Eswar, N., Davis, F.P., Braberg, H., Madhusudhan, M.S., Rossi, A., Marti‐Renom, M., Karchin, R., Webb, B.M., Eramian, D., Shen, M.Y., Kelly, L., Melo, F., and Sali, A. 2006. MODBASE: A database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 34: D291‐295.
   Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., and Lopez, R. 2005. InterProScan: Protein domains identifier. Nucleic Acids Res. 33: W116‐W120.
   Scordis, P., Flower, D.R., and Attwood, T.K. 1999. FingerPRINTScan: Intelligent searching of the PRINTS motif database. Bioinformatics 15: 799‐806.
   Selengut, J.D., Haft, D.H., Davidsen, T., Ganapathy, A., Gwinn‐Giglio, M., Nelson, W.C., Richter, A.R., and White, O. 2007. TIGRFAMs and Genome Properties: Tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35: D260‐D264.
   The UniProt Consortium. 2007. The Universal Protein Resource (UniProt). Nucleic Acids Res. 35: D193‐D197.
   Wilson, D., Madera, M., Vogel, C., Chothia, C., and Gough, J. 2007. The SUPERFAMILY database in 2007: Families and functions. Nucleic Acids Res. 35: D308‐D313.
   Wu, C.H., Nikolskaya, A., Huang, H., Yeh, L.S., Natale, D.A., Vinayaka, C.R., Hu, Z.Z., Mazumder, R., Kumar, S., Kourtesis, P., Ledley, R.S., Suzek, B.E., Arminski, L., Chen, Y., Zhang, J., Cardenas, J.L., Chung, S., Castro‐Alvear, J., Dinkov, G., and Barker, W.C. 2004. PIRSF: Family classification system at the Protein Information Resource. Nucleic Acids Res. 32: D112‐D114.
   Yeats, C., Maibaum, M., Marsden, R., Dibley, M., Lee, D., Addou, S., and Orengo, C.A. 2006. Gene3D: Modelling protein structure, function and evolution. Nucleic Acids Res. 34: D281‐284.
   Zdobnov, E.M., Lopez, R., Apweiler, R., and Etzold, T. 2002. The EBI SRS server‐new features. Bioinformatics 18: 1149‐1150.
Key References
   Biswas, M., O'Rourke, J.F., Camon, E., Fraser, G., Kanapin, A., Karavidopoulou, Y., Kersey, P., Kriventseva, E., Mittard, V., Mulder, N., Phan, I., Servant, F., and Apweiler, R. 2002. Applications of InterPro in protein annotation and genome analysis. Brief. Bioinform. 3: 225‐235.
  These papers are from a special issue of Briefings in Bioinformatics on InterPro and its member databases.
   Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., Bucher, P., Copley, R., Courcelle, E., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Griffith‐Jones, S., Haft, D., Hermjakob, H., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lopez, R., Letunic, I., Pagni, M., Peyruc, D., Ponting, C.P., Servant, F., and Sigrist, C.J.A. 2002. InterPro–An integrated documentation resource for protein families, domains and functional sites. Brief. Bioinform. 3: 285‐295.
Internet Resources
   http://www.cathdb.info/latest/index.html
  The CATH Web site. A database of protein Class, Architecture, Topology, and Homology.
   http://www.ebi.ac.uk/interpro
  The InterPro Web site. Integrated documentation resource for protein families, domains, and functional sites.
   http://www.ebi.uniprot.org/index.shtml
  The UniProt Web site. A universal Protein‐sequence database.
   http://www.sanger.ac.uk/Software/Pfam/
  The Pfam Web site. A collection of multiple sequence alignments and hidden Markov models (UNIT ).
   http://prodom.prabi.fr/prodom/current/html/home.php
  The ProDom Web site. An automatic compilation of homologous domains.
   http://www.expasy.ch/prosite
  The PROSITE Web site. A database of patterns and profiles describing protein families and domains (UNIT ).
   http://www.bioinf.manchester.ac.uk/dbbrowser/sprint/
  The PRINTS Web site. A compendium of protein fingerprints.
   http://smart.embl‐heidelberg.de
  A Simple Modular Architecture Research Tool (SMART). A collection of protein families and domains.
   http://www.tigr.org/TIGRFAMs/index.shtml
  The TIGRFAMs Web site. A database of protein families based on Hidden Markov Models.
   http://pir.georgetown.edu/pirsf
  The PIRSF Database Web site. A database of protein families based on full‐length Hidden Markov Models.
   http://supfam.org/SUPERFAMILY/index.html
  The SUPERFAMILY Web site. A database of Hidden Markov Model domains based on SCOP superfamilies.
   http://gene3d.biochem.ucl.ac.uk/Gene3D/
  The Gene3D Web site. A database of Hidden Markov Model domains based on CATH superfamilies.
   http://www.pantherdb.org/
  The PANTHER Web site. A database of families based on Hidden Markov Models.
   http://scop.mrc‐lmb.cam.ac.uk/scop
  Homepage for Structural Classification of Proteins (SCOP).
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library