Protein Databases on the Internet

Dong Xu1

1 Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri
Publication Name:  Current Protocols in Protein Science
Unit Number:  Unit 2.6
DOI:  10.1002/0471140864.ps0206s70
Online Posting Date:  November, 2012
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Protein databases have become a crucial part of modern biology. Huge amounts of data for protein structures, functions, and particularly sequences are being generated. Searching databases is often the first step in the study of a new protein. Comparison between proteins or between protein families provides information about the relationship between proteins within a genome or across different species, and hence offers much more information than can be obtained by studying only an isolated protein. In addition, secondary databases derived from experimental databases are also widely available. These databases reorganize and annotate the data or provide predictions. The use of multiple databases often helps researchers understand the structure and function of a protein. Although some protein databases are widely known, they are far from being fully utilized in the protein science community. This unit provides a starting point for readers to explore the potential of protein databases on the Internet. Curr. Protoc. Protein Sci. 70:2.6.1‐2.6.17. © 2012 by John Wiley & Sons, Inc.

Keywords: bioinformatics; biological databases; protein analysis; protein modeling

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Protein Sequence Databases
  • Protein Structural Databases
  • Protein Family Databases
  • Other Databases
  • Summary
  • Acknowledgments
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   Arnold, K., Kiefer, F., Kopp, J., Battey, J.N., Podvinec, M., Westbrook, J.D., Berman, H.M., Bordoli, L., and Schwede, T. 2009. The protein model portal. J. Struct. Funct. Genomics 10:1‐8.
   Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M., Servant, F., Sigrist, C.J., and Zdobnov, E.M. 2001. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29:37‐40.
   Attwood, T.K., Flower, D.R., Lewis, A.P., Mabey, J.E., Morgan, S.R., Scordis, P., Selley, J., and Wright, W. 1999. PRINTS prepares for the new millennium. Nucleic Acids Res. 27:220‐225.
   Bairoch, A. 1993. The ENZYME data bank. Nucleic Acids Res. 21:3155‐3156.
   Bairoch, A. and Apweiler, R. 1999. The UniProt protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 27:49‐54.
   Barker, W.C., Garavelli, J.S., McGarvey, P.B., Marzec, C.R., Orcutt, B.C., Srinivasarao, G.Y., Yeh, L.L., Ledley, R.S., Mewes, H., Pfeiffer, F., Tsugita, A., and Wu, C. 1999. The PIR‐international protein sequence database. Nucleic Acids Res. 27:39‐42.
   Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette, B.F., Rapp, B.A., and Wheeler, D.L. 1999. Genbank. Nucleic Acids Res. 27:12‐17.
   Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. 1977. The protein data bank: A computer based archival file for macromolecular structures. J. Mol. Biol. 112:535‐542.
   Bourne, P., Berman, H., Watenpaugh, K., Westbrook, J., and Fitzgerald, P. 1997. The macromolecular crystallographic information file (mmCIF). Methods Enzymol. 277:571‐590.
   Contreras‐Moreira, B. 2010. 3D‐footprint: A database for the structural analysis of protein‐DNA complexes. Nucleic Acids Res. 38:D91‐D97.
   Corpet, F., Gouzy, J., and Kahn, D. 1999. Recent improvements of the ProDom database of protein domain families. Nucleic Acids Res. 27:263‐267.
   Etzold, T., Ulyanov, A., and Argos, P. 1996. SRS: Information retrieval system for molecular biology data banks. Methods Enzymol. 266:114‐128.
   Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L., Eddy, S.R., and Bateman, A. 2010. The Pfam protein families database. Nucleic Acids Res. 38:D211‐D222.
   Fryklund, L. and Sievertsson, H. 1978. Primary structure of somatomedin B: A growth hormone‐dependent serum factor with protease inhibiting activity. FEBS Lett. 87:55‐60.
   Gao, J. Agrawal, G.K., Thelen, J.J., and Xu, D. 2009. P3DB: A plant protein phosphorylation database. Nucleic Acids Res. 37:D960‐D962.
   The Gene Ontology Consortium. 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25:25‐29.
   Gerstein, M. and Krebs, W. 1998. A database of macromolecular motions. Nucleic Acids Res. 26:4280‐4290.
   Gibrat, J.F., Madej, T., and Bryant, S.H. 1996. Surprising similarities in structure comparison. Curr. Opinion Struct. Biol. 6:377‐385.
   Gromiha, M.M., An, J., Kono, H., Oobatake, M., Uedaira, H., and Sarai, A. 1999. Protherm: Thermodynamic database for proteins and mutants. Nucleic Acids Res. 27:286‐288.
   Gupta, R., Birch, H., Rapacki, K., Brunak, S., and Hansen, J.E. 1999. O‐GLYCBASE version 4.0: A revised database of O‐glycosylated proteins. Nucleic Acids Res. 27:370‐372.
   Hanson, R.M. 2010. Jmol—A paradigm shift in crystallographic visualization. J. Appl. Crystallogr. 43:1250‐1260.
   Hendlich, M. 1998. Databases for protein‐ligand complexes. Acta Crystallogr. D 1:1178‐1182.
   Heniko, J.G., Heniko, S., and Pietrokovski, S. 1999. New features of the blocks database servers. Nucleic Acids Res. 27:226‐228.
   Heazlewood, J.L., Verboom, R.E., Tonti‐Filippini, J., Small, I., and Millar, A.H. 2007. SUBA: The Arabidopsis Subcellular Database. Nucleic Acids Res. 35:D213‐D218.
   Hofmann, K., Bucher, P., Falquet, L., and Bairoch, A. 1999. The PROSITE database, its status in 1999. Nucleic Acids Res. 27:215‐219.
   Holm, L. and Sander, C. 1996. Mapping the protein universe. Science 273:595‐602.
   Hu, Z.Z., Mani, I., Hermoso, V., Liu, H., and Wu, C.H. 2004. iProLINK: An integrated protein resource for literature mining. Comput. Biol. Chem. 28:409‐416.
   Huang, D.W., Sherman, B.T., and Lempicki, R.A. 2009. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nat. Protoc. 4:44‐57.
   Jensen, L.J., Kuhn, M., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A., Simonovic, M., Bork, P., and von Mering, C. 2009. STRING 8—A global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37:D412‐D416.
   Kraulis, P. 1991. MOLSCRIPT—A program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr. 24:946‐950.
   Krissinel, K. and Henrick, K. 2007. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372:774‐797.
   Laskowski, R.A., Hutchinson, E.G., Michie, A.D., Wallace, A.C., Jones, M.L., and Thornton, J.M. 1997. PDBsum: A web‐based database of summaries and analyses of all PDB structures. Trends Biochem. Sci. 22:488‐490.
   Letunic, I., Doerks, T., and Bork, P. 2009. SMART 6: Recent updates and new developments. Nucleic Acids Res. 37:D229‐D232.
   Levy, E.D., Pereira‐Leal, J.B., Chothia, C., and Teichmann, S.A. 2006. 3D complex: A structural classification of protein complexes. PLoS Comput. Biol. 2:e155.
   Liang, J., Edelsbrunner, H., and Woodward, C. 1998. Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design. Protein Sci. 7:1884‐1897.
   Liu, T., Lin, Y., Wen, X., Jorissen, R.N., and Gilson, M.K. 2007. BindingDB: A Web‐accessible database of experimentally determined protein‐ligand binding affinities. Nucleic Acids Res. 35:D198‐D201.
   Liu, Z., Cao, J., Gao, X., Zhou, Y., Wen, L., Yang, X., Yao, X., Ren, J., and Xue, Y. 2011. CPLA 1.0: An integrated database of protein lysine acetylation. Nucleic Acids Res. 39:D1029‐D1034.
   Marchler‐Bauer, A., Addess, K.J., Chappey, C., Geer, L., Madej, T., Matsuo, Y., Wang, Y., and Bryant, S.H. 1999. MMDB: Entrez's 3D structure database. Nucleic Acids Res. 27:240‐243.
   Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536‐540.
   Nogales‐Cadenas, R., Abascal, F., Díez‐Pérez, J., Carazo, J.M., and Pascual‐Montano, A. 2009. CentrosomeDB: A human centrosomal proteins database. Nucleic Acids Res. 37:D175‐D180.
   Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M. 1999. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27:29‐34.
   Orengo, C.A., Michie, A.D., Jones, D.T., Swindells, M.B., and Thornton, J.M. 1997. CATH—A hierarchic classification of protein domain structures. Structure 5:1093‐1108.
   Pierleoni, A., Martelli, P.L., Fariselli, P., and Casadio, R. 2007. eSLDB: Eukaryotic subcellular localization database. Nucleic Acids Res. 35:D208‐D212.
   Porter, C.T., Bartlett, G.J., and Thornton, J.M. 2004. The Catalytic Site Atlas: A resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 32:D129‐D133.
   Prilusky, J., Hodis, E., Canner, D., Decatur, W.A., Oberholser, K., Martz, E., Berchanski, A., Harel, M., and Sussman, J.L. 2011. Proteopedia: A status report on the collaborative, 3D Web‐encyclopedia of proteins and other biomolecules. J. Struct. Biol. 175:244‐252.
   Rebhan, M., Chalifa‐Caspi, V., Prilusky, J., and Lancet, D. 1998. GeneCards: A novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14:656‐664.
   Scheer, M., Grote, A., Chang, A., Schomburg, I., Munaretto, C., Rother, M., Söhngen, C., Stelzer, M., Thiele, J., and Schomburg, D. 2011. BRENDA, the enzyme information system in 2011. Nucleic Acids Res. 39:D670‐D676.
   Shindyalov, I.N. and Bourne, P.E. 1998. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11:739‐747.
   Sigurdardottir, O. and Wiman, B. 1994. Identification of a PAI‐1 binding site in vitronectin. Biochim. Biophys. Acta. 1208:104‐110
   Sobolev, V., Sorokine, A., Prilusky, J., Abola, E.E., and Edelman, M. 1999. Automated analysis of interatomic contacts in proteins. Bioinformatics 15:327‐332.
   Sprenger, J., Lynn Fink, J., Karunaratne, S., Hanson, K., Hamilton, N.A., and Teasdale, R.D. 2008. LOCATE: A mammalian protein subcellular localization database. Nucleic Acids Res. 36:D230‐D233.
   Stark, C., Breitkreutz, B.J., Chatr‐Aryamontri, A., Boucher, L., Oughtred, R., Livstone, M.S., Nixon, J., Van Auken, K., Wang, X., Shi, X., Reguly, T., Rust, J.M., Winter, A., Dolinski, K., and Tyers, M. 2011. The BioGRID Interaction Database: 2011 update. Nucleic Acids Res. 39:D698‐D704.
   Tatusov, R.L., Koonin, E.V., and Lipman, D.J. 1997. A genomic perspective on protein families. Science 278:631‐637.
   UniProt Consortium. 2011. Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 39:D214‐D219.
   University of Wisconsin. 1999. BioMagResBank. University of Wisconsin, Madison, Wisconsin.
   Wiwatwattana, N. and Kumar, A. 2005. Organelle DB: A cross‐species database of protein localization and function. Nucleic Acids Res. 33:D598‐D604.
   Wu, C.H., Huang, H., Nikolskaya, A., Hu, Z., and Barker, W.C. 2004. The iProClass integrated database for protein functional analysis. Comput. Biol. Chem. 28:87‐96.
   Xenarios, I., Rice, D.W., Salwinski, L., Baron, M.K., Marcotte, E.M., and Eisenberg, D. 2000. DIP: The database of interacting proteins. Nucleic Acids Res. 28:289‐291.
   Yu, N.Y., Laird, M.R., Spencer, C., and Brinkman, F.S. 2011. PSORTdb—An expanded, auto‐updated, user‐friendly protein subcellular localization database for bacteria and Archaea. Nucleic Acids Res. 39:D241‐D244.
Internet Resources
  Bioinformatics Links Directory.
  Pedro's biomolecular research tool.
  SIB Bioinformatics Resource Portal.
  Genomics, Proteomics and Bioinformatics Knowledge Base.
  Bioinformatics tools and algorithms.
PDF or HTML at Wiley Online Library