The Importance of Biological Databases in Biological Discovery

Andreas D. Baxevanis1, Alex Bateman2

1 Bethesda, Maryland, 2 European Bioinformatics Institute (EMBL‐EBI), Hinxton
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 1.1
DOI:  10.1002/0471250953.bi0101s50
Online Posting Date:  June, 2015
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non‐sequence‐centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. © 2015 by John Wiley & Sons, Inc.

Keywords: biological database; sequence database; structure database; model organisms; biological pathways

PDF or HTML at Wiley Online Library

Table of Contents

  • Overview
  • Disclaimer
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Antonescu, C., Antonescu, V., Sultana, R., and Quackenbush, J. 2010. Using the DFCI Gene Index Databases for biological discovery. Curr. Protoc. Bioinform. 29:1.6.1‐1.6.36.
  Baxevanis, A.D. 2012. Searching Online Mendelian Inheritance in Man (OMIM) for information on genetic loci involved in human disease. Curr. Protoc. Bioinform. 37:1.2.1‐1.2.10.
  Bhagwat, M. 2010. Searching NCBI's dbSNP database. Curr. Protoc. Bioinform. 32:1.19.1‐1.19.18.
  Caspi, R. and Karp, P.D. 2007. Using the MetaCyc Pathway Database and the BioCyc Database Collection. Curr. Protoc. Bioinform. 20:1.17.1‐1.17.51.
  Collins, F.S., Green, E.D., Guttmacher, A.E., Guyer, M.S., on behalf of the U.S. National Human Genome Research Institute. 2003. A vision for the future of genomics research. Nature 422:835‐847.
  Collins, F.S., Patrinos, A., Jordan, E., Chakravarti, A., Gesteland, R., Walters, L., and Members of the DOE and NIH Planning Groups. 1998. New goals for the U.S. human genome project: 1998‐2003. Science 282:682‐689.
  Dutta, S., Berman, H., and Bluhm, W.F. 2007. Using the tools and resources of the RCSB Protein Data Bank. Curr. Protoc. Bioinform. 20:1.9.1‐1.9.24.
  Fernández‐Suárez, X.M. and Schuster, M.K. 2010. Using the Ensembl Genome Server to browse genomic sequence data. Curr. Protoc. Bioinform. 30:1.15:1.15.1‐1.15.48.
  Galperin, M.Y, Rigden, D.J., and Fernandez‐Suarez, X.M. 2015. The 2015 Nucleic Acids Research database issue and molecular biology database collection. Nucleic Acids Res. 43:D1‐D5.
  Gibney, G. and Baxevanis, A.D. 2011. Searching NCBI databases using Entrez. Curr. Protoc. Bioinform. 34:1.3.1‐1.3.25.
  Green, E.D., Guyer, M.S., and the National Human Genome Research Institute. 2011. Charting a course for genomic medicine from basepairs to bedside. Nature 470:204‐213.
  Hoffmann, R. 2007. Using the iHOP information resource to mine the biomedical literature on genes, proteins, and chemical compounds. Curr. Protoc. Bioinform. 20:1.16.1‐1.16.16.
  Horaitis, O. and Cotton, R.G. 2005. Human mutation databases. Curr. Protoc. Bioinform. 9:1.10.1‐1.10.13.
  Karolchik, D., Hinrichs, A.S., and Kent, W.J. 2012. The UCSC Genome Browser. Curr. Protoc. Bioinform. 40:1.4.1‐1.4.33.
  Lamesch, P., Dreher, K., Swarbreck, D., Sasidharan, R., Reiser, L., and Huala, E. 2010. Using The Arabidopsis Information Resource (TAIR) to find information about Arabidopsis genes. Curr. Protoc. Bioinform. 30:1.11.1‐1.11.51.
  Laulederkind, S.J.F., Hayman, G.T., Wang, S.‐J., Lowry, T.F., Nigam, R., Petri, V., Smith, J.R., Dwinell, M.R., Jacob, H.J., and Shimoyama, M. 2012. Exploring genetic, genomic, and phenotypic data at the Rat Genome Database. Curr. Protoc. Bioinform. 40:1.14.1‐1.14.27.
  Lott, M.T., Leipzig, J.N., Derbeneva, O., Xie, H. M., Chalkia, D., Sarmady, M., Procaccio, V., and Wallace, D.C. 2013. mtDNA variation and analysis using Mitomap and Mitomaster. Curr. Protoc. Bioinform. 1:1.23.1‐1.23.26.
  Muthusamy, B., Thomas, J.K., Prasad, T.K. and Pandey, A. 2013. Access guide to Human Proteinpedia. Curr. Protoc. Bioinform. 41:1.21.1‐1.21.15.
  Ramachandran, S., Ruef, B., Pich, C., and Sprague, J. 2010. Exploring zebrafish genomic, functional and phenotypic data using ZFIN. Curr. Protoc. Bioinform. 31:1.18.1‐1.18.44.
  Rappaport, N., Twik, M., Nativ, N., Stelzer, G., Bahir, I., Stein, T. I., Safran, M., and Lancet, D. 2014. MalaCards: A comprehensive automatically‐mined database of human diseases. Curr. Protoc. Bioinform. 47:1.24.1‐1.24.19.
  Rawlings, N.D., Barrett, A.J., and Bateman, A. 2014. Using the MEROPS database for proteolytic enzymes and their inhibitors and substrates. Curr. Protoc. Bioinform. 48:1.25.1‐1.25.33.
  Schwarz, E.M. and Sternberg, P.W. 2006. Searching WormBase for information about Caenorhabditis elegans. Curr. Protoc. Bioinform. 14:1.8.1‐1.8.43.
  Shaw, D.R. 2009. Searching the Mouse Genome Informatics (MGI) resources for information on mouse biology from genotype to phenotype. Curr. Protoc. Bioinform. 25:1.7.1‐1.7.14.
  Skrzypek, M.S. and Hirschman, J. 2011. Using the Saccharomyces Genome Database (SGD) for analysis of genomic information. Curr. Protoc. Bioinform. 35:1.20.1‐1.20.23.
  Stenson, P.D., Ball, E.V., Mort, M., Phillips, A.D., Shaw, K., and Cooper, D. N. 2012. The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr. Protoc. Bioinform. 39:1.13.1‐1.13.20.
  Tanabe, M. and Kanehisa, M. 2012. Using the KEGG Database Resource. Curr. Protoc. Bioinform. 38:1.12.1‐1.12.43.
  Wolfsberg, T.G. 2010. Using the NCBI Map Viewer to browse genomic sequence data. Curr. Protoc. Bioinform. 29:1.5.1‐1.5.25.
Internet Resources
  Ensembl Web site.
  Human Gene Mutation Database (HGMD).
  Kyoto Encyclopedia of Genes and Genomes (KEGG).
  Mouse Genome Informatics at the Jackson Laboratory.
  National Center for Biotechnology Information (GenBank).
  NCBI Entrez Web site.
  National Human Genome Research Institute (NHGRI).
  Online Mendelian Inheritance in Man (OMIM).
  Protein Data Bank (PDB).
  The Arabidopsis Information Resource (TAIR).
  University of California at Santa Cruz (UCSC) Genome Browser.
  Status of genome sequencing projects funded by the National Human Genome Research Institute.
  The Gene Index Project.
  Saccharomyes Genome Database.
  Human Proteinpedia.
  iPlant Collaborative.
PDF or HTML at Wiley Online Library