Using Model Organism Databases (MODs)

Stacia R. Engel1, Kevin A. MacPherson1

1 Stanford University School of Medicine, California
Publication Name:  Current Protocols Essential Laboratory Techniques
Unit Number:  Unit 11.4
DOI:  10.1002/cpet.4
Online Posting Date:  November, 2016
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Model Organism Databases (MODs) represent the union of database technology and biology, and are essential to modern biological and medical research. Research communities are producing floods of new data, of increasingly different types and complexity. MODs assimilate this information from a wide variety of sources, organize it in a comprehensible manner, and make it freely available to the public via the Internet. MODs permit researchers to sort through massive amounts of data, providing access to key information that they might otherwise have overlooked. The protocols in this unit offer a general introduction to different types of data available in the growing number of MODs, and approaches for accessing, browsing, and querying these data. © 2016 by John Wiley & Sons, Inc.

Keywords: genome project; genetics; DNA sequence; gene model; protein function

PDF or HTML at Wiley Online Library

Table of Contents

  • Overview and Principles
  • Basic Protocol 1: General Guidelines for Using a Model Organism Database Using the Saccharomyces Genome Database as an Example
  • Basic Protocol 2: Obtaining a Sequence from JBrowse
  • Basic Protocol 3: Using Textpresso to Search Full‐Text Papers
  • Basic Protocol 4: Using InterMine to Perform Complex Data Queries
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

  Arnaud, M.B., Costanzo, M.C., Skrzypek, M.S., Binkley, G., Lane, C., Miyasato, S.R., and Sherlock, G. 2005. The Candida Genome Database (CGD), a community resource for Candida albicans gene and protein information. Nucleic Acids Res. 33:D358‐D363. doi: 10.1093/nar/gki003.
  Arnaud, M.B., Chibucos, M.C., Costanzo, M.C., Crabtree, J., Inglis, D.O., Lotia, A., Orvis, J., Shah, P., Skrzypek, M.S., Binkley, G., Miyasato, S.R., Wortman, J.R., and Sherlock, G. 2010. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein, and sequence information for the Aspergillus research community. Nucleic Acids Res. 38:D420‐D427. doi: 10.1093/nar/gkp751.
  Blake, J.A. and Harris, M.A. 2008. The Gene Ontology (GO) Project: Structured vocabularies for molecular biology and their application to genome and expression analysis. Curr. Protoc. Bioinform. 23:7.2.1‐7.2.9. doi: 10.1002/0471250953.bi0702s23.
  Blake, J.A., Bult, C.J., Eppig, J.T., Kadin, J.A., Richardson, J.E., and the Mouse Genome Database Group. 2009. The Mouse Genome Database genotypes::phenotypes. Nucleic Acids Res. 37:D712‐D719. doi: 10.1093/nar/gkn886.
  Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester, E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S., and Botstein, D. 1998. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 26:73‐79. doi: 10.1093/nar/26.1.73.
  Dwight, S.S., Balakrishnan, R., Christie, K.R., Costanzo, M.C., Dolinski, K., Engel, S.R., Feierbach, B., Fisk, D.G., Hirschman, J., Hong, E.L., Issel‐Tarver, L., Nash, R.S., Sethuraman, A., Starr, B., Theesfeld, C.L., Andrada, R., Binkley, G., Dong, Q., Lane, C., Schroeder, M., Weng, S., Botstein, D., and Cherry, J.M. 2004. Saccharomyces genome database: Underlying principles and organisation. Brief. Bioinform. 5:9‐22. doi: 10.1093/bib/5.1.9.
  Gelbart, W.M., Crosby, M., Matthews, B., Rindone, W.P., Chillemi, J., Russo Twombly, S., Emmert, D., Ashburner, M., Drysdale, R.A., Whitfield, E., Millburn, G.H., de Grey, A., Kaufman, T., Matthews, K., Gilbert, D., Strelets, V., and Tolstoshev, C. 1997. FlyBase: A Drosophila Database. Nucleic Acids Res. 25:63‐66. doi: 10.1093/nar/25.1.63.
  Gene Ontology Consortium. 2008. The Gene Ontology project in 2008. Nucleic Acids Res. 36:440‐444. doi: 10.1093/nar/gkm883.
  Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., Louis, E.J., Mewes, H.W., Murakami, Y., Philippsen, P., Tettelin, H., and Oliver, S.G. 1996. Life with 6000 genes. Science 274:546‐567. doi: 10.1126/science.274.5287.546.
  Karolchik, D., Hinrichs, A.S., and Kent, W.J. 2012. The UCSC genome browser. Curr. Protoc. Bioinform. 40:1.4.1‐1.4.33. doi: 10.1002/0471250953.bi0104s40.
  Kreppel, L., Fey, P., Gaudet, P., Just, E., Kibbe, W.A., Chisholm, R.L., and Kimmel, A.R. 2004. dictyBase: A new Dictyostelium discoideum genome database. Nucleic Acids Res. 32:D332‐D333. doi: 10.1093/nar/gkh138.
  Lamesch, P., Dreher, K., Swarbreck, D., Sasidharan, R., Reiser, L., and Huala, E. 2010. Using The Arabidopsis Information Resource (TAIR) to find information about Arabidopsis genes. Curr. Protoc. Bioinform. 30:1.11.1‐1.11.51. doi: 10.1002/0471250953.bi0111s30.
  Laulederkind, S.J.F., Hayman, G.T., Wang, S.‐J., Lowry, T.F., Nigam, R., Petri, V., Smith, J.R., Dwinell, M.R., Jacob, H.J., and Shimoyama, M. 2012. Exploring genetic, genomic, and phenotypic data at the rat genome database. Curr. Protoc. Bioinform. 40:1.14.1‐1.14.27. doi: 10.1002/0471250953.bi0114s40.
  Müller, H.M., Kenny, E.E., and Sternberg, P.W. 2004. Textpresso: An ontology‐based information retrieval and extraction system for biological literature. PLoS Biol. 2:e309. doi: 10.1371/journal.pbio.0020309.
  Ono, B.I., Hazu, T., Yoshida, S., Kawato, T., Shinoda, S., Brzvwczy, J., and Paszewski, A. 1999. Cysteine biosynthesis in Saccharomyces cerevisiae: A new outlook on pathway and regulation. Yeast 15:1365‐1375. doi: 10.1002/(SICI)1097‐0061(19990930)15:13<1365::AID‐YEA468>3e3.0.CO;2‐U.
  Rhee, S.Y. 2000. Bioinformatic resources, challenges, and opportunities using Arabidopsis as a model organism in post‐genomic era. Plant Physiol. 124:1460‐1464. doi: 10.1104/pp.124.4.1460.
  Schwarz, E.M. and Sternberg, P.W. 2006. Searching WormBase for information about Caenorhabditis elegans. Curr. Protoc. Bioinform. 14:1.8.1‐1.8.43. doi: 10.1002/0471250953.bi0108s14.
  Shaw, D.R. 2009. Searching the Mouse Genome Informatics (MGI) resources for information on mouse biology from genotype to phenotype. Curr. Protoc. Bioinform. 25:1.7.1‐1.7.14. doi: 10.1002/0471250953.bi0107s25.
  Skinner, M. E. and Holmes, I. H. 2010. Setting up the JBrowse genome browser. Curr. Protoc. Bioinform. 32:9.13.1‐9.13.13. doi: 10.1002/0471250953.bi0913s32.
  Skinner M.E., Uzilov A.V., Stein L.D., Mungall, C.J., and Holmes, I.H. 2009. JBrowse: A next‐generation genome browser. Genome Res. 19:1630‐1638. doi: 10.1101/gr.094607.109.
  Smith, R.N., Aleksic, J., Butano, D., Carr, A., Contrino, S., Hu, F., Lyne, M., Lyne, R., Kalderimis, A., Rutherford, K., Stepan, R., Sullivan, J., Wakeling, M., Watkins, X., and Micklem, G. 2012. InterMine: A flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics 28:3163‐3165. doi: 10.1093/bioinformatics/bts577.
  Stein, L.D., Sternberg, P., Durbin, R., Thierry‐Mieg, J., and Spieth, J. 2001. WormBase: Network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids Res. 29:82‐86. doi: 10.1093/nar/29.1.82.
  Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., and Lewis, S. 2002. The generic genome browser: A building block for a model organism system database. Genome Res. 12:1599‐1610. doi: 10.1101/gr.403602.
  Stover, N. A. and Cavalcanti, A. R. 2014. Using NCBI BLAST. Curr. Protoc. Essen. Lab. Tech. 11:11.1.1‐11.1.35. doi: 10.1002/9780470089941.et1101s08.
  Twigger, S.N., Shimoyama, M., Bromberg, S., Kwitek, A.E., Jacob, H.J., and the RGD Team. 2007. The Rat Genome Database, update 2007—Easing the path from disease to data and back again. Nucleic Acids Res. 35:D658‐D662. doi: 10.1093/nar/gkl988.
Internet Resources
  Resource for functional analysis of agricultural plant and animal gene products.
  The Arabidopsis Information Resource (TAIR): Database of genetic and molecular biology data for the plant Arabidopsis thaliana.
  Ascidian Network for In Situ Expression and Embryological Data (ANISEED): Database for Ciona intestinalis, C. savignyi, Halocynthia roretzi, and Phallusia mammillata.
  Ashbya Genome Database (AGD): Database of gene annotation and microarray data for Ashbya gossypii and Saccharomyces cerevisiae.
  Aspergillus Genome Database (AspGD): Resource for genomic sequence data and gene and protein information for Aspergilli.
  Database that integrates bovine genomics data with structural and functional annotations of genes and the genome.
  Database that serves as a resource for genomic sequence data and gene and protein information for Candida albicans.
  Resource for the biology and genomics of the social amoeba Dictyostelium discoideum.
  Centralized resource linking various E. coli online information services, databases, and Web sites.
  Database of Drosophila genes and genomes.
  Generic Model Organism Database (GMOD) project: Collection of open‐source software tools for creating genome‐scale biological databases.
  Data resource for comparative genome analysis in the grasses.
  Hymenoptera Genome Database (BeeBase, NasoniaBase, Ant Genomes Portal): Database of genes and genomes of Apis mellifera, Nasonia vitripennis, and other Hymenopterans.
  InterMine: Open‐source data warehouse system that enables the creation of biological databases that integrate multiple types of data from different sources.
  Mouse Genome Informatics: Resource for the laboratory mouse, providing genetic, genomic, and biological data for the study of human health and disease.
  Database of genomic sequence and genetic data for Paramecium tetraurelia.
  Rat Genome Database (RGD): Database of laboratory rat genetic and genomic data, including information for quantitative trait loci, mutations, and phenotypes.
  Saccharomyces Genome Database (SGD): Scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae.
  Schizosaccharomyces pombe GeneDB: Database of genetic features, functional annotations, and other information for fission yeast.
  Schmidtea mediterranea Genome Database (SmedGD): Database for information associated with the planarian genome.
  Text‐mining system for scientific literature.
  Web service that provides gene and genomic information for species of the genus Daphnia, commonly known as the water flea.
  Biology and genomic information for Caenorhabditis species.
  Zebrafish Information Network: Database for the molecular biology and genetics of zebrafish.
PDF or HTML at Wiley Online Library