Sequence Databases: Integrated Information Retrieval and Data Submission

Jane M. Weisemann1, Mark S. Boguski1, B.F. Francis Ouellette2

1 National Center for Biotechnology Information, Bethesda, Maryland, 2 Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 6.7
DOI:  10.1002/0471142905.hg0607s27
Online Posting Date:  May, 2001
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


This unit describes the NCBI's Entrez database browser. Entrez integrates DNA and protein sequence data, threedimensional structures, and taxonomic information with its associated abstracts and citations contained in PubMed (MEDLINE). It is possible to search the Entrez information space using conventional search queries (authors, gene names, map location) as well as by bibliographic associations (articles that are related to one another) and sequence homology. Also described are the procedures for submission of new data, updates, and corrections to the sequence databases.

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction to Entrez
  • Data Submission: General Considerations
  • Submitting a Sequence to the Nucleotide Database
  • Submitting an Update or Correction to an Existing GenBank Entry
  • Submitting EST, STS, or GSS Data
  • Submitting High‐Throughput Genome Sequences (HTGS)
  • Conclusion
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., Kerlavage, A.R., McCombie, W.R., and Venter, J.C. 1991. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252:1651‐1656.
   Barrell, B.G. and Clark, B.F.C. 1974. Handbook of Nucleic Acid Sequences. Joynson‐Bruvvers, Oxford.
   Baxevanis, A.D., Boguski, M.S., and Ouellette, B.F.F. 1997. Computational analysis of DNA and protein sequences. In Genome Analysis: A Laboratory Manual (B. Birren, E.D. Green, S. Kapholz, R.M. Myers, and J. Roskams.eds.) pp. 533‐586. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
   Benson, D.A., Karsch‐Mizrachi, I., Lipman, D.J., Ostell, J., Rapp, B.A., Wheeler, D.L. 2000. GenBank. Nucl. Acids. Res. 28:15‐18.
   Boguski, M. and McEntyre, J. 1994. I think therefore I publish. Trends Biochem. Sci. 19:71.
   Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST: Database for “Expressed Sequence Tags”. Nature Genet. 4:332‐333.
   Church, D.M., Stotler, C.J., Rutter, J.L., Murrell, J.R., Trofatter, J.A., and Buckler, A.J. 1993. Isolation of genes from complex sources of mammalian genomic DNA using exon amplification. Nature Genet. 6:98‐105.
   Cockerill, M. 1994. A versatile tool for retrieving molecular sequences. Trends Biochem. Sci. 19:94‐96.
   Harper, R. 1994. Access to DNA and protein databases on the Internet. Current Opin. Biotechnol. 5:4‐18.
   Kahn, A.S., Wilcox, A.S., Polymeropoulos, M.H., Hopkins, J.A., Stevens, T.J., Robinson, M., Orpana, A.K., and Sikela, J.M. 1992. Single pass sequencing and physical and genetic mapping of human brain cDNAs. Nature Genet. 2:180‐185.
   Kans, J.A. and Ouellette, B.F.F. 1998. Submitting DNA sequences to the databases. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp. 319‐353. John Wiley & Sons, New York.
   Okubo, K., Hori, N., Matoba, R., Niiyama, T., Fukushima, A., Kojima, Y., and Matsubara, K. 1992. Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nature Genet. 2:173‐179.
   Schuler, G.D., Epstein, J.A., Ohkawa, H., and Kans, J.A. 1996. Entrez: Molecular biology database and retrieval system. Methods Enzymol. 266:141‐162.
   Smith, M.W., Holmsen, A.L., Wei, Y.H., Peterson, M., and Evans, G.A. 1994. Genomic sequence sampling: A strategy to high resolution sequence‐based physical mapping of complex genomes. Nature Genet. 7:40‐47.
   Smith, T.F. 1990. The history of the genetic sequence databases. Genomics 6:702‐707.
   Waterston, R., Martin, C., Craxton, M., Huynh, C., Coulson, A., Hillier, L., Durbin, R., Green, P., Shownkeen, R., Halloran, N., Metzstein, M., Hawkins, T., Wilson, R., Berks, M., Du, Z., Thierry‐Mieg, J., and Sulston, J. 1992. A survey of expressed genes in Caenorhabditis elegans. Nature Genet. 1:114‐123.
Internet Resources
   DNA Data Bank of Japan (DDBJ; Center for Information Biology, National Institute of Genetics), 1111 Yata, Mishima, Shiznoka 411, Japan; Fax 81‐559‐81‐6849. e‐mail submissions:, updates:, information:, home page:, WWW submissions:
   European Molecular Biology Laboratory (EMBL), EMBL Outstation, European Bioinformatics Institutes (EBI), Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom. e‐mail submissions:, updates:, information:, home page:, WWW submissions:, WebIn:
  National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), Bldg. 38A, Room 8N‐803, 8600 Rockville Pike, Bethesda, Maryland 20894; Telephone: 301‐496‐2475; Fax: 301‐480‐9241. e‐mail submissions:, EST/GSS/STS:, updates:, information:, home page:, WWW submissions:, BankIt:
Appendix: Sample Genbank Records
PDF or HTML at Wiley Online Library