Using Chado to Store Genome Annotation Data

Pinglei Zhou1, David Emmert1, Peili Zhang1

1 Harvard University, Cambridge, Massachusetts
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 9.6
DOI:  10.1002/0471250953.bi0906s12
Online Posting Date:  January, 2006
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Chado is a relational database schema that can be used to manage a wide variety of biological information, including genome annotation, genetic, phenotypic, and expression data. Its flexibility comes from its use of “ontologies,” which are controlled vocabularies that describe data types and the relationships among them. By changing its ontologies, Chado can be customized to suit many different needs. Another aspect that gives Chado its flexibility is its use of a modular design, which means that users can choose to use only those features of Chado that are suitable for their needs. XORT is the main software tool used to move data in and out of Chado databases. XORT uses an XML‐based file format for data import and export; this format is called ChadoXML, The protocols described in this chapter show how to use XORT and related software to import genome annotation data into Chado databases, and how to export data stored in Chado databases into different file formats for report and data mining purposes.

Keywords: Chado; genome; annotation; database; XORT; GAME; GMOD

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Installing Chado and XORT in the Unix/Linux Environment
  • Basic Protocol 2: Building a Chado Annotation Database
  • Basic Protocol 3: Loading a GenBank File
  • Basic Protocol 4: Querying a Chado Annotation Database Using SQL
  • Basic Protocol 5: Generating Standard Reports from a Chado Annotation Database
  • Support Protocol 1: Installing Software for a Unix‐Like Environment on a PC
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 5:215:403‐10.
   Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94.
   Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8:967‐974.
   Lewis, S.E., Searle, S.M.J., Harris, N., Gibson, M., Iyer, V., Richter, J., Wiel, C., Bayraktarogly, L., Birney, E., Crosby, M.A., Kaminker, J.S., Matthews, B.B., Prochnik, S.E., Smith, C.D., Tupy, J.L., Rubin, G.M., Misra, S., Mungall, C.J., and Clamp, M.E. 2002. Apollo: A sequence annotation editor. Genome Biol. 3(12).
   Mungall, C.J., Misra, S., Berman, B.P., Carlson, J., Frise, E., Harris, N., Marshall, B., Shu, S., Kaminker, J.S., Prochnik, S.E., Smith, C.D., Smith, E., Tupy, J.L., Wiel, C., Rubin, G.M., and Lewis, S.E. 2002. An integrated computational pipeline and database to support whole‐genome sequence annotation. Genome Biol. 3(12).
   Reese, M.G., Kulp, D., Tammana, H., and Haussler, D. 2000. Genie: Gene finding in Drosophila melanogaster. Genome Res. 10:529‐538.
   Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., and Lewis, S. 2002. The generic genome browser: A building block for a model organism system database. Genome Res. 12:1599‐1610.
Internet Resources
  Web site of GMOD.
  Web site of FlyBase.
  Location of GAME XML DTD.
PDF or HTML at Wiley Online Library