DRAGON and DRAGON View: Information Annotation and Visualization Tools for Large‐Scale Expression Data

Christopher M.L.S. Bouton1, Jonathan Pevsner2

1 LION Bioscience Research, Cambridge, Massachusetts, 2 Kennedy Krieger Institute and Johns Hopkins University School of Medicine, Baltimore, Maryland
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 7.4
DOI:  10.1002/0471250953.bi0704s02
Online Posting Date:  August, 2003
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

The Database Referencing of Array Genes ONline (DRAGON) database system consists of information derived from publicly available databases including UniGene, SWISS‐Prot, Pfam, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). Through a Web‐accessible interface, the DRAGON Annotate tool rapidly supplies information pertaining to a range of biological characteristics of all the genes in any large‐scale gene expression data set. The subsequent inclusion of this information during data analysis and visualization allows for deeper insight into gene expression patterns. The set of DRAGON View tools provides methods for the analysis and visualization of expression patterns in relation to annotated information. Instead of incorporating the standard set of clustering and graphing tools available in many large‐scale expression data analysis software packages, DRAGON View has been specifically designed to allow for the analysis of expression data in relation to the biological characteristics of gene sets.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Preparing Data for Use with the DRAGON Database and Analyzing Data with Dragon View
  • Support Protocol 1: Analyzing Data with the DRAGON Families Tool
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Preparing Data for Use with the DRAGON Database and Analyzing Data with Dragon View

  Necessary Resources
      Hardware
  • Windows, Linux, Unix, or Macintosh computer with Internet connection (preferably broadband connection, e.g., T1, T3, cable, or DSL service)
      Software
  • Internet browser: e.g., MS Internet Explorer 5 (or higher) or Netscape 6 (or higher) on Windows or Macintosh systems; Opera, Netscape 6 (or higher), or Mozilla on Linux‐based systems. Internet Explorer 5 or higher and Netscape 6 or higher are preferred, because Netscape 4.x is not capable of supporting all of the functionality provided in the DRAGON Paths tool.
  • Also required:
  • Spreadsheet program: e.g., MS Excel on Windows or Macintosh systems or Sun Microsystems Star Office suite on Linux systems.
  • Text editor: e.g., TextPad (http://www.textpad.com/) or Notepad on Windows systems; XEmacs (http://www.xemacs.org) on Linux systems.
  • Finally, for advanced users who may want to have more flexibility in the manipulation of their text files, the Perl programming language is powerful and easy to use and allows the user to perform automated text‐formatting, file‐creation, and file‐alteration functions that are useful when analyzing large data sets. Activestate (http://www.activestate.com) has developed a version of Perl available for Windows computers (http://www.activestate.com/Products/ActivePerl/). Otherwise http://www.perl.com Web site provides downloads of Perl for Linux, Unix, and Macintosh computers.
      Files
  • The Iyer et al. ( ) example data files were obtained from the Stanford Microarray data Web site (http://genome-www.stanford.edu/serum/data.html). The two files used for demonstration purposes in this unit may be downloaded respectively at the following URLs:
    • http://genome-www.stanford.edu/serum/fig2data.txt
    • http://genome-www.stanford.edu/serum/data/fig2clusterdata.txt
  • Both files are also available at the Current Protocols Web site:
    • http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm
  • Optional: The DRAGON database is generated through the automated parsing of flat files provided by publicly available databases (see the DRAGON Web site for a list of the database flat files used by DRAGON). The information in these files is then loaded into a back‐end MySQL (http://www.mysql.com; unit 9.2) relational database for use by DRAGON (Fig. .). Although it may be easy and more intuitive for most users to access the information in these files via the DRAGON Web site, some readers may want to use this information in their own relational databases. For these purposes, all of the tables used in the DRAGON database are provided for download on the DRAGON Web site (http://pevsnerlab.kennedykrieger.org/download.htm) or can be ordered on CD if desired (http://pevsnerlab.kennedykrieger.org/order.htm).

Support Protocol 1: Analyzing Data with the DRAGON Families Tool

  Necessary Resources
      Hardware
  • Windows, Linux, Unix, or Macintosh computer with an Internet connection (preferably broadband connection, e.g., T1, T3, cable, or DSL service)
      Software
  • Internet browser: e.g., MS Internet Explorer 5 (or higher) or Netscape 6 (or higher) on Windows or Macintosh systems; Opera, Netscape 6 (or higher) or Mozilla on Linux‐based systems. Internet Explorer 5 or higher and Netscape 6 or higher are preferred, because Netscape 4.x is not capable of supporting all of the functionality provided in the DRAGON Paths tool.
  • Also required:
  • Spreadsheet program: e.g., MS Excel on Windows or Macintosh systems or Sun Microsystems Star Office suite on Linux systems.
  • Text editor: e.g., TextPad (http://www.textpad.com/) or Notepad on Windows systems; MEmacs (http://www.xemacs.org) on Linux systems.
      Files
  • An Annotated master matrix file created by running the DRAGON Annotate Tool (figure2_combined_data_KWS.txt; see protocol 1)
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

   Bailey, S.N., Wu, R.Z., and Sabatini, D.M. 2002. Applications of transfected cell microarrays in high‐throughput drug discovery. Drug Discov. Today 7:S113‐S118.
   Bouton, C.M. and Pevsner, J. 2001. DRAGON: Database Referencing of Array Genes Online. Bioinformatics 16:1038‐1039.
   Bouton, C.M. and Pevsner, J. 2002. DRAGON View: Information visualization for annotated microarray data. Bioinformatics 18:323‐324.
   Bowtell, D.D.L. 1999. Options available‐from start to finish‐for obtaining expression data by microarray. Nat. Genet. Suppl. 21:25‐32.
   Cheung, V.G., Morley, M., Aguilar, F., Massimi, A., Kucherlapati, R., and Childs, G. 1999. Making and reading microarrays. Nat. Genet. Suppl. 21:15‐19.
   Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., and Herskowitz, I. 1998. The transcriptional program of sporulation in budding yeast. Science 282:699‐705.
   Colantuoni, C., Henry, G., Zeger, S., and Pevsner, J. 2002. SNOMAD (Standardization and NOrmalization of MicroArray Data): Web‐accessible gene expression data analysis. Bioinformatics 18:1540‐1541.
   Duggan, D.J., Bittner, M., Chen, Y., Meltzer, P., and Trent, J.M. 1999. Expression profiling using cDNA microarrays. Nat. Genet. Suppl. 21:10‐14.
   Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A.. 95:14863‐14868.
   Frishman, D., Heumann, K., Lesk, A., and Mewes, H‐W. 1998. Comprehensive, comprehensible, distributed and intelligent databases: Current status. Bioinformatics 14:551‐561.
   Gawantka, V., Pollet, N., Delius, H., Vingron, M., Pfister, R., Nitsch, R., Blumenstock, C., and Niehrs, C. 1998. Gene expression screening in Xenopus identifies molecular pathways, predicts gene function and provides a global view of embryonic gene expression. Mech. Dev. 77:95‐141.
   Gibbons, F.D. and Roth, F.P. 2002. Judging the quality of gene expression‐based clustering methods using gene annotation. Genome Res. 12:1574‐81.
   Heyer, L.J., Kruglyak, S., and Yooseph, S. 1999. Exploring expression data: Identification and analysis of coexpressed genes. Genome Res. 9:1106‐1115.
   Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson, J. Jr., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., and Brown, P.O. 1999. The transcriptional program in the response of human fibroblasts to serum. Science 283:83‐87.
   Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28:27‐30.
   Kanehisa, M. et al. 2002. The KEGG databases at GenomeNet. Nucleic Acids Res. 30:42‐46.
   Lal, S.P., Christopherson, R.I., and dos Remedios, C.G. 2002. Antibody arrays: An embryonic but rapidly growing technology. Drug Discov. Today 7:S143‐S149.
   Liang, S., Fuhrman, S., and Somogyi, R. 1998. Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac. Symp. Biocomput. 3:18‐29.
   Lipshutz, R.J., Fodor, S.P.A., Gingeras, T.R., and Lockhart, D.J. 1999. High density synthetic oligonucleotide arrays. Nat. Genet. Suppl. 21:20‐24.
   Macauley, J., Wang, H., and Goodman, N. 1998. A model system for studying the integration of molecular biology databases. Bioinformatics 14:575‐582.
   Michaels, G.S., Carr, D.B., Askenaki, M., Fuhrman, S., Wen, X., and Somogyi, R. 1998. Cluster analysis and data visualization of large‐scale gene expression data. Pacific Symp. Biocomp. 3:42‐53.
   Somogyi, R., Fuhrman, S., Askenazi, M., and Wuensche, A. 1997. The gene expression matrix: Towards the extraction of genetic network architectures. Proc. Second World Cong. Nonlinear Analysts 1996. 30:1815‐1824.
   Spellman, P.T. and Rubin, G.M. 2002. Evidence for large domains of similarly expressed genes in the Drosophila genome. J. Biol. 1:5.1‐5.8.
   Szallasi, Z. 1999. Genetic network analysis in light of massively parallel biological data acquisition. Pac. Symp. Biocomp. 4:5‐16.
   Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., and Golub, T.R. 1999. Interpreting patterns of gene expression with self‐organizing maps: Methods and applications to hematopoetic differentiation. Proc. Natl. Acad. Sci. U.S.A. 96:2907‐2912.
   Toronen, P., Kolehmainen, M., Wong, G., and Castren, E. 1999. Analysis of gene expression data using self‐organizing maps. FEBS Lett. 451:142‐146.
   Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. 1995. Serial analysis of gene expression. Science 270:484‐7.
   Wen, X., Fuhrman, S., Michaels, G.S., Carr, D.B., Smith, S., Barker, J.L., and Somogyi, R. 1998. Large‐scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. U.S.A. 95:334‐339.
   Zhang, M.Q. 1999. Large‐scale gene expression data analysis: A new challenge to computational biologists. Genome Res. 9:681‐688.
Key References
   Bouton and Pevsner, 2001. See above.
  Original publication concerning the DRAGON database.
   Bouton and Pevsner, 2002. See above.
  Original publication concerning the DRAGON View visualization tools.
   Bouton, C.M., Hossain, M.A., Frelin, L.P., Laterra, J., and Pevsner, J. 2001. Microarray analysis of differential gene expression in lead‐exposed astrocytes. Toxicol. Appl. Pharmacol. 176:34‐53.
  Research publication that reports use of DRAGON and DRAGON View in the context of a toxicogenomic microarray study.
   Iyer et al. 1999. See above.
  Reports the microarray study from which the example data sets for this unit were derived.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library