GrailEXP and Genome Analysis Pipeline for Genome Annotation

Edward C. Uberbacher1, Doug Hyatt1, Manesh Shah1

1 Oak Ridge National Laboratory, Oak Ridge, Tennessee
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.9
DOI:  10.1002/0471250953.bi0409s04
Online Posting Date:  February, 2004
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The Basic Protocol describes the use of GrailEXP, the latest version of the gene finding system from Oak Ridge National Laboratory. GrailEXP provides gene models, by making use of sequence similarity with Expressed Sequence Tags (ESTs) and known genes. GrailEXP also provides alternatively spliced constructs for each gene based on the available EST evidence. The Support Protocol describes the use of the Genome Analysis Pipeline, a web application which allows users to perform comprehensive sequence analysis by offering a selection from a wide choice of supported gene finders, other biological feature finders, and database searches.

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Performing Gene Predictions Using the GrailEXP Web Interface
  • Alternate Protocol 1: Using Genome Analysis Pipeline for Comprehensive Analysis of DNA Sequences
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


Basic Protocol 1: Performing Gene Predictions Using the GrailEXP Web Interface

  Necessary Resources
  • Hardware
    • Any computer workstation (PC, Macintosh, Unix, Linux) with Web access
  • Software
    • Web browser (e.g., Netscape Navigator, Microsoft Internet Explorer)
  • Files
    • DNA sequence of interest in Raw or FASTA format ( appendix 1B)
PDF or HTML at Wiley Online Library



Literature Cited

   Brunak, S., Englebrecht, J., and Knudsen, S. 1990. Neural network detects errors in the assignment of mRNA splice sites. Nucl. Acids Res. 18:4797‐4801.
   Brunak, S., Englebrecht, J., and Knudsen, S. 1992. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220:49‐65.
   Claverie, J.‐M., Sauvaget, I., and Bougueleret, L. 1990. K‐tuple frequency analysis: From intron/exon discrimination to T‐cell epitope mapping. Methods Enzymol. 183:237‐252.
   Dong, S. and Searles, D.B. 1994. Gene structure prediction by linguistic methods. Genomics 23:540‐551.
   Fickett, J.W. 1982. Recognition of protein coding regions in DNA sequences. Nucl. Acids Res. 10:5303‐5318.
   Fickett, J.W. and Tung, C.‐S. 1992. Assessment of protein coding measures. Nucl. Acids Res. 20:6441‐6450.
   Gelfand, M.S. 1990. Computer prediction of the exon‐intron structure of mammalian pre‐mRNAs. Nucl. Acids Res. 18:5865‐5869.
   Guigo, R., Knudsen, S., Drake, N., and Smith, T. 1992. Prediction of gene structure. J. Mol. Biol. 226:141‐157.
   Henikoff, S. and Henikoff, J. 1991. Automated assembly of protein blocks for database searching. Nucl. Acids Res. 19:6565‐6572.
   Hutchinson, G.B. and Hayden, M.R. 1992. The prediction of exons through an analysis of spliceable open reading frames. Nucl. Acids Res. 20:3453‐3462.
   Hyatt, D. and Uberbacher, E.C. 2002. Computational DNA sequence analysis and annotation. In Genomic Technologies: Present and Future (D.J. Galas, and, S.J. McCormack, eds.) pp. 345‐374. Caister Academic Press, Norfolk, U.K.
   Mani, G.S. 1992. Long‐range correlations in DNA and the coding regions. J. Theor. Biol. 158:447‐464.
   Mural, R.J., Einstein, J.R., Guan, X., Mann, R.C., and Uberbacher, E.C. 1992. An artificial intelligence approach to DNA sequence feature recognition. Trends Biotech. 10:67‐69.
   Snyder, E.E. and Stormo, G.D. 1993. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucl. Acids Res. 21:607‐613.
   Solovyev, V.V., Salamov, A.A., and Lawrence, C.B. 1994. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucl. Acids Res. 22:5156‐5163.
   Staden, R. 1984. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. Nucl. Acids Res. 12:505‐519.
   Uberbacher, E.C. and Mural, R.J. 1991. Locating protein‐coding regions in human DNA sequences by a multiple sensor‐neural network approach. Proc. Natl. Acad. Sci. U.S.A. 88:11261‐11265.
   Xu, Y., Mural, R., Shah, M., and Uberbacher, E. 1994a. Recognizing exons in genomic sequence using GRAIL II. In Genetic Engineering, Principles and Methods (J.K. Setlow, ed.) vol. 15, pp. 241‐253. Plenum, New York.
   Xu, Y., Mural, R.J., and Uberbacher, E.C. 1994b. Constructing gene models from accurately‐predicted exons: An application of dynamic programming. CABIOS 10:613‐623.
Key References
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Antequera, F. and Bird, A. 1993. Number of CpG islands and genes in human and mouse. Proc. Natl. Acad. Sci. U.S.A. 90:11995‐11999.
   Bairoch, A. 1993. The PROSITE dictionary of sites and patterns in proteins, its current status. Nucl. Acids Res. 21:3097‐3103.
   Bairoch, A. and Boeckman, B. 1993. The SWISS‐PROT protein sequence data bank, recent developments. Nucl. Acids Res. 21:3093‐3094.
   Beck, S., Kelly, A., Radley, E., Khurshid, F., Alderton, R.P., and Trowsdale, J. 1992. DNA sequence analysis of 66 kb of the human MHC class II region encoding a cluster of genes for antigen processing. J. Mol. Biol. 228:433‐441.
   Benson, D., Lipman, D.J., and Ostell, J. 1993. GenBank. Nucl. Acids Res. 21:2963‐2965.
   Bilofsky, H.S. and Burks, C. 1988. The GenBank genetic sequence data bank. Nucl. Acids Res. 16:1861‐1864.
   Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—database for “expressed sequence tags.” Nature Genet. 4:332‐333.
   Brody, L.C., Abel, K.J., Castilla, L.H., Couch, F.J., McKinley, D.R., Yin, G.Y., Ho, P.P., Merajver, S., Chandrasekharappa, S.C., Xu, J., Cole, J.L., Struewing, J.P., Valdes, J.M., Collins, F.S., and Weber, B.L. 1995. Construction of a transcription map surrounding the BRCA1 locus of human chromosome 17. Genomics 26:238‐247.
   Fields, C., Adams, M.D., White, O., and Venter, J.C. 1994. How many genes in the human genome? Nature Genet. 7:345‐346.
   Gardiner‐Garden, M. and Frommer, M. 1987. CpG islands in vertebrate genomes. J. Mol. Biol. 196:261‐282.
   John, R.M., Robbins, C.A., and Myers, R.M. 1994. Identification of genes within CpG‐enriched DNA from human chromosome 4p16.3. Human Mol. Gen. 3:1611‐1616.
   Jurka, J., Walichiewicz, J., and Milosavljevic, A. 1992. Prototypic sequences from human repetitive DNA. J. Mol. Evol. 35:286‐291.
   Larsen, F., Gundersen, G., Lopez, R., and Prydz, H. 1992. CpG islands as gene markers in the human genome. Genomics 13:1095‐1107.
   Lawrence, B.J., Schwabe, W., Kloschis, P., Coy, J.F., Poustka, A., Brennan, M.B., and Hochgeschwender, U. 1994. Rapid identification of gene sequences for transcriptional map assembly by direct cDNA screening of genomic reference libraries. Hum. Mol. Gen. 3:2014‐2023.
   Marshall, E. 1995. A strategy for sequencing the genome 5 years early. Science 267:783‐784.
   Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
   Peltoketo, H., Isomaa, V., Maeentausta, O., and Vihko, R. 1988. Complete amino acid sequence of human placenta 17‐β‐hydroxysteroid dehydrogenase deduced from cDNA. FEBS Lett. 239:73‐77.
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
   Smith, M.W., Holmsen, A.L., Wei, Y.H., Peterson, M., and Evans, G.A. 1994. Genomic sequence sampling: A strategy for high resolution sequence‐based physical mapping of complex genomes. Nature Genet. 6:40‐47.
   Wiginton, D.A., Kaplan, D., States, J.C., Akeson, A.L., Perme, C.M., Bilyk, I.J., Vaughn, A.J., Lattier, D.C., and Hutton, J.J. 1986. Complete sequence and structure of the gene for human adenosine deaminase. Biochemistry 25:8234‐8244.
   Xu, H., Wei, H., Tassone, F., Graw, F., Gardiner, K., and Weissman, S. 1995a. Search for genes from the dark band region of chromosome 21. Genomics 27:1‐8.
   Xu, Y., Mural, R.J., and Uberbacher, E.C. 1995b. Correcting sequencing errors in DNA coding regions using a dynamic programming approach. CABIOS 11:117‐124.
PDF or HTML at Wiley Online Library