GrailEXP and Genome Analysis Pipeline for Genome Annotation

Edward C. Uberbacher1, Doug Hyatt1, Manesh Shah1

1 Oak Ridge National Laboratory, Oak Ridge, Tennessee
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 6.5
DOI:  10.1002/0471142905.hg0605s39
Online Posting Date:  February, 2004
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The Gene Recognition and Analysis Internet Link (GRAIL) is one of the most widely used systems for evaluating the protein‐coding potential of anonymous DNA sequences. This unit describes the use of the XGRAIL and genQuest client‐server applications to locate exons in DNA sequences, to develop gene models, and to search databases for homologs. A support protocol describes how to obtain the GRAIL and genQuest client software by anonymous FTP.

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Performing Gene Predictions Using the GrailEXP Web Interface
  • Alternate Protocol 1: Using Genome Analysis Pipeline for Comprehensive Analysis of DNA Sequences
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


Basic Protocol 1: Performing Gene Predictions Using the GrailEXP Web Interface

  • Any computer workstation (PC, Macintosh, Unix, Linux) with Web access
  • Web browser (e.g., Netscape Navigator, Microsoft Internet Explorer)
  • DNA sequence of interest in Raw or FASTA format
PDF or HTML at Wiley Online Library



Literature Cited

   Brunak, S., Englebrecht, J., and Knudsen, S. 1990. Neural network detects errors in the assignment of mRNA splice sites. Nucl. Acids Res. 18:4797‐4801.
   Brunak, S., Englebrecht, J., and Knudsen, S. 1992. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220:49‐65.
   Claverie, J.‐M., Sauvaget, I., and Bougueleret, L. 1990. K‐tuple frequency analysis: From intron/exon discrimination to T‐cell epitope mapping. Methods Enzymol. 183:237‐252.
   Dong, S. and Searles, D.B. 1994. Gene structure prediction by linguistic methods. Genomics 23:540‐551.
   Fickett, J.W. 1982. Recognition of protein coding regions in DNA sequences. Nucl. Acids Res. 10:5303‐5318.
   Fickett, J.W. and Tung, C.‐S. 1992. Assessment of protein coding measures. Nucl. Acids Res. 20:6441‐6450.
   Gelfand, M.S. 1990. Computer prediction of the exon‐intron structure of mammalian pre‐mRNAs. Nucl. Acids Res. 18:5865‐5869.
   Guigo, R., Knudsen, S., Drake, N., and Smith, T. 1992. Prediction of gene structure. J. Mol. Biol. 226:141‐157.
   Henikoff, S. and Henikoff, J. 1991. Automated assembly of protein blocks for database searching. Nucl. Acids Res. 19:6565‐6572.
   Hutchinson, G.B. and Hayden, M.R. 1992. The prediction of exons through an analysis of spliceable open reading frames. Nucl. Acids Res. 20:3453‐3462.
   Mani, G.S. 1992. Long‐range correlations in DNA and the coding regions. J. Theor. Biol. 158:447‐464.
   Mural, R.J., Einstein, J.R., Guan, X., Mann, R.C., and Uberbacher, E.C. 1992. An artificial intelligence approach to DNA sequence feature recognition. Trends Biotech. 10:67‐69.
   Snyder, E.E. and Stormo, G.D. 1993. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucl. Acids Res. 21:607‐613.
   Solovyev, V.V., Salamov, A.A., and Lawrence, C.B. 1994. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucl. Acids Res. 22:5156‐5163.
   Staden, R. 1984. Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes. Nucl. Acids Res. 12:505‐519.
   Uberbacher, E.C. and Mural, R.J. 1991. Locating protein‐coding regions in human DNA sequences by a multiple sensor‐neural network approach. Proc. Natl. Acad. Sci. U.S.A. 88:11261‐11265.
   Xu, Y., Mural, R., Shah, M., and Uberbacher, E. 1994a. Recognizing exons in genomic sequence using GRAIL II. In Genetic Engineering, Principles and Methods (J.K. Setlow, ed.) vol. 15, pp. 241‐253. Plenum, New York.
   Xu, Y., Mural, R.J., and Uberbacher, E.C. 1994b. Constructing gene models from accurately‐predicted exons: An application of dynamic programming. CABIOS 10:613‐623.
Key References
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Antequera, F. and Bird, A. 1993. Number of CpG islands and genes in human and mouse. Proc. Natl. Acad. Sci. U.S.A. 90:11995‐11999.
   Bairoch, A. 1993. The PROSITE dictionary of sites and patterns in proteins, its current status. Nucl. Acids Res. 21:3097‐3103.
   Bairoch, A. and Boeckman, B. 1993. The SWISS‐PROT protein sequence data bank, recent developments. Nucl. Acids Res. 21:3093‐3094.
   Beck, S., Kelly, A., Radley, E., Khurshid, F., Alderton, R.P., and Trowsdale, J. 1992. DNA sequence analysis of 66 kb of the human MHC class II region encoding a cluster of genes for antigen processing. J. Mol. Biol. 228:433‐441.
   Benson, D., Lipman, D.J., and Ostell, J. 1993. GenBank. Nucl. Acids Res. 21:2963‐2965.
   Bilofsky, H.S. and Burks, C. 1988. The GenBank Genetic Sequence Data Bank. Nucl. Acids Res. 16:1861‐1864.
   Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—database for “expressed sequence tags.” Nature Genet. 4:332‐333.
   Brody, L.C., Abel, K.J., Castilla, L.H., Couch, F.J., McKinley, D.R., Yin, G.Y., Ho, P.P., Merajver, S., Chandrasekharappa, S.C., Xu, J., Cole, J.L., Struewing, J.P., Valdes, J.M., Collins, F.S., and Weber, B.L. 1995. Construction of a transcription map surrounding the BRCA1 locus of human chromosome 17. Genomics 26:238‐247.
   Fields, C., Adams, M.D., White, O., and Venter, J.C. 1994. How many genes in the human genome? Nature Genet. 7:345‐346.
   Gardiner‐Garden, M. and Frommer, M. 1987. CpG islands in vertebrate genomes. J. Mol. Biol. 196:261‐282.
   John, R.M., Robbins, C.A., and Myers, R.M. 1994. Identification of genes within CpG‐enriched DNA from human chromosome 4p16.3. Human Mol. Gen. 3:1611‐1616.
   Jurka, J., Walichiewicz, J., and Milosavljevic, A. 1992. Prototypic sequences from human repetitive DNA. J. Mol. Evol. 35:286‐291.
   Larsen, F., Gundersen, G., Lopez, R., and Prydz, H. 1992. CpG islands as gene markers in the human genome. Genomics 13:1095‐1107.
   Lawrence, B.J., Schwabe, W., Kloschis, P., Coy, J.F., Poustka, A., Brennan, M.B., and Hochgeschwender, U. 1994. Rapid identification of gene sequences for transcriptional map assembly by direct cDNA screening of genomic reference libraries. Hum. Mol. Gen. 3:2014‐2023.
   Marshall, E. 1995. A strategy for sequencing the genome 5 years early. Science 267:783‐784.
   Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
   Peltoketo, H., Isomaa, V., Maeentausta, O., and Vihko, R. 1988. Complete amino acid sequence of human placenta 17‐β‐hydroxysteroid dehydrogenase deduced from cDNA. FEBS Lett. 239:73‐77.
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
   Smith, M.W., Holmsen, A.L., Wei, Y.H., Peterson, M., and Evans, G.A. 1994. Genomic sequence sampling: A strategy for high resolution sequence‐based physical mapping of complex genomes. Nature Genet. 6:40‐47.
   Wiginton, D.A., Kaplan, D., States, J.C., Akeson, A.L., Perme, C.M., Bilyk, I.J., Vaughn, A.J., Lattier, D.C., and Hutton, J.J. 1986. Complete sequence and structure of the gene for human adenosine deaminase. Biochemistry 25:8234‐8244.
   Xu, H., Wei, H., Tassone, F., Graw, F., Gardiner, K., and Weissman, S. 1995a. Search for genes from the dark band region of chromosome 21. Genomics 27:1‐8.
   Xu, Y., Mural, R.J., and Uberbacher, E.C. 1995b. Correcting sequencing errors in DNA coding regions using a dynamic programming approach. CABIOS 11:117‐124.
PDF or HTML at Wiley Online Library