Using GlimmerM to Find Genes in Eukaryotic Genomes

Mihaela Pertea1, Steven L. Salzberg1

1 The Institute For Genomic Research, Rockville, Maryland
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.4
DOI:  10.1002/0471250953.bi0404s00
Online Posting Date:  November, 2002
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


GlimmerM is a eukaryotic gene finder that has been used in the annotation of the genomes of Plasmodium falciparum (the malaria parasite), the model plant Arabidopsis thaliana, Oryza sativa (rice), the parasite Theileria parva, and the fungus Aspergillus fumigatus. A unique feature of the system compared to other eukaryotic gene finders is a module that allows users to provide their own data and train GlimmerM for any organism.

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Running GlimmerM Locally to Identify Genes
  • Support Protocol 1: Training GlimmerM for a Specific Organism
  • Alternate Protocol 1: Running GlimmerM VIA the WEB
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman, D. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. 1997. Gapped blast and psi‐blast: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796‐815.
   Bowman, S., Lawson, D., Basham, D., Brown, D., Chillingworth, T., Churcher, C.M., Craig, A., Davies, R.M., Devlin, K., Feltwell, T., Gentles, S., Gwilliam, R., Hamlin, N., Harris, D., Holroyd, S., Hornsby, T., Horrocks, P., Jagels, K., Jassal, B., Kyes, S., McLean, J., Moule, S., Mungall, K., Murphy, L., Barrell, B.G., et al. 1999. The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum. Nature 400:532‐538.
   Brendel, V. and Kleffe, J. 1998. Prediction of locally optimal splice sites in plant pre‐mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA. Nucleic Acids Res. 26:4748‐4757.
   Burge, C. 1997. Ph.D. thesis. Identification of Genes in Human Genomic DNA. Standford University, Calif.
   Burge, C.B. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94.
   Dietrich, R.C., Incorvaia, R., and Padgett, R.A. 1997. Terminal intron dinucleotide sequences do not distinguish between U2‐ and U12‐dependent introns. Mol. Cell. 1:151‐160.
   Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8:967‐974.
   Fraser, C.M., Casjens, S., Huang, W.M., Sutton, G.G., Clayton, R., Lathigra, R., White, O., Ketchum, K.A., Dodson, R., Hickey, E.K., Gwinn, M., Dougherty, B., Tomb, J.F., Fleischmann, R.D., Richardson, D., Peterson, J., Kerlavage, A.R., Quackenbush, J., Salzberg, S., Hanson, M., van Vugt, R., Palmer, N., Adams, M.D., Gocayne, J., Venter, J.C., et al. 1997. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390:580‐586.
   Fraser, C.M., Norris, S.J., Weinstock, G.M., White, O., Sutton, G.G., Dodson, R., Gwinn, M., Hickey, E.K., Clayton, R., Ketchum, K.A., Sodergren, E., Hardham, J.M., McLeod, M.P., Salzberg, S., Peterson, J., Khalak, H., Richardson, D., Howell, J.K., Chidambaram, M., Utterback, T., McDonald, L., Artiach, P., Bowman, C., Cotton, M.D., Venter, J.C., et al. 1998. Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281:375‐388.
   Gardner, M.J., Tettelin, H., Carucci, D.J., Cummings, L.M., Aravind, L., Koonin, E.V., Shallom, S., Mason, T., Yu, K., Fujii, C., Pederson, J., Shen, K., Jing, J., Aston, C., Lai, Z., Schwartz, D.C., Pertea, M., Salzberg, S., Zhou, L., Sutton, G.G., Clayton, R., White, O., Smith, H.O., Fraser, C.M., Hoffman, S.L. 1998. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science 282:1126‐1132.
   Heidelberg, J.F., Eisen, J.A., Nelson, W.C., Clayton, R.A., Gwinn, M.L., Dodson, R.J., Haft, D.H., Hickey, E.K., Peterson, J.D., Umayam, L., Gill, S.R., Nelson, K.E., Read, T.D., Tettelin, H., Richardson, D., Ermolaeva, M.D., Vamathevan, J., Bass, S., Qin, H., Dragoi, I., Sellers, P., McDonald, L., Utterback, T., Fleishmann, R.D., Nierman, W.C., and White, O. 2000. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406:477‐483.
   Jelinek, F. 1997. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA.
   Murthy, S.K., Kasif, S., Salzberg, S., and Beigel, R. 1993. OC1: Randomized induction of oblique decision trees. Proc. 11th Natl. Conf. on Artificial Intelligence 322‐327.
   Murthy, S.K., Kasif, S., and Salzberg, S. 1994. A system for induction of oblique decision trees. J. of Artificial Intelligence Res. 2:1‐32.
   Nelson, K.E., Eisen, J.A., and Fraser, C.M. 2001. Genome of Thermotoga maritima MSB8. Methods Enzymol. 330:169‐180.
   Pavy, N., Rombauts, S., Dehais, P., Mathe, C., Ramana, D.V., Leroy, P., and Rouze, P. 1999. Evaluation of gene prediction software using a genomic data set: Application to Arabidopsis thaliana sequences. Bioinformatics 15:887‐899.
   Pertea, M., Salzberg, S.L., and Gardner, M.J. 2000. Finding genes in Plasmodium falciparum. Nature 404:34‐35.
   Pertea, M. and Salzberg, S.L. 2002. Computational gene finding in plants. Plant Molecular Biology 48:9‐48.
   Pertea, M., Lin, X., and Salzberg, S.L. 2001. GeneSplicer: A new computational method for splice site prediction. Nucleic Acids Res. 29:1185‐1190.
   Salzberg, S.L. 1997. A method for identifying splice sites and translational start sites in eukaryotic mRNA. Comput. Appl. Biosci. 13:365‐376.
   Salzberg, S.L., Delcher, A.L., Kasif, S., and White, O. 1998a. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26:544‐548.
   Salzberg, S., Delcher, A.L., Fasman, K.H., and Henderson, J. 1998b. A decision tree system for finding genes in DNA. J. Comput. Biol. 5:667‐680.
   Salzberg, S.L., Pertea, M., Delcher, A.L., Gardner, M.J., and Tettelin, H. 1999. Interpolated Markov models for eukaryotic gene finding. Genomics 59:24‐31.
   Stephens, R., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E.V., and Davis, R.W. 1998. Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282:754‐759.
   Wu, Q. and Krainer, A.R. 1996. U1‐mediated exon definition interactions between AT‐AC and GT‐AG introns. Science 274:1005‐1008.
   Yuan, Q., Quackenbush, J., Sultana, R., Pertea, M., Salzberg, S.L., and Buell, C.R. 2001. Rice bioinformatics: Analysis of rice sequence data and leveraging the data to other plant species. Plant Physiol. 125:1166‐1174.
Key References
   Salzberg et al., 1999. See above.
  This paper introduces the GlimmerM method initially used in finding genes in Plasmodium falciparum. This paper also describes how GlimmerM was used in the annotation of chromosome 2 of P. falciparum.
Internet Resources
  GlimmerM Web site.
  A preliminary annotation of chromosomes 10, 11, and 14 of P. falciparum. (This will change when the P. falciparum genome is completed.)
PDF or HTML at Wiley Online Library