Using geneid to Identify Genes

Enrique Blanco1, Genís Parra1, Roderic Guigó1

1 Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Barcelona, Spain
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.3
DOI:  10.1002/0471250953.bi0403s18
Online Posting Date:  June, 2007
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

This unit describes the usage of geneid, an efficient gene‐finding program that allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Training geneid is relatively easy, and parameter configurations exist for a number of eukaryotic species. Geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs. Geneid software is in the public domain, and it is undergoing constant development. It is easy to install and use. Exhaustive benchmark evaluations show that geneid compares favorably with other existing gene finding tools.

Keywords: Gene identification; genes; exons; splicing; genome annotation; bioinformatics

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Using the geneid Unix Application to Predict Genes
  • Basic Protocol 2: Visualizing geneid Predictions
  • Basic Protocol 3: Using External Information to Solidify geneid Predictions
  • Alternate Protocol 1: Using the geneid Web Server to Predict Genes
  • Support Protocol 1: How to Get geneid and Visualization Programs
  • Guidelines for Understanding Results
  • Commentary
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Abril, J.F. and Guigó, R. 2000. gff2ps: Visualizing genomic annotations. Bioinformatics 16:743‐744.
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Aury, J.M., Jaillon, O., Duret, L., Noel, B., Jubin, C., Porcel, B.M., Segurens, B., Daubin, V., Anthouard, V., Aiach, N., Arnaiz, O., Billaut, A., Beisson, J., Blanc, I., Bouhouche, K., Camara, F., Duharcourt, S., Guigó, R., Gogendeau, D., Katinka, M., Keller, A.M., Kissmehl, R., Klotz, C., Koll, F., Le Mouel, A., Lepere, G., Malinsky, S., Nowacki, M., Nowak, J.K., Plattner, H., Poulain, J., Ruiz, F., Serrano, V., Zagulski, M., Dessen, P., Betermier, M., Weissenbach, J., Scarpelli, C., Schachter, V., Sperling, L., Meyer, E., Cohen, J., and Wincker, P. 2006. Global trends of whole‐genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444:171‐178.
   Birney, E. and Durbin, R. 2000. Using GeneWise in the Drosophila annotation experiment. Genome Res. 10:547‐548.
   Brent, M.R. and Guigó, R. 2004. Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14:264‐272.
   Castellano, S., Novoselov, S.V., Kryukov, G.V., Lescure, A., Blanco, E., Krol, A., Gladyshev, V.N., and Guigó, R. 2004. Reconsidering the evolution of eukaryotic selenoproteins: A novel nonmammalian family with scattered phylogenetic distribution. EMBO Rep. 5:71‐77.
   Castellano, S., Morozova, N., Morey, M., Berry, M.J., Serras, F., Corominas, M., and Guigó, R. 2001. In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Reports 2:697‐702.
   Fagioli, M., Alcalay, M., Pandolfi, P.P., Venturini, L., Mencarelli, A., Simeone, A., Acampora, D., Grignani, F., and Pelicci, P.G. 1992. Alternative splicing of PML transcripts predicts coexpression of several carboxy‐terminally different protein isoforms. Oncogene. 7:1083‐1091.
   Glockner, G., Eichinger, L., Szafranski, K., Pachebat, J.A., Bankier, A.T., Dear, P.H., Lehmann, R., Baumgart, C., Parra, G., Abril, J.F., Guigó, R., Kumpf, K., Tunggal, B., Cox, E., Quail, M.A., Platzer, M., Rosenthal, A., Noegel, A.A.; Dictyostelium Genome Sequencing Consortium. 2002. Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature 418:79‐85.
   Guigó, R. 1998. Assembling genes from predicted exons in linear time with dynamic programming. J. Comp. Biol. 5:681‐702.
   Guigó, R., Knudsen, S., Drake, N., and Smith, T. 1992. Prediction of gene structure. J. Mol. Biol. 226:141‐157.
   Guigó, R., Flicek, P., Abril, J.F., Reymond, A., Lagarde, J., Denoeud, F., Antonarakis, S., Ashburner, M., Bajic, V.B., Birney, E., Castelo, R., Eyras, E., Ucla, C., Gingeras, T.R., Harrow, J., Hubbard, T., Lewis, S.E., and Reese, M.G. 2006. EGASP: The human ENCODE Genome Annotation Assessment Project. Genome Biol. 7:S2.1‐S3.31.
   Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., Hillman‐Jackson, J., Kuhn, R.M., Pedersen, J.S., Pohl, A., Raney, B.J., Rosenbloom, K.R., Siepel, A., Smith, K.E., Sugnet, C.W., Sultan‐Qurraie, A., Thomas, D.J., Trumbower, H., Weber, R.J., Weirauch, M., Zweig, A.S., Haussler, D., and Kent, W.J. 2006. The UCSC Genome Browser Database: Update 2006. Nucl. Acids Res. 34:D590‐D598.
   International Chicken Genome Sequencing Consortium. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432:695‐716.
   Jaillon, O., Aury, J.M., Brunet, F., Petit, J.L., Stange‐Thomann, N., Mauceli, E., Bouneau, L., Fischer, C., Ozouf‐Costaz, C., Bernot, A., Nicaud, S., Jaffe, D., Fisher, S., Lutfalla, G., Dossat, C., Segurens, B., Dasilva, C., Salanoubat, M., Levy, M., Boudet, N., Castellano, S., Anthouard, V., Jubin, C., Castelli, V., Katinka, M., Vacherie, B., Biemont, C., Skalli, Z., Cattolico, L., Poulain, J., De Berardinis, V., Cruaud, C., Duprat, S., Brottier, P., Coutanceau, J.P., Gouzy, J., Parra, G., Lardier, G., Chapple, C., McKernan, K.J., McEwan, P., Bosak, S., Kellis, M., Volff, J.N., Guigó, R., Zody, M.C., Mesirov, J., Lindblad‐Toh, K., Birren, B., Nusbaum, C., Kahn, D., Robinson‐Rechavi, M., Laudet, V., Schachter, V., Quetier, F., Saurin, W., Scarpelli, C., Wincker, P., Lander, E.S., Weissenbach, J., and Roest Crollius, H. 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto‐karyotype. Nature 431:916‐917.
   Lewis, S.E., Searle, S.M.J., Harris, N., Gibson, M., Iyer, V., Ricter, J., Wiel, C., Bayraktaroglu, L., Birney, E., Crosby, M.A., Kaminker, J.S., Matthews, B., Prochnik, S.E., Smith, C.D., Tupy, J.L., Rubin, G.M., Misra, S., Mungall, C.J., and Clamp, M.E. 2002. Apollo: A sequence annotation editor. Genome Biology 3:research0082.
   Mott, R. 1997. EST_GENOME: A program to align spliced DNA sequences to unspliced genomic DNA. Comp. Appl. Biosci. 13:477‐478.
   Mouse Genome Sequencing Consortium. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520‐562.
   Parra, G., Blanco, E., and Guigó, R. 2000. geneid in Drosophila. Genome Res. 10:511‐515.
   Parra, G. Agarwal, P. Óbril, J.F. Wiehe, T. Fickett, J.W. Guigó, R. and 2003. Comparative gene prediction in human and mouse. Genome Res. 13:108‐117.
   Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183:63‐98.
   Rat Genome Sequencing Project Consortium. 2004. Genome sequence of the brown Norway rat yields insights into mammalian evolution. Nature 428:493‐521.
   Stormo, G.D. 2000. Gene‐finding approaches for eukaryotes. Genome Res. 10:394‐397.
   Zhang, M.Q. 2002. Computational prediction of eukaryotic protein‐coding genes. Nat. Rev. Genet. 3:698‐709.
Key References
   Guigó et al., 1992. See above.
  Description of the first implementation of geneid.
   Guigó et al., 2006. See above.
  A community experiment to assess the state‐of‐the‐art in one percent of the human genome sequence.
   Parra et al., 2000. See above.
  Description of geneid v 1.0 used in the Adh region of Drosophila melanogaster.
Internet Resources
   http://genome.imim.es/software/geneid/index.html
  This is the geneid Web page.
   http://genome.imim.es/software/gfftools/GFF2PS.html
  This is gff2ps Web page.
   http://www.fruitfly.org/annot/apollo/
  This is Apollo Web page (see UNIT )
   http://genome.ucsc.edu/
  This is UCSC genome browser (golden path; UNIT ).
   http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml
  This is GFF format Web page.
   http://www.w3.org/XML/
  This is XML format Web page.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library