Using N‐SCAN or TWINSCAN to Predict Gene Structures in Genomic DNA Sequences

Marijke J. van Baren1, Brian C. Koebbe1, Michael R. Brent1

1 Washington University, St. Louis, Missouri
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.8
DOI:  10.1002/0471250953.bi0408s20
Online Posting Date:  December, 2007
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


N‐SCAN is a gene‐prediction system that combines the methods of ab initio predictors like GENSCAN with information derived from genome comparison. It is the latest in the TWINSCAN series of programs. This unit describes the use of N‐SCAN to identify gene structures in eukaryotic genomic sequences. Protocols for using N‐SCAN through its Web interface and from the command line in a Linux environment are provided. Detailed discussion about the appropriate parameter settings, input‐sequence processing, and choice of genome for comparison are included. Curr. Protoc. Bioinform. 20:4.8.1‐4.8.16. © 2007 by John Wiley & Sons, Inc.

Keywords: N‐SCAN; TWINSCAN; gene prediction; sequence alignment; comparative genome analysis; cross‐species sequence comparison ; genome annotation

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using the N‐SCAN Web Server
  • Run N‐Scan from the Command Line on a Local Computer
  • Alternate Protocol 1: Preparing Data Files and Running N‐SCAN Manually
  • Alternate Protocol 2: Using Nscan_Driver.Pl on a Local Computer
  • Support Protocol 1: Obtaining and Installing N‐Scan on a Local Computer
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Alexandersson, M., Cawley, S., and Pachter, L. 2003. SLAM‐Cross‐species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13:496‐502.
   Allen, J.E. and Salzberg, S.L. 2005. JIGSAW: Integration of multiple sources of evidence for gene prediction. Bioinformatics 21:3596‐3603.
   Allen, J.E., Pertea, M., and Salzberg, S.L. 2004. Computational gene prediction using multiple sources of evidence. Genome Res. 14:142‐148.
   Brown, R.H., Gross, S.S., and Brent, M.R. 2005. Begin at the beginning: Predicting genes with 5′ UTRs. Genome Res. 15:742‐747.
   Burge, C. 1997. Identification of Genes in Human Genomic DNA. In Stanford Univeristy. Stanford University, Stanford, Calif.
   Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94.
   Elsik, C.G., Mackey, A.J., Reese, J.T., Milshina, N.V., Roos, D.S., and Weinstock, G.M. 2007. Creating a honey bee consensus gene set. Genome Biol. 8:R13.
   Flicek, P., Keibler, E., Hu, P., Korf, I., and Brent, M.R. 2003. Leveraging the mouse genome for gene prediction in human: From whole‐genome shotgun reads to a global synteny map. Genome Res. 13:46‐54.
   Gross, S.S. and Brent, M.R. 2006. Using multiple alignments to improve gene prediction. J. Comput. Biol. 13:379‐393.
   Guigo, R., Agarwal, P., Abril, J.F., Burset, M., and Fickett, J.W. 2000. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10:1631‐1642.
   Guigo, R., Dermitzakis, E.T., Agarwal, P., Ponting, C.P., Parra, G., Reymond, A., Abril, J.F., Keibler, E., Lyle, R., Ucla, C., Antonarakis, S.E., and Brent, M.R. 2003. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc. Natl. Acad. Sci. U.S.A. 100:1140‐1145.
   Guigo, R., Flicek, P., Abril, J.F., Reymond, A., Lagarde, J., Denoeud, F., Antonarakis, S., Ashburner, M., Bajic, V.B., Birney, E., Castelo, R., Eyras, E., Ucla, C., Gingeras, T.R., Harrow, J., Hubbard, T., Lewis, S.E., and Reese, M.G. 2006. EGASP: The human ENCODE Genome Annotation Assessment Project. Genome Biol. 7:S21‐S31.
   Hubbard, T.J., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cunningham, F., Cutts, T., Down, T., Dyer, S.C., Fitzgerald, S., Fernandez‐Banet, J., Graf, S., Haider, S., Hammond, M., Herrero, J., Holland, R., Howe, K., Howe, K., Johnson, N., Kahari, A., Keefe, D., Kokocinski, F., Kulesha, E., Lawson, D., Longden, I., Melsopp, C., Megy, K., Meidl, P., Ouverdin, B., Parker, A., Prlic, A., Rice, S., Rios, D., Schuster, M., Sealy, I., Severin, J., Slater, G., Smedley, D., Spudich, G., Trevanion, S., Vilella, A., Vogel, J., White, S., Wood, M., Cox, T., Curwen, V., Durbin, R., Fernandez‐Suarez, X.M., Flicek, P., Kasprzyk, A., Proctor, G., Searle, S., Smith, J., Ureta‐Vidal, A., and Birney, E. 2007. Ensembl 2007. Nucleic Acids Res. 35:D610‐D617.
   Keibler, E. and Brent, M.R. 2003. Eval: A software package for analysis of genome annotations. BMC Bioinformatics 4:50.
   Korf, I., Flicek, P., Duan, D., and Brent, M.R. 2001. Integrating genomic homology into gene structure prediction. Bioinformatics 17:S140‐S148.
   Mouse Genome Sequencing Consortium et al.. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520‐562.
   Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W., and Guigo, R. 2003. Comparative gene prediction in human and mouse. Genome Res. 13:108‐117.
   Salamov, A.A. and Solovyev, V.V. 2000. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10:516‐522.
   Stanke, M. and Waack, S. 2003. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:II215‐II225.
   Stanke, M., Tzvetkova, A., and Morgenstern, B. 2006. AUGUSTUS at EGASP: Using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 7:S11.1‐ S11.8.
   van Baren, M.J. and Brent, M.R. 2006. Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res. 16:678‐685.
   Wei, C. and Brent, M.R. 2006. Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics 7:327.
PDF or HTML at Wiley Online Library