An Introduction to Genome Annotation

Michael S. Campbell1, Mark Yandell2

1 Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, 2 USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.1
DOI:  10.1002/0471250953.bi0401s52
Online Posting Date:  December, 2015
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Genome projects have evolved from large international undertakings to tractable endeavors for a single lab. Accurate genome annotation is critical for successful genomic, genetic, and molecular biology experiments. These annotations can be generated using a number of approaches and available software tools. This unit describes methods for genome annotation and a number of software tools commonly used in gene annotation. © 2015 by John Wiley & Sons, Inc.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Evidence Generation
  • Synthesis
  • Quality Control
  • Quality Metrics
  • Community Curation
  • Annotation Updating and Management
  • Future Considerations
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
  Abeel, T., Van Parys, T., Saeys, Y., Galagan, J., and Van De Peer, Y. 2012. GenomeView: a next‐generation genome browser. Nucleic Acids Res. 40:1‐10. doi: 10.1093/nar/gkr995.
  Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., George, R.A., Lewis, S.E., Richards, S., Ashburner, M., Henderson, S.N., Sutton, G.G., Wortman, J.R., Yandell, M.D., Zhang, Q., Chen, L.X., Brandon, R.C., Rogers, Y.H., Blazej, R.G., Champe, M., Pfeiffer, B.D., Wan, K.H., Doyle, C., Baxter, E.G., Helt, G., Nelson, C.R., Gabor, G.L., Abril, J.F., Agbayani, A., An, H.J., Andrews‐Pfannkoch, C., Baldwin, D., Ballew, R.M., Basu, A., Baxendale, J., Bayraktaroglu, L., Beasley, E.M., Beeson, K.Y., Benos, P.V., Berman, B.P., Bhandari, D., Bolshakov, S., Borkova, D., Botchan, M.R., Bouck, J., Brokstein, P., Brottier, P., Burtis, K.C., Busam, D.A., Butler, H., Cadieu, E., Center, A., Chandra, I., Cherry, J.M., Cawley, S., Dahlke, C., Davenport, L.B., Davies, P., de Pablos, B., Delcher, A., Deng, Z., Mays, A.D., Dew, I., Dietz, S.M., Dodson, K., Doup, L.E., Downes, M., Dugan‐Rocha, S., Dunkov, B.C., Dunn, P., Durbin, K.J., Evangelista, C.C., Ferraz, C., Ferriera, S., Fleischmann, W., Fosler, C., Gabrielian, A.E., Garg, N.S., Gelbart, W.M., Glasser, K., Glodek, A., Gong, F., Gorrell, J.H., Gu, Z., Guan, P., Harris, M., Harris, N.L., Harvey, D., Heiman, T.J., Hernandez, J.R., Houck, J., Hostin, D., Houston, K.A., Howland, T.J., Wei, M.H., Ibegwam, C., Jalali, M., Kalush, F., Karpen, G.H., Ke, Z., Kennison, J.A., Ketchum, K.A., Kimmel, B.E., Kodira, C.D., Kraft, C., Kravitz, S., Kulp, D., Lai, Z., Lasko, P., Lei, Y., Levitsky, A.A., Li, J., Li, Z., Liang, Y., Lin, X., Liu, X., Mattei, B., McIntosh, T.C., McLeod, M.P., McPherson, D., Merkulov, G., Milshina, N.V., Mobarry, C., Morris, J., Moshrefi, A., Mount, S.M., Moy, M., Murphy, B., Murphy, L., Muzny, D.M., Nelson, D.L., Nelson, D.R., Nelson, K.A., Nixon, K., Nusskern, D.R., Pacleb, J.M., Palazzolo, M., Pittman, G.S., Pan, S., Pollard, J., Puri, V., Reese, M.G., Reinert, K., Remington, K., Saunders, R.D., Scheeler, F., Shen, H., Shue, B.C., Sidén‐Kiamos, I., Simpson, M., Skupski, M.P., Smith, T., Spier, E., Spradling, A.C., Stapleton, M., Strong, R., Sun, E., Svirskas, R., Tector, C., Turner, R., Venter, E., Wang, A.H., Wang, X., Wang, Z.Y., Wassarman, D.A., Weinstock, G.M., Weissenbach, J., Williams, S.M., Woodage, T., Worley, K.C., Wu, D., Yang, S., Yao, Q.A., Ye, J., Yeh, R.F., Zaveri, J.S., Zhan, M., Zhang, G., Zhao, Q., Zheng, L., Zheng, X.H., Zhong, F.N., Zhong, W., Zhou, X., Zhu, S., Zhu, X., Smith, H.O., Gibbs, R.A., Myers, E.W., Rubin, G.M., and Venter, J.C. 2000. The genome sequence of Drosophila melanogaster. Science 287:2185‐2195. doi: 10.1126/science.287.5461.2185.
  Allen, J.E. and Salzberg, S.L. 2005. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21:3596‐3603. doi: 10.1093/bioinformatics/bti609.
  Amemiya, C.T., Alföldi, J., Lee, A.P., Fan, S., Philippe, H., Maccallum, I., Braasch, I., Manousaki, T., Schneider, I., Rohner, N., Organ, C., Chalopin, D., Smith, J.J., Robinson, M., Dorrington, R.A., Gerdol, M., Aken, B., Biscotti, M.A., Barucca, M., Baurain, D., Berlin, A.M., Blatch, G.L., Buonocore, F., Burmester, T., Campbell, M.S., Canapa, A., Cannon, J.P., Christoffels, A., De Moro, G., Edkins, A.L., Fan, L., Fausto, A.M., Feiner, N., Forconi, M., Gamieldien, J., Gnerre, S., Gnirke, A., Goldstone, J.V., Haerty, W., Hahn, M.E., Hesse, U., Hoffmann, S., Johnson, J., Karchner, S.I., Kuraku, S., Lara, M., Levin, J.Z., Litman, G.W., Mauceli, E., Miyake, T., Mueller, M.G., Nelson, D.R., Nitsche, A., Olmo, E., Ota, T., Pallavicini, A., Panji, S., Picone, B., Ponting, C.P., Prohaska, S.J., Przybylski, D., Saha, N.R., Ravi, V., Ribeiro, F.J., Sauka‐Spengler, T., Scapigliati, G., Searle, S.M., Sharpe, T., Simakov, O., Stadler, P.F., Stegeman, J.J., Sumiyama, K., Tabbaa, D., Tafer, H., Turner‐Maier, J., van Heusden, P., White, S., Williams, L., Yandell, M., Brinkmann, H., Volff, J.N., Tabin, C.J., Shubin, N., Schartl, M., Jaffe, D.B., Postlethwait, J.H., Venkatesh, B., Di Palma, F., Lander, E.S., Meyer, A., and Lindblad‐Toh, K. 2013. The African coelacanth genome provides insights into tetrapod evolution. Nature 496:311‐316. doi: 10.1038/nature12027.
  Angiuoli, S.V., Hotopp, J.C.D., Salzberg, S.L., and Tettelin, H. 2011. Improving pan‐genome annotation using whole genome multiple alignment. BMC Bioinformatics 12:272. doi: 10.1186/1471-2105-12-272.
  Bairoch, A. and Apweiler, R. 2000. The SWISS‐PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28:45‐48. doi: 10.1093/nar/28.1.45.
  Bhagwat, M., Young, L., and Robison, R.R. 2012. Using BLAT to find sequence similarity in closely related genomes. Curr. Protoc. Bioinform. 37:10.8.1–10.8.24.
  Bao, Z. and Eddy, S.R. 2003. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 13:1269‐1276. doi: 10.1101/gr.88502
  Benson, G. 1999. Tandem repeats: a program to analyze DNA sequences. Nucleic Acids Res. 27:573‐580. doi: 10.1093/nar/27.2.573.
  Birney, E., Clamp, M., and Durbin, R. 2004. GeneWise and Genomewise. Genome Res. 14:988‐995. doi: 10.1101/gr.1865504.
  Borodovsky, M. and Lomsadze, A. 2011a. Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite. Curr. Protoc. Bioinform. 35:4.5.1–4.5.17.
  Borodovsky, M. and Lomsadze, A. 2011b. Eukaryotic gene prediction using GeneMark.hmm‐E and GeneMark‐ES. Curr. Protoc. Bioinform. 35:4.6.1–4.6.10.
  Campbell, M.S., Holt, C., Moore, B. and Yandell, M. 2014a. Genome annotation and curation using MAKER and MAKER‐P. Curr. Protoc. Bioinform. 48:4.11.1‐4.11.39. doi: 10.1002/0471250953.bi0411s48.
  Campbell, M.S., Law, M., Holt, C., Stein, J.C., Moghe, G.D., Hufnagel, D.E., Lei, J., Achawanantakun, R., Jiao, D., Lawrence, C.J., Ware, D., Shiu, S.H., Childs, K.L., Sun, Y., Jiang, N., and Yandell, M. 2014b. MAKER‐P: a tool‐kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 164:513‐524. doi: 10.1104/pp.113.230144.
  Cantarel, B.L., Korf, I., Robb, S.M.C, Parra, G., Ross, E., Moore, B., Holt, C., Sánchez Alvarado, A., and Yandell, M. 2008. MAKER: an easy‐to‐use annotation pipeline designed for emerging model organism genomes. Genome. Res. 18:188‐196. doi: 10.1101/gr.6743907.1.
  Carbone, L., Harris, R.A., Gnerre, S., Veeramah, K.R., Lorente‐Galdos, B., Huddleston, J., Meyer, T.J., Herrero, J., Roos, C., Aken, B., Anaclerio, F., Archidiacono, N., Baker, C., Barrell, D., Batzer, M.A., Beal, K., Blancher, A., Bohrson, C.L., Brameier, M., Campbell, M.S., Capozzi, O., Casola, C., Chiatante, G., Cree, A., Damert, A., de Jong, P.J., Dumas, L., Fernandez‐Callejo, M., Flicek, P., Fuchs, N.V., Gut, I., Gut, M., Hahn, M.W., Hernandez‐Rodriguez, J., Hillier, L.W., Hubley, R., Ianc, B., Izsvák, Z., Jablonski, N.G., Johnstone, L.M., Karimpour‐Fard, A., Konkel, M.K., Kostka, D., Lazar, N.H., Lee, S.L., Lewis, L.R., Liu, Y., Locke, D.P., Mallick, S., Mendez, F.L., Muffato, M., Nazareth, L.V., Nevonen, K.A., O'Bleness, M., Ochis, C., Odom, D.T., Pollard, K.S., Quilez, J., Reich, D., Rocchi, M., Schumann, G.G., Searle, S., Sikela, J.M., Skollar, G., Smit, A., Sonmez, K., ten Hallers, B., Terhune, E., Thomas, G.W., Ullmer, B., Ventura, M., Walker, J.A., Wall, J.D., Walter, L., Ward, M.C., Wheelan, S.J., Whelan, C.W., White, S., Wilhelm, L.J., Woerner, A.E., Yandell, M., Zhu, B., Hammer, M.F., Marques‐Bonet, T., Eichler, E.E., Fulton, L., Fronick, C., Muzny, D.M., Warren, W.C., Worley, K.C., Rogers, J., Wilson, R.K., and Gibbs, R.A. 2014. Gibbon genome and the fast karyotype evolution of small apes. Nature 513:195‐201. doi: 10.1038/nature13679.
  Carver, T., Harris, S.R., Berriman, M., Parkhill, J., and McQuillan, J.A. 2012. Artemis: an integrated platform for visualization and analysis of high‐throughput sequence‐based experimental data. Bioinformatics 28:464‐469. doi: 10.1093/bioinformatics/btr703.
  Castoe, T.A., de Koning, A.P., Hall, K.T., Card, D.C., Schield, D.R., Fujita, M.K., Ruggiero, R.P., Degner, J.F., Daza, J.M., Gu, W., Reyes‐Velasco, J., Shaney, K.J., Castoe, J.M., Fox, S.E., Poole, A.W., Polanco, D., Dobry, J., Vandewege, M.W., Li, Q., Schott, R.K., Kapusta, A., Minx, P., Feschotte, C., Uetz, P., Ray, D.A., Hoffmann, F.G., Bogden, R., Smith, E.N., Chang, B.S., Vonk, F.J., Casewell, N.R., Henkel, C.V., Richardson, M.K., Mackessy, S.P., Bronikowski, A.M., Bronikowsi, A.M., Yandell, M., Warren, W.C., Secor, S.M., and Pollock, D.D. 2013. The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc. Natl. Acad. Sci. U.S.A. 110:20645‐20650. doi: 10.1073/pnas.1314475110.
  Coghlan, A., Fiedler, T.J., McKay, S.J., Flicek, P., Harris, T.W., Blasiar, D., and Stein, L.D. 2008. nGASP—the nematode genome annotation assessment project. BMC Bioinformatics 9:549. doi: 10.1186/1471-2105-9-549.
  Curwen, V., Eyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M.J., and Clamp, M. 2004. The Ensembl automatic gene annotation system. Genome Res. 14:942‐950. doi: 10.1101/gr.1858004.
  Dobin, A. and Gingeras, T.R. 2015. Mapping RNA‐seq reads with STAR. Curr. Protoc. Bioinform. 51:11.14.1‐11.14.19. doi: 10.1002/0471250953.bi1114s51
  Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. 2013. STAR: ultrafast universal RNA‐seq aligner. Bioinformatics 29:15‐21. doi: 10.1093/bioinformatics/bts635.
  Donlin, M.J. 2009. Using the generic genome browser (GBrowse). Curr. Protoc. Bioinform. 28:9.9.1‐9.9.25.
  Eilbeck, K., Moore, B., Holt, C., and Yandell, M. 2009. Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10:67. doi: 10.1186/1471-2105-10-67.
  Ellinghaus, D., Kurtz, S., and Willhoeft, U. 2008. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9:18. doi: 10.1186/1471-2105-9-18.
  Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., de Graaf, D.C., Debyser, G., Deng, J., Devreese, B., Elhaik, E., Evans, J.D., Foster, L.J., Graur, D., Guigo, R., HGSC, P.T., Hoff, K.J., Holder, M.E., Hudson, M.E., Hunt, G.J., Jiang, H., Joshi, V., Khetani, R.S., Kosarev, P., Kovar, C.L., Ma, J., Maleszka, R., Moritz, R.F., Munoz‐Torres, M.C., Murphy, T.D., Muzny, D.M., Newsham, I.F., Reese, J.T., Robertson, H.M., Robinson, G.E., Rueppell, O., Solovyev, V., Stanke, M., Stolle, E., Tsuruda, J.M., Vaerenbergh, M.V., Waterhouse, R.M., Weaver, D.B., Whitfield, C.W., Wu, Y., Zdobnov, E.M., Zhang, L., Zhu, D., Gibbs, R.A., and on behalf of Honey Bee Genome Sequencing Consortium 2014. Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics 15:86. doi: 10.1186/1471-2164-15-86.
  Fernández‐Suárez, X.M. and Schuster, M.K. 2010. Using the Ensembl genome server to browse genomic sequence data. Curr. Protoc. Bioinform. 30:1.15.1‐1.15.48.
  Foissac, S., Gouzy, J., Rombauts, S., Mathe, C., Amselem, J., Sterck, L., de Peer, Y.V., Rouze, P., and Schiex, T. 2008. Genome annotation in plants and fungi: EuGene as a model platform. Curr. Bioinformatics 3:87‐97. doi: 10.2174/157489308784340702.
  Gibney, G. and Baxevanis, A. D. 2011. Searching NCBI databases using Entrez. Curr. Protoc. Bioinform. 34:1.3.1‐1.3.25.
  Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A., Muir, A., Merchant, N., Lowry, S., Mock, S., Helmke, M., Kubach, A., Narro, M., Hopkins, N., Micklos, D., Hilgert, U., Gonzales, M., Jordan, C., Skidmore, E., Dooley, R., Cazes, J., McLay, R., Lu, Z., Pasternak, S., Koesterke, L., Piel, W.H., Grene, R., Noutsos, C., Gendler, K., Feng, X., Tang, C., Lent, M., Kim, S.J., Kvilekval, K., Manjunath, B.S., Tannen, V., Stamatakis, A., Sanderson, M., Welch, S.M., Cranston, K.A., Soltis, P., Soltis, D., O'Meara, B., Ane, C., Brutnell, T., Kleibenstein, D.J., White, J.W., Leebens‐Mack, J., Donoghue, M.J., Spalding, E.P., Vision, T.J., Myers, C.R., Lowenthal, D., Enquist, B.J., Boyle, B., Akoglu, A., Andrews, G., Ram, S., Ware, D., Stein, L., and Stanzione, D. 2011. The iPlant collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2:34. doi: 10.3389/fpls.2011.00034.
  Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad‐Toh, K., Friedman, N., and Regev, A. 2011. Full‐length transcriptome assembly from RNA‐Seq data without a reference genome. Nat. Biotechnol. 29:644‐652. doi: 10.1038/nbt.1883.
  Gross, S.S., Do, C.B., Sirota, M., and Batzoglou, S. 2007. CONTRAST: a discriminative, phylogeny‐free approach to multiple informant de novo gene prediction. Genome Biol. 8:R269. doi: 10.1186/gb-2007-8-12-r269.
  Guigó, R., Flicek, P., Abril, J.F., Reymond, A., Lagarde, J., Denoeud, F., Antonarakis, S., Ashburner, M., Bajic, V.B., Birney, E., Castelo, R., Eyras, E., Ucla, C., Gingeras, T.R., Harrow, J., Hubbard, T., Lewis, S.E., and Reese, M.G. 2006. EGASP: the human ENCODE genome annotation assessment project. Genome Biol. 7:S2.1‐31. doi: 10.1186/gb-2006-7-s1-s2
  Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., White, O., Buell, C.R., and Wortman, J.R. 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9:R7. doi: 10.1186/gb-2008-9-1-r7.
  Han, Y. and Wessler, S.R. 2010. MITE‐Hunter: a program for discovering miniature inverted‐repeat transposable elements from genomic sequences. Nucleic Acids Res. 38:e199. doi: 10.1093/nar/gkq862
  Hoff, K.J. and Stanke, M. 2013. WebAUGUSTUS—a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res. 41:123‐128. doi: 10.1093/nar/gkt418.
  Holt, C. and Yandell, M. 2011. MAKER2: an annotation pipeline and genome‐database management tool for second‐generation genome projects. BMC Bioinformatics 12:491. doi: 10.1186/1471-2105-12-491.
  Jurka, J., Kapitonov, V. V, Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. 2005. Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110:462‐467. doi: 10.1159/000084979.
  Kapustin, Y., Souvorov, A., Tatusova, T., and Lipman, D. 2008. Splign: algorithms for computing spliced alignments with identification of paralogs. Biol. Direct 3:20. doi: 10.1186/1745-6150-3-20.
  Kent, W.J. 2002. BLAT—the BLAST‐like alignment tool. Genome Res. 656‐664. doi: 10.1101/gr.229202.
  Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. 2013. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14:R36. doi: 10.1186/gb-2013-14-4-r36
  Korf, I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59. doi: 10.1186/1471-2105-5-59.
  Korf, I., Yandell, M., and Bedell, J. 2003. Blast. O'Reilly and Associates, Inc. Sebastopol, CA.
  Ladunga, I. 2009. Finding homologs in amino acid sequences using network BLAST searches. Curr. Protoc. Bioinform. 25:3.4.1‐3.4.34.
  Lamesch, P., Dreher, K., Swarbreck, D., Sasidharan, R., Reiser, L., and Huala, E. 2010. Using the Arabidopsis Information Resource (TAIR) to find information about Arabidopsis genes. Curr. Protoc. Bioinform. 30:1.11.1‐1.11.51.
  Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D.L., Garcia‐Hernandez, M., Karthikeyan, A.S., Lee, C.H., Nelson, W.D., Ploetz, L., Singh, S., Wensel, A., and Huala, E. 2012. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 40:D1202‐D1210. doi: 10.1093/nar/gkr1090.
  Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J.P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange‐Thomann, Y., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J.C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R.H., Wilson, R.K., Hillier, L.W., McPherson, J.D., Marra, M.A., Mardis, E.R., Fulton, L.A., Chinwalla, A.T., Pepin, K.H., Gish, W.R., Chissoe, S.L., Wendl, M.C., Delehaunty, K.D., Miner, T.L., Delehaunty, A., Kramer, J.B., Cook, L.L., Fulton, R.S., Johnson, D.L., Minx, P.J., Clifton, S.W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R.A., Muzny, D.M., Scherer, S.E., Bouck, J.B., Sodergren, E.J., Worley, K.C., Rives, C.M., Gorrell, J.H., Metzker, M.L., Naylor, S.L., Kucherlapati, R.S., Nelson, D.L., Weinstock, G.M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Smith, D.R., Doucette‐Stamm, L., Rubenfield, M., Weinstock, K., Lee, H.M., Dubois, J., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R.W., Federspiel, N.A., Abola, A.P., Proctor, M.J., Myers, R.M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D.R., Olson, M.V., Kaul, R., Raymond, C., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G.A., Athanasiou, M., Schultz, R., Roe, B.A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W.R., de la Bastide, M., Dedhia, N., Blöcker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J.A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D.G., Burge, C.B., Cerutti, L., Chen, H.C., Church, D., Clamp, M., Copley, R.R., Doerks, T., Eddy, S.R., Eichler, E.E., Furey, T.S., Galagan, J., Gilbert, J.G., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L.S., Jones, T.A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W.J., Kitts, P., Koonin, E.V., Korf, I., Kulp, D., Lancet, D., Lowe, T.M., McLysaght, A., Mikkelsen, T., Moran, J.V., Mulder, N., Pollara, V.J., Ponting, C.P., Schuler, G., Schultz, J., Slater, G., Smit, A.F., Stupka, E., Szustakowki, J., Thierry‐Mieg, D., Thierry‐Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y.I., Wolfe, K.H., Yang, S.P., Yeh, R.F., Collins, F., Guyer, M.S., Peterson, J., Felsenfeld, A., Wetterstrand, K.A., Patrinos, A., Morgan, M.J., de Jong, P., Catanese, J.J., Osoegawa, K., Shizuya, H., Choi, S., Chen, Y.J., Szustakowki, J., and International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860‐921. doi: 10.1038/35057062.
  Lee, E., Helt, G.A., Reese, J.T., Munoz‐Torres, M.C., Childers, C.P., Buels, R.M., Stein, L., Holmes, I.H., Elsik, C.G., and Lewis, S.E. 2013. Web Apollo: a web‐based genomic annotation editing platform. Genome Biol. 14:R93. doi: 10.1186/gb-2013-14-8-r93.
  Liu, Q., Mackey, A.J., Roos, D.S., and Pereira, F.C.N. 2008. Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction. Bioinformatics 24:597‐605. doi: 10.1093/bioinformatics/btn004.
  Lomsadze, A., Burns, P.D., and Borodovsky, M. 2014. Integration of mapped RNA‐Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 42:1‐8. doi: 10.1093/nar/gku557.
  Lomsadze, A., Ter‐Hovhannisyan, V., Chernoff, Y.O., and Borodovsky, M. 2005. Gene identification in novel eukaryotic genomes by self‐training algorithm. Nucleic Acids Res. 33:6494‐6506. doi: 10.1093/nar/gki937.
  Meyer, I.M. and Durbin, R. 2004. Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32:776‐783. doi: 10.1093/nar/gkh211.
  Ming, R., VanBuren, R., Liu, Y., Yang, M., Han, Y., Li, L.T., Zhang, Q., Kim, M.J., Schatz, M.C., Campbell, M., Li, J., Bowers, J.E., Tang, H., Lyons, E., Ferguson, A.A., Narzisi, G., Nelson, D.R., Blaby‐Haas, C.E., Gschwend, A.R., Jiao, Y., Der, J.P., Zeng, F., Han, J., Min, X.J., Hudson, K.A., Singh, R., Grennan, A.K., Karpowicz, S.J., Watling, J.R., Ito, K., Robinson, S.A., Hudson, M.E., Yu, Q., Mockler, T.C., Carroll, A., Zheng, Y., Sunkar, R., Jia, R., Chen, N., Arro, J., Wai, C.M., Wafula, E., Spence, A., Han, Y., Xu, L., Zhang, J., Peery, R., Haus, M.J., Xiong, W., Walsh, J.A., Wu, J., Wang, M.L., Zhu, Y.J., Paull, R.E., Britt, A.B., Du, C., Downie, S.R., Schuler, M.A., Michael, T.P., Long, S.P., Ort, D.R., Schopf, J.W., Gang, D.R., Jiang, N., Yandell, M., dePamphilis, C.W., Merchant, S.S., Paterson, A.H., Buchanan, B.B., Li, S., and Shen‐Miller, J. 2013. Genome of the long‐living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol. 14:R41. doi: 10.1186/gb-2013-14-5-r41.
  Oliver, S.L., Lenards, A.J., Barthelson, R.A., Merchant, N., and McKay, S.J. 2013. Using the iPlant collaborative discovery environment. Curr. Protoc. Bioinform. 42:1.22.1‐1.22.26.
  Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.‐C., Mendell, J.T., and Salzberg, S.L. 2015. StringTie enables improved reconstruction of a transcriptome from RNA‐seq reads. Nat. Biotechnol. 33:290‐295. doi: 10.1038/nbt.3122.
  Prado‐Martinez, J., Sudmant, P.H., Kidd, J.M., Li, H., Kelley, J.L., Lorente‐Galdos, B., Veeramah, K.R., Woerner, A.E., O'Connor, T.D., Santpere, G., Cagan, A., Theunert, C., Casals, F., Laayouni, H., Munch, K., Hobolth, A., Halager, A.E., Malig, M., Hernandez‐Rodriguez, J., Hernando‐Herraez, I., Prüfer, K., Pybus, M., Johnstone, L., Lachmann, M., Alkan, C., Twigg, D., Petit, N., Baker, C., Hormozdiari, F., Fernandez‐Callejo, M., Dabad, M., Wilson, M.L., Stevison, L., Camprubí, C., Carvalho, T., Ruiz‐Herrera, A., Vives, L., Mele, M., Abello, T., Kondova, I., Bontrop, R.E., Pusey, A., Lankester, F., Kiyang, J.A., Bergl, R.A., Lonsdorf, E., Myers, S., Ventura, M., Gagneux, P., Comas, D., Siegismund, H., Blanc, J., Agueda‐Calpena, L., Gut, M., Fulton, L., Tishkoff, S.A., Mullikin, J.C., Wilson, R.K., Gut, I.G., Gonder, M.K., Ryder, O.A., Hahn, B.H., Navarro, A., Akey, J.M., Bertranpetit, J., Reich, D., Mailund, T., Schierup, M.H., Hvilsom, C., Andrés, A.M., Wall, J.D., Bustamante, C.D., Hammer, M.F., Eichler, E.E., and Marques‐Bonet, T. 2013. Great ape genetic diversity and population history. Nature 499:471‐475. doi: 10.1038/nature12228.
  Price, A.L., Jones, N.C., and Pevzner, P.A. 2005. De novo identification of repeat families in large genomes. Bioinformatics, 21:351‐358. doi: 10.1093/bioinformatics/bti1018.
  Quinlan, A.R. 2014. BEDTools: the Swiss‐army tool for genome feature analysis. Curr. Protoc. Bioinform. 47:11.12.1‐11.12.34.
  Quinlan, A.R. and Hall, I.M. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841‐842. doi: 10.1093/bioinformatics/btq033.
  Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S.D., Mungall, K., Lee, S., Okada, H.M., Qian, J.Q., Griffith, M., Raymond, A., Thiessen, N., Cezard, T., Butterfield, Y.S., Newsome, R., Chan, S.K., She, R., Varhol, R., Kamoh, B., Prabhu, A.L., Tam, A., Zhao, Y., Moore, R.A., Hirst, M., Marra, M.A., Jones, S.J., Hoodless, P.A., and Birol, I. 2010. De novo assembly and analysis of RNA‐seq data. Nat. Methods 7:909‐912. doi: 10.1038/nmeth.1517.
  Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24‐26. doi: 10.1038/nbt.1754.
  Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., Wright, M.H., Chia, J.M., Ware, D., McCouch, S.R., and McCombie, W.R. 2014. Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol. 15:506. doi: 10.1186/s13059-014-0506-z
  Schweikert, G., Zien, A., Zeller, G., Behr, J., Dieterich, C., Ong, C.S., Philips, P., De Bona, F., Hartmann, L., Bohlen, A., Krüger, N., Sonnenburg, S., and Rätsch, G. 2009. mGene: accurate SVM‐based gene finding with an application to nematode genomes. Genome Res. 19:2133‐2143. doi: 10.1101/gr.090597.108.
  Shapiro, M.D., Kronenberg, Z., Li, C., Domyan, E.T., Pan, H., Campbell, M., Tan, H., Huff, C.D., Hu, H., Vickrey, A.I., Nielsen, S.C., Stringham, S.A., Hu, H., Willerslev, E., Gilbert, M.T., Yandell, M., Zhang, G., and Wang, J. 2013. Genomic diversity and evolution of the head crest in the rock pigeon. Science 339:1063‐1067. doi: 10.1126/science.1230422.
  Skinner, M.E. and Holmes, I.H. 2010. Setting up the JBrowse genome browser. Curr. Protoc. Bioinform. 32:9.13.1‐9.13.13.
  Skinner, M.E., Uzilov, A.V., Stein, L.D., Mungall, C.J., and Holmes, I.H. 2009. JBrowse: a next‐generation genome browser. Genome Res. 19:1630‐1638. doi: 10.1101/gr.094607.109.
  Slater, G.S.C. and Birney, E. 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31. doi: 10.1186/1471-2105-6-31.
  Smith, C.D., Zimin, A., Holt, C., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C.R., Elhaik, E., Elsik, C.G., Fave, M.J., Fernandes, V., Gadau, J., Gibson, J.D., Graur, D., Grubbs, K.J., Hagen, D.E., Helmkampf, M., Holley, J.A., Hu, H., Viniegra, A.S., Johnson, B.R., Johnson, R.M., Khila, A., Kim, J.W., Laird, J., Mathis, K.A., Moeller, J.A., Muñoz‐Torres, M.C., Murphy, M.C., Nakamura, R., Nigam, S., Overson, R.P., Placek, J.E., Rajakumar, R., Reese, J.T., Robertson, H.M., Smith, C.R., Suarez, A.V., Suen, G., Suhr, E.L., Tao, S., Torres, C.W., van Wilgenburg, E., Viljakainen, L., Walden, K.K., Wild, A.L., Yandell, M., Yorke, J.A., and Tsutsui, N.D. 2011. Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). Proc. Natl. Acad. Sci. U.S.A. 108:5673‐5678. doi: 10.1073/pnas.1008617108.
  Smith, C.R., Smith, C.D., Robertson, H.M., Helmkampf, M., Zimin, A., Yandell, M., Holt, C., Hu, H., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C.R., Elhaik, E., Elsik, C.G., Favé, M.J., Fernandes, V., Gibson, J.D., Graur, D., Gronenberg, W., Grubbs, K.J., Hagen, D.E., Viniegra, A.S., Johnson, B.R., Johnson, R.M., Khila, A., Kim, J.W., Mathis, K.A., Munoz‐Torres, M.C., Murphy, M.C., Mustard, J.A., Nakamura, R., Niehuis, O., Nigam, S., Overson, R.P., Placek, J.E., Rajakumar, R., Reese, J.T., Suen, G., Tao, S., Torres, C.W., Tsutsui, N.D., Viljakainen, L., Wolschin, F., and Gadau, J. 2011. Draft genome of the red harvester ant Pogonomyrmex barbatus. Proc. Natl. Acad. Sci. U.S.A. 108:5667‐5672. doi: 10.1073/pnas.1007901108.
  Smith, J.J., Kuraku, S., Holt, C., Sauka‐Spengler, T., Jiang, N., Campbell, M.S., Yandell, M.D., Manousaki, T., Meyer, A., Bloom, O.E., Morgan, J.R., Buxbaum, J.D., Sachidanandam, R., Sims, C., Garruss, A.S., Cook, M., Krumlauf, R., Wiedemann, L.M., Sower, S.A., Decatur, W.A., Hall, J.A., Amemiya, C.T., Saha, N.R., Buckley, K.M., Rast, J.P., Das, S., Hirano, M., McCurley, N., Guo, P., Rohner, N., Tabin, C.J., Piccinelli, P., Elgar, G., Ruffier, M., Aken, B.L., Searle, S.M., Muffato, M., Pignatelli, M., Herrero, J., Jones, M., Brown, C.T., Chung‐Davidson, Y.W., Nanlohy, K.G., Libants, S.V., Yeh, C.Y., McCauley, D.W., Langeland, J.A., Pancer, Z., Fritzsch, B., de Jong, P.J., Zhu, B., Fulton, L.L., Theising, B., Flicek, P., Bronner, M.E., Warren, W.C., Clifton, S.W., Wilson, R.K., and Li, W. 2013. Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nat. Genetics 45:415‐421, 421e1‐2. doi: 10.1038/ng.2568.
  Solovyev, V., Kosarev, P., Seledsov, I., and Vorobyev, D. 2006. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7:S10.1‐12. doi: 10.1186/gb-2006-7-s1-s10.
  Stanke, M. and Waack, S. 2003. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:ii215‐ii225. doi: 10.1093/bioinformatics/btg1080.
  Stanke, M., Diekhans, M., Baertsch, R., and Haussler, D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637‐644. doi: 10.1093/bioinformatics/btn013.
  Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., and Lewis, S. 2002. The generic genome browser: a building block for a model organism system database. Genome Res. 12:1599‐1610. doi: 10.1101/gr.403602.
  Suen, G., Teiling, C., Li, L., Holt, C., Abouheif, E., Bornberg‐Bauer, E., Bouffard, P., Caldera, E.J., Cash, E., Cavanaugh, A., Denas, O., Elhaik, E., Favé, M.J., Gadau, J., Gibson, J.D., Graur, D., Grubbs, K.J., Hagen, D.E., Harkins, T.T., Helmkampf, M., Hu, H., Johnson, B.R., Kim, J., Marsh, S.E., Moeller, J.A., Muñoz‐Torres, M.C., Murphy, M.C., Naughton, M.C., Nigam, S., Overson, R., Rajakumar, R., Reese, J.T., Scott, J.J., Smith, C.R., Tao, S., Tsutsui, N.D., Viljakainen, L., Wissler, L., Yandell, M.D., Zimmer, F., Taylor, J., Slater, S.C., Clifton, S.W., Warren, W.C., Elsik, C.G., Smith, C.D., Weinstock, G.M., Gerardo, N.M., and Currie, C.R. 2011. The genome sequence of the leaf‐cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genetics 7:e1002007. doi: 10.1371/journal.pgen.1002007.
  Tarailo‐Graovac, M. and Chen, N. 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 25:4.10.1‐4.10.14.
  Ter‐hovhannisyan, V., Lomsadze, A., Chernoff, Y.O., and Borodovsky, M. 2008. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome. Res. 18:1979‐1990. doi: 10.1101/gr.081612.108.
  The Arabidospsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796‐815. doi: 10.1038/35048692.
  Thibaud‐Nissen, F., Souvorov, A., Murphy, T., DiCuccio, M., and Kitts, P. 2013. Eukaryotic genome annotation pipeline. In The NCBI Handbook. 2nd edition. National Center for Biotechnology Information. Bethesda, MD. http://www.ncbi.nlm.nih.gov/books/NBK169439/
  Thorvaldsdóttir, H., Robinson, J.T., Turner, D., and Mesirov, J.P. 2015. A genomic data viewer for iPad. Genome Biol. 16:1‐6. doi: 10.1186/s13059-015-0595-3.
  Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. 2012. Differential gene and transcript expression analysis of RNA‐seq experiments with TopHat and Cufflinks. Nat. Protoc. 7:562‐578. doi: 10.1038/nprot.2012.016.
  Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., Gocayne, J.D., Amanatides, P., Ballew, R.M., Huson, D.H., Wortman, J.R., Zhang, Q., Kodira, C.D., Zheng, X.H., Chen, L., Skupski, M., Subramanian, G., Thomas, P.D., Zhang, J., Gabor Miklos, G.L., Nelson, C., Broder, S., Clark, A.G., Nadeau, J., McKusick, V.A., Zinder, N., Levine, A.J., Roberts, R.J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu‐Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A.E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T.J., Higgins, M.E., Ji, R.R., Ke, Z., Ketchum, K.A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G.V., Milshina, N., Moore, H.M., Naik, A.K., Narayan, V.A., Neelam, B., Nusskern, D., Rusch, D.B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M.L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, M., Moy, L., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y.H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N.N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S., Williams, M., Windsor, S., Winn‐Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J.F., Guigó, R., Campbell, M.J., Sjolander, K.V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes‐Stine, J., Caulk, P., Chiang, Y.H., Coyne, M., Dahlke, C., Mays, A., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., and Zhu, X. 2001. The sequence of the human genome. Science 291:1304‐1351. doi: 10.1126/science.1058040.
  Vonk, F.J., Casewell, N.R., Henkel, C.V., Heimberg, A.M., Jansen, H.J., McCleary, R.J., Kerkkamp, H.M., Vos, R.A., Guerreiro, I., Calvete, J.J., Wüster, W., Woods, A.E., Logan, J.M., Harrison, R.A., Castoe, T.A., de Koning, A.P., Pollock, D.D., Yandell, M., Calderon, D., Renjifo, C., Currier, R.B., Salgado, D., Pla, D., Sanz, L., Hyder, A.S., Ribeiro, J.M., Arntzen, J.W., van den Thillart, G.E., Boetzer, M., Pirovano, W., Dirks, R.P., Spaink, H.P., Duboule, D., McGlinn, E., Kini, R.M., and Richardson, M.K. 2013. The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc. Natl. Acad. Sci. U.S.A. 110:20651‐20656. doi: 10.1073/pnas.1314702110.
  Wang, Y., Chen, L., Song, N., and Lei, X. 2015. GASS: genome structural annotation for eukaryotes based on species similarity. BMC Genomics 16:1‐14. doi: 10.1186/s12864-015-1353-3.
  Wang, K., Singh, D., Zeng, Z., Coleman, S.J., Huang, Y., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., MacLeod, J.N., Chiang, D.Y., Prins, J.F., and Liu, J. 2010. MapSplice: accurate mapping of RNA‐seq reads for splice junction discovery. Nucleic Acids Res. 38:1‐14. doi: 10.1093/nar/gkq622.
  Wegrzyn, J.L., Liechty, J.D., Stevens, K.A., Wu, L.S., Loopstra, C.A., Vasquez‐Gross, H.A., Dougherty, W.M., Lin, B.Y., Zieve, J.J., Martínez‐García, P.J., Holt, C., Yandell, M., Zimin, A.V., Yorke, J.A., Crepeau, M.W., Puiu, D., Salzberg, S.L., Dejong, P.J., Mockaitis, K., Main, D., Langley, C.H., and Neale, D.B. 2014. Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics 196:891‐909. doi: 10.1534/genetics.113.159996.
  Wu, T.D. and Nacu, S. 2010. Fast and SNP‐tolerant detection of complex variants and splicing in short reads. Bioinformatics 26:873‐881. doi: 10.1093/bioinformatics/btq057.
  Yandell, M. and Ence, D. 2012. A beginner's guide to eukaryotic genome annotation. Nat. Rev. Genet. 135:329–342. doi: 10.1038/nrg3174.
Internet Resources
  http://repeatmasker.org
  RepeatMasker. Contributed by A. Smit, R. Hubley, and P. Green.
  http://www.repeatmasker.org/RepeatModeler.html
  RepeatModeler. Contributed by A. Smit and R. Hubley.
  http://weatherby.genetics.utah.edu/MAKER/data/ProtExcluder1.1.tar.gz
  Repeat Library Construction. This wiki site describes how to use MAKER to construct a repeat library from a newly sequenced genome. Contributed by N. Jiang.
  http://www.novocraft.com/products/novoalign/
  NovoAlign
  http://glean‐gene.sourceforge.net/
  GLEAN
  http://aws.amazon.com/ec2/
  Amazon EC2
  ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_gene_confidence_ranking/DOCUMENTATION_TAIR_Gene_Confidence.pdf
  Arabidopsis Information Resource documentation for the gene model and exon confidence ranking system. 2009.
  http://www.broadinstitute.org/annotation/argo
  Argo
  http://www.ncbi.nlm.nih.gov/core/assets/genome/files/Gnomon‐description.pdf
  Gnomon—NCBI eukaryotic gene prediction tool. 2010. Contributed by A. Souvorov, Y. Kapustin, B. Kiryutin, V. Chetvernin, T. Tatusova, and D. Lipman. National Center for Biotechnology Information. Bethesda, MD.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library