Genome Annotation and Curation Using MAKER and MAKER‐P

Michael S. Campbell1, Carson Holt2, Barry Moore2, Mark Yandell2

1 Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, 2 USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.11
DOI:  10.1002/0471250953.bi0411s48
Online Posting Date:  December, 2014
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


This unit describes how to use the genome annotation and curation tools MAKER and MAKER‐P to annotate protein‐coding and noncoding RNA genes in newly assembled genomes, update/combine legacy annotations in light of new evidence, add quality metrics to annotations from other pipelines, and map existing annotations to a new assembly. MAKER and MAKER‐P can rapidly annotate genomes of any size, and scale to match available computational resources. © 2014 by John Wiley & Sons, Inc.

Keywords: genome annotation; comparative genomics; gene finding; plants

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: De Novo Genome Annotation Using Maker
  • Alternate Protocol 1: De Novo Genome Annotation Using Pre‐Existing Evidence Alignments and Gene Predictions
  • Alternate Protocol 2: Parallelized De Novo Genome Annotation Using MPI
  • Alternate Protocol 3: Parallelized De Novo Genome Annotation Without MPI
  • Support Protocol 1: Training Gene Finders for use with Maker
  • Support Protocol 2: Renaming Genes for Genbank Submission
  • Support Protocol 3: Assigning Putative Gene Function
  • Support Protocol 4: Labeling Evidence Sources for Display in Genome Browsers
  • Basic Protocol 2: Updating/Combining Legacy Annotation Datasets in Light of New Evidence
  • Basic Protocol 3: Adding Maker's Quality‐Control Metrics to Annotations from Another Pipeline
  • Basic Protocol 4: Mapping Annotations to a New Assembly
  • Basic Protocol 5: The Maker Gene Build/Rescuing Rejected Gene Models
  • Guidelines for Understanding Results
  • Commentary
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J. D., Amanatides, P. G., and Venter, J.C. 2000. The genome sequence of Drosophila melanogaster. Science 287:2185‐2195.
  Bradnam, K.R., Fass, J.N., Alexandrov, A., Baranay, P., Bechner, M., Birol, I., Boisvert, S., Chapman, J.A., Chapuis, G., Chikhi, R., Chitsaz, H., Chou, W.C., et al. 2013. Assemblathon 2: Evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2:10. doi:10.1186/2047‐217X‐2‐10.
  Campbell, M., Law, M., Holt, C., Stein, J., Moghe, G., Hufnagel, D., Lei, J., Achawanantakun, R., Jiao, D., Lawrence, C.J., Ware, D., Shiu, S.H., Childs, K.L., Sun, Y., Jiang, N., and Yandell, M. 2013. MAKER‐P: A tool‐kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 164:513‐524.
  Cantarel, B.L., Korf, I., Robb, S.M.C., Parra, G., Ross, E., Moore, B., Holt, C., Sanchez Alvarado, A., and Yandell, M. 2008. MAKER: An easy‐to‐use annotation pipeline designed for emerging model organism genomes. 18:188‐196.
  Eilbeck, K., Moore, B., Holt, C., and Yandell, M. 2009. Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10:67.
  Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A., Muir, A., Merchant, N., et al. 2011. The iPlant collaborative: Cyberinfrastructure for plant biology. Front. Plant Sci. 2:34.
  Holt, C. and Yandell, M. 2011. MAKER2: An annotation pipeline and genome‐database management tool for second‐generation genome projects. BMC Bioinformatics 12:491.
  Jurka, J., Kapitonov, V.V, Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110:462‐467.
  Korf, I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59.
  Law, M., Childs, K.L., Campbell, M.S., Stein, J.C., Holt, C., Olson, A.J., Holt, C., Lei, J., Jiao, D. Andorf, C.M., Ware, D. Shiu, S.‐H., Sun, Y. Jiang, N., and Yandell, M. 2014. Automated update, revision and quality control of the Zea mays genome annotations using MAKER‐P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol. In press.
  Lévesque, C.A., Brouwer, H., Cano, L., Hamilton, J.P., Holt, C., Huitema, E., Raffaele, S., Robideau, G.P., Thines, M., Win, J., Zerillo, M.M. Beakes, G.W., et al. 2010. Genome sequence of the necrotrophic plant pathogen Pythium ultimum reveals original pathogenicity mechanisms and effector repertoire. Genome Biol. 11:R73.
  Lipman, D.J. and Pearson, W.R. 1985. Rapid and sensitive protein similarity searches. Science 227:1435‐1441.
  Lowe, T.M. 1999. A computational screen for methylation guide snoRNAs in yeast. Science 283:1168‐1171.
  Lowe, T.M. and Eddy, S.R. 1997. tRNAscan‐SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955‐964.
  Campbell, M.S., Law, M., Holt, C., Stein, J.C., Moghe, G.D., Hufnagel, D.E., Lei, J., Achawanantakun, R., Jiao, D., Lawrence, C.J., Ware, D., Shiu, S.H., Childs, K.L., Sun, Y., Jiang, N., and Yandell, M. 2014. MAKER‐P: A tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 164:513‐524.
  Neale, D.B., Wegrzyn, J.L., Stevens, K.A., Zimin, A.V., Puiu, D., Crepeau, M.W., Cardeno, C., Koriabine, M., Holtz‐Morris, A.E., Liechty, J.D., Martínez‐García, P.J., Vasquez‐Gross, H.A., et al. 2014. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol. 15:R59.
  Parra, G., Bradnam, K., and Korf, I. 2007. CEGMA: A pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061‐1067.
  Press, H., York, N., and Nw, A. 1998. Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282:2012‐2018.
  Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., and Lopez, R. 2005. InterProScan: Protein domains identifier. Nucleic Acids Res. 33:W116‐W120.
  Waterston, R.H., Lindblad‐Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., Antonarakis, S.E., Attwood, J., et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520‐562.
  Wegrzyn, J.L., Liechty, J.D., Stevens, K.A., Wu, L.‐S., Loopstra, C.A., Vasquez‐Gross, H.A., Dougherty, W.M., Lin, B.Y., Zieve, J.J., Martínez‐García, P.J., Holt, C., Yandell, M., Zimin, A.V., Yorke, J.A., Crepeau, M.W., Puiu, D., Salzberg, S.L., Dejong, P.J., Mockaitis, K., Main, D., Langley, C.H., and Neale, D.B. 2014. Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics 196:891‐909.
  Yandell, M. and Ence, D. 2012. A beginner's guide to eukaryotic genome annotation. Nat. Rev. Genet. 13:329‐342.
  Zimin, A., Stevens, K.A., Crepeau, M.W., Holtz‐Morris, A., Koriabine, M., Marçais, G., Puiu, D., Roberts, M., Wegrzyn, J.L., de Jong, P.J., Neale, D.B., Salzberg, S.L., Yorke, J.A., and Langley, C.H. 2014. Sequencing and assembly of the 22‐gb loblolly pine genome. Genetics 196:875‐890.
  Zou, C., Lehti‐Shiu, M.D., Thibaud‐Nissen, F., Prakash, T., Buell, C.R., and Shiu, S.‐H. 2009. Evolutionary and expression signatures of pseudogenes in Arabidopsis and rice. Plant Physiol. 151:3‐15.
Internet Resources
  Generic Feature Format version 3.
  CPAN Web site (A. König, developer).
PDF or HTML at Wiley Online Library