Next Generation Sequence Assembly with AMOS

Todd J. Treangen1, Dan D. Sommer1, Florent E. Angly2, Sergey Koren3, Mihai Pop3

1 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 2 Advanced Water Management Centre, University of Queensland, St. Lucia, Brisbane, Australia, 3 Department of Computer Science, University of Maryland, College Park, Maryland
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 11.8
DOI:  10.1002/0471250953.bi1108s33
Online Posting Date:  March, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


A Modular Open‐Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including the lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality.Curr. Protoc. Bioinform. 33:11.8.1‐11.8.18. © 2011 by John Wiley & Sons, Inc.

Keywords: next‐generation sequencing; genome assembly; open source

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Assembly of a Small Bacterial Genome
  • Basic Protocol 2: Assembly of a Phage Genome
  • Basic Protocol 3: Metagenomic Example Assembly
  • Support Protocol 1: Downloading and Installing AMOS
  • Support Protocol 2: Modifying the Minimus/Minimo Pipeline
  • Support Protocol 3: Validating an Assembly Inside AMOS
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Angly, F.E. Willner, D., Prieto‐Davó, A., Edwards, R.A., Schmieder, R., Vega‐Thurber, R., Antonopoulos, D.A., Barott, K., Cottrell, M.T., Desnues, C., Dinsdale, E.A., Furlan, M., Haynes, M., Henn, M.R., Hu, Y., Kirchman, D.L., McDole, T., McPherson, J.D., Meyer, F., Miller, R.M., Mundt, E., Naviaux, R.K., Rodriguez‐Mueller, B., Stevens, R., Wegley, L., Zhang, L., Zhu, B., and Rohwer, F. 2009. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput. Biol. 5:e100593.
   Berger, B., Laserson, J., Jojic, V., and Koller, D. 2010. Research in Computational Molecular Biology Springer, (B. Berger, ed.) Heidelberg, Germany.
   Dalloul, R.A., Long, J.A., Zimin, A.V., Aslam, L., Beal, K., Ann Blomberg, L., Bouffard, P., Burt, D.W., Crasta, O., Crooijmans, R.P., Cooper, K., Coulombe, R.A., De, S., Delany, M.E., Dodgson, J.B., Dong, J.J., Evans, C., Frederickson, K.M., Flicek, P., Florea, L., Folkerts, O., Groenen, M.A., Harkins, T.T., Herrero, J., Hoffmann, S., Megens, H.J., Jiang, A., de Jong, P., Kaiser, P., Kim, H., Kim, K.W., Kim, S., Langenberger, D., Lee, M.K., Lee, T., Mane, S., Marcais, G., Marz, M., McElroy, A.P., Modise, T., Nefedov, M., Notredame, C., Paton, I.R., Payne, W.S., Pertea, G., Prickett, D., Puiu, D., Qioa, D., Raineri, E., Ruffier, M., Salzberg, S.L., Schatz, M.C., Scheuring, C., Schmidt, C.J., Schroeder, S., Searle, S.M., Smith, E.J., Smith, J., Sonstegard, T.S., Stadler, P.F., Tafer, H., Tu, Z.J., Van Tassell, C.P., Vilella, A.J., Williams, K.P., Yorke, J.A., Zhang, L., Zhang, H.B., Zhang, X., Zhang, Y., and Reed, K.M. 2010. Multi‐platform next‐generation sequencing of the domestic turkey (Meleagris gallopavo): Genome assembly and analysis. PLoS Biol. 8:e100475.
   Fleischmann, R., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., Merrick, J.M., et al. 1995. Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496‐512.
   Idury, R.M. and Waterman, M.S. 1995. A new algorithm for DNA sequence assembly. J. Comput. Biol. 2:291‐306.
   Kelley, D.R., Schatz, M.C., and Salzberg, S.L. 2010. Quake: quality‐aware detection and correction of sequencing errors. Genome Biol. 11:R116.
   Kislyuk, A.O., Katz, L.S., Agrawal, S., Hagen, M.S., Conley, A.B., Jayaraman, P., Nelakuditi, V., Humphrey, J.C., Sammons, S.A., Govil, D., Mair, R.D., Tatti, K.M., Tondella, M.L., Harcourt, B.H., Mayer, L.W., and Jordan, I.K. 2010. A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26:1819‐1826.
   Li, Y., Hu, Y., Bolund, L., and Wang, J. 2010. State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum. Genomics 4:271‐277.
   Margulies, M. Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L., Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F., and Rothberg, J.M. 2005. Genome sequencing in microfabricated high‐density picolitre reactors. Nature 437:376‐380.
   Miller, J.R., Koren, S., and Sutton, G. 2010. Assembly algorithms for next‐generation sequencing data. Genomics 95:315‐327.
   Myers, E.W. 2000. A Whole‐genome assembly of drosophila. Science 287:2196‐2204.
   Myers, E.W. 2005. The fragment assembly string graph. Bioinformatics 21:ii79‐ii85.
   Nagarajan, N. and Pop, M. 2010. Sequencing and genome assembly using next‐generation technologies. Methods Mol. Biol. 673:1‐17.
   Pevzner, P.A., Tang, H., and Waterman, M.S. 2001. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98:9748‐9753.
   Phillippy, A.M., Schatz, M.C, and Pop, M. 2008. Genome assembly forensics: Finding the elusive mis‐assembly. Genome Biol. 9:R55.
   Pop, M. and Salzberg, S.L. 2008. Bioinformatics challenges of new sequencing technology. Trends Genet. 24:142‐149.
   Pop, M., Kosack, D.S., and Salzberg, S.L. 2004. Hierarchical scaffolding with Bambus. Genome Res. 14:149‐159.
   Salmela, L. 2010. Correction of sequencing errors in a mixed set of reads. Bioinformatics 26:1284‐1290.
   Sanger, F. and Coulson, A. 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94:441‐448.
   Schatz, M.C., Phillippy, A. M., Shneiderman, B., and Salzberg, S.L. 2007. Hawkeye: An interactive visual analytics tool for genome assemblies. Genome Biol. 8:R34.
   Schatz, M.C., Delcher, A.L., and Salzberg, S.L. 2010. Assembly of large genomes using second‐generation sequencing. Genome Res. 20:1165‐1173.
   Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., and Birol, I. 2009. ABySS: A parallel assembler for short read sequence data. Genome Res. 19:1117‐1123.
   Sommer, D.D., Delcher, A.L., Salzberg, S.L., and Pop, M. 2007. Minimus: A fast, lightweight genome assembler. BMC Bioinformatics 8:64.
   Yang, X., Dorman, K.S., and Aluru, S. 2010. Reptile: Representative tiling for short read error correction. Bioinformatics 26:2526‐2533.
   Zerbino, D.R. and Birney, E. 2008. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:821‐829.
   Zhao, X., Palmer, L.E., Bolanos, R., Mircean, C., Fasulo, D., and Wittenberg, G.M. 2010. EDAR: an efficient error detection and removal algorithm for next generation sequencing data. J. Comput. Biol. 17:1549‐1560.
Key References
   Sommer et al., 2007. See above.
  This first publication describing Minimus focused on the algorithm and implementation details. It also includes assemblies of a gene and bacterium.
   Pop et al., 2004. See above.
  This is the original Bambus publication describing the scaffolder's algorithm and implementation. A new publication describing Bambus 2, the updated scaffolder referenced throughout this manuscript, is currently under review.
Internet Resources
  AMOS Sourceforge website, where code, tutorials and general information on AMOS can be accessed.
PDF or HTML at Wiley Online Library